Age | Commit message (Collapse) | Author |
|
All IO pages mapped to user space are handled by rdma_user_mmap_io,
so add empty stub for disassociate ucontext.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://patch.msgid.link/20240902112920.58749-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
We extracted the common parts of the initialization/destruction
process to make the code cleaner.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://patch.msgid.link/20240902112920.58749-2-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When creating kernel MRs, it is not definitive whether they will be used
for peer-to-peer transactions or for other usecases, since address
mapping is performed only after the MR is created.
Since peer-to-peer transactions benefit significantly from ATS
performance-wise, enable ATS on newly-allocated kernel MRs when
supported.
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Gal Shalom <galshalom@nvidia.com>
Link: https://patch.msgid.link/fafd4c9f14cf438d2882d88649c2947e1d05d0b4.1725273403.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The cited commit moves the pd allocation from function
mlx5r_umr_resource_cleanup() to a new function mlx5r_umr_cleanup().
So the fix in commit [1] is broken. In error flow, will hit panic [2].
Fix it by checking pd pointer to avoid panic if it is NULL;
[1] RDMA/mlx5: Fix UMR cleanup on error flow of driver init
[2]
[ 347.567063] infiniband mlx5_0: Couldn't register device with driver model
[ 347.591382] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 347.593438] #PF: supervisor read access in kernel mode
[ 347.595176] #PF: error_code(0x0000) - not-present page
[ 347.596962] PGD 0 P4D 0
[ 347.601361] RIP: 0010:ib_dealloc_pd_user+0x12/0xc0 [ib_core]
[ 347.604171] RSP: 0018:ffff888106293b10 EFLAGS: 00010282
[ 347.604834] RAX: 0000000000000000 RBX: 000000000000000e RCX: 0000000000000000
[ 347.605672] RDX: ffff888106293ad0 RSI: 0000000000000000 RDI: 0000000000000000
[ 347.606529] RBP: 0000000000000000 R08: ffff888106293ae0 R09: ffff888106293ae0
[ 347.607379] R10: 0000000000000a06 R11: 0000000000000000 R12: 0000000000000000
[ 347.608224] R13: ffffffffa0704dc0 R14: 0000000000000001 R15: 0000000000000001
[ 347.609067] FS: 00007fdc720cd9c0(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
[ 347.610094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 347.610727] CR2: 0000000000000020 CR3: 0000000103012003 CR4: 0000000000370eb0
[ 347.611421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 347.612113] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 347.612804] Call Trace:
[ 347.613130] <TASK>
[ 347.613417] ? __die+0x20/0x60
[ 347.613793] ? page_fault_oops+0x150/0x3e0
[ 347.614243] ? free_msg+0x68/0x80 [mlx5_core]
[ 347.614840] ? cmd_exec+0x48f/0x11d0 [mlx5_core]
[ 347.615359] ? exc_page_fault+0x74/0x130
[ 347.615808] ? asm_exc_page_fault+0x22/0x30
[ 347.616273] ? ib_dealloc_pd_user+0x12/0xc0 [ib_core]
[ 347.616801] mlx5r_umr_cleanup+0x23/0x90 [mlx5_ib]
[ 347.617365] mlx5_ib_stage_pre_ib_reg_umr_cleanup+0x36/0x40 [mlx5_ib]
[ 347.618025] __mlx5_ib_add+0x96/0xd0 [mlx5_ib]
[ 347.618539] mlx5r_probe+0xe9/0x310 [mlx5_ib]
[ 347.619032] ? kernfs_add_one+0x107/0x150
[ 347.619478] ? __mlx5_ib_add+0xd0/0xd0 [mlx5_ib]
[ 347.619984] auxiliary_bus_probe+0x3e/0x90
[ 347.620448] really_probe+0xc5/0x3a0
[ 347.620857] __driver_probe_device+0x80/0x160
[ 347.621325] driver_probe_device+0x1e/0x90
[ 347.621770] __driver_attach+0xec/0x1c0
[ 347.622213] ? __device_attach_driver+0x100/0x100
[ 347.622724] bus_for_each_dev+0x71/0xc0
[ 347.623151] bus_add_driver+0xed/0x240
[ 347.623570] driver_register+0x58/0x100
[ 347.623998] __auxiliary_driver_register+0x6a/0xc0
[ 347.624499] ? driver_register+0xae/0x100
[ 347.624940] ? 0xffffffffa0893000
[ 347.625329] mlx5_ib_init+0x16a/0x1e0 [mlx5_ib]
[ 347.625845] do_one_initcall+0x4a/0x2a0
[ 347.626273] ? gcov_event+0x2e2/0x3a0
[ 347.626706] do_init_module+0x8a/0x260
[ 347.627126] init_module_from_file+0x8b/0xd0
[ 347.627596] __x64_sys_finit_module+0x1ca/0x2f0
[ 347.628089] do_syscall_64+0x4c/0x100
Fixes: 638420115cc4 ("IB/mlx5: Create UMR QP just before first reg_mr occurs")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Link: https://patch.msgid.link/778c40c60287992da5d6ec92bb07b67f7bb5e6ef.1725273295.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Some of the adapters support Relaxed Ordering for the MRs.
Driver queries support for Memory region relax ordering support from
firmware and set relax ordering bit in REGISTER_MR request, if the users
request for the support. Also, this is supported only if the PCIe device
has enabled relaxed ordering attribute.
Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Vijay Kumar Mandadapu <vijaykumar.mandadapu@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1725256351-12751-5-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Firmware now have a new mr registration command where
both MR allocation and registration can be done in a
single hwrm command. Driver has to issue this new hwrm
command whenever the support flag is set. This reduces
the number of hwrm issued per MR creation and speed up
the MR creation.
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1725256351-12751-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Renaming flags to access_flags for clarity.
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1725256351-12751-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Updating the HW structures for the pcie relax ordering support.
Newly added interface structures will be used in the
followup patch.
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1725256351-12751-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When mapping doorbell page from user-mode, the driver should use the system
page size as this memory is allocated via mmap() from user-mode.
Cc: stable@vger.kernel.org
Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/1725030993-16213-2-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
MANA hardware uses 4k page size. When calculating the page table index,
it should use the hardware page size, not the system page size.
Cc: stable@vger.kernel.org
Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/1725030993-16213-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Gen P7 adapters needs to share a toggle bits information received
in kernel driver with the user space. User space needs this
info to arm the SRQ.
User space application can get this page using the
UAPI routines. Library will mmap this page and get the
toggle bits to be used in the next ARM Doorbell.
Uses a hash list to map the SRQ structure from the SRQ ID.
SRQ structure is retrieved from the hash list while the
library calls the UAPI routine to get the toggle page
mapping. Currently the full page is mapped per SRQ. This
can be optimized to enable multiple SRQs from the same
application share the same page and different offsets
in the page
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1724945645-14989-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Refactor the code in this function to have common code.
This is used in subsequent patches.
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1724945645-14989-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
SRQ arming requires the toggle bits received from hardware.
Get the toggle bits from SRQ notification for the
gen p7 adapters. This value will be zero for the older adapters.
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1724945645-14989-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Since commit be4c9bad9d0e ("MAINTAINERS: Add cxgb4 and iw_cxgb4 entries")
c4iw_post_terminate() declaration is not used anymore.
And other declarations were never implemented since introduction in
commit cfdda9d76436 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC").
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20240824091629.3659565-1-yuehaibing@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Selvin Xavier says:
=============
Enable the Variable size Work Queue entry support for Gen P7
adapters. This would help in the better utilization of the queue memory
and pci bandwidth due to the smaller send queue Work entries.
=============
Based on v6.11-rc5 for dependencies.
* bnxt_re_variable_wqes: (829 commits)
RDMA/bnxt_re: Enable variable size WQEs for user space applications
RDMA/bnxt_re: Handle variable WQE support for user applications
RDMA/bnxt_re: Fix the table size for PSN/MSN entries
RDMA/bnxt_re: Get the WQE index from slot index while completing the WQEs
RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters
Linux 6.11-rc5
...
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Add backward compatibility code to enable variable size WQEs only if the
user lib supports it.
Link: https://patch.msgid.link/r/1724042847-1481-6-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
User library calculates the number of slots required for user applications
and it can pass that information to the driver. Driver can use this value
and update the HW directly. This mechanism is currently used only for the
newly introduced variable size WQEs.
Extend the bnxt_re_qp_req structure to pass the Send Queue slot count.
Reorganize the code to get the sq_slots before initializing the Send Queue
attributes.
Link: https://patch.msgid.link/r/1724042847-1481-5-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
HW MSN table size is always a power of 2. So the pages should be mapped
accordingly.
Use the power of two calculation while get the number of PSN/MSN entries.
Fixes: 6f6bfbc595fb ("RDMA/bnxt_re: Expose the MSN table capability for user library")
Link: https://patch.msgid.link/r/1724042847-1481-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
While reporting the completions, SQ Work Queue index is required to
identify the WQE that generated the completions. In variable WQE mode, FW
returns the slot index for Error completions. Driver need to walk through
the shadow queue between the consumer index and producer index and matches
the slot index returned by FW. If a match is found, the next index of the
shadow queue is the WQE index to be considered for remaining poll_cq loop.
Link: https://patch.msgid.link/r/1724042847-1481-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Variable size WQE means that each send Work Queue Entry to HW can use
different WQE sizes as opposed to the static WQE size on the current
devices. Set variable WQE mode for Gen P7 devices. Depth of the Queue will
be a multiple of slot which is 16 bytes. The number of slots should be a
multiple of 256 as per the HW requirement.
Initialize the Software shadow queue to hold requests equal to the number
of slots. Also, do not expose the variable size WQE capability until the
last patch in the series.
Link: https://patch.msgid.link/r/1724042847-1481-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Propagate the unique, per device, ID in the device attributes to the
standard node_guid value in IB device.
Link: https://patch.msgid.link/r/20240822171143.2800-1-mrgolin@amazon.com
Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Yehuda Yitschak <yehuday@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Let alloc_ordered_workqueue() format the workqueue name instead of calling
snprintf() explicitly.
Link: https://patch.msgid.link/r/20240823101840.515398-5-ruanjinjie@huawei.com
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Let alloc_ordered_workqueue() format the workqueue name instead of calling
snprintf() explicitly.
Link: https://patch.msgid.link/r/20240823101840.515398-4-ruanjinjie@huawei.com
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Let alloc_ordered_workqueue() format the workqueue name instead of
calling snprintf() explicitly.
Link: https://patch.msgid.link/r/20240823101840.515398-2-ruanjinjie@huawei.com
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Commit e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to
mlx5_ib_dereg_mr()") removed mlx5_ib_free_implicit_mr() but left
the declaration.
Commit d98995b4bf98 ("net/mlx5: Reimplement write combining test") left
mlx5_ib_test_wc().
Remove the unused declarations.
Link: https://patch.msgid.link/r/20240816101358.881247-1-yuehaibing@huawei.com
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Introduce the 'GET_DATA_DIRECT_SYSFS_PATH' ioctl to return the sysfs
path of the affiliated 'data direct' device for a given device.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/403745463e0ef52adbef681ff09aa6a29a756352.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add support for DMABUF MR registrations with Data-direct device.
Upon userspace calling to register a DMABUF MR with the data direct bit
set, the below algorithm will be followed.
1) Obtain a pinned DMABUF umem from the IB core using the user input
parameters (FD, offset, length) and the DMA PF device. The DMA PF
device is needed to allow the IOMMU to enable the DMA PF to access the
user buffer over PCI.
2) Create a KSM MKEY by setting its entries according to the user buffer
VA to IOVA mapping, with the MKEY being the data direct device-crossed
MKEY. This KSM MKEY is umrable and will be used as part of the MR cache.
The PD for creating it is the internal device 'data direct' kernel one.
3) Create a crossing MKEY that points to the KSM MKEY using the crossing
access mode.
4) Manage the KSM MKEY by adding it to a list of 'data direct' MKEYs
managed on the mlx5_ib device.
5) Return the crossing MKEY to the user, created with its supplied PD.
Upon DMA PF unbind flow, the driver will revoke the KSM entries.
The final deregistration will occur under the hood once the application
deregisters its MKEY.
Notes:
- This version supports only the PINNED UMEM mode, so there is no
dependency on ODP.
- The IOVA supplied by the application must be system page aligned due to
HW translations of KSM.
- The crossing MKEY will not be umrable or part of the MR cache, as we
cannot change its crossed (i.e. KSM) MKEY over UMR.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/1f99d8020ed540d9702b9e2252a145a439609ba6.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API instead of
udata.
This enables passing some new ioctl attributes to the drivers, as will
be introduced in the next patches for mlx5 driver.
Change the involved drivers accordingly.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/9a25b2fc02443f7c36c2d93499ae25252b6afd40.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add the NET device initialization flow to utilize the 'data
direct' device.
When a NET mlx5_ib device is capable of 'data direct', the following
sequence of actions will occur:
- Find its affiliated 'data direct' VUID via a firmware command.
- Create its own private PD and 'data direct' mkey.
- Register to be notified when its 'data direct' driver is probed or removed.
The DMA device of the affiliated 'data direct' device, including the
private PD and the 'data direct' mkey, will be used later during MR
registrations that request the data direct functionality.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/b11fa87b2a65bce4db8d40341bb6cee490fa4d06.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Introduce the 'data direct' driver for a ConnectX-8 Data Direct device.
The 'data direct' driver functions as the affiliated DMA device for one
or more capable mlx5_ib devices. This DMA device, as the name suggests,
is used exclusively for DMA operations. It can be considered a DMA engine
managed by a PF/VF, lacking network capabilities and having minimal overall
capabilities.
Consequently, the DMA NIC PF will not be exposed to or directly used by
software applications. The driver will not have any direct interface or
interaction with the firmware (no command interface, no capabilities,
etc.). It will operate solely over PCI to enable its DMA functionality.
Registration and un-registration of the driver are handled as part of
the mlx5_ib initialization and exit processes, as the mlx5_ib devices
will effectively be its clients.
The driver will serve as the DMA device for accessing another PCI device
to achieve optimal performance (both on the same NUMA node, P2P access,
etc.).
Upon probing, it will read its VUID over PCI to handle mlx5_ib device
registrations with the same VUID.
Upon removal, it will notify its clients to allow them to clean up the
resources that were mmaped with its DMA device.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/b77edecfd476c3f445da96ab6aef499ae47b2829.1722512548.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In multiport mode, RDMA devices make it impossible for userspace to use
DEVX to discover vhca id values for ports beyond port 1. This patch
addresses the issue by exposing the vhca id of all ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Link: https://patch.msgid.link/41dea83aa51843aa4c067b4f73f28d64e51bd53c.1722331101.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
dma_alloc_coherent() allocates contiguous memory irrespective of
iommu mode, but after commit f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP")
if iommu is enabled in translate mode, dma_alloc_coherent() may
allocate non-contiguous memory. Attempt to map this memory results in panic.
This patch fixes the issue by using dma_mmap_coherent() to map each page
to user space.
Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Link: https://lore.kernel.org/r/20240716142532.97423-1-anumula@chelsio.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Will Deacon:
"Core:
- Support for the "ats-supported" device-tree property
- Removal of the 'ops' field from 'struct iommu_fwspec'
- Introduction of iommu_paging_domain_alloc() and partial conversion
of existing users
- Introduce 'struct iommu_attach_handle' and provide corresponding
IOMMU interfaces which will be used by the IOMMUFD subsystem
- Remove stale documentation
- Add missing MODULE_DESCRIPTION() macro
- Misc cleanups
Allwinner Sun50i:
- Ensure bypass mode is disabled on H616 SoCs
- Ensure page-tables are allocated below 4GiB for the 32-bit
page-table walker
- Add new device-tree compatible strings
AMD Vi:
- Use try_cmpxchg64() instead of cmpxchg64() when updating pte
Arm SMMUv2:
- Print much more useful information on context faults
- Fix Qualcomm TBU probing when CONFIG_ARM_SMMU_QCOM_DEBUG=n
- Add new Qualcomm device-tree bindings
Arm SMMUv3:
- Support for hardware update of access/dirty bits and reporting via
IOMMUFD
- More driver rework from Jason, this time updating the PASID/SVA
support to prepare for full IOMMUFD support
- Add missing MODULE_DESCRIPTION() macro
- Minor fixes and cleanups
NVIDIA Tegra:
- Fix for benign fwspec initialisation issue exposed by rework on the
core branch
Intel VT-d:
- Use try_cmpxchg64() instead of cmpxchg64() when updating pte
- Use READ_ONCE() to read volatile descriptor status
- Remove support for handling Execute-Requested requests
- Avoid calling iommu_domain_alloc()
- Minor fixes and refactoring
Qualcomm MSM:
- Updates to the device-tree bindings"
* tag 'iommu-updates-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (72 commits)
iommu/tegra-smmu: Pass correct fwnode to iommu_fwspec_init()
iommu/vt-d: Fix identity map bounds in si_domain_init()
iommu: Move IOMMU_DIRTY_NO_CLEAR define
dt-bindings: iommu: Convert msm,iommu-v0 to yaml
iommu/vt-d: Fix aligned pages in calculate_psi_aligned_address()
iommu/vt-d: Limit max address mask to MAX_AGAW_PFN_WIDTH
docs: iommu: Remove outdated Documentation/userspace-api/iommu.rst
arm64: dts: fvp: Enable PCIe ATS for Base RevC FVP
iommu/of: Support ats-supported device-tree property
dt-bindings: PCI: generic: Add ats-supported property
iommu: Remove iommu_fwspec ops
OF: Simplify of_iommu_configure()
ACPI: Retire acpi_iommu_fwspec_ops()
iommu: Resolve fwspec ops automatically
iommu/mediatek-v1: Clean up redundant fwspec checks
RDMA/usnic: Use iommu_paging_domain_alloc()
wifi: ath11k: Use iommu_paging_domain_alloc()
wifi: ath10k: Use iommu_paging_domain_alloc()
drm/msm: Use iommu_paging_domain_alloc()
vhost-vdpa: Use iommu_paging_domain_alloc()
...
|
|
Pull rdma updates from Jason Gunthorpe:
"Usual collection of small improvements and fixes:
- Bug fixes and minor improvments in efa, irdma, mlx4, mlx5, rxe,
hf1, qib, ocrdma
- bnxt_re support for MSN, which is a new retransmit logic
- Initial mana support for RC qps
- Use after free bug and cleanups in iwcm
- Reduce resource usage in mlx5 when RDMA verbs features are not used
- New verb to drain shared recieve queues, similar to normal recieve
queues. This is necessary to allow ULPs a clean shutdown. Used in
the iscsi rdma target
- mlx5 support for more than 16 bits of doorbell indexes
- Doorbell moderation support for bnxt_re
- IB multi-plane support for mlx5
- New EFA adaptor PCI IDs
- RDMA_NAME_ASSIGN_TYPE_USER to hint to userspace that it shouldn't
rename the device
- A collection of hns bugs
- Fix long standing bug in bnxt_re with incorrect endian handling of
immediate data"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits)
IB/hfi1: Constify struct flag_table
RDMA/mana_ib: Set correct device into ib
bnxt_re: Fix imm_data endianness
RDMA: Fix netdev tracker in ib_device_set_netdev
RDMA/hns: Fix mbx timing out before CMD execution is completed
RDMA/hns: Fix insufficient extend DB for VFs.
RDMA/hns: Fix undifined behavior caused by invalid max_sge
RDMA/hns: Fix shift-out-bounds when max_inline_data is 0
RDMA/hns: Fix missing pagesize and alignment check in FRMR
RDMA/hns: Fix unmatch exception handling when init eq table fails
RDMA/hns: Fix soft lockup under heavy CEQE load
RDMA/hns: Check atomic wr length
RDMA/ocrdma: Don't inline statistics functions
RDMA/core: Introduce "name_assign_type" for an IB device
RDMA/qib: Fix truncation compilation warnings in qib_verbs.c
RDMA/qib: Fix truncation compilation warnings in qib_init.c
RDMA/efa: Add EFA 0xefa3 PCI ID
RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register
net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
RDMA/mlx5: Add plane index support when querying PTYS registers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
aux-sysfs-irqs
Shay Says:
==========
Introduce auxiliary bus IRQs sysfs
Today, PCI PFs and VFs, which are anchored on the PCI bus, display their
IRQ information in the <pci_device>/msi_irqs/<irq_num> sysfs files. PCI
subfunctions (SFs) are similar to PFs and VFs and these SFs are anchored
on the auxiliary bus. However, these PCI SFs lack such IRQ information
on the auxiliary bus, leaving users without visibility into which IRQs
are used by the SFs. This absence makes it impossible to debug
situations and to understand the source of interrupts/SFs for
performance tuning and debug.
Additionally, the SFs are multifunctional devices supporting RDMA,
network devices, clocks, and more, similar to their peer PCI PFs and
VFs. Therefore, it is desirable to have SFs' IRQ information available
at the bus/device level.
To overcome the above limitations, this short series extends the
auxiliary bus to display IRQ information in sysfs, similar to that of
PFs and VFs.
It adds an 'irqs' directory under the auxiliary device and includes an
<irq_num> sysfs file within it.
For example:
$ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/
50 51 52 53 54 55 56 57 58
Patch summary:
patch-1 adds auxiliary bus to support irqs used by auxiliary device
patch-2 mlx5 driver using exposing irqs for PCI SF devices via auxiliary
bus
==========
* tag 'aux-sysfs-irqs' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Expose SFs IRQs
driver core: auxiliary bus: show auxiliary device IRQs
RDMA/mlx5: Add Qcounters req_transport_retries_exceeded/req_rnr_retries_exceeded
net/mlx5: Reimplement write combining test
====================
Link: https://patch.msgid.link/20240711213140.256997-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'struct flag_table' are not modified in this driver.
Constifying this structure moves some data to a read-only section, so
increase overall security.
On a x86_64, with allmodconfig:
Before:
======
text data bss dec hex filename
302932 40271 112 343315 53d13 drivers/infiniband/hw/hfi1/chip.o
After:
=====
text data bss dec hex filename
311636 31567 112 343315 53d13 drivers/infiniband/hw/hfi1/chip.o
Link: https://lore.kernel.org/r/782b6a648bfbbf2bb83f81a73c0460b5bb7642a1.1720959310.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Add mana_get_primary_netdev_rcu helper to get a primary
netdevice for a given port. When mana is used with
netvsc, the VF netdev is controlled by an upper netvsc
device. In a baremetal case, the VF netdev is the
primary device.
Use the mana_get_primary_netdev_rcu() helper in the mana_ib
to get the correct device for querying network states.
Fixes: 8b184e4f1c32 ("RDMA/mana_ib: Enable RoCE on port 1")
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://lore.kernel.org/r/1720705077-322-1-git-send-email-kotaranov@linux.microsoft.com
Reviewed-by: Long Li <longli@microsoft.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When map a device between servers with MLX and BCM RoCE nics, RTRS
server complain about unknown imm type, and can't map the device,
After more debug, it seems bnxt_re wrongly handle the
imm_data, this patch fixed the compat issue with MLX for us.
In off list discussion, Selvin confirmed HW is working in little endian format
and all data needs to be converted to LE while providing.
This patch fix the endianness for imm_data
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20240710122102.37569-1-jinpu.wang@ionos.com
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When a large number of tasks are issued, the speed of HW processing
mbx will slow down. The standard for judging mbx timeout in the current
firmware is 30ms, and the current timeout standard for the driver is also
30ms.
Considering that firmware scheduling in multi-function scenarios takes a
certain amount of time, this will cause the driver to time out too early
and report a failure before mbx execution times out.
This patch introduces a new mechanism that can set different timeouts for
different cmds and extends the timeout of mbx to 35ms.
Fixes: a04ff739f2a9 ("RDMA/hns: Add command queue support for hip08 RoCE driver")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20240710133705.896445-9-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
VFs and its PF will share the memory of the extend DB. Currently,
the number of extend DB allocated by driver is only enough for PF.
This leads to a probability of DB loss and some other problems in
scenarios where both PF and VFs use a large number of QPs.
Fixes: 6b63597d3540 ("RDMA/hns: Add TSQ link table support")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20240710133705.896445-8-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
If max_sge has been set to 0, roundup_pow_of_two() in
set_srq_basic_param() may have undefined behavior.
Fixes: 9dd052474a26 ("RDMA/hns: Allocate one more recv SGE for HIP08")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20240710133705.896445-7-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
A shift-out-bounds may occur, if the max_inline_data has not been set.
The related log:
UBSAN: shift-out-of-bounds in kernel/include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
Call trace:
dump_backtrace+0xb0/0x118
show_stack+0x20/0x38
dump_stack_lvl+0xbc/0x120
dump_stack+0x1c/0x28
__ubsan_handle_shift_out_of_bounds+0x104/0x240
set_ext_sge_param+0x40c/0x420 [hns_roce_hw_v2]
hns_roce_create_qp+0xf48/0x1c40 [hns_roce_hw_v2]
create_qp.part.0+0x294/0x3c0
ib_create_qp_kernel+0x7c/0x150
create_mad_qp+0x11c/0x1e0
ib_mad_init_device+0x834/0xc88
add_client_context+0x248/0x318
enable_device_and_get+0x158/0x280
ib_register_device+0x4ac/0x610
hns_roce_init+0x890/0xf98 [hns_roce_hw_v2]
__hns_roce_hw_v2_init_instance+0x398/0x720 [hns_roce_hw_v2]
hns_roce_hw_v2_init_instance+0x108/0x1e0 [hns_roce_hw_v2]
hclge_init_roce_client_instance+0x1a0/0x358 [hclge]
hclge_init_client_instance+0xa0/0x508 [hclge]
hnae3_register_client+0x18c/0x210 [hnae3]
hns_roce_hw_v2_init+0x28/0xff8 [hns_roce_hw_v2]
do_one_initcall+0xe0/0x510
do_init_module+0x110/0x370
load_module+0x2c6c/0x2f20
init_module_from_file+0xe0/0x140
idempotent_init_module+0x24c/0x350
__arm64_sys_finit_module+0x88/0xf8
invoke_syscall+0x68/0x1a0
el0_svc_common.constprop.0+0x11c/0x150
do_el0_svc+0x38/0x50
el0_svc+0x50/0xa0
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x1a4/0x1a8
Fixes: 0c5e259b06a8 ("RDMA/hns: Fix incorrect sge nums calculation")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20240710133705.896445-6-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The offset requires 128B alignment and the page size ranges from
4K to 128M.
Fixes: 68a997c5d28c ("RDMA/hns: Add FRMR support for hip08")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20240710133705.896445-5-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The hw ctx should be destroyed when init eq table fails.
Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20240710133705.896445-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
CEQEs are handled in interrupt handler currently. This may cause the
CPU core staying in interrupt context too long and lead to soft lockup
under heavy load.
Handle CEQEs in BH workqueue and set an upper limit for the number of
CEQE handled by a single call of work handler.
Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20240710133705.896445-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
8 bytes is the only supported length of atomic. Add this check in
set_rc_wqe(). Besides, stop processing WQEs and return from
set_rc_wqe() if there is any error.
Fixes: 384f88185112 ("RDMA/hns: Add atomic support")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20240710133705.896445-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Fix the problem of KASAN causing the stack frame size to increase
drivers/infiniband/hw/ocrdma/ocrdma_stats.c:686:16: error: stack frame size (20664) exceeds limit (8192) in 'ocrdma_dbgfs_ops_read' [-Werror,-Wframe-larger-than]
static ssize_t ocrdma_dbgfs_ops_read(struct file *filp, char __user *buffer,
^
Some functions called by ocrdma_dbgfs_ops_read occupy a lot of stack space.
Mark these functions as noinline_for_stack to prevent them from accumulating
in ocrdma_dbgfs_ops_read.
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Link: https://lore.kernel.org/r/20240710091657.26291-1-flyingpeng@tencent.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
usnic_uiom_alloc_pd() allocates a paging domain for a given device.
In this case, iommu_domain_alloc(dev->bus) is equivalent to
iommu_paging_domain_alloc(dev). Replace it as iommu_domain_alloc()
has been deprecated.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20240610085555.88197-15-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The name_assign_type indicates how the name is provided. Currently
these types are supported:
- RDMA_NAME_ASSIGN_TYPE_UNKNOWN: Unknown or not set;
- RDMA_NAME_ASSIGN_TYPE_USER: Name is provided by the user; The
user-created sub device, rxe and siw device has this type.
When filling nl device info, it is set in the new attribute
RDMA_NLDEV_ATTR_NAME_ASSIGN_TYPE. User-space tools like udev
"rdma_rename" could check this attribute to determine if this
device needs to be renamed or not.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Link: https://lore.kernel.org/r/522591bef9a369cc8e5dcb77787e017bffee37fe.1719837610.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Reduce nodename string size to fit IB_DEVICE_NODE_DESC_MAX.
drivers/infiniband/hw/qib/qib_verbs.c: In function ‘qib_register_ib_device’:
drivers/infiniband/hw/qib/qib_verbs.c:1554:40: error: ‘%s’ directive output may be truncated writing up to 64 bytes into a region of size 43
[-Werror=format-truncation=]
1554 | "Intel Infiniband HCA %s", init_utsname()->nodename);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/qib/qib_verbs.c:1553:9: note: ‘snprintf’ output between 22 and 86 bytes into a destination of size 64
1553 | snprintf(ibdev->node_desc, sizeof(ibdev->node_desc),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1554 | "Intel Infiniband HCA %s", init_utsname()->nodename);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Link: https://lore.kernel.org/r/1fb6393fa2e0702fef995834c3c7db972bbc4d06.1719837715.git.leon@kernel.org
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|