summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)Author
2024-10-11RDMA/bnxt_re: Fix a possible NULL pointer dereferenceKalesh AP
There is a possibility of a NULL pointer dereference in the failure path of bnxt_re_add_device(). To address that, moved the update of "rdev->adev" to bnxt_re_dev_add(). Fixes: dee3da3422d5 ("RDMA/bnxt_re: Change aux driver data to en_info to hold more information") Link: https://patch.msgid.link/r/1728373302-19530-6-git-send-email-selvin.xavier@broadcom.com Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-rdma/CAH-L+nMCwymKGqf5pd8-FZNhxEkDD=kb6AoCaE6fAVi7b3e5Qw@mail.gmail.com/T/#t Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Return more meaningful errorKalesh AP
When the HWRM command fails, driver currently returns -EFAULT(Bad address). This does not look correct. Modified to return -EIO(I/O error). Fixes: cc1ec769b87c ("RDMA/bnxt_re: Fixing the Control path command and response handling") Fixes: 65288a22ddd8 ("RDMA/bnxt_re: use shadow qd while posting non blocking rcfw command") Link: https://patch.msgid.link/r/1728373302-19530-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Fix incorrect dereference of srq in async eventKashyap Desai
Currently driver is not getting correct srq. Dereference only if qplib has a valid srq. Fixes: b02fd3f79ec3 ("RDMA/bnxt_re: Report async events and errors") Link: https://patch.msgid.link/r/1728373302-19530-4-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Fix out of bound checkKalesh AP
Driver exports pacing stats only on GenP5 and P7 adapters. But while parsing the pacing stats, driver has a check for "rdev->dbr_pacing". This caused a trace when KASAN is enabled. BUG: KASAN: slab-out-of-bounds in bnxt_re_get_hw_stats+0x2b6a/0x2e00 [bnxt_re] Write of size 8 at addr ffff8885942a6340 by task modprobe/4809 Fixes: 8b6573ff3420 ("bnxt_re: Update the debug counters for doorbell pacing") Link: https://patch.msgid.link/r/1728373302-19530-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/bnxt_re: Fix the max CQ WQEs for older adaptersAbhishek Mohapatra
Older adapters doesn't support the MAX CQ WQEs reported by older FW. So restrict the value reported to 1M always for older adapters. Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://patch.msgid.link/r/1728373302-19530-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Abhishek Mohapatra<abhishek.mohapatra@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/irdma: Fix misspelling of "accept*"Alexander Zubkov
There is "accept*" misspelled as "accpet*" in the comments. Fix the spelling. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Link: https://patch.msgid.link/r/20241008161913.19965-1-green@qrator.net Signed-off-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-11RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARPAnumula Murali Mohan Reddy
ip_dev_find() always returns real net_device address, whether traffic is running on a vlan or real device, if traffic is over vlan, filling endpoint struture with real ndev and an attempt to send a connect request will results in RDMA_CM_EVENT_UNREACHABLE error. This patch fixes the issue by using vlan_dev_real_dev(). Fixes: 830662f6f032 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address") Link: https://patch.msgid.link/r/20241007132311.70593-1-anumula@chelsio.com Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-10-08IB/hfi1: make clear_all_interrupts staticDr. David Alan Gilbert
clear_all_interrupts() in hw/hfi1/chip.c is currently global but only used in the same file, so make it static. There are also 'clear_all_interrupts' functions in i2c-nomadik and emif.c but fortunately they're already static. (Build and boot tested only, I don't have this hardware) Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20241007235327.128613-1-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-08RDMA/bnxt_re: Fix the max WQEs used in Static WQE modeSelvin Xavier
max_sw_wqe used for static wqe mode should be same as the max_wqe. Calculate the max_sw_wqe only for the variable WQE mode. Fixes: de1d364c3815 ("RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters") Link: https://patch.msgid.link/r/1726715161-18941-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-08RDMA/bnxt_re: Add a check for memory allocationKalesh AP
__alloc_pbl() can return error when memory allocation fails. Driver is not checking the status on one of the instances. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Link: https://patch.msgid.link/r/1726715161-18941-4-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-08RDMA/bnxt_re: Fix incorrect AVID type in WQE structureSaravanan Vajravel
Driver uses internal data structure to construct WQE frame. It used avid type as u16 which can accommodate up to 64K AVs. When outstanding AVID crosses 64K, driver truncates AVID and hence it uses incorrect AVID to WR. This leads to WR failure due to invalid AV ID and QP is moved to error state with reason set to 19 (INVALID AVID). When RDMA CM path is used, this issue hits QP1 and it is moved to error state Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://patch.msgid.link/r/1726715161-18941-3-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-08RDMA/bnxt_re: Fix a possible memory leakKalesh AP
In bnxt_re_setup_chip_ctx() when bnxt_qplib_map_db_bar() fails driver is not freeing the memory allocated for "rdev->chip_ctx". Fixes: 0ac20faf5d83 ("RDMA/bnxt_re: Reorg the bar mapping") Link: https://patch.msgid.link/r/1726715161-18941-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-10-07RDMA/hns: Disassociate mmap pages for all uctx when HW is being resetChengchang Tang
When HW is being reset, userspace should not ring doorbell otherwise it may lead to abnormal consequence such as RAS. Disassociate mmap pages for all uctx to prevent userspace from ringing doorbell to HW. Since all resources will be destroyed during HW reset, no new mmap is allowed after HW reset is completed. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240927103323.1897094-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-27[tree-wide] finally take no_llseek outAl Viro
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-22RDMA/bnxt_re: Remove the unused variable en_devJiapeng Chong
Variable en_dev is not effectively used, so delete it. drivers/infiniband/hw/bnxt_re/main.c:1980:22: warning: variable ‘en_dev’ set but not used. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10867 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://patch.msgid.link/20240918021632.36091-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-16RDMA/irdma: fix error message in irdma_modify_qp_roce()Vitaliy Shevtsov
Use a correct field max_dest_rd_atomic instead of max_rd_atomic for the error output. Found by Linux Verification Center (linuxtesting.org) with Svace. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Vitaliy Shevtsov <v.shevtsov@maxima.ru> Link: https://lore.kernel.org/stable/20240916165817.14691-1-v.shevtsov%40maxima.ru Link: https://patch.msgid.link/20240916165817.14691-1-v.shevtsov@maxima.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-16RDMA/cxgb4: Added NULL check for lookup_atidMikhail Lobanov
The lookup_atid() function can return NULL if the ATID is invalid or does not exist in the identifier table, which could lead to dereferencing a null pointer without a check in the `act_establish()` and `act_open_rpl()` functions. Add a NULL check to prevent null pointer dereferencing. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: cfdda9d76436 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC") Signed-off-by: Mikhail Lobanov <m.lobanov@rosalinux.ru> Link: https://patch.msgid.link/20240912145844.77516-1-m.lobanov@rosalinux.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-16RDMA/hns: Fix ah error counter in sw stat not increasingJunxian Huang
There are several error cases where hns_roce_create_ah() returns directly without jumping to sw stat path, thus leading to a problem that the ah error counter does not increase. Fixes: ee20cc17e9d8 ("RDMA/hns: Support DSCP") Fixes: eb7854d63db5 ("RDMA/hns: Support SW stats with debugfs") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240912115700.2016443-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/bnxt_re: Recover the device when FW error is detectedSelvin Xavier
If the FW crashes, L2 driver gets notified and it notifies the RoCE driver. Currently driver doesn't re-initialize the device. Add support for re-initialize the RoCE device. RoCE device is removed and re-attached in the ulp_stop and ulp_start respectively. The recovery logic expects the RoCE driver to be registered with L2 driver while its being removed. So the driver avoids unregistering with L2 driver in the recovery path. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1726027710-2292-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/bnxt_re: Group all operations under add_device and remove_deviceSelvin Xavier
Adding and removing device need to be handled from multiple contexts when Firmware error recovery is supported. So group all the add and remove operations to add_device and remove_device function. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1726027710-2292-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/bnxt_re: Use the aux device for L2 ULP callbacksChandramohan Akula
While registering with the L2 for ULP operations, use the aux device pointer as the handle. Aux device has the data bnxt_re_en_dev_info, which is used to store required information for the bnxt_re_suspend and bnxt_re_resume functions. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1726027710-2292-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/bnxt_re: Change aux driver data to en_info to hold more informationChandramohan Akula
rdev will be destroyed and recreated during the FW error recovery scenarios. So to keep the state, if any, use an en_info structure which gets created/freed based on auxiliary device initialization/de-initialization. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1726027710-2292-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/mlx5: Use IB set_netdev and get_netdev functionsChiara Meiohas
The IB layer provides a common interface to store and get net devices associated to an IB device port (ib_device_set_netdev() and ib_device_get_netdev()). Previously, mlx5_ib stored and managed the associated net devices internally. Replace internal net device management in mlx5_ib with ib_device_set_netdev() when attaching/detaching a net device and ib_device_get_netdev() when retrieving the net device. Export ib_device_get_netdev(). For mlx5 representors/PFs/VFs and lag creation we replace the netdev assignments with the IB set/get netdev functions. In active-backup mode lag the active slave net device is stored in the lag itself. To assure the net device stored in a lag bond IB device is the active slave we implement the following: - mlx5_core: when modifying the slave of a bond we send the internal driver event MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE. - mlx5_ib: when catching the event call ib_device_set_netdev() This patch also ensures the correct IB events are sent in switchdev lag. While at it, when in multiport eswitch mode, only a single IB device is created for all ports. The said IB device will receive all netdev events of its VFs once loaded, thus to avoid overwriting the mapping of PF IB device to PF netdev, ignore NETDEV_REGISTER events if the ib device has already been mapped to a netdev. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909173025.30422-6-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/mlx5: Initialize phys_port_cnt earlier in RDMA device creationChiara Meiohas
phys_port_cnt of the IB device must be initialized before calling ib_device_set_netdev(). Previously, phys_port_cnt was initialized in the mlx5_ib init function. Remove this initialization to allow setting it separately, providing the flexibility to call ib_device_set_netdev before registering the IB device. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909173025.30422-4-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/mlx5: Obtain upper net device only when neededMark Bloch
Report the upper device's state as the RDMA port state only in RoCE LAG or switchdev LAG. Fixes: 27f9e0ccb6da ("net/mlx5: Lag, Add single RDMA device in multiport mode") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909173025.30422-3-michaelgur@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/mlx5: Check RoCE LAG status before getting netdevMark Bloch
Check if RoCE LAG is active before calling the LAG layer for netdev. This clarifies if LAG is active. No behavior changes with this patch. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909173025.30422-2-michaelgur@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-13RDMA/mlx5: Consider the query_vuid cap for data_directYishai Hadas
Consider also the query_vuid cap before enabling the data_direct functionality. This may prevent a syndrome from the FW in case the query_vuid command is not supported. (e.g. migratable VF) Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Gal Shalom <galshalom@nvidia.com> Link: https://patch.msgid.link/274c4f6f1ac0b1078243dd296695a49dbe58e7d1.1725907637.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11RDMA/mlx5: Add implicit MR handling to ODP memory schemeMichael Guralnik
Implicit MRs in ODP memory scheme require allocating a private null mkey and assigning the mkey and va differently in the KSM mkey. The page faults are received on the null mkey so we also add storing the null mkey in the odp_mkey xarray. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-8-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11RDMA/mlx5: Add handling for memory scheme page fault eventsMichael Guralnik
The memory scheme page fault event is a new approch in handling page fault on mkeys using the on-demand-paging feature. The major shift in handling the page fault in this scheme is that the HW is taking responsibilty for parsing the faulted mkey instead of the previous approach where the driver would read and parse the wqes and query the mkeys to get to the direct mkey that we need to handle. Therefore, the event we get from FW in this scheme will contain the direct mkey and address we need to handle and require much less work from driver. Additionally, to optimize performance, the FW can generate the event on a memory area that is larger than the faulted memory operation is requiring, to 'prefetch' memory that is around it and will likely be used soon. Unlike previous types of page fault, the memory page scheme fault does not always require a resume command after handling the page fault as the FW can post multiple events on same mkey and will set the 'last' flag only on the page fault that requires the resume command. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-7-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11RDMA/mlx5: Split ODP mkey search logicMichael Guralnik
Split the search for the ODP mkey when handling an rdma type page fault to a helper function, later to be used in other page fault types. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-6-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11RDMA/mlx5: Enforce umem boundaries for explicit ODP page faultsMichael Guralnik
The new memory scheme page faults are requesting the driver to fetch additinal pages to the faulted memory access. This is done in order to prefetch pages before and after the area that got the page fault, assuming this will reduce the total amount of page faults. The driver should ensure it handles only the pages that are within the umem range. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-5-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11RDMA/mlx5: Add new ODP memory scheme eqe formatMichael Guralnik
Add new fields to support the new memory scheme page fault and extend the token field to u64 as in the new scheme the token is 48 bit. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-4-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11net/mlx5: Expose HW bits for Memory scheme ODPMichael Guralnik
Expose IFC bits to support the new memory scheme on demand paging. Change the macro reading odp capabilities to be able to read from the new IFC layout and align the code in upper layers to be compiled. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-3-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-11net/mlx5: Expand mkey page size to support 6 bitsMichael Guralnik
Protect the usage of the 6th bit with the relevant capability to ensure we are using the new page sizes with FW that supports the bit extension. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/20240909100504.29797-2-michaelgur@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-10RDMA/hns: Fix restricted __le16 degrades to integer issueJunxian Huang
Fix sparse warnings: restricted __le16 degrades to integer. Fixes: 5a87279591a1 ("RDMA/hns: Support hns HW stats") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202409080508.g4mNSLwy-lkp@intel.com/ Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240909065331.3950268-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-10IB/qib: Remove unused declarations in header fileZhang Zekun
The definition of qib_rc_rnr_retry() has been removed since commit b4238e70579c ("IB/qib: Use new rdmavt timers"). Also, the definition of mr_rcu_callback() has been remove since commit 7c2e11fe2dbe ("IB/qib: Remove qp and mr functionality from qib"). So, let's remove the unused declartions. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Link: https://patch.msgid.link/20240909121408.80079-3-zhangzekun11@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-10RDMA/hns: Optimize hem allocation performanceJunxian Huang
When allocating MTT hem, for each hop level of each hem that is being allocated, the driver iterates the hem list to find out whether the bt page has been allocated in this hop level. If not, allocate a new one and splice it to the list. The time complexity is O(n^2) in worst cases. Currently the allocation for-loop uses 'unit' as the step size. This actually has taken into account the reuse of last-hop-level MTT bt pages by multiple buffer pages. Thus pages of last hop level will never have been allocated, so there is no need to iterate the hem list in last hop level. Removing this unnecessary iteration can reduce the time complexity to O(n). Fixes: 38389eaa4db1 ("RDMA/hns: Add mtr support for mixed multihop addressing") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-9-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-10RDMA/hns: Fix 1bit-ECC recovery address in non-4K OSChengchang Tang
The 1bit-ECC recovery address read from HW only contain bits 64:12, so it should be fixed left-shifted 12 bits when used. Currently, the driver will shift the address left by PAGE_SHIFT when used, which is wrong in non-4K OS. Fixes: 2de949abd6a5 ("RDMA/hns: Recover 1bit-ECC error of RAM on chip") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-8-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-10RDMA/hns: Fix VF triggering PF reset in abnormal interrupt handlerJunxian Huang
In abnormal interrupt handler, a PF reset will be triggered even if the device is a VF. It should be a VF reset. Fixes: 2b9acb9a97fe ("RDMA/hns: Add the process of AEQ overflow for hip08") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-10RDMA/hns: Fix spin_unlock_irqrestore() called with IRQs enabledChengchang Tang
Fix missuse of spin_lock_irq()/spin_unlock_irq() when spin_lock_irqsave()/spin_lock_irqrestore() was hold. This was discovered through the lock debugging, and the corresponding log is as follows: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 96 PID: 2074 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x40 ... Call trace: warn_bogus_irq_restore+0x30/0x40 _raw_spin_unlock_irqrestore+0x84/0xc8 add_qp_to_list+0x11c/0x148 [hns_roce_hw_v2] hns_roce_create_qp_common.constprop.0+0x240/0x780 [hns_roce_hw_v2] hns_roce_create_qp+0x98/0x160 [hns_roce_hw_v2] create_qp+0x138/0x258 ib_create_qp_kernel+0x50/0xe8 create_mad_qp+0xa8/0x128 ib_mad_port_open+0x218/0x448 ib_mad_init_device+0x70/0x1f8 add_client_context+0xfc/0x220 enable_device_and_get+0xd0/0x140 ib_register_device.part.0+0xf4/0x1c8 ib_register_device+0x34/0x50 hns_roce_register_device+0x174/0x3d0 [hns_roce_hw_v2] hns_roce_init+0xfc/0x2c0 [hns_roce_hw_v2] __hns_roce_hw_v2_init_instance+0x7c/0x1d0 [hns_roce_hw_v2] hns_roce_hw_v2_init_instance+0x9c/0x180 [hns_roce_hw_v2] Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-10RDMA/hns: Fix the overflow risk of hem_list_calc_ba_range()wenglianfa
The max value of 'unit' and 'hop_num' is 2^24 and 2, so the value of 'step' may exceed the range of u32. Change the type of 'step' to u64. Fixes: 38389eaa4db1 ("RDMA/hns: Add mtr support for mixed multihop addressing") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-10RDMA/hns: Fix Use-After-Free of rsv_qp on HIP08wenglianfa
Currently rsv_qp is freed before ib_unregister_device() is called on HIP08. During the time interval, users can still dereg MR and rsv_qp will be used in this process, leading to a UAF. Move the release of rsv_qp after calling ib_unregister_device() to fix it. Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-10RDMA/hns: Don't modify rq next block addr in HIP09 QPCJunxian Huang
The field 'rq next block addr' in QPC can be updated by driver only on HIP08. On HIP09 HW updates this field while driver is not allowed. Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20240906093444.3571619-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/bnxt_re: Fix the max WQE size for static WQE supportSelvin Xavier
When variable size WQE is supported, max_qp_sges reported is more than 6. For devices that supports variable size WQE, the Send WQE size calculation is wrong when an an older library that doesn't support variable size WQE is used. Set the WQE size to 128 when static WQE is supported. Fixes: de1d364c3815 ("RDMA/bnxt_re: Add support for Variable WQE in Genp7 adapters") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1725444253-13221-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/bnxt_re: Fix the compatibility flag for variable size WQESelvin Xavier
For older adapters that doesn't support variable size WQE, driver is wrongly reporting that variable WQE is supported, when the latest library is used. Report the variable WQE capability only if the driver supports it. Fixes: 10a104c0debb ("RDMA/bnxt_re: Enable variable size WQEs for user space applications") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1725444253-13221-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/mlx5: Fix MR cache temp entries cleanupMichael Guralnik
Fix the cleanup of the temp cache entries that are dynamically created in the MR cache. The cleanup of the temp cache entries is currently scheduled only when a new entry is created. Since in the cleanup of the entries only the mkeys are destroyed and the cache entry stays in the cache, subsequent registrations might reuse the entry and it will eventually be filled with new mkeys without cleanup ever getting scheduled again. On workloads that register and deregister MRs with a wide range of properties we see the cache ends up holding many cache entries, each holding the max number of mkeys that were ever used through it. Additionally, as the cleanup work is scheduled to run over the whole cache, any mkey that is returned to the cache after the cleanup was scheduled will be held for less than the intended 30 seconds timeout. Solve both issues by dropping the existing remove_ent_work and reusing the existing per-entry work to also handle the temp entries cleanup. Schedule the work to run with a 30 seconds delay every time we push an mkey to a clean temp entry. This ensures the cleanup runs on each entry only 30 seconds after the first mkey was pushed to an empty entry. As we have already been distinguishing between persistent and temp entries when scheduling the cache_work_func, it is not being scheduled in any other flows for the temp entries. Another benefit from moving to a per-entry cleanup is we now not required to hold the rb_tree mutex, thus enabling other flow to run concurrently. Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/e4fa4bb03bebf20dceae320f26816cd2dde23a26.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/mlx5: Limit usage of over-sized mkeys from the MR cacheMichael Guralnik
When searching the MR cache for suitable cache entries, don't use mkeys larger than twice the size required for the MR. This should ensure the usage of mkeys closer to the minimal required size and reduce memory waste. On driver init we create entries for mkeys with clear attributes and powers of 2 sizes from 4 to the max supported size. This solves the issue for anyone using mkeys that fit these requirements. In the use case where an MR is registered with different attributes, like an access flag we can't UMR, we'll create a new cache entry to store it upon dereg. Without this fix, any later registration with same attributes and smaller size will use the newly created cache entry and it's mkeys, disregarding the memory waste of using mkeys larger than required. For example, one worst-case scenario can be when registering and deregistering a 1GB mkey with ATS enabled which will cause the creation of a new cache entry to hold those type of mkeys. A user registering a 4k MR with ATS will end up using the new cache entry and an mkey that can support a 1GB MR, thus wasting x250k memory than actually needed in the HW. Additionally, allow all small registration to use the smallest size cache entry that is initialized on driver load even if size is larger than twice the required size. Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/8ba3a6e3748aace2026de8b83da03aba084f78f4.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/mlx5: Fix counter update on MR cache mkey creationMichael Guralnik
After an mkey is created, update the counter for pending mkeys before reshceduling the work that is filling the cache. Rescheduling the work with a full MR cache entry and a wrong 'pending' counter will cause us to miss disabling the fill_to_high_water flag. Thus leaving the cache full but with an indication that it's still needs to be filled up to it's full size (2 * limit). Next time an mkey will be taken from the cache, we'll unnecessarily continue the process of filling the cache to it's full size. Fixes: 57e7071683ef ("RDMA/mlx5: Implement mkeys management via LIFO queue") Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/0f44f462ba22e45f72cb3d0ec6a748634086b8d0.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/mlx5: Drop redundant work canceling from clean_keys()Michael Guralnik
The canceling of dealyed work in clean_keys() is a leftover from years back and was added to prevent races in the cleanup process of MR cache. The cleanup process was rewritten a few years ago and the canceling of delayed work and flushing of workqueue was added before the call to clean_keys(). Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://patch.msgid.link/943d21f5a9dba7b98a3e1d531e3561ffe9745d71.1725362530.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-09-09RDMA/erdma: Return QP state in erdma_query_qpCheng Xu
Fix qp_state and cur_qp_state to return correct values in struct ib_qp_attr. Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://patch.msgid.link/20240902112920.58749-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>