summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)Author
2020-09-09RDMA: Make counters destroy symmetricalLeon Romanovsky
Change counters to return failure like any other verbs destroy, however this flow shouldn't return error at all. Link: https://lore.kernel.org/r/20200907120921.476363-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to return error for destroy WQLeon Romanovsky
Make this interface symmetrical to other destroy paths. Fixes: a49b1dc7ae44 ("RDMA: Convert destroy_wq to be void") Link: https://lore.kernel.org/r/20200907120921.476363-9-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Change XRCD destroy return valueLeon Romanovsky
Update XRCD destroy flow to allow command failure. Fixes: 28ad5f65c314 ("RDMA: Move XRCD to be under ib_core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Allow fail of destroy CQLeon Romanovsky
Like any other verbs objects, CQ shouldn't fail during destroy, but mlx5_ib didn't follow this contract with mixed IB verbs objects with DEVX. Such mix causes to the situation where FW and kernel are fully interdependent on the reference counting of each side. Kernel verbs and drivers that don't have DEVX flows shouldn't fail. Fixes: e39afe3d6dbd ("RDMA: Convert CQ allocations to be under core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on SRQ destroyLeon Romanovsky
In similar way to other IB objects, restore the ability to return error on SRQ destroy. Strictly speaking, this change is not necessary, and provided here to ensure a symmetrical interface like other destroy functions. Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/mlx5: Issue FW command to destroy SRQ on reentryLeon Romanovsky
The HW release can fail and leave the system in limbo state, where SRQ is removed from the table, but can't be destroyed later. In every reentry, the initial xa_erase_irq() check will fail. Rewrite the erase logic to keep index, but don't store the entry itself. By doing it, we can safely reinsert entry back in the case of destroy failure. Link: https://lore.kernel.org/r/20200907120921.476363-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on AH destroyLeon Romanovsky
Like any other IB verbs objects, AH are refcounted by ib_core. The release of those objects are controlled by ib_core with promise that AH destroy can't fail. Being SW object for now, this change makes dealloc_ah() to behave like any other destroy IB flows. Fixes: d345691471b4 ("RDMA: Handle AH allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on PD deallocateLeon Romanovsky
The IB verbs objects are counted by the kernel and ib_core ensures that deallocate PD will success so it will be called once all other objects that depends on PD will be released. This is achieved by managing various reference counters on such objects. The mlx5 driver didn't follow this standard flow when allowed DEVX objects that are not managed by ib_core to be interleaved with the ones under ib_core responsibility. In such interleaved scenarios deallocate command can fail and ib_core will leave uobject in internal DB and attempt to clean it later to free resources anyway. This change partially restores returned value from dealloc_pd() for all drivers, but keeping in mind that non-DEVX devices and kernel verbs paths shouldn't fail. Fixes: 21a428a019c9 ("RDMA: Handle PD allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/hns: Avoid unncessary initializationLijun Ou
Some variables have been initialized when used. As a result, here removes some unncessary initial assignment. Link: https://lore.kernel.org/r/1599547944-30671-1-git-send-email-oulijun@huawei.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/bnxt_re: Remove set but not used variable 'qplib_ctx'YueHaibing
drivers/infiniband/hw/bnxt_re/main.c:1012:25: warning: variable ‘qplib_ctx’ set but not used [-Wunused-but-set-variable] Fixes: f86b31c6a28f ("RDMA/bnxt_re: Static NQ depth allocation") Link: https://lore.kernel.org/r/20200905121624.32776-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/qib: Convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-5-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/i40iw: Convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-4-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/hfi1: Convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-3-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/bnxt_re: Convert tasklets to use new tasklet_setup() APIAllen Pais
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-2-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/mlx5: Fix potential race between destroy and CQE pollLeon Romanovsky
The SRQ can be destroyed right before mlx5_cmd_get_srq is called. In such case the latter will return NULL instead of expected SRQ. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Link: https://lore.kernel.org/r/20200830084010.102381-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-02RDMA/qedr: Fix reported max_pkeysKamal Heib
As qedr driver supports both RoCE and iWarp, make sure to set the max_pkeys only when running in RoCE mode. Link: https://lore.kernel.org/r/20200827141655.406185-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/qib: Tidy up process_cc()Alex Dewar
This function has a lot of gotos which could be replaced by simple returns, making the function tidier and less bug prone. Link: https://lore.kernel.org/r/20200825171242.448447-2-alex.dewar90@gmail.com Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/qib: Remove superfluous fallthrough statementsAlex Dewar
Commit 36a8f01cd24b ("IB/qib: Add congestion control agent implementation") erroneously marked a couple of switch cases as /* FALLTHROUGH */, which were later converted to fallthrough statements by commit df561f6688fe ("treewide: Use fallthrough pseudo-keyword"). This triggered a Coverity warning about unreachable code. Remove the fallthrough statements. Link: https://lore.kernel.org/r/20200825171242.448447-1-alex.dewar90@gmail.com Addresses-Coverity: ("Unreachable code") Fixes: 36a8f01cd24b ("IB/qib: Add congestion control agent implementation") Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31Merge tag 'v5.9-rc3' into rdma.git for-nextJason Gunthorpe
Required due to dependencies in following patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/hns: Get udp sport num dynamically instead of using a fixed valueWeihang Li
The UDP source port number in RoCE v2 is used to create entropy for network routers (ECMP), load balancers and 802.3ad link aggregation switching that are not aware of RoCE IB headers. Considering that the IB core has achieved a new interface to get a hashed value of it, the fixed value of it in QPC and UD WQE in hns driver could be fixed and the port number is to be set dynamically now. For QPC of RC, the value could be hashed from flow_lable if the user pass it in or from remote qpn and local qpn. For WQE of UD, it is set according to fl or as a random value. Link: https://lore.kernel.org/r/1598002289-8611-1-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/hns: Add a check for current state before modifying QPLang Cheng
It should be considered an illegal operation if the ULP attempts to modify a QP from another state to the current hardware state. Otherwise, the ULP can modify some fields of QPC at any time. For example, for a QP in state of RTS, modify it from RTR to RTS can change the PSN, which is always not as expected. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/1598353674-24270-1-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeedsSelvin Xavier
Driver crashes when destroy_qp is re-tried because of an error returned. This is because the qp entry was removed from the qp list during the first call. Remove qp from the list only if destroy_qp returns success. The driver will still trigger a WARN_ON due to the memory leaking, but at least it isn't corrupting memory too. Fixes: 8dae419f9ec7 ("RDMA/bnxt_re: Refactor queue pair creation code") Link: https://lore.kernel.org/r/1598292876-26529-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Fix driver crash on unaligned PSN entry addressNaresh Kumar PBS
When computing the first psn entry, driver checks for page alignment. If this address is not page aligned,it attempts to compute the offset in that page for later use by using ALIGN macro. ALIGN macro does not return offset bytes but the requested aligned address and hence cannot be used directly to store as offset. Since driver was using the address itself instead of offset, it resulted in invalid address when filling the psn buffer. Fixed driver to use PAGE_MASK macro to calculate the offset. Fixes: fddcbbb02af4 ("RDMA/bnxt_re: Simplify obtaining queue entry from hw ring") Link: https://lore.kernel.org/r/1598292876-26529-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Restrict the max_gids to 256Naresh Kumar PBS
Some adapters report more than 256 gid entries. Restrict it to 256 for now. Fixes: 1ac5a4047975("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/1598292876-26529-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Static NQ depth allocationNaresh Kumar PBS
At first, driver allocates memory for NQ based on qplib_ctx->cq_count and qplib_ctx->srqc_count. Later when creating ring, it uses a static value of 128K -1. Fixing this with a static value for now. Fixes: b08fe048a69d ("RDMA/bnxt_re: Refactor net ring allocation function") Link: https://lore.kernel.org/r/1598292876-26529-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Fix the qp table indexingSelvin Xavier
qp->id can be a value outside the max number of qp. Indexing the qp table with the id can cause out of bounds crash. So changing the qp table indexing by (qp->id % max_qp -1). Allocating one extra entry for QP1. Some adapters create one more than the max_qp requested to accommodate QP1. If the qp->id is 1, store the inforamtion in the last entry of the qp table. Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") Link: https://lore.kernel.org/r/1598292876-26529-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Do not report transparent vlan from QP1Selvin Xavier
QP1 Rx CQE reports transparent VLAN ID in the completion and this is used while reporting the completion for received MAD packet. Check if the vlan id is configured before reporting it in the work completion. Fixes: 84511455ac5b ("RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion") Link: https://lore.kernel.org/r/1598292876-26529-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/mlx4: Read pkey table length instead of hardcoded valueMark Bloch
If the pkey_table is not available (which is the case when RoCE is not supported), the cited commit caused a regression where mlx4_devices without RoCE are not created. Fix this by returning a pkey table length of zero in procedure eth_link_query_port() if the pkey-table length reported by the device is zero. Link: https://lore.kernel.org/r/20200824110229.1094376-1-leon@kernel.org Cc: <stable@vger.kernel.org> Fixes: 1901b91f9982 ("IB/core: Fix potential NULL pointer dereference in pkey cache") Fixes: fa417f7b520e ("IB/mlx4: Add support for IBoE") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/usnic: Remove the query_pkey callbackKamal Heib
Now that the query_pkey() isn't mandatory by the RDMA core, this callback can be removed from the usnic provider. The libfabric userspace never touches the pkey. Link: https://lore.kernel.org/r/20200820125346.111902-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27IB/mlx5: Add DCT RoCE LAG supportMark Zhang
When DCT QPs work in RoCE LAG mode: 1. DCT creation is allowed only when it is supported 2. The "port" of a DCT QP is assigned in a round-robin way Link: https://lore.kernel.org/r/20200818115245.700581-3-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27IB/mlx5: Add tx_affinity support for DCI QPMark Zhang
DCI QP supports tx_affinity as well. Link: https://lore.kernel.org/r/20200818115245.700581-2-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24RDMA/efa: Remove redundant udata check from alloc ucontext responseGal Pressman
The alloc ucontext flow is always called with a valid udata, there's no need to test whether it's NULL. While at it, the 'udata->outlen' check is removed as well as we copy the minimum between the size of the response and outlen, so in case of zero outlen, zero bytes will be copied. Link: https://lore.kernel.org/r/20200818110835.54299-1-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24RDMA/vmw_pvrdma: Fix kernel-doc documentationKamal Heib
Fix the kernel-doc documentation by matching between the functions definitions and documentation. Link: https://lore.kernel.org/r/20200820123512.105193-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24IB/mlx4: Adjust delayed work when a dup is observedHåkon Bugge
When scheduling delayed work to clean up the cache, if the entry already has been scheduled for deletion, we adjust the delay. Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-7-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24IB/mlx4: Add support for REJ due to timeoutHåkon Bugge
A CM REJ packet with its reason equal to timeout is a special beast in the sense that it doesn't have a Remote Communication ID nor does it have a Remote Port GID. Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. This proxying doesn't not handle said REJ packets. If the active side abandons its connection attempt after having sent a REQ, it will send a REJ with the reason being timeout. This example can be provoked by a simple user-verbs program, which ends up doing: rdma_connect(cm_id, &conn_param); rdma_destroy_id(cm_id); using the async librdmacm API. Having dynamic debug prints enabled in the mlx4_ib driver, we will then see: mlx4_ib_demux_cm_handler: Couldn't find an entry for pv_cm_id 0x0, attr_id 0x12 The solution is to introduce a radix-tree. When a REQ packet is received and handled in mlx4_ib_demux_cm_handler(), we know the connecting peer's para-virtual cm_id and the destination slave. We then insert an entry into the tree with said information. We also schedule work to remove this entry from the tree and free it, in order to avoid memory leak. When a REJ packet with reason timeout is received, we can look up the slave in the tree, and deliver the packet to the correct slave. When a duplicate REQ packet is received, the entry is in the tree. In this case, we adjust the delayed work in order to avoid a too premature eviction of the entry. When cleaning up, we simply traverse the tree and modify any delayed work to use a zero delay. A subsequent flush of the system_wq will ensure all entries being wiped out. Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-6-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24IB/mlx4: Fix starvation in paravirt mux/demuxHåkon Bugge
The mlx4 driver will proxy MAD packets through the PF driver. A VM or an instantiated VF will send its MAD packets to the PF driver using loop-back. The PF driver will be informed by an interrupt, but defer the handling and polling of CQEs to a worker thread running on an ordered work-queue. Consider the following scenario: the VMs will in short proximity in time, for example due to a network event, send many MAD packets to the PF driver. Lets say there are K VMs, each sending N packets. The interrupt from the first VM will start the worker thread, which will poll N CQEs. A common case here is where the PF driver will multiplex the packets received from the VMs out on the wire QP. But before the wire QP has returned a send CQE and associated interrupt, the other K - 1 VMs have sent their N packets as well. The PF driver has to multiplex K * N packets out on the wire QP. But the send-queue on the wire QP has a finite capacity. So, in this scenario, if K * N is larger than the send-queue capacity of the wire QP, we will get MAD packets dropped on the floor with this dynamic debug message: mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11) and this despite the fact that the wire send-queue could have capacity, but the PF driver isn't aware, because the wire send CQEs have not yet been polled. We can also have a similar scenario inbound, with a wire recv-queue larger than the tunnel QP's send-queue. If many remote peers send MAD packets to the very same VM, the tunnel send-queue destined to the VM could allegedly be construed to be full by the PF driver. This starvation is fixed by introducing separate work queues for the wire QPs vs. the tunnel QPs. With this fix, using a dual ported HCA, 8 VFs instantiated, we could run cmtime on each of the 18 interfaces towards a similar configured peer, each cmtime instance with 800 QPs (all in all 14400 QPs) without a single CM packet getting lost. Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-5-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24IB/mlx4: Separate tunnel and wire bufs parametersHåkon Bugge
Using CX-3 in virtualized mode, MAD packets are proxied through the PF driver. The feed is N tunnel QPs, and what is received from the VFs is multiplexed out on the wire QP. Since this is a many-to-one scenario, it is better to have separate initialization parameters for the two usages. The number of wire and tunnel bufs are yanked up to 2K and 512 respectively. With this set of parameters, a system consisting of eight physical servers, each with eight VMs and 14 I/O servers (BM), can run switch fail-over without seeing: mlx4_ib_demux_mad: failed sending GSI to slave 3 via tunnel qp (-11) or mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11) Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-4-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24IB/mlx4: Add support for MRAHåkon Bugge
Using CX-3 in virtualized mode, MAD packets are proxied through the PF driver. However, the handling lacks support of the MRA (Message Receipt Acknowledgment) packet. When having dynamic debug enabled, we see tons of: mlx4_ib_multiplex_cm_handler: id{slave: 7, sl_cm_id: 0x8fcb45a0} is NULL! attr_id: 0x11 Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-3-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24IB/mlx4: Add and improve loggingHåkon Bugge
Add missing check for success after call to mlx4_ib_send_to_wire() in mlx4_ib_multiplex_mad(). Amended the existing pr_debug() in mlx4_ib_multiplex_cm_handler() and mlx4_ib_demux_cm_handler() with attr_id during a lookup failure. Removed two noisy pr_debug() in mad.c Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-2-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-20Revert "RDMA/hns: Reserve one sge in order to avoid local length error"Weihang Li
This patch caused some issues on SEND operation, and it should be reverted to make the drivers work correctly. There will be a better solution that has been tested carefully to solve the original problem. This reverts commit 711195e57d341e58133d92cf8aaab1db24e4768d. Fixes: 711195e57d34 ("RDMA/hns: Reserve one sge in order to avoid local length error") Link: https://lore.kernel.org/r/1597829984-20223-1-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-20RDMA/hfi1: Correct an interlock issue for TID RDMA WRITE requestKaike Wan
The following message occurs when running an AI application with TID RDMA enabled: hfi1 0000:7f:00.0: hfi1_0: [QP74] hfi1_tid_timeout 4084 hfi1 0000:7f:00.0: hfi1_0: [QP70] hfi1_tid_timeout 4084 The issue happens when TID RDMA WRITE request is followed by an IB_WR_RDMA_WRITE_WITH_IMM request, the latter could be completed first on the responder side. As a result, no ACK packet for the latter could be sent because the TID RDMA WRITE request is still being processed on the responder side. When the TID RDMA WRITE request is eventually completed, the requester will wait for the IB_WR_RDMA_WRITE_WITH_IMM request to be acknowledged. If the next request is another TID RDMA WRITE request, no TID RDMA WRITE DATA packet could be sent because the preceding IB_WR_RDMA_WRITE_WITH_IMM request is not completed yet. Consequently the IB_WR_RDMA_WRITE_WITH_IMM will be retried but it will be ignored on the responder side because the responder thinks it has already been completed. Eventually the retry will be exhausted and the qp will be put into error state on the requester side. On the responder side, the TID resource timer will eventually expire because no TID RDMA WRITE DATA packets will be received for the second TID RDMA WRITE request. There is also risk of a write-after-write memory corruption due to the issue. Fix by adding a requester side interlock to prevent any potential data corruption and TID RDMA protocol error. Fixes: a0b34f75ec20 ("IB/hfi1: Add interlock between a TID RDMA request and other requests") Link: https://lore.kernel.org/r/20200811174931.191210.84093.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> # 5.4.x+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-20RDMA/bnxt_re: Do not add user qps to flushlistSelvin Xavier
Driver shall add only the kernel qps to the flush list for clean up. During async error events from the HW, driver is adding qps to this list without checking if the qp is kernel qp or not. Add a check to avoid user qp addition to the flush list. Fixes: 942c9b6ca8de ("RDMA/bnxt_re: Avoid Hard lockup during error CQE processing") Fixes: c50866e2853a ("bnxt_re: fix the regression due to changes in alloc_pbl") Link: https://lore.kernel.org/r/1596689148-4023-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA/mlx5: Enable sniffer when device is in switchdev modeMaor Gottlieb
In order to allow sniffer when the RDMA device is in switchdev mode, we don't need to set the source port when creating the sniffer rule. Link: https://lore.kernel.org/r/20200803060214.15328-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA/mlx5: Add new IB rates supportMark Zhang
Support 56, 25, 100, 200 and 50Gbps IB rates in mlx5 driver. Link: https://lore.kernel.org/r/20200802081712.1993490-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA/efa: Introduce SRD RNR retryGal Pressman
This patch introduces the ability to configure SRD QPs with the RNR retry parameter when issuing a modify QP command. In addition, a capability bit was added to report support to the userspace library. Link: https://lore.kernel.org/r/20200731060420.17053-5-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA/efa: Introduce SRD QP state machineGal Pressman
This precursory patch adds the SRD QP type state machine, which is currently identical to the one of UD QP type. A following patch is going to change the SRD QP state machine to support RNR retry modifications. Link: https://lore.kernel.org/r/20200731060420.17053-4-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA/efa: Be consistent with modify QP bitmaskGal Pressman
The modify QP bitmask was not consistent with other bitmasks used in the device interface. Remove the bitmask enum and allow usage with EFA_GET/SET. Link: https://lore.kernel.org/r/20200731060420.17053-3-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA/efa: Add a generic capability check helperGal Pressman
Instead of adding a new function for each capability added, introduce a generic helper to query device capabilities. Link: https://lore.kernel.org/r/20200731060420.17053-2-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA: Remove constant domain argument from flow creation callLeon Romanovsky
The "domain" argument is constant and modern device (mlx5) doesn't support anything except IB_FLOW_DOMAIN_USER, so delete this extra parameter and simplify code. Link: https://lore.kernel.org/r/20200730081235.1581127-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>