summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)Author
2023-04-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Usual wide collection of unrelated items in drivers: - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5, rxe, usnic, usnic, bnxt_re, ocrdma, iser: - remove unnecessary NULL checks - kmap obsolescence - pci_enable_pcie_error_reporting() obsolescence - unused variables and macros - trace event related warnings - casting warnings - Code cleanups for irdm and erdma - EFA reporting of 128 byte PCIe TLP support - mlx5 more agressively uses the out of order HW feature - Big rework of how state machines and tasks work in rxe - Fix a syzkaller found crash netdev refcount leak in siw - bnxt_re revises their HW description header - Congestion control for bnxt_re - Use mmu_notifiers more safely in hfi1 - mlx5 gets better support for PCIe relaxed ordering inside VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (81 commits) RDMA/efa: Add rdma write capability to device caps RDMA/mlx5: Use correct device num_ports when modify DC RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call RDMA/rxe: Fix spinlock recursion deadlock on requester RDMA/mlx5: Fix flow counter query via DEVX RDMA/rxe: Protect QP state with qp->state_lock RDMA/rxe: Move code to check if drained to subroutine RDMA/rxe: Remove qp->req.state RDMA/rxe: Remove qp->comp.state RDMA/rxe: Remove qp->resp.state RDMA/mlx5: Allow relaxed ordering read in VFs and VMs net/mlx5: Update relaxed ordering read HCA capabilities RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write RDMA: Add ib_virt_dma_to_page() RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame() RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c IB/hfi1: Place struct mmu_rb_handler on cache line start IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests ...
2023-04-27Merge tag 'driver-core-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
2023-04-24Merge tag 'iter-ubuf.2-2023-04-21' of git://git.kernel.dk/linuxLinus Torvalds
Pull ITER_UBUF updates from Jens Axboe: "This turns singe vector imports into ITER_UBUF, rather than ITER_IOVEC. The former is more trivial to iterate and advance, and hence a bit more efficient. From some very unscientific testing, ~60% of all iovec imports are single vector" * tag 'iter-ubuf.2-2023-04-21' of git://git.kernel.dk/linux: iov_iter: Mark copy_compat_iovec_from_user() noinline iov_iter: import single vector iovecs as ITER_UBUF iov_iter: convert import_single_range() to ITER_UBUF iov_iter: overlay struct iovec and ubuf/len iov_iter: set nr_segs = 1 for ITER_UBUF iov_iter: remove iov_iter_iovec() iov_iter: add iter_iov_addr() and iter_iov_len() helpers ALSA: pcm: check for user backed iterator, not specific iterator type IB/qib: check for user backed iterator, not specific iterator type IB/hfi1: check for user backed iterator, not specific iterator type iov_iter: add iter_iovec() helper block: ensure bio_alloc_map_data() deals with ITER_UBUF correctly
2023-04-21RDMA/efa: Add rdma write capability to device capsYonatan Nachum
Add rdma write capability that is propagated from the device to rdma-core. Enable MR creation with remote write permissions according to this device capability. Link: https://lore.kernel.org/r/20230404154313.35194-1-ynachum@amazon.com Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-21RDMA/mlx5: Use correct device num_ports when modify DCMark Zhang
Just like other QP types, when modify DC, the port_num should be compared with dev->num_ports, instead of HCA_CAP.num_ports. Otherwise Multi-port vHCA on DC may not work. Fixes: 776a3906b692 ("IB/mlx5: Add support for DC target QP") Link: https://lore.kernel.org/r/20230420013906.1244185-1-markzhang@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-21RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() callTejun Heo
Workqueue is in the process of cleaning up the distinction between unbound workqueues w/ @nr_active==1 and ordered workqueues. Explicit WQ_UNBOUND isn't needed for alloc_ordered_workqueue() and will trigger a warning in the future. Let's remove it. This doesn't cause any functional changes. Link: https://lore.kernel.org/r/ZEGW-IcFReR1juVM@slm.duckdns.org Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-18RDMA/mlx5: Fix flow counter query via DEVXMark Bloch
Commit cited in "fixes" tag added bulk support for flow counters but it didn't account that's also possible to query a counter using a non-base id if the counter was allocated as bulk. When a user performs a query, validate the flow counter id given in the mailbox is inside the valid range taking bulk value into account. Fixes: 208d70f562e5 ("IB/mlx5: Support flow counters offset for bulk counters") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://lore.kernel.org/r/79d7fbe291690128e44672418934256254d93115.1681377114.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16RDMA/mlx5: Allow relaxed ordering read in VFs and VMsAvihai Horon
According to PCIe spec, Enable Relaxed Ordering value in the VF's PCI config space is wired to 0 and PF relaxed ordering (RO) setting should be applied to the VF. In QEMU (and maybe others), when assigning VFs, the RO bit in PCI config space is not emulated properly and is always set to 0. Therefore, pcie_relaxed_ordering_enabled() always returns 0 for VFs and VMs and thus MKeys can't be created with RO read even if the PF supports it. pcie_relaxed_ordering_enabled() check was added to avoid a syndrome when creating a MKey with relaxed ordering (RO) enabled when the driver's relaxed_ordering_read_pci_enabled HCA capability is out of sync with FW. With the new relaxed_ordering_read capability this can't happen, as it's set regardless of RO value in PCI config space and thus can't change during runtime. Hence, to allow RO read in VFs and VMs, use the new HCA capability relaxed_ordering_read without checking pcie_relaxed_ordering_enabled(). The old capability checks are kept for backward compatibility with older FWs. Allowing RO in VFs and VMs is valuable since it can greatly improve performance on some setups. For example, testing throughput of a VF on an AMD EPYC 7763 and ConnectX-6 Dx setup showed roughly 60% performance improvement. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Link: https://lore.kernel.org/r/e7048640d66c341a8fa0465e099926e7989184bc.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16net/mlx5: Update relaxed ordering read HCA capabilitiesAvihai Horon
Rename existing HCA capability relaxed_ordering_read to relaxed_ordering_read_pci_enabled. This is in accordance with recent PRM change to better describe the capability, as it's set only if both the device supports relaxed ordering (RO) read and RO is enabled in PCI config space. In addition, add new HCA capability relaxed_ordering_read which is set if the device supports RO read, regardless of RO in PCI config space. This will be used in the following patch to allow RO in VFs and VMs. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/caa0002fd8135086357dfcc368e2f5cc73b08480.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMRAvihai Horon
relaxed_ordering_read HCA capability is set if both the device supports relaxed ordering (RO) read and RO is set in PCI config space. RO in PCI config space can change during runtime. This will change the value of relaxed_ordering_read HCA capability in FW, but the driver will not see it since it queries the capabilities only once. This can lead to the following scenario: 1. RO in PCI config space is enabled. 2. User creates MKey without RO. 3. RO in PCI config space is disabled. As a result, relaxed_ordering_read HCA capability is turned off in FW but remains on in driver copy of the capabilities. 4. User requests to reconfig the MKey with RO via UMR. 5. Driver will try to reconfig the MKey with RO read although it shouldn't (as relaxed_ordering_read HCA capability is really off). To fix this, check pcie_relaxed_ordering_enabled() before setting RO read in UMR. Fixes: 896ec9735336 ("RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/8d39eb8317e7bed1a354311a20ae707788fd94ed.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO writeAvihai Horon
pcie_relaxed_ordering_enabled() check was added to avoid a syndrome when creating a MKey with relaxed ordering (RO) enabled when the driver's relaxed_ordering_{read,write} HCA capabilities are out of sync with FW. While this can happen with relaxed_ordering_read, it can't happen with relaxed_ordering_write as it's set if the device supports RO write, regardless of RO in PCI config space, and thus can't change during runtime. Therefore, drop the pcie_relaxed_ordering_enabled() check for relaxed_ordering_write while keeping it for relaxed_ordering_read. Doing so will also allow the usage of RO write in VFs and VMs (where RO in PCI config space is not reported/emulated properly). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/7e8f55e31572c1702d69cae015a395d3a824a38a.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-13RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame()Christophe JAILLET
There is no need to zero 'pktsize' bytes of 'buf', only the header needs to be cleared, to be safe. All the other bytes are already written with some memcpy() at the end of the function. Doing so also gives the opportunity to the compiler to avoid the memset() call. It can be inlined now that the length is known as compile time. Link: https://lore.kernel.org/r/098e3c397be0436f1867899245ecfe656c472110.1675369386.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-09IB/hfi1: Place struct mmu_rb_handler on cache line startPatrick Kelsey
Place struct mmu_rb_handler on cache line start like so: struct mmu_rb_handler *h; void *free_ptr; int ret; free_ptr = kzalloc(sizeof(*h) + cache_line_size() - 1, GFP_KERNEL); if (!free_ptr) return -ENOMEM; h = PTR_ALIGN(free_ptr, cache_line_size()); Additionally, move struct mmu_rb_handler fields "root" and "ops_args" to start after the next cacheline using the "____cacheline_aligned_in_smp" annotation. Allocating an additional cache_line_size() - 1 bytes to place struct mmu_rb_handler on a cache line start does increase memory consumption. However, few struct mmu_rb_handler are created when hfi1 is in use. As mmu_rb_handler->root and mmu_rb_handler->ops_args are accessed frequently, the advantage of having them both within a cache line is expected to outweigh the disadvantage of the additional memory consumption per struct mmu_rb_handler. Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/168088636963.3027109.16959757980497822530.stgit@252.162.96.66.static.eigbox.net Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-09IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requestsPatrick Kelsey
hfi1 user SDMA request processing has two bugs that can cause data corruption for user SDMA requests that have multiple payload iovecs where an iovec other than the tail iovec does not run up to the page boundary for the buffer pointed to by that iovec.a Here are the specific bugs: 1. user_sdma_txadd() does not use struct user_sdma_iovec->iov.iov_len. Rather, user_sdma_txadd() will add up to PAGE_SIZE bytes from iovec to the packet, even if some of those bytes are past iovec->iov.iov_len and are thus not intended to be in the packet. 2. user_sdma_txadd() and user_sdma_send_pkts() fail to advance to the next iovec in user_sdma_request->iovs when the current iovec is not PAGE_SIZE and does not contain enough data to complete the packet. The transmitted packet will contain the wrong data from the iovec pages. This has not been an issue with SDMA packets from hfi1 Verbs or PSM2 because they only produce iovecs that end short of PAGE_SIZE as the tail iovec of an SDMA request. Fixing these bugs exposes other bugs with the SDMA pin cache (struct mmu_rb_handler) that get in way of supporting user SDMA requests with multiple payload iovecs whose buffers do not end at PAGE_SIZE. So this commit fixes those issues as well. Here are the mmu_rb_handler bugs that non-PAGE_SIZE-end multi-iovec payload user SDMA requests can hit: 1. Overlapping memory ranges in mmu_rb_handler will result in duplicate pinnings. 2. When extending an existing mmu_rb_handler entry (struct mmu_rb_node), the mmu_rb code (1) removes the existing entry under a lock, (2) releases that lock, pins the new pages, (3) then reacquires the lock to insert the extended mmu_rb_node. If someone else comes in and inserts an overlapping entry between (2) and (3), insert in (3) will fail. The failure path code in this case unpins _all_ pages in either the original mmu_rb_node or the new mmu_rb_node that was inserted between (2) and (3). 3. In hfi1_mmu_rb_remove_unless_exact(), mmu_rb_node->refcount is incremented outside of mmu_rb_handler->lock. As a result, mmu_rb_node could be evicted by another thread that gets mmu_rb_handler->lock and checks mmu_rb_node->refcount before mmu_rb_node->refcount is incremented. 4. Related to #2 above, SDMA request submission failure path does not check mmu_rb_node->refcount before freeing mmu_rb_node object. If there are other SDMA requests in progress whose iovecs have pointers to the now-freed mmu_rb_node(s), those pointers to the now-freed mmu_rb nodes will be dereferenced when those SDMA requests complete. Fixes: 7be85676f1d1 ("IB/hfi1: Don't remove RB entry when not needed.") Fixes: 7724105686e7 ("IB/hfi1: add driver files") Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/168088636445.3027109.10054635277810177889.stgit@252.162.96.66.static.eigbox.net Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-09IB/hfi1: Fix SDMA mmu_rb_node not being evicted in LRU orderPatrick Kelsey
hfi1_mmu_rb_remove_unless_exact() did not move mmu_rb_node objects in mmu_rb_handler->lru_list after getting a cache hit on an mmu_rb_node. As a result, hfi1_mmu_rb_evict() was not guaranteed to evict truly least-recently used nodes. This could be a performance issue for an application when that application: - Uses some long-lived buffers frequently. - Uses a large number of buffers once. - Hits the mmu_rb_handler cache size or pinned-page limits, forcing mmu_rb_handler cache entries to be evicted. In this case, the one-time use buffers cause the long-lived buffer entries to eventually filter to the end of the LRU list where hfi1_mmu_rb_evict() will consider evicting a frequently-used long-lived entry instead of evicting one of the one-time use entries. Fix this by inserting new mmu_rb_node at the tail of mmu_rb_handler->lru_list and move mmu_rb_ndoe to the tail of mmu_rb_handler->lru_list when the mmu_rb_node is a hit in hfi1_mmu_rb_remove_unless_exact(). Change hfi1_mmu_rb_evict() to evict from the head of mmu_rb_handler->lru_list instead of the tail. Fixes: 0636e9ab8355 ("IB/hfi1: Add cache evict LRU list") Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/168088635931.3027109.10423156330761536044.stgit@252.162.96.66.static.eigbox.net Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-09IB/hfi1: Suppress useless compiler warningsEhab Ababneh
These warnings can cause build failure: In file included from ./include/trace/define_trace.h:102, from drivers/infiniband/hw/hfi1/trace_dbg.h:111, from drivers/infiniband/hw/hfi1/trace.h:15, from drivers/infiniband/hw/hfi1/trace.c:6: drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_get_offsets_hfi1_trace_template’: ./include/trace/trace_events.h:261:9: warning: function ‘trace_event_get_offsets_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] struct trace_event_raw_##call __maybe_unused *entry; \ ^~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’ DECLARE_EVENT_CLASS(hfi1_trace_template, ^~~~~~~~~~~~~~~~~~~ In file included from ./include/trace/define_trace.h:102, from drivers/infiniband/hw/hfi1/trace_dbg.h:111, from drivers/infiniband/hw/hfi1/trace.h:15, from drivers/infiniband/hw/hfi1/trace.c:6: drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_raw_event_hfi1_trace_template’: ./include/trace/trace_events.h:386:9: warning: function ‘trace_event_raw_event_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] struct trace_event_raw_##call *entry; \ ^~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’ DECLARE_EVENT_CLASS(hfi1_trace_template, ^~~~~~~~~~~~~~~~~~~ In file included from ./include/trace/define_trace.h:103, from drivers/infiniband/hw/hfi1/trace_dbg.h:111, from drivers/infiniband/hw/hfi1/trace.h:15, from drivers/infiniband/hw/hfi1/trace.c:6: drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘perf_trace_hfi1_trace_template’: ./include/trace/perf.h:70:9: warning: function ‘perf_trace_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] struct hlist_head *head; \ ^~~~~~~~~~ drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’ DECLARE_EVENT_CLASS(hfi1_trace_template, ^~~~~~~~~~~~~~~~~~~ Solution adapted here is similar to the one in fbbc95a49d5b0 Signed-off-by: Ehab Ababneh <ehab.ababneh@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/168088635415.3027109.5711716700328939402.stgit@252.162.96.66.static.eigbox.net Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-09IB/hfi1: Remove trace newlinesDean Luick
The hfi1_cdbg trace mechanism appends a newline. Remove trailing newlines from all format strings. Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/168088634897.3027109.10401662436950683555.stgit@252.162.96.66.static.eigbox.net Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-04RDMA/bnxt_re: Enable congestion control by defaultSelvin Xavier
Enable Congesion control by default. Issue FW command enable the CC during driver load and disable it during unload. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-8-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-04RDAM/bnxt_re: Use tlv apis while processing the slow path commandsSelvin Xavier
Use the new TLV APIs for existing slow path commands. The TLV APIs will be used to populate extended headers for some of the Firmware commands, which will be introduced in the patches that follow. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-04RDMA/bnxt_re: RoCE slow path TLV supportSelvin Xavier
Header file to support TLV encapsulated commands. These functions will be used by the driver in the follow up patches. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-04RDMA/bnxt_re: Reduce number of argumets to control path command APIsSelvin Xavier
Reducing the number of arguments to bnxt_qplib_rcfw_send_message by enclosing all its arguments into a command message structure. Use the same struct while passing the command information to send_message. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-04RDMA/bnxt_re: Convert RCFW_CMD_PREP macro to static inline functionSelvin Xavier
Convert RCFW_CMD_PREP macro to static inline function. Also, remove the cmd_flags passed as none of the functions are using it. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-04RDMA/bnxt_re: Remove HW queue mapping from RoCE DriverSelvin Xavier
bnxt_en driver does the queue mapping for RoCE traffic. Removing the queue mapping from RoCE driver. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-04RDMA/bnxt_re: Update HW interface headersSelvin Xavier
Updating the HW structures to the latest version. This is copied from the code maintained internally. No functionality changes in this patch. Code is re-organized to match the file maintained in the internal tree. Also, New HW interface structures are added, which will be used by the drivers in future. CC: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1680169540-10029-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-03IB/qib: Remove unused cnt variableTom Rix
clang with W=1 reports drivers/infiniband/hw/qib/qib_file_ops.c:487:20: error: variable 'cnt' set but not used [-Werror,-Wunused-but-set-variable] u32 tid, ctxttid, cnt, limit, tidcnt; ^ drivers/infiniband/hw/qib/qib_file_ops.c:1771:9: error: variable 'cnt' set but not used [-Werror,-Wunused-but-set-variable] int i, cnt = 0, maxtid = ctxt_tidbase + dd->rcvtidcnt; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230330235800.1845815-1-trix@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-03RDMA/mlx5: Remove unused num_alloc_xa_entries variableTom Rix
clang with W=1 reports drivers/infiniband/hw/mlx5/devx.c:1996:6: error: variable 'num_alloc_xa_entries' set but not used [-Werror,-Wunused-but-set-variable] int num_alloc_xa_entries = 0; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230330153607.1838750-1-trix@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-30IB/qib: check for user backed iterator, not specific iterator typeJens Axboe
In preparation for switching single segment iterators to using ITER_UBUF, swap the check for whether we are user backed or not. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-30IB/hfi1: check for user backed iterator, not specific iterator typeJens Axboe
In preparation for switching single segment iterators to using ITER_UBUF, swap the check for whether we are user backed or not. While at it, move it outside the srcu locking area to clean up the code a bit. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-30iov_iter: add iter_iovec() helperJens Axboe
This returns a pointer to the current iovec entry in the iterator. Only useful with ITER_IOVEC right now, but it prepares us to treat ITER_UBUF and ITER_IOVEC identically for the first segment. Rename struct iov_iter->iov to iov_iter->__iov to find any potentially troublesome spots, and also to prevent anyone from adding new code that accesses iter->iov directly. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-29RDMA/ocrdma: remove unused discard_cnt variableTom Rix
clang with W=1 reports drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1592:6: error: variable 'discard_cnt' set but not used [-Werror,-Wunused-but-set-variable] int discard_cnt = 0; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230326120959.1351948-1-trix@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29RDMA/bnxt_re: remove unused num_srqne_processed and num_cqne_processed variablesTom Rix
clang with W=1 reports drivers/infiniband/hw/bnxt_re/qplib_fp.c:303:6: error: variable 'num_srqne_processed' set but not used [-Werror,-Wunused-but-set-variable] int num_srqne_processed = 0; ^ drivers/infiniband/hw/bnxt_re/qplib_fp.c:304:6: error: variable 'num_cqne_processed' set but not used [-Werror,-Wunused-but-set-variable] int num_cqne_processed = 0; ^ These variables are not used so remove them. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230325140559.1336056-1-trix@redhat.com Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29RDMA/usnic: Remove redundant pci_clear_masterCai Huoqing
Remove pci_clear_master to simplify the code, the bus-mastering is also cleared in do_pci_disable_device, like this: ./drivers/pci/pci.c:2197 static void do_pci_disable_device(struct pci_dev *dev) { u16 pci_command; pci_read_config_word(dev, PCI_COMMAND, &pci_command); if (pci_command & PCI_COMMAND_MASTER) { pci_command &= ~PCI_COMMAND_MASTER; pci_write_config_word(dev, PCI_COMMAND, pci_command); } pcibios_disable_device(dev); }. And dev->is_busmaster is set to 0 in pci_disable_device. Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev> Link: https://lore.kernel.org/r/20230323115742.13836-1-cai.huoqing@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29RDMA/mlx5: Expand switchdev Q-counters to expose representor statisticsPatrisious Haddad
Previously for switchdev only per device counters were supported. Currently we allocate counters for switchdev per port, which also includes the ports that belong to VF representors in order to expose them to users through the rdma tool, allowing the host to track the VFs statistics through their representors counters. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/ea31e1103c125cd27931ba213f307cde30d2eaed.1679566038.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29RDMA/bnxt_re: Add resize_cq supportSelvin Xavier
Add resize_cq verb support for user space CQs. Resize operation for kernel CQs are not supported now. Driver should free the current CQ only after user library polls for all the completions and switch to new CQ. So after the resize_cq is returned from the driver, user library polls for existing completions and store it as temporary data. Once library reaps all completions in the current CQ, it invokes the ibv_cmd_poll_cq to inform the driver about the resize_cq completion. Adding a check for user CQs in driver's poll_cq and complete the resize operation for user CQs. Updating uverbs_cmd_mask with poll_cq to support this. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1678868215-23626-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-24RDMA/erdma: Use fixed hardware page sizeCheng Xu
Hardware's page size is 4096, but the kernel's page size may vary. Driver should use hardware's page size when communicating with hardware. Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation") Link: https://lore.kernel.org/r/20230307102924.70577-2-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-23Enable IB out-of-order by default in mlx5Leon Romanovsky
This series from Or changes default of IB out-of-order feature and allows to the RDMA users to decide if they need to wait for completion for all segments or it is enough to wait for last segment completion only. Thanks Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-23RDMA/mlx5: Disable out-of-order in integrity enabled QPsOr Har-Toov
Set retry_mode to GO_BACK_N when qp is created with INTEGRITY_EN flag because out-of-order is not supported when doing HW offload of signature operations. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/362de42cdc7a541afa5b1fd0ec6ae706061764a2.1679230449.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-22RDMA/efa: Add data polling capability feature bitYonatan Nachum
Add feature bit to existing device caps field. EFA supports data polling of 128 bytes blocks. The flag indicates that the NIC guarentees that a 128 byte aligned block is written in order, ie that observing the last 8 bits of the block mean the prior 127 bytes are also written. It is useful for "last data polling" acceleration techniques. Link: https://lore.kernel.org/r/20230219081328.10419-1-mrgolin@amazon.com Reviewed-by: Yehuda Yitschak <yehuday@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Acked-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-22RDMA/erdma: Minor refactor of device init flowCheng Xu
After necessary configuration, driver should wait hardware finishing initialization. The wait sets at CMDQ related function though it has nothing to do with CMDQ. Refactor this part to make code cleaner. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230322093319.84045-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-22RDMA/erdma: Eliminate unnecessary casting of EQ doorbellsCheng Xu
Using void * to define EQ doorbell pointer can eliminate unnecessary casting when performing assignment. Also rename *db_addr* to *db* for a shorter name. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230322093319.84045-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-22RDMA/erdma: Unify byte ordering APIs usageCheng Xu
Replace __be32_to_cpu/__cpu_to_be16 with be32_to_cpu/cpu_to_be16. And use be32_to_cpu_array to copy and swap byte order to hide the loop. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230322093319.84045-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20RDMA/erdma: Defer probing if netdevice can not be foundCheng Xu
ERDMA device may be probed before its associated netdevice, returning -EPROBE_DEFER allows OS try to probe erdma device later. Fixes: d55e6fb4803c ("RDMA/erdma: Add the erdma module") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230320084652.16807-5-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20RDMA/erdma: Inline mtt entries into WQE if supportedCheng Xu
The max inline mtt count supported is ERDMA_MAX_INLINE_MTT_ENTRIES. When mr->mem.mtt_nents == ERDMA_MAX_INLINE_MTT_ENTRIES, inline mtt is also supported, fix it. Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230320084652.16807-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20RDMA/erdma: Update default EQ depth to 4096 and max_send_wr to 8192Cheng Xu
Max EQ depth of hardware is 32K, the current default EQ depth is too small for some applications, so change the default depth to 4096. Max send WRs the hardware can support is 8K, but the driver limits the value to 4K. Remove this limitation. Fixes: be3cff0f242d ("RDMA/erdma: Add the hardware related definitions") Fixes: db23ae64caac ("RDMA/erdma: Add verbs header file") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230320084652.16807-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20RDMA/erdma: Fix some typosCheng Xu
FAA is short for atomic fetch and add, not FAD. Fix this. Fixes: 0ca9c2e2844a ("RDMA/erdma: Implement atomic operations support") Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230320084652.16807-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-20IB/mlx5: Add support for 400G_8X lane speedMaher Sanalla
Currently, when driver queries PTYS to report which link speed is being used on its RoCE ports, it does not check the case of having 400Gbps transmitted over 8 lanes. Thus it fails to report the said speed and instead it defaults to report 10G over 4 lanes. Add a check for the said speed when querying PTYS and report it back correctly when needed. Fixes: 08e8676f1607 ("IB/mlx5: Add support for 50Gbps per lane link modes") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/ec9040548d119d22557d6a4b4070d6f421701fd4.1678973994.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19IB/qib: Drop redundant pci_enable_pcie_error_reporting()Bjorn Helgaas
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19IB/hfi1: Drop redundant pci_enable_pcie_error_reporting()Bjorn Helgaas
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19RDMA/mlx5: Coding style fix reported by checkpatchRohit Chavan
Block comments should align the * on each line on line 2849 Avoid line continuations in quoted strings on line 3848 Signed-off-by: Rohit Chavan <roheetchavan@gmail.com> Link: https://lore.kernel.org/r/20230319100847.5566-1-roheetchavan@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-19RDMA/irdma: Add ipv4 check to irdma_find_listener()Tatyana Nikolova
Add ipv4 check to irdma_find_listener(). Otherwise the function incorrectly finds and returns a listener with a different addr family for the zero IP addr, if a listener with a zero IP addr and the same port as the one searched for has already been created. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230315145231.931-5-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>