summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx5/mr.c
AgeCommit message (Collapse)Author
2019-10-28Merge tag 'v5.4-rc5' into rdma.git for-nextJason Gunthorpe
Linux 5.4-rc5 For dependencies in the next patches Conflict resolved by keeping the delete of the unlock. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-08IB/mlx5: Introduce and use mkey context setting helper routineParav Pandit
Introduce and use set_mkc_access_pd_addr_fields() which sets mkey context's access rights, PD, address fields. Thereby avoid the code duplication. Link: https://lore.kernel.org/r/20191006155443.31068-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Add missing synchronize_srcu() for MW casesJason Gunthorpe
While MR uses live as the SRCU 'update', the MW case uses the xarray directly, xa_erase() causes the MW to become inaccessible to the pagefault thread. Thus whenever a MW is removed from the xarray we must synchronize_srcu() before freeing it. This must be done before freeing the mkey as re-use of the mkey while the pagefault thread is using the stale mkey is undesirable. Add the missing synchronizes to MW and DEVX indirect mkey and delete the bogus protection against double destroy in mlx5_core_destroy_mkey() Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Put live in the correct place for ODP MRsJason Gunthorpe
live is used to signal to the pagefault thread that the MR is initialized and ready for use. It should be after the umem is assigned and all other setup is completed. This prevents races (at least) of the form: CPU0 CPU1 mlx5_ib_alloc_implicit_mr() implicit_mr_alloc() live = 1 imr->umem = umem num_pending_prefetch_inc() if (live) atomic_inc(num_pending_prefetch) atomic_set(num_pending_prefetch,0) // Overwrites other thread's store Further, live is being used with SRCU as the 'update' in an acquire/release fashion, so it can not be read and written raw. Move all live = 1's to after MR initialization is completed and use smp_store_release/smp_load_acquire() for manipulating it. Add a missing live = 0 when an implicit MR child is deleted, before queuing work to do synchronize_srcu(). The barriers in update_odp_mr() were some broken attempt to create a acquire/release, but were not even applied consistently and missed the point, delete it as well. Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcuJason Gunthorpe
During destroy setting live = 0 and then synchronize_srcu() prevents num_pending_prefetch from incrementing, and also, ensures that all work holding that count is queued on the WQ. Testing before causes races of the form: CPU0 CPU1 dereg_mr() mlx5_ib_advise_mr_prefetch() srcu_read_lock() num_pending_prefetch_inc() if (!live) live = 0 atomic_read() == 0 // skip flush_workqueue() atomic_inc() queue_work(); srcu_read_unlock() WARN_ON(atomic_read()) // Fails Swap the order so that the synchronize_srcu() prevents this. Fixes: a6bc3875f176 ("IB/mlx5: Protect against prefetch of invalid MR") Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Do not allow rereg of a ODP MRJason Gunthorpe
This code is completely broken, the umem of a ODP MR simply cannot be discarded without a lot more locking, nor can an ODP mkey be blithely destroyed via destroy_mkey(). Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-21Merge tag 'for-linus-hmm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "This is more cleanup and consolidation of the hmm APIs and the very strongly related mmu_notifier interfaces. Many places across the tree using these interfaces are touched in the process. Beyond that a cleanup to the page walker API and a few memremap related changes round out the series: - General improvement of hmm_range_fault() and related APIs, more documentation, bug fixes from testing, API simplification & consolidation, and unused API removal - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and make them internal kconfig selects - Hoist a lot of code related to mmu notifier attachment out of drivers by using a refcount get/put attachment idiom and remove the convoluted mmu_notifier_unregister_no_release() and related APIs. - General API improvement for the migrate_vma API and revision of its only user in nouveau - Annotate mmu_notifiers with lockdep and sleeping region debugging Two series unrelated to HMM or mmu_notifiers came along due to dependencies: - Allow pagemap's memremap_pages family of APIs to work without providing a struct device - Make walk_page_range() and related use a constant structure for function pointers" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits) libnvdimm: Enable unit test infrastructure compile checks mm, notifier: Catch sleeping/blocking for !blockable kernel.h: Add non_block_start/end() drm/radeon: guard against calling an unpaired radeon_mn_unregister() csky: add missing brackets in a macro for tlb.h pagewalk: use lockdep_assert_held for locking validation pagewalk: separate function pointers from iterator data mm: split out a new pagewalk.h header from mm.h mm/mmu_notifiers: annotate with might_sleep() mm/mmu_notifiers: prime lockdep mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports mm/hmm: hmm_range_fault() infinite loop mm/hmm: hmm_range_fault() NULL pointer bug mm/hmm: fix hmm_range_fault()'s handling of swapped out pages mm/mmu_notifiers: remove unregister_no_release RDMA/odp: remove ib_ucontext from ib_umem RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address ...
2019-08-21RDMA/odp: Provide ib_umem_odp_release() to undo the allocsJason Gunthorpe
Now that there are allocator APIs that return the ib_umem_odp directly it should be freed through a umem_odp free'er as well. Link: https://lore.kernel.org/r/20190819111710.18440-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21RDMA/odp: Split creating a umem_odp from ib_umem_getJason Gunthorpe
This is the last creation API that is overloaded for both, there is very little code sharing and a driver has to be specifically ready for a umem_odp to be created to use the odp version. Link: https://lore.kernel.org/r/20190819111710.18440-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21RDMA/odp: Make it clearer when a umem is an implicit ODP umemJason Gunthorpe
Implicit ODP umems are special, they don't have any page lists, they don't exist in the interval tree and they are never DMA mapped. Instead of trying to guess this based on a zero length use an explicit flag. Further, do not allow non-implicit umems to be 0 size. Link: https://lore.kernel.org/r/20190819111710.18440-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-20IB/mlx5: Fix MR re-registration flow to use UMR properlyMoni Shoua
The UMR WQE in the MR re-registration flow requires that modify_atomic and modify_entity_size capabilities are enabled. Therefore, check that the these capabilities are present before going to umr flow and go through slow path if not. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-8-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-20IB/mlx5: Consolidate use_umr checks into single functionMoni Shoua
Introduce helper function to unify various use_umr checks. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-6-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-01IB/mlx5: Fix MR registration flow to use UMR properlyGuy Levi
Driver shouldn't allow to use UMR to register a MR when umr_modify_atomic_disabled is set. Otherwise it will always end up with a failure in the post send flow which sets the UMR WQE to modify atomic access right. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190731081929.32559-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-24IB/mlx5: Fix clean_mr() to work in the expected orderYishai Hadas
Any dma map underlying the MR should only be freed once the MR is fenced at the hardware. As of the above we first destroy the MKEY and just after that can safely call to dma_unmap_single(). Link: https://lore.kernel.org/r/20190723065733.4899-6-leon@kernel.org Cc: <stable@vger.kernel.org> # 4.3 Fixes: 8a187ee52b04 ("IB/mlx5: Support the new memory registration API") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-24IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cacheYishai Hadas
Fix unreg_umr to move the MR to a kernel owned PD (i.e. the UMR PD) which can't be accessed by userspace. This ensures that nothing can continue to access the MR once it has been placed in the kernels cache for reuse. MRs in the cache continue to have their HW state, including DMA tables, present. Even though the MR has been invalidated, changing the PD provides an additional layer of protection against use of the MR. Link: https://lore.kernel.org/r/20190723065733.4899-5-leon@kernel.org Cc: <stable@vger.kernel.org> # 3.10 Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-24IB/mlx5: Use direct mkey destroy command upon UMR unreg failureYishai Hadas
Use a direct firmware command to destroy the mkey in case the unreg UMR operation has failed. This prevents a case that a mkey will leak out from the cache post a failure to be destroyed by a UMR WR. In case the MR cache limit didn't reach a call to add another entry to the cache instead of the destroyed one is issued. In addition, replaced a warn message to WARN_ON() as this flow is fatal and can't happen unless some bug around. Link: https://lore.kernel.org/r/20190723065733.4899-4-leon@kernel.org Cc: <stable@vger.kernel.org> # 4.10 Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-24IB/mlx5: Fix unreg_umr to ignore the mkey stateYishai Hadas
Fix unreg_umr to ignore the mkey state and do not fail if was freed. This prevents a case that a user space application already changed the mkey state to free and then the UMR operation will fail leaving the mkey in an inappropriate state. Link: https://lore.kernel.org/r/20190723065733.4899-3-leon@kernel.org Cc: <stable@vger.kernel.org> # 3.19 Fixes: 968e78dd9644 ("IB/mlx5: Enhance UMR support to allow partial page table update") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03Merge mlx5-next into rdma for-nextJason Gunthorpe
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in the next patches. Resolved the conflicts: - esw_destroy_offloads_acl_tables() use the newer mlx5_esw_for_all_vports() version - esw_offloads_steering_init() drop the cap test - esw_offloads_init() drop the extra function arguments * branch 'mlx5-next': (39 commits) net/mlx5: Expose device definitions for object events net/mlx5: Report EQE data upon CQ completion net/mlx5: Report a CQ error event only when a handler was set net/mlx5: mlx5_core_create_cq() enhancements net/mlx5: Expose the API to register for ANY event net/mlx5: Use event mask based on device capabilities net/mlx5: Fix mlx5_core_destroy_cq() error flow net/mlx5: E-Switch, Handle UC address change in switchdev mode net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop net/mlx5: E-Switch, Use iterator for vlan and min-inline setups net/mlx5: E-Switch, Reg/unreg function changed event at correct stage net/mlx5: E-Switch, Consolidate eswitch function number of VFs net/mlx5: E-Switch, Refactor eswitch SR-IOV interface net/mlx5: Handle host PF vport mac/guid for ECPF net/mlx5: E-Switch, Use correct flags when configuring vlan net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs net/mlx5: Don't handle VF func change if host PF is disabled net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices net/mlx5: Move pci status reg access mutex to mlx5_pci_init net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24net/mlx5: Convert mkey_table to XArrayMatthew Wilcox
The lock protecting the data structure does not need to be an rwlock. The only read access to the lock is in an error path, and if that's limiting your scalability, you have bigger performance problems. Eliminate mlx5_mkey_table in favour of using the xarray directly. reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may be called in interrupt context. This also fixes a minor bug where SRCU locking was being used on the radix tree read side, when RCU was needed too. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-24RDMA/mlx5: Refactor MR descriptors allocationMax Gurtovoy
Improve code readability using static helpers for each memory region type. Re-use the common logic to get smaller functions that are easy to maintain and reduce code duplication. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Use PA mapping for PI handoverMax Gurtovoy
If possibe, avoid doing a UMR operation to register data and protection buffers (via MTT/KLM mkeys). Instead, use the local DMA key and map the SG lists using PA access. This is safe, since the internal key for data and protection never exposed to the remote server (only signature key might be exposed). If PA mappings are not possible, perform mapping using MTT/KLM descriptors. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1266.4K/1262.4K 1720.1K/1732.1K 4k 793139/570902 1129.6K/773982 32k 72660/72086 97229/96164 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1590.2K/1600.1K 1828.2K/1830.3K 4k 1078.1K/937272 1142.1K/815304 32k 77012/77369 98125/97435 Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Improve PI handover performanceIsrael Rukshin
In some loads, there is performance degradation when using KLM mkey instead of MTT mkey. This is because KLM descriptor access is via indirection that might require more HW resources and cycles. Using KLM descriptor is not necessary when there are no gaps at the data/metadata sg lists. As an optimization, use MTT mkey whenever it is possible. For that matter, allocate internal MTT mkey and choose the effective pi_mr for in transaction according to the required mapping scheme. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o/baseline): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1262.4K/1243.3K/1147.1K 1732.1K/1725.1K/1423.8K 4k 570902/571233/457874 773982/743293/642080 32k 72086/72388/71933 96164/71789/93249 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1600.1K/1572.1K/1393.3K 1830.3K/1823.5K/1557.2K 4k 937272/921992/762934 815304/753772/646071 32k 77369/75052/72058 97435/73180/94612 Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Suggested-by: Max Gurtovoy <maxg@mellanox.com> Suggested-by: Idan Burstein <idanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR codeIsrael Rukshin
IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY was used. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/mlx5: Implement mlx5_ib_map_mr_sg_pi and mlx5_ib_alloc_mr_integrityMax Gurtovoy
mlx5_ib_map_mr_sg_pi() will map the PI and data dma mapped SG lists to the mlx5 memory region prior to the registration operation. In the new API, the mlx5 driver will allocate an internal memory region for the UMR operation to register both PI and data SG lists. The internal MR will use KLM mode in order to map 2 (possibly non-contiguous/non-align) SG lists using 1 memory key. In the new API, each ULP will use 1 memory region for the signature operation (instead of 3 in the old API). This memory region will have a key that will be exposed to remote server to perform RDMA operation. The internal memory key that will map the SG lists will stay private. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-20RDMA: Check umem pointer validity prior to releaseLeon Romanovsky
Update ib_umem_release() to behave similarly to kfree() and allow submitting NULL pointer as safe input to this function. Fixes: a52c8e2469c3 ("RDMA: Clean destroy CQ in drivers do not return errors") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-05-21RDMA/umem: Move page_shift from ib_umem to ib_odp_umemJason Gunthorpe
This value has always been set to PAGE_SHIFT in the core code, the only thing that does differently was the ODP path. Move the value into the ODP struct and still use it for ODP, but change all the non-ODP things to just use PAGE_SHIFT/PAGE_SIZE/PAGE_MASK directly. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-05-06IB/mlx5: Add steering SW ICM device memory typeAriel Levkovich
This patch adds support for allocating, deallocating and registering a new device memory type, STEERING_SW_ICM. This memory can be allocated and used by a privileged user for direct rule insertion and management of the device's steering tables. The type is provided by the user via the dedicated attribute in the alloc_dm ioctl command. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06IB/mlx5: Support device memory type attributeAriel Levkovich
This patch intoruduces a new mlx5_ib driver attribute to the DM allocation method - the DM type. In order to allow addition of new types in downstream patches this patch also refactors the allocation, deallocation and registration handlers to consider the requested type and perform the necessary actions according to it. Since not all future device memory types will be such that are mapped to user memory, the mandatory page index output attribute is modified to be optional. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10RDMA/mlx5: Move rep into port structMark Bloch
In preparation of moving into a model of single IB device multiple ports move rep to be part of the port structure. We mark a representor device by setting is_rep, no functional change with this patch. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10Merge branch 'mlx5-next' into rdma.git for-nextJason Gunthorpe
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies on the next series * branch 'mlx5-next': net/mlx5: E-Switch, add a new prio to be used by the RDMA side net/mlx5: E-Switch, don't use hardcoded values for FDB prios net/mlx5: Fix false compilation warning net/mlx5: Expose MPEIN (Management PCIE INfo) register layout net/mlx5: Add rate limit print macros net/mlx5: Add explicit bar address field net/mlx5: Replace dev_err/warn/info by mlx5_core_err/warn/info net/mlx5: Use dev->priv.name instead of dev_name net/mlx5: Make mlx5_core messages independent from mdev->pdev net/mlx5: Break load_one into three stages net/mlx5: Function setup/teardown procedures net/mlx5: Move health and page alloc init to mdev_init net/mlx5: Split mdev init and pci init net/mlx5: Remove redundant init functions parameter net/mlx5: Remove spinlock support from mlx5_write64 net/mlx5: Remove unused MLX5_*_DOORBELL_LOCK macros Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-02net/mlx5: Add explicit bar address fieldHuy Nguyen
Add bar_addr field to store bar-0 address to avoid calling pci_resource_start with hard-coded bar-0 as parameter. Also note that different mlx5 device types will have bar_addr on different bars. This patch does not change any functionality. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-01IB: Pass uverbs_attr_bundle down ib_x destroy pathShamir Rabinovitch
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x destroy path as ib_udata. The next patch will use the ib_udata to free the drivers destroy path from the dependency in 'uobject->context' as we already did for the create path. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21IB/mlx5: Protect against prefetch of invalid MRMoni Shoua
When deferring a prefetch request we need to protect against MR or PD being destroyed while the request is still enqueued. The first step is to validate that PD owns the lkey that describes the MR and that the MR that the lkey refers to is owned by that PD. The second step is to dequeue all requests when MR is destroyed. Since PD can't be destroyed while it owns MRs it is guaranteed that when a worker wakes up the request it refers to is still valid. Now, it is possible to refrain from taking a reference on the device since it is assured to be present as pd. While that, replace the dedicated ordered workqueue with the system unbound workqueue to reuse an existing resource and improve performance. This will also fix a bug of queueing to the wrong workqueue. Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support") Reported-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29Merge branch 'devx-async' into k.o/for-nextJason Gunthorpe
Yishai Hadas says: Enable DEVX asynchronous query commands This series enables querying a DEVX object in an asynchronous mode. The userspace application won't block when calling the firmware and it will be able to get the response back once that it will be ready. To enable the above functionality: - DEVX asynchronous command completion FD object was introduced. - The applicable file operations were implemented to enable using it by the user application. - Query asynchronous method was added to the DEVX object, it will call the firmware asynchronously and manages the response on the given input FD. - Hot unplug support was added for the FD to work properly upon unbind/disassociate. - mlx5 core fence for asynchronous commands was implemented and used to prevent racing upon unbind/disassociate. This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'devx-async': IB/mlx5: Implement DEVX hot unplug for async command FD IB/mlx5: Implement the file ops of DEVX async command FD IB/mlx5: Introduce async DEVX obj query API IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24infiniband: mlx5: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24net/mlx5: Make mlx5_cmd_exec_cb() a safe APIJason Gunthorpe
APIs that have deferred callbacks should have some kind of cleanup function that callers can use to fence the callbacks. Otherwise things like module unloading can lead to dangling function pointers, or worse. The IB MR code is the only place that calls this function and had a really poor attempt at creating this fence. Provide a good version in the core code as future patches will add more places that need this fence. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-10IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udataJason Gunthorpe
ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
2019-01-08RDMA/mlx5: Embed into the code flow the ODP config optionLeon Romanovsky
Convert various places to more readable code, which embeds CONFIG_INFINIBAND_ON_DEMAND_PAGING into the code flow. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08RDMA/mlx5: Introduce and reuse helper to identify ODP MRLeon Romanovsky
Consolidate various checks if MR is ODP backed to one simple helper and update call sites to use it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-02Revert "IB/mlx5: Fix long EEH recover time with NVMe offloads"Leon Romanovsky
Longer term testing shows this patch didn't play well with MR cache and caused to call traces during remove_mkeys(). This reverts commit bb7e22a8ab00ff9ba911a45ba8784cef9e6d6f7a. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20IB/mlx5: Fix long EEH recover time with NVMe offloadsHuy Nguyen
On NVMe offloads connection with many IO queues, EEH takes long time to recover. The culprit is the synchronize_srcu in the destroy_mkey. The solution is to use synchronize_srcu only for ODP mkey. Fixes: b4cfe447d47b ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18IB/mlx5: Add advise_mr() supportMoni Shoua
The verb advise_mr() is used to give advice to the kernel about an address range that belongs to a MR. Implement the verb and register it on the device. The current implementation supports the only known advice to date, prefetch. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-18RDMA/mlx5: Fix function name typo 'fileds' -> 'fields'Gal Pressman
Fix typo in 'set_mr_fileds' -> 'set_mr_fields'. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16IB/mlx5: Fix MR cache initializationArtemy Kovalyov
Schedule MR cache work only after bucket was initialized. Cc: <stable@vger.kernel.org> # 4.10 Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16Merge branch 'for-rc' into rdma.git for-nextJason Gunthorpe
From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git This is required to resolve dependencies of the next series of RDMA patches. The code motion conflicts in drivers/infiniband/core/cache.c were resolved. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-10IB/mlx5: Unmap DMA addr from HCA before IOMMUValentine Fatiev
The function that puts back the MR in cache also removes the DMA address from the HCA. Therefore we need to call this function before we remove the DMA mapping from MMU. Otherwise the HCA may access a memory that is no longer DMA mapped. Call trace: NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0. CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc6+ #4 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 08/20/2012 RIP: 0010:intel_idle+0x73/0x120 Code: 80 5c 01 00 0f ae 38 0f ae f0 31 d2 65 48 8b 04 25 80 5c 01 00 48 89 d1 0f 60 02 RSP: 0018:ffffffff9a403e38 EFLAGS: 00000046 RAX: 0000000000000030 RBX: 0000000000000005 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffff9a5790c0 RDI: 0000000000000000 RBP: 0000000000000030 R08: 0000000000000000 R09: 0000000000007cf9 R10: 000000000000030a R11: 0000000000000018 R12: 0000000000000000 R13: ffffffff9a5792b8 R14: ffffffff9a5790c0 R15: 0000002b48471e4d FS: 0000000000000000(0000) GS:ffff9c6caf400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5737185000 CR3: 0000000590c0a002 CR4: 00000000000606f0 Call Trace: cpuidle_enter_state+0x7e/0x2e0 do_idle+0x1ed/0x290 cpu_startup_entry+0x6f/0x80 start_kernel+0x524/0x544 ? set_init_arg+0x55/0x55 secondary_startup_64+0xa4/0xb0 DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read] Request device [04:00.0] fault addr b34d2000 [fault reason 06] PTE Read access is not set DMAR: [DMA Read] Request device [01:00.2] fault addr bff8b000 [fault reason 06] PTE Read access is not set Fixes: f3f134f5260a ("RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory") Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21RDMA/umem: Get rid of struct ib_umem.odp_dataJason Gunthorpe
This no longer has any use, we can use container_of to get to the umem_odp, and a simple flag to indicate if this is an odp MR. Remove the few remaining references to it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21RDMA/umem: Use ib_umem_odp in all function signatures connected to ODPJason Gunthorpe
All of these functions already require the ODP version of the umem struct, make this very clear by having the signature require it. This paves the way to using the container_of() pattern to link umem_odp and umem together. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-07-30RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments constBart Van Assche
Since neither ib_post_send() nor ib_post_recv() modify the data structure their second argument points at, declare that argument const. This change makes it necessary to declare the 'bad_wr' argument const too and also to modify all ULPs that call ib_post_send(), ib_post_recv() or ib_post_srq_recv(). This patch does not change any functionality but makes it possible for the compiler to verify whether the ib_post_(send|recv|srq_recv) really do not modify the posted work request. To make this possible, only one cast had to be introduce that casts away constness, namely in rpcrdma_post_recvs(). The only way I can think of to avoid that cast is to introduce an additional loop in that function or to change the data type of bad_wr from struct ib_recv_wr ** into int (an index that refers to an element in the work request list). However, both approaches would require even more extensive changes than this patch. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09IB/mlx5: fix uaccess beyond "count" in debugfs read/write handlersJann Horn
In general, accessing userspace memory beyond the length of the supplied buffer in VFS read/write handlers can lead to both kernel memory corruption (via kernel_read()/kernel_write(), which can e.g. be triggered via sys_splice()) and privilege escalation inside userspace. In this case, the affected files are in debugfs (and should therefore only be accessible to root), and the read handlers check that *pos is zero (meaning that at least sys_splice() can't trigger kernel memory corruption). Because of the root requirement, this is not a security fix, but rather a cleanup. For the read handlers, fix it by using simple_read_from_buffer() instead of custom logic. Add min() calls to the write handlers. Fixes: 4a2da0b8c078 ("IB/mlx5: Add debug control parameters for congestion control") Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>