Age | Commit message (Collapse) | Author |
|
pcie_relaxed_ordering_enabled() check was added to avoid a syndrome when
creating a MKey with relaxed ordering (RO) enabled when the driver's
relaxed_ordering_{read,write} HCA capabilities are out of sync with FW.
While this can happen with relaxed_ordering_read, it can't happen with
relaxed_ordering_write as it's set if the device supports RO write,
regardless of RO in PCI config space, and thus can't change during
runtime.
Therefore, drop the pcie_relaxed_ordering_enabled() check for
relaxed_ordering_write while keeping it for relaxed_ordering_read.
Doing so will also allow the usage of RO write in VFs and VMs (where RO
in PCI config space is not reported/emulated properly).
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Link: https://lore.kernel.org/r/7e8f55e31572c1702d69cae015a395d3a824a38a.1681131553.git.leon@kernel.org
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
There is no need to zero 'pktsize' bytes of 'buf', only the header needs
to be cleared, to be safe.
All the other bytes are already written with some memcpy() at the end of
the function.
Doing so also gives the opportunity to the compiler to avoid the memset()
call. It can be inlined now that the length is known as compile time.
Link: https://lore.kernel.org/r/098e3c397be0436f1867899245ecfe656c472110.1675369386.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Place struct mmu_rb_handler on cache line start like so:
struct mmu_rb_handler *h;
void *free_ptr;
int ret;
free_ptr = kzalloc(sizeof(*h) + cache_line_size() - 1, GFP_KERNEL);
if (!free_ptr)
return -ENOMEM;
h = PTR_ALIGN(free_ptr, cache_line_size());
Additionally, move struct mmu_rb_handler fields "root" and "ops_args" to
start after the next cacheline using the "____cacheline_aligned_in_smp"
annotation.
Allocating an additional cache_line_size() - 1 bytes to place
struct mmu_rb_handler on a cache line start does increase memory
consumption.
However, few struct mmu_rb_handler are created when hfi1 is in use.
As mmu_rb_handler->root and mmu_rb_handler->ops_args are accessed
frequently, the advantage of having them both within a cache line is
expected to outweigh the disadvantage of the additional memory
consumption per struct mmu_rb_handler.
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088636963.3027109.16959757980497822530.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
hfi1 user SDMA request processing has two bugs that can cause data
corruption for user SDMA requests that have multiple payload iovecs
where an iovec other than the tail iovec does not run up to the page
boundary for the buffer pointed to by that iovec.a
Here are the specific bugs:
1. user_sdma_txadd() does not use struct user_sdma_iovec->iov.iov_len.
Rather, user_sdma_txadd() will add up to PAGE_SIZE bytes from iovec
to the packet, even if some of those bytes are past
iovec->iov.iov_len and are thus not intended to be in the packet.
2. user_sdma_txadd() and user_sdma_send_pkts() fail to advance to the
next iovec in user_sdma_request->iovs when the current iovec
is not PAGE_SIZE and does not contain enough data to complete the
packet. The transmitted packet will contain the wrong data from the
iovec pages.
This has not been an issue with SDMA packets from hfi1 Verbs or PSM2
because they only produce iovecs that end short of PAGE_SIZE as the tail
iovec of an SDMA request.
Fixing these bugs exposes other bugs with the SDMA pin cache
(struct mmu_rb_handler) that get in way of supporting user SDMA requests
with multiple payload iovecs whose buffers do not end at PAGE_SIZE. So
this commit fixes those issues as well.
Here are the mmu_rb_handler bugs that non-PAGE_SIZE-end multi-iovec
payload user SDMA requests can hit:
1. Overlapping memory ranges in mmu_rb_handler will result in duplicate
pinnings.
2. When extending an existing mmu_rb_handler entry (struct mmu_rb_node),
the mmu_rb code (1) removes the existing entry under a lock, (2)
releases that lock, pins the new pages, (3) then reacquires the lock
to insert the extended mmu_rb_node.
If someone else comes in and inserts an overlapping entry between (2)
and (3), insert in (3) will fail.
The failure path code in this case unpins _all_ pages in either the
original mmu_rb_node or the new mmu_rb_node that was inserted between
(2) and (3).
3. In hfi1_mmu_rb_remove_unless_exact(), mmu_rb_node->refcount is
incremented outside of mmu_rb_handler->lock. As a result, mmu_rb_node
could be evicted by another thread that gets mmu_rb_handler->lock and
checks mmu_rb_node->refcount before mmu_rb_node->refcount is
incremented.
4. Related to #2 above, SDMA request submission failure path does not
check mmu_rb_node->refcount before freeing mmu_rb_node object.
If there are other SDMA requests in progress whose iovecs have
pointers to the now-freed mmu_rb_node(s), those pointers to the
now-freed mmu_rb nodes will be dereferenced when those SDMA requests
complete.
Fixes: 7be85676f1d1 ("IB/hfi1: Don't remove RB entry when not needed.")
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088636445.3027109.10054635277810177889.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
hfi1_mmu_rb_remove_unless_exact() did not move mmu_rb_node objects in
mmu_rb_handler->lru_list after getting a cache hit on an mmu_rb_node.
As a result, hfi1_mmu_rb_evict() was not guaranteed to evict truly
least-recently used nodes.
This could be a performance issue for an application when that
application:
- Uses some long-lived buffers frequently.
- Uses a large number of buffers once.
- Hits the mmu_rb_handler cache size or pinned-page limits, forcing
mmu_rb_handler cache entries to be evicted.
In this case, the one-time use buffers cause the long-lived buffer
entries to eventually filter to the end of the LRU list where
hfi1_mmu_rb_evict() will consider evicting a frequently-used long-lived
entry instead of evicting one of the one-time use entries.
Fix this by inserting new mmu_rb_node at the tail of
mmu_rb_handler->lru_list and move mmu_rb_ndoe to the tail of
mmu_rb_handler->lru_list when the mmu_rb_node is a hit in
hfi1_mmu_rb_remove_unless_exact(). Change hfi1_mmu_rb_evict() to evict
from the head of mmu_rb_handler->lru_list instead of the tail.
Fixes: 0636e9ab8355 ("IB/hfi1: Add cache evict LRU list")
Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
Signed-off-by: Patrick Kelsey <pat.kelsey@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088635931.3027109.10423156330761536044.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
These warnings can cause build failure:
In file included from ./include/trace/define_trace.h:102,
from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
from drivers/infiniband/hw/hfi1/trace.h:15,
from drivers/infiniband/hw/hfi1/trace.c:6:
drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_get_offsets_hfi1_trace_template’:
./include/trace/trace_events.h:261:9: warning: function ‘trace_event_get_offsets_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
struct trace_event_raw_##call __maybe_unused *entry; \
^~~~~~~~~~~~~~~~
drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
DECLARE_EVENT_CLASS(hfi1_trace_template,
^~~~~~~~~~~~~~~~~~~
In file included from ./include/trace/define_trace.h:102,
from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
from drivers/infiniband/hw/hfi1/trace.h:15,
from drivers/infiniband/hw/hfi1/trace.c:6:
drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_raw_event_hfi1_trace_template’:
./include/trace/trace_events.h:386:9: warning: function ‘trace_event_raw_event_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
struct trace_event_raw_##call *entry; \
^~~~~~~~~~~~~~~~
drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
DECLARE_EVENT_CLASS(hfi1_trace_template,
^~~~~~~~~~~~~~~~~~~
In file included from ./include/trace/define_trace.h:103,
from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
from drivers/infiniband/hw/hfi1/trace.h:15,
from drivers/infiniband/hw/hfi1/trace.c:6:
drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘perf_trace_hfi1_trace_template’:
./include/trace/perf.h:70:9: warning: function ‘perf_trace_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
struct hlist_head *head; \
^~~~~~~~~~
drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
DECLARE_EVENT_CLASS(hfi1_trace_template,
^~~~~~~~~~~~~~~~~~~
Solution adapted here is similar to the one in fbbc95a49d5b0
Signed-off-by: Ehab Ababneh <ehab.ababneh@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088635415.3027109.5711716700328939402.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The hfi1_cdbg trace mechanism appends a newline. Remove trailing
newlines from all format strings.
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://lore.kernel.org/r/168088634897.3027109.10401662436950683555.stgit@252.162.96.66.static.eigbox.net
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Enable Congesion control by default. Issue FW command
enable the CC during driver load and disable it during
unload.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-8-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Use the new TLV APIs for existing slow path commands. The TLV
APIs will be used to populate extended headers for some of the
Firmware commands, which will be introduced in the patches that
follow.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-7-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Header file to support TLV encapsulated commands. These
functions will be used by the driver in the follow up patches.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-6-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Reducing the number of arguments to bnxt_qplib_rcfw_send_message
by enclosing all its arguments into a command message structure.
Use the same struct while passing the command information to
send_message.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-5-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Convert RCFW_CMD_PREP macro to static inline function.
Also, remove the cmd_flags passed as none of the functions
are using it.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-4-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
bnxt_en driver does the queue mapping for RoCE traffic. Removing the
queue mapping from RoCE driver.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Updating the HW structures to the latest version.
This is copied from the code maintained internally. No functionality
changes in this patch. Code is re-organized to match the file maintained
in the internal tree. Also, New HW interface structures are added, which
will be used by the drivers in future.
CC: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1680169540-10029-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
clang with W=1 reports
drivers/infiniband/hw/qib/qib_file_ops.c:487:20: error: variable
'cnt' set but not used [-Werror,-Wunused-but-set-variable]
u32 tid, ctxttid, cnt, limit, tidcnt;
^
drivers/infiniband/hw/qib/qib_file_ops.c:1771:9: error: variable
'cnt' set but not used [-Werror,-Wunused-but-set-variable]
int i, cnt = 0, maxtid = ctxt_tidbase + dd->rcvtidcnt;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230330235800.1845815-1-trix@redhat.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
clang with W=1 reports
drivers/infiniband/hw/mlx5/devx.c:1996:6: error: variable
'num_alloc_xa_entries' set but not used [-Werror,-Wunused-but-set-variable]
int num_alloc_xa_entries = 0;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230330153607.1838750-1-trix@redhat.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In preparation for switching single segment iterators to using ITER_UBUF,
swap the check for whether we are user backed or not.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In preparation for switching single segment iterators to using ITER_UBUF,
swap the check for whether we are user backed or not. While at it, move
it outside the srcu locking area to clean up the code a bit.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This returns a pointer to the current iovec entry in the iterator. Only
useful with ITER_IOVEC right now, but it prepares us to treat ITER_UBUF
and ITER_IOVEC identically for the first segment.
Rename struct iov_iter->iov to iov_iter->__iov to find any potentially
troublesome spots, and also to prevent anyone from adding new code that
accesses iter->iov directly.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
clang with W=1 reports
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1592:6: error: variable
'discard_cnt' set but not used [-Werror,-Wunused-but-set-variable]
int discard_cnt = 0;
^
This variable is not used so remove it.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230326120959.1351948-1-trix@redhat.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
clang with W=1 reports
drivers/infiniband/hw/bnxt_re/qplib_fp.c:303:6: error: variable
'num_srqne_processed' set but not used [-Werror,-Wunused-but-set-variable]
int num_srqne_processed = 0;
^
drivers/infiniband/hw/bnxt_re/qplib_fp.c:304:6: error: variable
'num_cqne_processed' set but not used [-Werror,-Wunused-but-set-variable]
int num_cqne_processed = 0;
^
These variables are not used so remove them.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230325140559.1336056-1-trix@redhat.com
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}
pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Link: https://lore.kernel.org/r/20230323115742.13836-1-cai.huoqing@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Previously for switchdev only per device counters were supported.
Currently we allocate counters for switchdev per port, which also
includes the ports that belong to VF representors in order to expose
them to users through the rdma tool, allowing the host to track the VFs
statistics through their representors counters.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/ea31e1103c125cd27931ba213f307cde30d2eaed.1679566038.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add resize_cq verb support for user space CQs. Resize operation for
kernel CQs are not supported now.
Driver should free the current CQ only after user library polls
for all the completions and switch to new CQ. So after the resize_cq
is returned from the driver, user library polls for existing completions
and store it as temporary data. Once library reaps all completions in the
current CQ, it invokes the ibv_cmd_poll_cq to inform the driver about
the resize_cq completion. Adding a check for user CQs in driver's
poll_cq and complete the resize operation for user CQs.
Updating uverbs_cmd_mask with poll_cq to support this.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://lore.kernel.org/r/1678868215-23626-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Hardware's page size is 4096, but the kernel's page size may vary. Driver
should use hardware's page size when communicating with hardware.
Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation")
Link: https://lore.kernel.org/r/20230307102924.70577-2-chengyou@linux.alibaba.com
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This series from Or changes default of IB out-of-order feature and
allows to the RDMA users to decide if they need to wait for completion
for all segments or it is enough to wait for last segment completion only.
Thanks
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Set retry_mode to GO_BACK_N when qp is created with INTEGRITY_EN flag
because out-of-order is not supported when doing HW offload of signature
operations.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/362de42cdc7a541afa5b1fd0ec6ae706061764a2.1679230449.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add feature bit to existing device caps field. EFA supports data polling
of 128 bytes blocks.
The flag indicates that the NIC guarentees that a 128 byte aligned block
is written in order, ie that observing the last 8 bits of the block mean
the prior 127 bytes are also written.
It is useful for "last data polling" acceleration techniques.
Link: https://lore.kernel.org/r/20230219081328.10419-1-mrgolin@amazon.com
Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
Acked-by: Gal Pressman <gal.pressman@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
After necessary configuration, driver should wait hardware finishing
initialization. The wait sets at CMDQ related function though it has
nothing to do with CMDQ. Refactor this part to make code cleaner.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230322093319.84045-4-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Using void * to define EQ doorbell pointer can eliminate unnecessary
casting when performing assignment. Also rename *db_addr* to *db* for a
shorter name.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230322093319.84045-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Replace __be32_to_cpu/__cpu_to_be16 with be32_to_cpu/cpu_to_be16.
And use be32_to_cpu_array to copy and swap byte order to hide the
loop.
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230322093319.84045-2-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
ERDMA device may be probed before its associated netdevice, returning
-EPROBE_DEFER allows OS try to probe erdma device later.
Fixes: d55e6fb4803c ("RDMA/erdma: Add the erdma module")
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230320084652.16807-5-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The max inline mtt count supported is ERDMA_MAX_INLINE_MTT_ENTRIES.
When mr->mem.mtt_nents == ERDMA_MAX_INLINE_MTT_ENTRIES, inline mtt
is also supported, fix it.
Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation")
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230320084652.16807-4-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Max EQ depth of hardware is 32K, the current default EQ depth is too small
for some applications, so change the default depth to 4096.
Max send WRs the hardware can support is 8K, but the driver limits the
value to 4K. Remove this limitation.
Fixes: be3cff0f242d ("RDMA/erdma: Add the hardware related definitions")
Fixes: db23ae64caac ("RDMA/erdma: Add verbs header file")
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230320084652.16807-3-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
FAA is short for atomic fetch and add, not FAD. Fix this.
Fixes: 0ca9c2e2844a ("RDMA/erdma: Implement atomic operations support")
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230320084652.16807-2-chengyou@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Currently, when driver queries PTYS to report which link speed is being
used on its RoCE ports, it does not check the case of having 400Gbps
transmitted over 8 lanes. Thus it fails to report the said speed and
instead it defaults to report 10G over 4 lanes.
Add a check for the said speed when querying PTYS and report it back
correctly when needed.
Fixes: 08e8676f1607 ("IB/mlx5: Add support for 50Gbps per lane link modes")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/ec9040548d119d22557d6a4b4070d6f421701fd4.1678973994.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration, so the
driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Block comments should align the * on each line on line 2849
Avoid line continuations in quoted strings on line 3848
Signed-off-by: Rohit Chavan <roheetchavan@gmail.com>
Link: https://lore.kernel.org/r/20230319100847.5566-1-roheetchavan@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add ipv4 check to irdma_find_listener(). Otherwise the function
incorrectly finds and returns a listener with a different addr family for
the zero IP addr, if a listener with a zero IP addr and the same port as
the one searched for has already been created.
Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145231.931-5-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When running perftest with large number of connections in iWARP mode, the
passive side could be slow to respond. Increase the rexmit counter default
to allow scaling connections.
Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145231.931-4-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
On rmmod of irdma, the PBLE object memory is not being freed. PBLE object
memory are not statically pre-allocated at function initialization time
unlike other HMC objects. PBLEs objects and the Segment Descriptors (SD)
for it can be dynamically allocated during scale up and SD's remain
allocated till function deinitialization.
Fix this leak by adding IRDMA_HMC_IW_PBLE to the iw_hmc_obj_types[] table
and skip pbles in irdma_create_hmc_obj but not in irdma_del_hmc_objects().
Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145231.931-3-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Currently, artificial SW completions are generated for NOP wqes which can
generate unexpected completions with wr_id = 0. Skip the generation of
artificial completions for NOPs.
Fixes: 81091d7696ae ("RDMA/irdma: Add SW mechanism to generate completions on error")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145231.931-2-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Refactor PBLE functions using a bit mask to represent the PBLE level
desired versus 2 parameters use_pble and lvl_one_only which makes the
code confusing.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Link: https://lore.kernel.org/r/20230315145305.955-5-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add more information in interrupt names.
Before this patch it was:
irdma
CEQ
CEQ
...
Now:
irdma-0000:18:00.0-AEQ
irdma-0000:18:00.0-CEQ-0
irdma-0000:18:00.0-CEQ-1
...
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Suggested-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145305.955-4-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Remove a redundant function call in irdma_modify_qp_roce, since
irdma_arp_table() with IRDMA_ARP_RESOLVE action is called after the if/else
ipv check as part of irdma_add_arp().
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145305.955-3-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Refactor HW statistics which,
- Unifies HW statistics support for all HW generations.
- Unifies support of 32- and 64-bit counters.
- Removes duplicated code and simplifies implementation.
- Fixes roll-over handling.
- Removes unneeded last_hw_stats.
With new implementation, there is no separate handling and no separate
arrays for 32- and 64-bit counters (offsets, regs, values). Instead,
there is a HW stats map array for each HW revision, which defines
HW-specific width and location of each counter in the statistics buffer.
Once the statistics are gathered (either via CQP op, or by reading HW
registers), counter values are extracted from the statistics buffer using
the stats map and the delta between the last and new values is computed.
Finally, the counter values in rdma_hw_stats are incremented by those
deltas.
From the OS perspective, all the counters are 64-bit and their order in
rdma_hw_stats->value[] array, as well as in irdma_hw_stat_names[], is the
same for all HW gens. New statistics should always be added at the end.
Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Youvaraj Sagar <youvaraj.sagar@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/20230315145305.955-2-shiraz.saleem@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
kmap() has been deprecated in favor of the kmap_local_page() call.
kmap_local_page() is thread local.
In the sdma coalesce case the page allocated is potentially free'ed in a
different context through qib_sdma_get_complete() ->
qib_user_sdma_make_progress(). The use of kmap_local_page() is
inappropriate in this call path. However, the page is allocated using
GFP_KERNEL and will never be from highmem.
Remove the use of kmap calls and use page_address() in this case.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20230217-kmap-qib-v1-1-e5a6fde167e0@intel.com
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The ucmd->log_sq_bb_count variable is controlled by the user so this
shift can wrap. Fix it by using check_shl_overflow() in the same way
that it was done in commit 515f60004ed9 ("RDMA/hns: Prevent undefined
behavior in hns_roce_set_user_sq_size()").
Fixes: 839041329fd3 ("IB/mlx4: Sanity check userspace send queue sizes")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/a8dfbd1d-c019-4556-930b-bab1ded73b10@kili.mountain
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The module pointer in class_create() never actually did anything, and it
shouldn't have been requred to be set as a parameter even if it did
something. So just remove it and fix up all callers of the function in
the kernel tree at the same time.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|