Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Current release - regressions:
- rtnetlink: fix rtnl_dump_ifinfo() error path
- bluetooth: remove the redundant sco_conn_put
Previous releases - regressions:
- netlink: fix false positive warning in extack during dumps
- sched: sch_fq: don't follow the fast path if Tx is behind now
- ipv6: delete temporary address if mngtmpaddr is removed or
unmanaged
- tcp: fix use-after-free of nreq in reqsk_timer_handler().
- bluetooth: fix slab-use-after-free Read in set_powered_sync
- l2tp: fix warning in l2tp_exit_net found
- eth:
- bnxt_en: fix receive ring space parameters when XDP is active
- lan78xx: fix double free issue with interrupt buffer allocation
- tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets
Previous releases - always broken:
- ipmr: fix tables suspicious RCU usage
- iucv: MSG_PEEK causes memory leak in iucv_sock_destruct()
- eth:
- octeontx2-af: fix low network performance
- stmmac: dwmac-socfpga: set RX watchdog interrupt as broken
- rtase: correct the speed for RTL907XD-V1
Misc:
- some documentation fixup"
* tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
ipmr: fix build with clang and DEBUG_NET disabled.
Documentation: tls_offload: fix typos and grammar
Fix spelling mistake
ipmr: fix tables suspicious RCU usage
ip6mr: fix tables suspicious RCU usage
ipmr: add debug check for mr table cleanup
selftests: rds: move test.py to TEST_FILES
net_sched: sch_fq: don't follow the fast path if Tx is behind now
tcp: Fix use-after-free of nreq in reqsk_timer_handler().
net: phy: fix phy_ethtool_set_eee() incorrectly enabling LPI
net: Comment copy_from_sockptr() explaining its behaviour
rxrpc: Improve setsockopt() handling of malformed user input
llc: Improve setsockopt() handling of malformed user input
Bluetooth: SCO: remove the redundant sco_conn_put
Bluetooth: MGMT: Fix possible deadlocks
Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
bnxt_en: Unregister PTP during PCI shutdown and suspend
bnxt_en: Refactor bnxt_ptp_init()
bnxt_en: Fix receive ring space parameters when XDP is active
bnxt_en: Fix queue start to update vnic RSS table
...
|
|
If we go through the PCI shutdown or suspend path, we shutdown the
NIC but PTP remains registered. If the kernel continues to run for
a little bit, the periodic PTP .do_aux_work() function may be called
and it will read the PHC from the BAR register. Since the device
has already been disabled, it will cause a PCIe completion timeout.
Fix it by calling bnxt_ptp_clear() in the PCI shutdown/suspend
handlers. bnxt_ptp_clear() will unregister from PTP and
.do_aux_work() will be canceled.
In bnxt_resume(), we need to re-initialize PTP.
Fixes: a521c8a01d26 ("bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one()")
Cc: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Instead of passing the 2nd parameter phc_cfg to bnxt_ptp_init().
Store it in bp->ptp_cfg so that the caller doesn't need to know what
the value should be.
In the next patch, we'll need to call bnxt_ptp_init() in bnxt_resume()
and this will make it easier.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The MTU setting at the time an XDP multi-buffer is attached
determines whether the aggregation ring will be used and the
rx_skb_func handler. This is done in bnxt_set_rx_skb_mode().
If the MTU is later changed, the aggregation ring setting may need
to be changed and it may become out-of-sync with the settings
initially done in bnxt_set_rx_skb_mode(). This may result in
random memory corruption and crashes as the HW may DMA data larger
than the allocated buffer size, such as:
BUG: kernel NULL pointer dereference, address: 00000000000003c0
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G S OE 6.1.0-226bf9805506 #1
Hardware name: Wiwynn Delta Lake PVT BZA.02601.0150/Delta Lake-Class1, BIOS F0E_3A12 08/26/2021
RIP: 0010:bnxt_rx_pkt+0xe97/0x1ae0 [bnxt_en]
Code: 8b 95 70 ff ff ff 4c 8b 9d 48 ff ff ff 66 41 89 87 b4 00 00 00 e9 0b f7 ff ff 0f b7 43 0a 49 8b 95 a8 04 00 00 25 ff 0f 00 00 <0f> b7 14 42 48 c1 e2 06 49 03 95 a0 04 00 00 0f b6 42 33f
RSP: 0018:ffffa19f40cc0d18 EFLAGS: 00010202
RAX: 00000000000001e0 RBX: ffff8e2c805c6100 RCX: 00000000000007ff
RDX: 0000000000000000 RSI: ffff8e2c271ab990 RDI: ffff8e2c84f12380
RBP: ffffa19f40cc0e48 R08: 000000000001000d R09: 974ea2fcddfa4cbf
R10: 0000000000000000 R11: ffffa19f40cc0ff8 R12: ffff8e2c94b58980
R13: ffff8e2c952d6600 R14: 0000000000000016 R15: ffff8e2c271ab990
FS: 0000000000000000(0000) GS:ffff8e3b3f840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000003c0 CR3: 0000000e8580a004 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
__bnxt_poll_work+0x1c2/0x3e0 [bnxt_en]
To address the issue, we now call bnxt_set_rx_skb_mode() within
bnxt_change_mtu() to properly set the AGG rings configuration and
update rx_skb_func based on the new MTU value.
Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of
bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on
the current MTU.
Fixes: 08450ea98ae9 ("bnxt_en: Fix max_mtu setting for multi-buf XDP")
Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
HWRM_RING_FREE followed by a HWRM_RING_ALLOC is not guaranteed to
have the same FW ring ID as before. So we must reinitialize the
RSS table with the correct ring IDs. Otherwise, traffic may not
resume properly if the restarted ring ID is stale. Since this
feature is only supported on P5_PLUS chips, we call
bnxt_vnic_set_rss_p5() to update the HW RSS table.
Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops")
Cc: David Wei <dw@davidwei.uk>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use the return value from bnxt_get_media() to determine the port and
link modes. bnxt_get_media() returns the proper BNXT_MEDIA_KR when
the PHY is backplane. This will correct the ethtool settings for
backplane devices.
Fixes: 5d4e1bf60664 ("bnxt_en: extend media types to supported and autoneg modes")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
After successful PCIe AER recovery, FW will reset all resource
reservations. If it is IF_UP, the driver will call bnxt_open() and
all resources will be reserved again. It it is IF_DOWN, we should
call bnxt_reserve_rings() so that we can reserve resources including
RoCE resources to allow RoCE to resume after AER. Without this
patch, RoCE fails to resume in this IF_DOWN scenario.
Later, if it becomes IF_UP, bnxt_open() will see that resources have
been reserved and will not reserve again.
Fixes: fb1e6e562b37 ("bnxt_en: Fix AER recovery.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The hardware on Broadcom 1G chipsets have a known limitation
where they cannot handle DMA addresses that cross over 4GB.
When such an address is encountered, the hardware sets the
address overflow error bit in the DMA status register and
triggers a reset.
However, BCM57766 hardware is setting the overflow bit and
triggering a reset in some cases when there is no actual
underlying address overflow. The hardware team analyzed the
issue and concluded that it is happening when the status
block update has an address with higher (b16 to b31) bits
as 0xffff following a previous update that had lowest bits
as 0xffff.
To work around this bug in the BCM57766 hardware, set the
coherent dma mask from the current 64b to 31b. This will
ensure that upper bits of the status block DMA address are
always at most 0x7fff, thus avoiding the improper overflow
check described above. This work around is intended for only
status block and ring memories and has no effect on TX and
RX buffers as they do not require coherent memory.
Fixes: 72f2afb8a685 ("[TG3]: Add DMA address workaround")
Reported-by: Salam Noureddine <noureddine@arista.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20241119055741.147144-1-pavan.chebbi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull rdma updates from Jason Gunthorpe:
"Seveal fixes scattered across the drivers and a few new features:
- Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns
- Force disassociate the userspace FD when hns does an async reset
- bnxt new features for optimized modify QP to skip certain stayes,
CQ coalescing, better debug dumping
- mlx5 new data placement ordering feature
- Faster destruction of mlx5 devx HW objects
- Improvements to RDMA CM mad handling"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (51 commits)
RDMA/bnxt_re: Correct the sequence of device suspend
RDMA/bnxt_re: Use the default mode of congestion control
RDMA/bnxt_re: Support different traffic class
IB/cm: Rework sending DREQ when destroying a cm_id
IB/cm: Do not hold reference on cm_id unless needed
IB/cm: Explicitly mark if a response MAD is a retransmission
RDMA/mlx5: Move events notifier registration to be after device registration
RDMA/bnxt_re: Cache MSIx info to a local structure
RDMA/bnxt_re: Refurbish CQ to NQ hash calculation
RDMA/bnxt_re: Refactor NQ allocation
RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reserved
RDMA/hns: Fix different dgids mapping to the same dip_idx
RDMA/bnxt_re: Add set_func_resources support for P5/P7 adapters
RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design
bnxt_en: Add support for RoCE sriov configuration
RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg()
RDMA/hns: Fix out-of-order issue of requester when setting FENCE
RDMA/nldev: Add IB device and net device rename events
RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation
RDMA/core: Move ib_uverbs_file struct to uverbs_types.h
...
|
|
The FW trace coredump segments are very similar to the context
memory segments in the previous patch. The main difference is
to call HWRM_DBG_LOG_BUFFER_FLUSH to flush the FW data to host
memory and to include an additional record in the coredump that
contains the head and tail information of the trace data.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-12-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new ethtool -W dump flag (2) to include driver coredump segments.
This patch adds the host backing store context memory pages used by the
chip and FW to store various states to the coredump. The pages for
each context memory type is dumped into a separate coredump segment.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-11-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass the component ID and segment ID to this function to create
the coredump segment header. This will be needed in the next
patches to create more segments for the coredump.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-10-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Host context memory is used by the newer chips to store context
information for various L2 and RoCE states and FW logs. This
information will be useful for debugging. This patch adds the
functions to copy all pages of a context memory type to a contiguous
buffer. The next patches will include the context memory dump
during ethtool -w coredump.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Co-developed-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-9-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If FW supports appending new FW logs to an offset in the context
memory after FW reset, then do not free this type of context memory
during reset. The driver will provide the initial offset to the FW
when configuring this type of context memory. This way, we don't lose
the older FW logs after reset.
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-8-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The FW trace memory pages will be added to the ethtool -w coredump
in later patches. In addition to the raw data, the driver has to
add a header to provide the head and tail information on each FW
trace log segment when creating the coredump. The FW sends an async
message to the driver after DMAing a chunk of logs to the context
memory to indicate the last offset containing the tail of the logs.
The driver needs to keep track of that.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allocate the new FW trace log backing store context memory types
if they are supported by the FW. FW debug logs are DMA'ed to the host
backing store memory when the on-chip buffers are full. If host
memory cannot be allocated for these memory types, the driver
will not abort.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If 'force' is false, it will keep the memory pages and all data
structures for the context memory type if the memory is valid.
This patch always passes true for the 'force' parameter so there is
no change in behavior. Later patches will adjust the 'force' parameter
for the FW log context memory types so that the logs will not be reset
after FW reset.
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new function bnxt_free_one_ctx_mem() to free one context
memory type. bnxt_free_ctx_mem() now calls the new function in
the loop to free each context memory type. There is no change in
behavior. Later patches will further make use of the new function.
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new bit to struct bnxt_ctx_mem_type to indicate that host
memory has been successfully allocated for this context memory type.
In the next patches, we'll be adding some additional context memory
types for FW debugging/logging. If memory cannot be allocated for
any of these new types, we will not abort and the cleared mem_valid
bit will indicate to skip configuring the memory type.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-of-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The major change is the new firmware command to flush the FW debug
logs to the host backing store context memory buffers.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241115151438.550106-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
warnings"
This reverts commit 3bd9b9abdf1563a22041b7255baea6d449902f1a. We cannot
use the new tagged struct group because it throws C++ errors even under
"extern C".
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20241115204308.3821419-1-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current implementation of gettimex64() makes at least 3 PCIe reads to
get current PHC time. It takes at least 2.2us to get this value back to
userspace. At the same time there is cached value of upper bits of PHC
available for packet timestamps already. This patch reuses cached value
to speed up reading of PHC time.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241114114820.1411660-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Testing small size RPCs (300B-400B) on a large AMD system suggests
that page pool recycling is very useful even for just the head frags.
With this patch (and copy break disabled) I see a 30% performance
improvement (82Gbps -> 106Gbps).
Convert bnxt from normal page frags to page pool frags for head buffers.
On systems with small page size we can use the same pool as for TPA
pages. On systems with large pages the frag allocation logic of the
page pool is already used to split a large page into TPA chunks.
TPA chunks are much larger than heads (8k or 64k, AFAICT vs 1kB)
and we always allocate the same sized chunks. Mixing allocation
of TPA and head pages would lead to sub-optimal memory use.
Plus Taehee's work on zero-copy / devmem will need to differentiate
between TPA and non-TPA page pool, anyway. Conditionally allocate
a new page pool for heads.
Link: https://patch.msgid.link/20241109035119.3391864-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refine RoCE SRIOV resource configuration design,
using the INITIALIZE_FW's flag as an indication
for the new design to the firmware. RoCE driver does not
have to provision resources to VF when firmware
advertises support for RoCE resource management by NIC driver.
Signed-off-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
CC: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1730882676-24434-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
During driver load, PF RDMA driver provisions resources
to the RDMA VFs. This logic takes into consideration of
the total number of VFs supported on the PF while
allocating resources. Firmware now advertises a capability
where NIC driver can allocate resources for RDMA VFs when
the user actually creates a VF. So this resource
distribution can be based on the number of active VFs.
This patch adds the support to check for the firmware
capability and follow the new RDMA VF resource allocation
strategy. The current logic in the RDMA driver will be
removed for the newer Firmware versions in a subsequent
patch in this series.
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1730882676-24434-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Serialization of PHC read with FW reset mechanism uses ptp_lock which
also protects timecounter updates. This means we cannot grab it when
called from bnxt_cc_read(). Let's move locking into different function.
Fixes: 6c0828d00f07 ("bnxt_en: replace PTP spinlock with seqlock")
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20241107214917.2980976-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
irq_set_affinity_hint() is deprecated, Use irq_update_affinity_hint()
instead. This removes the side-effect of actually applying the affinity.
The driver does not really need to worry about spreading its IRQs across
CPUs. The core code already takes care of that. when the driver applies the
affinities by itself, it breaks the users' expectations:
1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in
order to prevent IRQs from being moved to certain CPUs that run a
real-time workload.
2. bnxt_en device reopening will resets the affinity
in bnxt_open().
3. bnxt_en has no idea about irqbalance's config, so it may move an IRQ to
a banned CPU. The real-time workload suffers unacceptable latency.
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20241106180811.385175-1-mheib@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The latter is the preferred way to copy ethtool strings.
Avoids manually incrementing the pointer. Cleans up the code quite well.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241104205317.306140-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The latter is the preferred way to copy ethtool strings.
Avoids manually incrementing the pointer. Cleans up the code quite well.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241104202326.78418-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Previously, trying to insert an ip4/ip6 ntuple rule with an unset
l4proto would get rejected with -EOPNOTSUPP. For example, the following
would fail:
ethtool -N eth0 flow-type ip6 dst-ip $IP6 context 1
The reason was that all the l4proto validation was being run despite the
l4proto mask being set to 0x0. Fix by respecting the mask on l4proto and
treating a mask of 0x0 as wildcard l4proto.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/1ac93a2836b25f79e7045f8874d9a17875229ffc.1730778566.git.dxu@dxuuu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 9ba0e56199e3 ("bnxt_en: Enhance ethtool ntuple support for ip
flows besides TCP/UDP") added support for ip4/ip6 ntuple rules.
However, if you wanted to wildcard over l4proto, you had to provide
0xFF.
The choice of 0xFF is non-standard and non-intuitive. Delete support for
it in this commit. Next commit we will introduce a cleaner way to
wildcard l4proto.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/a5ba0d3bd926d27977c317efa7fdfbc8a704d2b8.1730778566.git.dxu@dxuuu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We can see high contention on ptp_lock while doing RX timestamping
on high packet rates over several queues. Spinlock is not effecient
to protect timecounter for RX timestamps when reads are the most
usual operations and writes are only occasional. It's better to use
seqlock in such cases.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241103215108.557531-2-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This hardware can provide only 48 bits of cycle counter. We can leave
only 24 bits in the cache to extend RX timestamps from 32 bits to 48
bits. Lower 8 bits of the cached value will be used to check for
roll-over while extending to full 48 bits.
This change makes cache writes atomic even on 32 bit platforms and we
can simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The
configuration structure will be also reduced by 4 bytes.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241103215108.557531-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
net_dim() is currently passed a struct dim_sample argument by value.
struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64
passes it on the stack. All callers have already initialized dim_sample
on the stack, so passing it by value requires pushing a duplicated copy
to the stack. Either witing to the stack and immediately reading it, or
perhaps dereferencing addresses relative to the stack pointer in a chain
of push instructions, seems to perform quite poorly.
In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time,
94% of which is attributed to the first push instruction to copy
dim_sample on the stack for the call to net_dim():
// Call ktime_get()
0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47>
// Pass the address of struct dim in %rdi
|4ead7: lea 0x3d0(%rbx),%rdi
// Set dim_sample.pkt_ctr
|4eade: mov %r13d,0x8(%rsp)
// Set dim_sample.byte_ctr
|4eae3: mov %r12d,0xc(%rsp)
// Set dim_sample.event_ctr
0.15 |4eae8: mov %bp,0x10(%rsp)
// Duplicate dim_sample on the stack
94.16 |4eaed: push 0x10(%rsp)
2.79 |4eaf1: push 0x10(%rsp)
0.07 |4eaf5: push %rax
// Call net_dim()
0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b>
To allow the caller to reuse the struct dim_sample already on the stack,
pass the struct dim_sample by reference to net_dim().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Avoids having to use manual pointer manipulation.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241029233229.9385-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Change the type of the middle struct member currently causing trouble from
`struct ethtool_link_settings` to `struct ethtool_link_settings_hdr`.
Additionally, update the type of some variables in various functions that
don't access the flexible-array member, changing them to the newly created
`struct ethtool_link_settings_hdr`. These changes are needed because the
type of the conflicting middle members changed. So, those instances that
expect the type to be `struct ethtool_link_settings` should be adjusted to
the newly created type `struct ethtool_link_settings_hdr`.
Also, adjust variable declarations to follow the reverse xmas tree
convention.
Fix 3338 of the following -Wflex-array-member-not-at-end warnings:
include/linux/ethtool.h:214:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the BCM_SYSPORT_IO_MACRO() definition and its use to bcmsysport.h
where it is more appropriate and where static inline helpers are
acceptable. While at it, make sure that the macro 'offset' argument does
not trigger a checkpatch warning due to possible argument re-use.
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021174935.57658-3-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vladimir reported the following warning with clang-16 and W=1:
warning: unused function 'txchk_readl' [-Wunused-function]
BCM_SYSPORT_IO_MACRO(txchk, SYS_PORT_TXCHK_OFFSET);
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'
warning: unused function 'txchk_writel' [-Wunused-function]
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'
warning: unused function 'tbuf_readl' [-Wunused-function]
BCM_SYSPORT_IO_MACRO(tbuf, SYS_PORT_TBUF_OFFSET);
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'
warning: unused function 'tbuf_writel' [-Wunused-function]
note: expanded from macro 'BCM_SYSPORT_IO_MACRO'
The TXCHK and RBUF blocks are not being accessed, remove the IO macros
used to access those blocks. No functional impact.
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241021174935.57658-2-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts and no adjacent changes.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.12-rc4).
Conflicts:
107a034d5c1e ("net/mlx5: qos: Store rate groups in a qos domain")
1da9cfd6c41c ("net/mlx5: Unregister notifier on eswitch init failure")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
There are some spelling mistakes of 'accelaration', 'exprienced' and
'rewritting' in comments which should be 'acceleration', 'experienced'
and 'rewriting'.
Suggested-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/all/20241017162846.GA51712@kernel.org/
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <90D42CB167CA0842+20241018021910.31359-1-wangyuli@uniontech.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
|
|
In netpoll configuration the completion processing can happen in hard
irq context which will break with spin_lock_bh() for fullfilling RX
timestamp in case of all packets timestamping. Replace it with
spin_lock_irqsave() variant.
Fixes: 7f5515d19cd7 ("bnxt_en: Get the RX packet timestamp")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Message-ID: <20241016195234.2622004-1-vadfed@meta.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
|
|
GCC is not happy with the current code, e.g.:
.../tg3.c:11313:37: error: ‘-txrx-’ directive output may be truncated writing 6 bytes into a region of size between 1 and 16 [-Werror=format-truncation=]
11313 | "%s-txrx-%d", tp->dev->name, irq_num);
| ^~~~~~
.../tg3.c:11313:34: note: using the range [-2147483648, 2147483647] for directive argument
11313 | "%s-txrx-%d", tp->dev->name, irq_num);
When `make W=1` is supplied, this prevents kernel building. Fix it by
increasing the buffer size for IRQ label and use sizeoF() instead of
hard coded constants.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Message-ID: <20241016090647.691022-1-andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
|
|
The bcmasp_xmit() returns NETDEV_TX_OK without freeing skb
in case of mapping fails, add dev_kfree_skb() to fix it.
Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241014145901.48940-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bcm_sysport_xmit() returns NETDEV_TX_OK without freeing skb
in case of dma_map_single() fails, add dev_kfree_skb() to fix it.
Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://patch.msgid.link/20241014145115.44977-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-8-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Address byte-order miss-matches flagged by Sparse.
In tg3_load_firmware_cpu() and tg3_get_device_address()
this is done using appropriate types to store big endian values.
In the cases of tg3_test_nvram(), where buf is an array which
contains values of several different types, cast to __le32
before converting values to host byte order.
Reported by Sparse as:
.../tg3.c:3745:34: warning: cast to restricted __be32
.../tg3.c:13096:21: warning: cast to restricted __le32
.../tg3.c:13096:21: warning: cast from restricted __be32
.../tg3.c:13101:21: warning: cast to restricted __le32
.../tg3.c:13101:21: warning: cast from restricted __be32
.../tg3.c:17070:63: warning: incorrect type in argument 3 (different base types)
.../tg3.c:17070:63: expected restricted __be32 [usertype] *val
.../tg3.c:17070:63: got unsigned int *
dr.../tg3.c:17071:63: warning: incorrect type in argument 3 (different base types)
.../tg3.c:17071:63: expected restricted __be32 [usertype] *val
.../tg3.c:17071:63: got unsigned int *
Also, address white-space issues on lines modified for the above.
And, for consistency, lines adjacent to them.
Compile tested only.
No functional change intended.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-tg3-sparse-v1-1-6af38a7bf4ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add skb_tx_timestamp() call and enable support for SW
timestamping.
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241010221506.802730-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MII driver isn't used by brcmstb Ethernet drivers. Remove it
from the BCMASP, GENET, and SYSTEMPORT drivers.
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241010191332.1074642-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Link queues to NAPIs using the netdev-genl API so this information is
queryable.
First, test with the default setting on my tg3 NIC at boot with 1 TX
queue:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}]
Now, adjust the number of TX queues to be 4 via ethtool:
$ sudo ethtool -L eth0 tx 4
$ sudo ethtool -l eth0 | tail -5
Current hardware settings:
RX: 4
TX: 4
Other: n/a
Combined: n/a
Despite "Combined: n/a" in the ethtool output, /proc/interrupts shows
the tg3 has renamed the IRQs to be combined:
343: [...] eth0-0
344: [...] eth0-txrx-1
345: [...] eth0-txrx-2
346: [...] eth0-txrx-3
347: [...] eth0-txrx-4
Now query this via netlink to ensure the queues are linked properly to
their NAPIs:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'tx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'tx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'tx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'tx'}]
As you can see above, id 0 for both TX and RX share a NAPI, NAPI ID
8960, and so on for each queue index up to 3.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241009175509.31753-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|