summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2024-11-05net: stmmac: add support for dwmac 3.72aLothar Rubusch
The dwmac 3.72a is an ip version that can be found on Intel/Altera Arria10 SoCs. Going by the hardware features "snps,multicast-filter-bins" and "snps,perfect-filter-entries" shall be supported. Thus add a compatibility flag, and extend coverage of the driver for the 3.72a. Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241102114122.4631-2-l.rubusch@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05net: hisilicon: hns: use ethtool string helpersRosen Penev
The latter is the preferred way to copy ethtool strings. Avoids manually incrementing the pointer. Cleans up the code quite well. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20241101220023.290926-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05r8169: improve initialization of RSS registers on RTL8125/RTL8126Heiner Kallweit
Replace the register addresses with the names used in r8125/r8126 vendor driver, and consider that RSS_CTRL_8125 is a 32 bit register. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/3bf2f340-b369-4174-97bf-fd38d4217492@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05sfc: Remove more unused functionsDr. David Alan Gilbert
efx_ticks_to_usecs(), efx_reconfigure_port(), efx_ptp_get_mode(), and efx_tx_get_copy_buffer_limited() are unused. They seem to be partially due to the later splits to Siena, but some seem unused for longer. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://patch.msgid.link/20241102151625.39535-5-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05sfc: Remove unused mcdi functionsDr. David Alan Gilbert
efx_mcdi_flush_rxqs(), efx_mcdi_rpc_async_quiet(), efx_mcdi_rpc_finish_quiet(), and efx_mcdi_wol_filter_get_magic() are unused. I think these are fall out from the split into Siena that happened in commit 4d49e5cd4b09 ("sfc/siena: Rename functions in mcdi headers to avoid conflicts with sfc") and commit d48523cb88e0 ("sfc: Copy shared files needed for Siena (part 2)") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://patch.msgid.link/20241102151625.39535-4-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05sfc: Remove unused efx_mae_mport_vfDr. David Alan Gilbert
efx_mae_mport_vf() has been unused since commit 5227adff37af ("sfc: add mport lookup based on driver's mport data") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://patch.msgid.link/20241102151625.39535-3-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05sfc: Remove falcon deadcodeDr. David Alan Gilbert
ef4_farch_dimension_resources(), ef4_nic_fix_nodesc_drop_stat(), ef4_ticks_to_usecs() and ef4_tx_get_copy_buffer_limited() were copied over from efx_ equivalents in 2016 but never used by commit 5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver") EF4_MAX_FLUSH_TIME is also unused. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://patch.msgid.link/20241102151625.39535-2-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05bnxt_en: replace PTP spinlock with seqlockVadim Fedorenko
We can see high contention on ptp_lock while doing RX timestamping on high packet rates over several queues. Spinlock is not effecient to protect timecounter for RX timestamps when reads are the most usual operations and writes are only occasional. It's better to use seqlock in such cases. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241103215108.557531-2-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05bnxt_en: cache only 24 bits of hw counterVadim Fedorenko
This hardware can provide only 48 bits of cycle counter. We can leave only 24 bits in the cache to extend RX timestamps from 32 bits to 48 bits. Lower 8 bits of the cached value will be used to check for roll-over while extending to full 48 bits. This change makes cache writes atomic even on 32 bit platforms and we can simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The configuration structure will be also reduced by 4 bytes. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241103215108.557531-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05mlx5_en: use read sequence for gettimex64Vadim Fedorenko
The gettimex64() doesn't modify values in timecounter, that's why there is no need to update sequence counter. Reduce the contention on sequence lock for multi-thread PHC reading use-case. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241014170103.2473580-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05net: ethernet: ti: am65-cpsw: fix warning in am65_cpsw_nuss_remove_rx_chns()Roger Quadros
flow->irq is initialized to 0 which is a valid IRQ. Set it to -EINVAL in error path of am65_cpsw_nuss_init_rx_chns() so we do not try to free an unallocated IRQ in am65_cpsw_nuss_remove_rx_chns(). If user tried to change number of RX queues and am65_cpsw_nuss_init_rx_chns() failed due to any reason, the warning will happen if user tries to change the number of RX queues after the error condition. root@am62xx-evm:~# ethtool -L eth0 rx 3 [ 40.385293] am65-cpsw-nuss 8000000.ethernet: set new flow-id-base 19 [ 40.393211] am65-cpsw-nuss 8000000.ethernet: Failed to init rx flow2 netlink error: Invalid argument root@am62xx-evm:~# ethtool -L eth0 rx 2 [ 82.306427] ------------[ cut here ]------------ [ 82.311075] WARNING: CPU: 0 PID: 378 at kernel/irq/devres.c:144 devm_free_irq+0x84/0x90 [ 82.469770] Call trace: [ 82.472208] devm_free_irq+0x84/0x90 [ 82.475777] am65_cpsw_nuss_remove_rx_chns+0x6c/0xac [ti_am65_cpsw_nuss] [ 82.482487] am65_cpsw_nuss_update_tx_rx_chns+0x2c/0x9c [ti_am65_cpsw_nuss] [ 82.489442] am65_cpsw_set_channels+0x30/0x4c [ti_am65_cpsw_nuss] [ 82.495531] ethnl_set_channels+0x224/0x2dc [ 82.499713] ethnl_default_set_doit+0xb8/0x1b8 [ 82.504149] genl_family_rcv_msg_doit+0xc0/0x124 [ 82.508757] genl_rcv_msg+0x1f0/0x284 [ 82.512409] netlink_rcv_skb+0x58/0x130 [ 82.516239] genl_rcv+0x38/0x50 [ 82.519374] netlink_unicast+0x1d0/0x2b0 [ 82.523289] netlink_sendmsg+0x180/0x3c4 [ 82.527205] __sys_sendto+0xe4/0x158 [ 82.530779] __arm64_sys_sendto+0x28/0x38 [ 82.534782] invoke_syscall+0x44/0x100 [ 82.538526] el0_svc_common.constprop.0+0xc0/0xe0 [ 82.543221] do_el0_svc+0x1c/0x28 [ 82.546528] el0_svc+0x28/0x98 [ 82.549578] el0t_64_sync_handler+0xc0/0xc4 [ 82.553752] el0t_64_sync+0x190/0x194 [ 82.557407] ---[ end trace 0000000000000000 ]--- Fixes: da70d184a8c3 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx") Signed-off-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7Roger Quadros
On J7 platforms, setting up multiple RX flows was failing as the RX free descriptor ring 0 is shared among all flows and we did not allocate enough elements in the RX free descriptor ring 0 to accommodate for all RX flows. This issue is not present on AM62 as separate pair of rings are used for free and completion rings for each flow. Fix this by allocating enough elements for RX free descriptor ring 0. However, we can no longer rely on desc_idx (descriptor based offsets) to identify the pages in the respective flows as free descriptor ring includes elements for all flows. To solve this, introduce a new swdata data structure to store flow_id and page. This can be used to identify which flow (page_pool) and page the descriptor belonged to when popped out of the RX rings. Fixes: da70d184a8c3 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx") Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05net: hns3: fix kernel crash when uninstalling driverPeiyang Wang
When the driver is uninstalled and the VF is disabled concurrently, a kernel crash occurs. The reason is that the two actions call function pci_disable_sriov(). The num_VFs is checked to determine whether to release the corresponding resources. During the second calling, num_VFs is not 0 and the resource release function is called. However, the corresponding resource has been released during the first invoking. Therefore, the problem occurs: [15277.839633][T50670] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 ... [15278.131557][T50670] Call trace: [15278.134686][T50670] klist_put+0x28/0x12c [15278.138682][T50670] klist_del+0x14/0x20 [15278.142592][T50670] device_del+0xbc/0x3c0 [15278.146676][T50670] pci_remove_bus_device+0x84/0x120 [15278.151714][T50670] pci_stop_and_remove_bus_device+0x6c/0x80 [15278.157447][T50670] pci_iov_remove_virtfn+0xb4/0x12c [15278.162485][T50670] sriov_disable+0x50/0x11c [15278.166829][T50670] pci_disable_sriov+0x24/0x30 [15278.171433][T50670] hnae3_unregister_ae_algo_prepare+0x60/0x90 [hnae3] [15278.178039][T50670] hclge_exit+0x28/0xd0 [hclge] [15278.182730][T50670] __se_sys_delete_module.isra.0+0x164/0x230 [15278.188550][T50670] __arm64_sys_delete_module+0x1c/0x30 [15278.193848][T50670] invoke_syscall+0x50/0x11c [15278.198278][T50670] el0_svc_common.constprop.0+0x158/0x164 [15278.203837][T50670] do_el0_svc+0x34/0xcc [15278.207834][T50670] el0_svc+0x20/0x30 For details, see the following figure. rmmod hclge disable VFs ---------------------------------------------------- hclge_exit() sriov_numvfs_store() ... device_lock() pci_disable_sriov() hns3_pci_sriov_configure() pci_disable_sriov() sriov_disable() sriov_disable() if !num_VFs : if !num_VFs : return; return; sriov_del_vfs() sriov_del_vfs() ... ... klist_put() klist_put() ... ... num_VFs = 0; num_VFs = 0; device_unlock(); In this patch, when driver is removing, we get the device_lock() to protect num_VFs, just like sriov_numvfs_store(). Fixes: 0dd8a25f355b ("net: hns3: disable sriov before unload hclge layer") Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241101091507.3644584-1-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05net: lan969x: add VCAP configuration dataDaniel Machon
Add configuration data (for consumption by the VCAP API) for the four VCAP's that we are going to support. The following VCAP's will be supported: - VCAP CLM: (also known as IS0) is part of the analyzer and enables frame classification using VCAP functionality. - VCAP IS2: is part of ANA_ACL and enables access control lists, using VCAP functionality. - VCAP ES0: is part of the rewriter and enables rewriting of frames using VCAP functionality. - VCAP ES2: is part of EACL and enables egress access control lists using VCAP functionality The two VCAP's: CLM and IS2 use shared resources from the SUPER VCAP. The SUPER VCAP is a shared pool of 6 blocks that can be distributed freely among CLM and IS2. Each block in the pool has 3,072 addresses with entries, actions, and counters. ES0 and ES2 does not use shared resources. In the configuration data for lan969x CLM uses blocks 2-4 with a total of 6 lookups. IS2 uses blocks 0-1 with a total of 4 lookups. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05net: lan969x: add autogenerated VCAP informationDaniel Machon
Platform VCAP data for each VCAP instance is auto-generated using an internal Microchip tool. The generated VCAP data contains information about keyfields, keyfield sets, actionfields, actionfield sets and typegroups, which in combination are used to encode and decode rules in the VCAP. Add the auto-generated VCAP file lan969x_vcap_ag_api.c and assign the two structs: lan969x_vcaps and lan969x_vcap_stats to the match data. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05net: sparx5: execute sparx5_vcap_init() on lan969xDaniel Machon
The is_sparx5() check was introduced in an earlier series, to make sure the sparx5_vcap_init() was not executed on lan969x, as it was not implemented there yet. Now that it is, remove that check. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05net: sparx5: add new VCAP constants to match dataDaniel Machon
In preparation for lan969x VCAP support, add the following three new VCAP constants to match data: - vcaps_cfg (contains configuration data for each VCAP). - vcaps (contains auto-generated information about VCAP keys and actions). - vcap_stats: (contains auto-generated string names of all the keys and actions) Add these constants to the Sparx5 match data constants and use them to initialize the VCAP's in sparx5_vcap_init(). Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05net: sparx5: replace SPX5_PORTS with n_portsDaniel Machon
The Sparx5 VCAP implementation uses the SPX5_PORTS symbol to iterate over the 65 front ports of Sparx5. Replace the use with the n_ports constant from the match data, which translates to 65 of Sparx5 and 30 on lan969x. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05net: sparx5: expose some sparx5 VCAP symbolsDaniel Machon
In preparation for lan969x VCAP support, expose the following symbols for use by the lan969x VCAP implementation: - The symbols SPARX5_*_LOOKUPS defines the number of lookups in each VCAP instance. These are the same for lan969x. Move them to the header file. - The struct sparx5_vcap_inst encapsulates information about a single VCAP instance. Move this struct to the header file and declare the sparx5_vcap_inst_cfg as extern. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-04net: dpaa_eth: extract hash using __be32 pointer in rx_default_dqrr()Vladimir Oltean
Sparse provides the following output: warning: cast to restricted __be32 This is a harmless warning due to the fact that we dereference the hash stored in the FD using an incorrect type annotation. Suppress the warning by using the correct __be32 type instead of u32. No functional change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Breno Leitao <leitao@debian.org> Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Link: https://patch.msgid.link/20241029164317.50182-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04net: dpaa_eth: add assertions about SGT entry offsets in sg_fd_to_skb()Vladimir Oltean
Multi-buffer frame descriptors (FDs) point to a buffer holding a scatter/gather table (SGT), which is a finite array of fixed-size entries, the last of which has qm_sg_entry_is_final(&sgt[i]) == true. Each SGT entry points to a buffer holding pieces of the frame. DPAARM.pdf explains in the figure called "Internal and External Margins, Scatter/Gather Frame Format" that the SGT table is located within its buffer at the same offset as the frame data start is located within the first packet buffer. +------------------------+ Scatter/Gather Buffer | First Buffer | Last Buffer ^ +------------+ ^ +-|---->^ +------------+ +->+------------+ | | | | ICEOF | | | | | |////////////| | +------------+ v | | | | | |////////////| BSM | |/ part of //| | |BSM | | | |////////////| | |/ Internal /| | | | | | |////////////| | |/ Context //| | | | | | |// Frame ///| | +------------+ | | | | | ... |/ content //| | | | | | | | | |////////////| | | | | | | | | |////////////| v +------------+ | | v +------------+ |////////////| | Scatter/ //| sgt[0]--+ | |// Frame ///| |////////////| | Gather List| ... | |/ content //| +------------+ ^ |////////////| sgt[N]----+ |////////////| | | | BEM |////////////| |////////////| | | | +------------+ +------------+ +------------+ v BSM = Buffer Start Margin, BEM = Buffer End Margin, both are configured by dpaa_eth_init_rx_port() for the RX FMan port relevant here. sg_fd_to_skb() runs in the calling context of rx_default_dqrr() - the NAPI receive callback - which only expects to receive contiguous (qm_fd_contig) or scatter/gather (qm_fd_sg) frame descriptors. Everything else is irrelevant codewise. The processing done by sg_fd_to_skb() is weird because it does not conform to the expectations laid out by the aforementioned figure. Namely, it parses the OFFSET field only for SGT entries with i != 0 (codewise, skb != NULL). In those cases, OFFSET should always be 0. Also, it does not parse the OFFSET field for the sgt[0] case, the only case where the buffer offset is meaningful in this context. There, it uses the fd_off, aka the offset to the Scatter/Gather List in the Scatter/Gather Buffer from the figure. By equivalence, they should both be equal to the BSM (in turn, equal to priv->rx_headroom). This can actually be explained due to the bug which we had in qm_sg_entry_get_off() until the previous change: - qm_sg_entry_get_off() did not actually _work_ for sgt[0]. It returned zero even with a non-zero offset, so fd_off had to be used as a fill-in. - qm_sg_entry_get_off() always returned zero for sgt[i>0], and that resulted in no user-visible bug, because the buffer offset _was supposed_ to be zero for those buffers. So remove it from calculations. Add assertions about the OFFSET field in both cases (first or subsequent SGT entries) to make it absolutely obvious when something is not well handled. Similar logic can be seen in the driver for the architecturally similar DPAA2, where dpaa2_eth_build_frag_skb() calls dpaa2_sg_get_offset() only for i == 0. For the rest, there is even a comment stating the same thing: * Data in subsequent SG entries is stored from the * beginning of the buffer, so we don't need to add the * sg_offset. Tested on LS1046A. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Link: https://patch.msgid.link/20241029164317.50182-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04net: ena: remove devm from ethtoolRosen Penev
There's no need for devm bloat here. In addition, these are freed right before the function exits. Also swapped kcalloc order for consistency. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Shay Agroskin <shayagr@amazon.com> Link: https://patch.msgid.link/20241101214828.289752-2-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04net: ena: Remove deadcodeDr. David Alan Gilbert
ena_com_get_dev_basic_stats() has been unused since 2017's commit d81db2405613 ("net/ena: refactor ena_get_stats64 to be atomic context safe") ena_com_get_offload_settings() has been unused since the original commit of ENA back in 2016 in commit 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20241102220142.80285-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04net: ena: Remove autopolling modeDr. David Alan Gilbert
This manually reverts commit a4e262cde3cd ("net: ena: allow automatic fallback to polling mode") which is unused. (I did it manually because there are other minor comment and function changes surrounding it). Build tested only. Suggested-by: David Arinzon <darinzon@amazon.com> Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20241103194149.293456-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04Revert "Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'"Jakub Kicinski
This reverts commit d80a3091308491455b6501b1c4b68698c4a7cd24, reversing changes made to 637f41476384c76d3cd7dcf5947caf2c8b8d7a9b: 2cf246143519 ("net: hns3: fix kernel crash when 1588 is sent on HIP08 devices") 3e22b7de34cb ("net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue") d1c2e2961ab4 ("net: hns3: initialize reset_timer before hclgevf_misc_irq_init()") 5f62009ff108 ("net: hns3: don't auto enable misc vector") 2758f18a83ef ("net: hns3: Resolved the issue that the debugfs query result is inconsistent.") 662ecfc46690 ("net: hns3: fix missing features due to dev->features configuration too early") 3e0f7cc887b7 ("net: hns3: fixed reset failure issues caused by the incorrect reset type") f2c14899caba ("net: hns3: add sync command to sync io-pgtable") e6ab19443b36 ("net: hns3: default enable tx bounce buffer when smmu enabled") The series is making the driver poke into IOMMU internals instead of implementing appropriate IOMMU workarounds. Link: https://lore.kernel.org/069c9838-b781-4012-934a-d2626fa78212@arm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04e1000e: Remove Meteor Lake SMBUS workaroundsVitaly Lifshits
This is a partial revert to commit 76a0a3f9cc2f ("e1000e: fix force smbus during suspend flow"). That commit fixed a sporadic PHY access issue but introduced a regression in runtime suspend flows. The original issue on Meteor Lake systems was rare in terms of the reproduction rate and the number of the systems affected. After the integration of commit 0a6ad4d9e169 ("e1000e: avoid failing the system during pm_suspend"), PHY access loss can no longer cause a system-level suspend failure. As it only occurs when the LAN cable is disconnected, and is recovered during system resume flow. Therefore, its functional impact is low, and the priority is given to stabilizing runtime suspend. Fixes: 76a0a3f9cc2f ("e1000e: fix force smbus during suspend flow") Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04i40e: fix race condition by adding filter's intermediate sync stateAleksandr Loktionov
Fix a race condition in the i40e driver that leads to MAC/VLAN filters becoming corrupted and leaking. Address the issue that occurs under heavy load when multiple threads are concurrently modifying MAC/VLAN filters by setting mac and port VLAN. 1. Thread T0 allocates a filter in i40e_add_filter() within i40e_ndo_set_vf_port_vlan(). 2. Thread T1 concurrently frees the filter in __i40e_del_filter() within i40e_ndo_set_vf_mac(). 3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which refers to the already freed filter memory, causing corruption. Reproduction steps: 1. Spawn multiple VFs. 2. Apply a concurrent heavy load by running parallel operations to change MAC addresses on the VFs and change port VLANs on the host. 3. Observe errors in dmesg: "Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX, please set promiscuous on manually for VF XX". Exact code for stable reproduction Intel can't open-source now. The fix involves implementing a new intermediate filter state, I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list. These filters cannot be deleted from the hash list directly but must be removed using the full process. Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04idpf: fix idpf_vc_core_init error pathPavan Kumar Linga
In an event where the platform running the device control plane is rebooted, reset is detected on the driver. It releases all the resources and waits for the reset to complete. Once the reset is done, it tries to build the resources back. At this time if the device control plane is not yet started, then the driver timeouts on the virtchnl message and retries to establish the mailbox again. In the retry flow, mailbox is deinitialized but the mailbox workqueue is still alive and polling for the mailbox message. This results in accessing the released control queue leading to null-ptr-deref. Fix it by unrolling the work queue cancellation and mailbox deinitialization in the reverse order which they got initialized. Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") Cc: stable@vger.kernel.org # 6.9+ Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04idpf: avoid vport access in idpf_get_link_ksettingsPavan Kumar Linga
When the device control plane is removed or the platform running device control plane is rebooted, a reset is detected on the driver. On driver reset, it releases the resources and waits for the reset to complete. If the reset fails, it takes the error path and releases the vport lock. At this time if the monitoring tools tries to access link settings, it call traces for accessing released vport pointer. To avoid it, move link_speed_mbps to netdev_priv structure which removes the dependency on vport pointer and the vport lock in idpf_get_link_ksettings. Also use netif_carrier_ok() to check the link status and adjust the offsetof to use link_up instead of link_speed_mbps. Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Cc: stable@vger.kernel.org # 6.7+ Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04ice: change q_index variable type to s16 to store -1 valueMateusz Polchlopek
Fix Flow Director not allowing to re-map traffic to 0th queue when action is configured to drop (and vice versa). The current implementation of ethtool callback in the ice driver forbids change Flow Director action from 0 to -1 and from -1 to 0 with an error, e.g: # ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action 0 # ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action -1 rmgr: Cannot insert RX class rule: Invalid argument We set the value of `u16 q_index = 0` at the beginning of the function ice_set_fdir_input_set(). In case of "drop traffic" action (which is equal to -1 in ethtool) we store the 0 value. Later, when want to change traffic rule to redirect to queue with index 0 it returns an error caused by duplicate found. Fix this behaviour by change of the type of field `q_index` from u16 to s16 in `struct ice_fdir_fltr`. This allows to store -1 in the field in case of "drop traffic" action. What is more, change the variable type in the function ice_set_fdir_input_set() and assign at the beginning the new `#define ICE_FDIR_NO_QUEUE_IDX` which is -1. Later, if the action is set to another value (point specific queue index) the variable value is overwritten in the function. Fixes: cac2a27cd9ab ("ice: Support IPv4 Flow Director filters") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04ice: Fix use after free during unload with ports in bridgeMarcin Szycik
Unloading the ice driver while switchdev port representors are added to a bridge can lead to kernel panic. Reproducer: modprobe ice devlink dev eswitch set $PF1_PCI mode switchdev ip link add $BR type bridge ip link set $BR up echo 2 > /sys/class/net/$PF1/device/sriov_numvfs sleep 2 ip link set $PF1 master $BR ip link set $VF1_PR master $BR ip link set $VF2_PR master $BR ip link set $PF1 up ip link set $VF1_PR up ip link set $VF2_PR up ip link set $VF1 up rmmod irdma ice When unloading the driver, ice_eswitch_detach() is eventually called as part of VF freeing. First, it removes a port representor from xarray, then unregister_netdev() is called (via repr->ops.rem()), finally representor is deallocated. The problem comes from the bridge doing its own deinit at the same time. unregister_netdev() triggers a notifier chain, resulting in ice_eswitch_br_port_deinit() being called. It should set repr->br_port = NULL, but this does not happen since repr has already been removed from xarray and is not found. Regardless, it finishes up deallocating br_port. At this point, repr is still not freed and an fdb event can happen, in which ice_eswitch_br_fdb_event_work() takes repr->br_port and tries to use it, which causes a panic (use after free). Note that this only happens with 2 or more port representors added to the bridge, since with only one representor port, the bridge deinit is slightly different (ice_eswitch_br_port_deinit() is called via ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()). Trace: Oops: general protection fault, probably for non-canonical address 0xf129010fd1a93284: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: maybe wild-memory-access in range [0x8948287e8d499420-0x8948287e8d499427] (...) Workqueue: ice_bridge_wq ice_eswitch_br_fdb_event_work [ice] RIP: 0010:__rht_bucket_nested+0xb4/0x180 (...) Call Trace: (...) ice_eswitch_br_fdb_find+0x3fa/0x550 [ice] ? __pfx_ice_eswitch_br_fdb_find+0x10/0x10 [ice] ice_eswitch_br_fdb_event_work+0x2de/0x1e60 [ice] ? __schedule+0xf60/0x5210 ? mutex_lock+0x91/0xe0 ? __pfx_ice_eswitch_br_fdb_event_work+0x10/0x10 [ice] ? ice_eswitch_br_update_work+0x1f4/0x310 [ice] (...) A workaround is available: brctl setageing $BR 0, which stops the bridge from adding fdb entries altogether. Change the order of operations in ice_eswitch_detach(): move the call to unregister_netdev() before removing repr from xarray. This way repr->br_port will be correctly set to NULL in ice_eswitch_br_port_deinit(), preventing a panic. Fixes: fff292b47ac1 ("ice: add VF representors one by one") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-04RDMA/mlx5: Ensure active slave attachment to the bond IB deviceChiara Meiohas
Fix a race condition when creating a lag bond in active backup mode where after the bond creation the backup slave was attached to the IB device, instead of the active slave. This caused stale entries in the GID table, as the gid updating mechanism relies on ib_device_get_netdev(), which would return the backup slave. Send an MLX5_DRIVER_EVENT_ACTIVE_BACKUP_LAG_CHANGE_LOWERSTATE event when activating the lag, additionally to when modifying the lag. This ensures that eventually the active netdevice is stored in the bond IB device. When handling this event remove the GIDs of the previously attached netdevice in this port and rescan the GIDs of the newly attached netdevice. This ensures that eventually the active slave netdevice is correctly stored in the IB device port. While there might be a brief moment where the backup slave GIDs appear in the GID table, it will eventually stabilize with the correct GIDs (of the bond and the active slave). Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/91fc2cb24f63add266a528c1c702668a80416d9f.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Call dev_put() after the blocking notifierChiara Meiohas
Move dev_put() call to occur directly after the blocking notifier, instead of within the event handler. Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/342ff94b3dcbb07da1c7dab862a73933d604b717.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04net: enetc: add preliminary support for i.MX95 ENETC PFWei Fang
The i.MX95 ENETC has been upgraded to revision 4.1, which is different from the LS1028A ENETC (revision 1.0) except for the SI part. Therefore, the fsl-enetc driver is incompatible with i.MX95 ENETC PF. So add new nxp-enetc4 driver to support i.MX95 ENETC PF, and this driver will be used to support the ENETC PF with major revision 4 for other SoCs in the future. Currently, the nxp-enetc4 driver only supports basic transmission feature for i.MX95 ENETC PF, the more basic and advanced features will be added in the subsequent patches. In addition, PCS support has not been added yet, so 10G ENETC (ENETC instance 2) is not supported now. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-04net: enetc: optimize the allocation of tx_bdrClark Wang
There is a situation where num_tx_rings cannot be divided by bdr_int_num. For example, num_tx_rings is 8 and bdr_int_num is 3. According to the previous logic, this results in two tx_bdr corresponding memories not being allocated, so when sending packets to tx ring 6 or 7, wild pointers will be accessed. Of course, this issue doesn't exist on LS1028A, because its num_tx_rings is 8, and bdr_int_num is either 1 or 2. However, there is a risk for the upcoming i.MX95. Therefore, it is necessary to ensure that each tx_bdr can be allocated to the corresponding memory. Signed-off-by: Clark Wang <xiaoning.wang@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-04net: enetc: extract enetc_int_vector_init/destroy() from enetc_alloc_msix()Clark Wang
Extract enetc_int_vector_init() and enetc_int_vector_destroy() from enetc_alloc_msix() so that the code is more concise and readable. Signed-off-by: Clark Wang <xiaoning.wang@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-04net: enetc: add i.MX95 EMDIO supportWei Fang
The verdor ID and device ID of i.MX95 EMDIO are different from LS1028A EMDIO, so add new vendor ID and device ID to pci_device_id table to support i.MX95 EMDIO. Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-04net: enetc: remove ERR050089 workaround for i.MX95Vladimir Oltean
The ERR050089 workaround causes performance degradation and potential functional issues (e.g., RCU stalls) under certain workloads. Since new SoCs like i.MX95 do not require this workaround, use a static key to compile out enetc_lock_mdio() and enetc_unlock_mdio() at runtime, improving performance and avoiding unnecessary logic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-04net: enetc: build enetc_pf_common.c as a separate moduleWei Fang
Compile enetc_pf_common.c as a standalone module to allow shared usage between ENETC v1 and v4 PF drivers. Add struct enetc_pf_ops to register different hardware operation interfaces for both ENETC v1 and v4 PF drivers. Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-04net: enetc: extract common ENETC PF parts for LS1028A and i.MX95 platformsWei Fang
The ENETC PF driver of LS1028A (rev 1.0) is incompatible with the version used on the i.MX95 platform (rev 4.1), except for the station interface (SI) part. To reduce code redundancy and prepare for a new driver for rev 4.1 and later, extract shared interfaces from enetc_pf.c and move them to enetc_pf_common.c. This refactoring lays the groundwork for compiling enetc_pf_common.c into a shared driver for both platforms' PF drivers. Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-04net: enetc: add initial netc-blk-ctrl driver supportWei Fang
The netc-blk-ctrl driver is used to configure Integrated Endpoint Register Block (IERB) and Privileged Register Block (PRB) of NETC. For i.MX platforms, it is also used to configure the NETCMIX block. The IERB contains registers that are used for pre-boot initialization, debug, and non-customer configuration. The PRB controls global reset and global error handling for NETC. The NETCMIX block is mainly used to set MII protocol and PCS protocol of the links, it also contains settings for some other functions. Note the IERB configuration registers can only be written after being unlocked by PRB, otherwise, all write operations are inhibited. A warm reset is performed when the IERB is unlocked, and it results in an FLR to all NETC devices. Therefore, all NETC device drivers must be probed or initialized after the warm reset is finished. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-03net/mlx5e: do not create xdp_redirect for non-uplink repWilliam Tu
XDP and XDP socket require extra SQ/RQ/CQs. Most of these resources are dynamically created: no XDP program loaded, no resources are created. One exception is the SQ/CQ created for XDP_REDRIECT, used for other netdev to forward packet to mlx5 for transmit. The patch disables creation of SQ and CQ used for egress XDP_REDIRECT, by checking whether ndo_xdp_xmit is set or not. For netdev without XDP support such as non-uplink representor, this saves around 0.35MB of memory, per representor netdevice per channel. Signed-off-by: William Tu <witu@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241031125856.530927-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net/mlx5e: move XDP_REDIRECT sq to dynamic allocationWilliam Tu
Dynamically allocating xdpsq, used by egress side XDP_REDIRECT. mlx5 has multiple XDP sqs. Under struct mlx5e_channel: 1. rx_xdpsq: used for XDP_TX, an XDP prog handles the rx packet and transmits using the same queue as rx. 2. xdpsq: used by egress side XDP_REDIRECT. This is for another interface to redirect packet to the mlx5 interface, using ndo_xdp_xmit . 3. xsksq: used by XSK. XSK has its own dedicated channel, and it also has resources of 1 and 2. The patch changes only the 2. xdpsq. Signed-off-by: William Tu <witu@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241031125856.530927-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net/mlx5: HWS, renamed the files in accordance with naming conventionYevgeny Kliteynik
Removed the 'mlx5hws_' file name prefix from the internal HWS files. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241031125856.530927-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net/mlx5: DR, moved all the SWS code into a separate directoryYevgeny Kliteynik
After adding HWS support in a separate folder, moving all the SWS code into its own folder as well. Now SWS and HWS implementation are located in their appropriate folders: - steering/sws/ - steering/hws/ Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241031125856.530927-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net/mlx5: Rework esw qos domain init and cleanupCosmin Ratiu
The first approach was flawed, because there are situations where the esw mode change fails, leaving the qos domain as NULL. Various calls into the QoS infra then trigger a NULL pointer access and unhappiness. Improve that by a combination of: - Allocating the QoS domain on esw init and cleaning it up on teardown. - Refactoring mode change to only call qos domain init but not cleanup. - Making qos domain init idempotent - not change anything if nothing needs changing. Together, these should guarantee that, as long as the memory allocations succeed, there should always be a valid qos domain until the esw cleanup, no matter what mode changes happen (or failures thereof). Fixes: 107a034d5c1e ("net/mlx5: qos: Store rate groups in a qos domain") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241031125856.530927-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: stmmac: xgmac: Enable FPE for tc-mqprio/tc-taprioFurong Xu
The FPE on XGMAC is ready, it is time to update dwxgmac_tc_ops to let user configure FPE via tc-mqprio/tc-taprio. Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/0575ef1553d572b7c8bc1baafa3fb7ac641073e0.1730449003.git.0x1207@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: stmmac: xgmac: Complete FPE supportFurong Xu
Implement the necessary fpe_map_preemption_class callback for xgmac. Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/d0347f2b8a71fee372e53293fe26a6538775ec5d.1730449003.git.0x1207@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: stmmac: xgmac: Rename XGMAC_RQ to XGMAC_FPRQFurong Xu
Synopsys XGMAC Databook defines MAC_RxQ_Ctrl1 register: RQ: Frame Preemption Residue Queue XGMAC_FPRQ is more readable and more consistent with GMAC4. Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/611991edf9e9d6fac8b29c3fe952791b193ca179.1730449003.git.0x1207@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: stmmac: Get the TC number of net_device by netdev_get_num_tc()Furong Xu
netdev_get_num_tc() is the right method, we should not access net_device.num_tc directly. Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/6298463f4655a76faf94e4273a4205c13ca17c77.1730449003.git.0x1207@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>