summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/igc
AgeCommit message (Collapse)Author
2024-01-31ethtool: replace struct ethtool_eee with a new struct ethtool_keee on kernel ↵Heiner Kallweit
side In order to pass EEE link modes beyond bit 32 to userspace we have to complement the 32 bit bitmaps in struct ethtool_eee with linkmode bitmaps. Therefore, similar to ethtool_link_settings and ethtool_link_ksettings, add a struct ethtool_keee. In a first step it's an identical copy of ethtool_eee. This patch simply does a s/ethtool_eee/ethtool_keee/g for all users. No functional change intended. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c e009b2efb7a8 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()") 0f2b21477988 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL") https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-01-03 (i40e, ice, igc) This series contains updates to i40e, ice, and igc drivers. Ke Xiao fixes use after free for unicast filters on i40e. Andrii restores VF MSI-X flag after PCI reset on i40e. Paul corrects admin queue link status structure to fulfill firmware expectations for ice. Rodrigo Cataldo corrects value used for hicredit calculations on igc. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Fix hicredit calculation ice: fix Get link status data length i40e: Restore VF MSI-X state during PCI reset i40e: fix use-after-free in i40e_aqc_add_filters() ==================== Link: https://lore.kernel.org/r/20240103193254.822968-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03igc: Fix hicredit calculationRodrigo Cataldo
According to the Intel Software Manual for I225, Section 7.5.2.7, hicredit should be multiplied by the constant link-rate value, 0x7736. Currently, the old constant link-rate value, 0x7735, from the boards supported on igb are being used, most likely due to a copy'n'paste, as the rest of the logic is the same for both drivers. Update hicredit accordingly. Fixes: 1ab011b0bf07 ("igc: Add support for CBS offloading") Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Rodrigo Cataldo <rodrigo.cadore@l-acoustics.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27igc: Check VLAN EtherType maskKurt Kanzenbach
Currently the driver accepts VLAN EtherType steering rules regardless of the configured mask. And things might fail silently or with confusing error messages to the user. The VLAN EtherType can only be matched by full mask. Therefore, add a check for that. For instance the following rule is invalid, but the driver accepts it and ignores the user specified mask: |root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 \ | m 0x00ff action 0 |Added rule with ID 63 |root@host:~# ethtool --show-ntuple enp3s0 |4 RX rings available |Total 1 rules | |Filter: 63 | Flow Type: Raw Ethernet | Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF | Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF | Ethertype: 0x0 mask: 0xFFFF | VLAN EtherType: 0x8100 mask: 0x0 | VLAN: 0x0 mask: 0xffff | User-defined: 0x0 mask: 0xffffffffffffffff | Action: Direct to queue 0 After: |root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 \ | m 0x00ff action 0 |rmgr: Cannot insert RX class rule: Operation not supported Fixes: 2b477d057e33 ("igc: Integrate flex filter into ethtool ops") Suggested-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27igc: Check VLAN TCI maskKurt Kanzenbach
Currently the driver accepts VLAN TCI steering rules regardless of the configured mask. And things might fail silently or with confusing error messages to the user. There are two ways to handle the VLAN TCI mask: 1. Match on the PCP field using a VLAN prio filter 2. Match on complete TCI field using a flex filter Therefore, add checks and code for that. For instance the following rule is invalid and will be converted into a VLAN prio rule which is not correct: |root@host:~# ethtool -N enp3s0 flow-type ether vlan 0x0001 m 0xf000 \ | action 1 |Added rule with ID 61 |root@host:~# ethtool --show-ntuple enp3s0 |4 RX rings available |Total 1 rules | |Filter: 61 | Flow Type: Raw Ethernet | Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF | Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF | Ethertype: 0x0 mask: 0xFFFF | VLAN EtherType: 0x0 mask: 0xffff | VLAN: 0x1 mask: 0x1fff | User-defined: 0x0 mask: 0xffffffffffffffff | Action: Direct to queue 1 After: |root@host:~# ethtool -N enp3s0 flow-type ether vlan 0x0001 m 0xf000 \ | action 1 |rmgr: Cannot insert RX class rule: Operation not supported Fixes: 7991487ecb2d ("igc: Allow for Flex Filters to be installed") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27igc: Report VLAN EtherType matching back to userKurt Kanzenbach
Currently the driver allows to configure matching by VLAN EtherType. However, the retrieval function does not report it back to the user. Add it. Before: |root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 action 0 |Added rule with ID 63 |root@host:~# ethtool --show-ntuple enp3s0 |4 RX rings available |Total 1 rules | |Filter: 63 | Flow Type: Raw Ethernet | Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF | Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF | Ethertype: 0x0 mask: 0xFFFF | Action: Direct to queue 0 After: |root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 action 0 |Added rule with ID 63 |root@host:~# ethtool --show-ntuple enp3s0 |4 RX rings available |Total 1 rules | |Filter: 63 | Flow Type: Raw Ethernet | Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF | Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF | Ethertype: 0x0 mask: 0xFFFF | VLAN EtherType: 0x8100 mask: 0x0 | VLAN: 0x0 mask: 0xffff | User-defined: 0x0 mask: 0xffffffffffffffff | Action: Direct to queue 0 Fixes: 2b477d057e33 ("igc: Integrate flex filter into ethtool ops") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18igc: field get conversionJesse Brandeburg
Refactor the igc driver to use FIELD_GET() for mask and shift reads, which reduces lines of code and adds clarity of intent. This code was generated by the following coccinelle/spatch script and then manually repaired in a later patch. @get@ constant shift,mask; type T; expression a; @@ -((T)((a) & mask) >> shift) +FIELD_GET(mask, a) and applied via: spatch --sp-file field_prep.cocci --in-place --dir \ drivers/net/ethernet/intel/ Cc: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18igc: field prep conversionJesse Brandeburg
Refactor igc driver to use FIELD_PREP(), which reduces lines of code and adds clarity of intent. This code was generated by the following coccinelle/spatch script and then manually repaired in a later patch. @prep2@ constant shift,mask; type T; expression a; @@ -(((T)(a) << shift) & mask) +FIELD_PREP(mask, a) @prep@ constant shift,mask; type T; expression a; @@ -((T)((a) << shift) & mask) +FIELD_PREP(mask, a) Cc: Julia Lawall <Julia.Lawall@inria.fr> Cc: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18intel: add bit macro includes where neededJesse Brandeburg
This series is introducing the use of FIELD_GET and FIELD_PREP which requires bitfield.h to be included. Fix all the includes in this one change, and rearrange includes into alphabetical order to ease readability and future maintenance. virtchnl.h and it's usage was modified to have it's own includes as it should. This required including bits.h for virtchnl.h. Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-13net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool opsAhmed Zaki
The get/set_rxfh ethtool ops currently takes the rxfh (RSS) parameters as direct function arguments. This will force us to change the API (and all drivers' functions) every time some new parameters are added. This is part 1/2 of the fix, as suggested in [1]: - First simplify the code by always providing a pointer to all params (indir, key and func); the fact that some of them may be NULL seems like a weird historic thing or a premature optimization. It will simplify the drivers if all pointers are always present. - Then make the functions take a dev pointer, and a pointer to a single struct wrapping all arguments. The set_* should also take an extack. Link: https://lore.kernel.org/netdev/20231121152906.2dd5f487@kernel.org/ [1] Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://lore.kernel.org/r/20231213003321.605376-2-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-08net: Convert some ethtool_sprintf() to ethtool_puts()justinstitt@google.com
This patch converts some basic cases of ethtool_sprintf() to ethtool_puts(). The conversions are used in cases where ethtool_sprintf() was being used with just two arguments: | ethtool_sprintf(&data, buffer[i].name); or when it's used with format string: "%s" | ethtool_sprintf(&data, "%s", buffer[i].name); which both now become: | ethtool_puts(&data, buffer[i].name); Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-13igc: Add support for PTP .getcyclesx64()Vinicius Costa Gomes
Add support for using Timer 1 (i225/i226 have 4 timer registers) as a free-running clock (the "cycles" clock) in addition to Timer 0 (the default, "adjustable clock"). The objective is to allow taprio/etf offloading to coexist with PTP vclocks. Besides the implementation of .getcyclesx64() for i225/i226, to keep timestamping working when vclocks are in use, we also need to add support for TX and RX timestamping using the free running timer, when the requesting socket is bound to a vclock. On the RX side, i225/i226 can be configured to store the values of two timers in the received packet metadata area, so it's a matter of configuring the right registers and retrieving the right timestamp. The TX is a bit more involved because the hardware stores a single timestamp (with the selected timer in the TX descriptor) into one of the timestamp registers. Note some changes at how the timestamps are done for RX, the conversion and adjustment of timestamps are now done closer to the consumption of the timestamp instead of near the reception. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-11-13igc: Simplify setting flags in the TX data descriptorVinicius Costa Gomes
We can re-use the IGC_SET_FLAG() macro to simplify setting some values in the TX data descriptor. With the macro it's easier to get the meaning of the operations. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20igc: Fix ambiguity in the ethtool advertisingSasha Neftin
The 'ethtool_convert_link_mode_to_legacy_u32' method does not allow us to advertise 2500M speed support and TP (twisted pair) properly. Convert to 'ethtool_link_ksettings_test_link_mode' to advertise supported speed and eliminate ambiguity. Fixes: 8c5ad0dae93c ("igc: Add ethtool support") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Suggested-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231019203641.3661960-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18igc: replace deprecated strncpy with strscpyJustin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect netdev->name to be NUL-terminated based on its use with format strings: | if (q_vector->rx.ring && q_vector->tx.ring) | sprintf(q_vector->name, "%s-TxRx-%u", netdev->name, Furthermore, we do not need NUL-padding as netdev is already zero-allocated: | netdev = alloc_etherdev_mq(sizeof(struct igc_adapter), | IGC_MAX_TX_QUEUES); ... alloc_etherdev() -> alloc_etherdev_mq() -> alloc_etherdev_mqs() -> alloc_netdev_mqs() ... | p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231017190411.2199743-10-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18intel: fix format warningsJesse Brandeburg
Get ahead of the game and fix all the -Wformat=2 noted warnings in the intel drivers directory. There are one set of i40e and iavf warnings I couldn't figure out how to fix because the driver is already using vsnprintf without an explicit "const char *" format string. Tested with both gcc-12 and clang-15. I found gcc-12 runs clean after this series but clang-15 is a little worried about the vsnprintf lines. summary of warnings: drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c:148:34: warning: format string is not a string literal [-Wformat-nonliteral] drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1416:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1416:24: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1421:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1421:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/igc/igc_ethtool.c:776:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igc/igc_ethtool.c:776:24: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/igc/igc_ethtool.c:779:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igc/igc_ethtool.c:779:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/iavf/iavf_ethtool.c:199:34: warning: format string is not a string literal [-Wformat-nonliteral] drivers/net/ethernet/intel/igb/igb_ethtool.c:2360:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igb/igb_ethtool.c:2360:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/igb/igb_ethtool.c:2363:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/igb/igb_ethtool.c:2363:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/i40e/i40e_ethtool.c:208:34: warning: format string is not a string literal [-Wformat-nonliteral] drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2515:23: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2515:23: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2519:23: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2519:23: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ice/ice_ethtool.c:1064:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ice/ice_ethtool.c:1064:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ice/ice_ethtool.c:1084:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ice/ice_ethtool.c:1084:6: note: treat the string as an argument to avoid this drivers/net/ethernet/intel/ice/ice_ethtool.c:1100:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security] drivers/net/ethernet/intel/ice/ice_ethtool.c:1100:24: note: treat the string as an argument to avoid this Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231017190411.2199743-3-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-09-21igc: Expose tx-usecs coalesce setting to userMuhammad Husaini Zulkifli
When users attempt to obtain the coalesce setting using the ethtool command, current code always returns 0 for tx-usecs. This is because I225/6 always uses a queue pair setting, hence tx_coalesce_usecs does not return a value during the igc_ethtool_get_coalesce() callback process. The pair queue condition checking in igc_ethtool_get_coalesce() is removed by this patch so that the user gets information of the value of tx-usecs. Even if i225/6 is using queue pair setting, there is no harm in notifying the user of the tx-usecs. The implementation of the current code may have previously been a copy of the legacy code i210. Since I225 has the queue pair setting enabled, tx-usecs will always adhere to the user-set rx-usecs value. An error message will appear when the user attempts to set the tx-usecs value for the input parameters because, by default, they should only set the rx-usecs value. This patch also adds the helper function to get the previous rx coalesce value similar to tx coalesce. How to test: User can get the coalesce value using ethtool command. Example command: Get: ethtool -c <interface> Previous output: rx-usecs: 3 rx-frames: n/a rx-usecs-irq: n/a rx-frames-irq: n/a tx-usecs: 0 tx-frames: n/a tx-usecs-irq: n/a tx-frames-irq: n/a New output: rx-usecs: 3 rx-frames: n/a rx-usecs-irq: n/a rx-frames-irq: n/a tx-usecs: 3 tx-frames: n/a tx-usecs-irq: n/a tx-frames-irq: n/a Fixes: 8c5ad0dae93c ("igc: Add ethtool support") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230919170331.1581031-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-16igc: Fix infinite initialization loop with early XDP redirectVinicius Costa Gomes
When an XDP redirect happens before the link is ready, that transmission will not finish and will timeout, causing an adapter reset. If the redirects do not stop, the adapter will not stop resetting. Wait for the driver to signal that there's a carrier before allowing transmissions to proceed. Previous code was relying that when __IGC_DOWN is cleared, the NIC is ready to transmit as all the queues are ready, what happens is that the carrier presence will only be signaled later, after the watchdog workqueue has a chance to run. And during this interval (between clearing __IGC_DOWN and the watchdog running) if any transmission happens the timeout is emitted (detected by igc_tx_timeout()) which causes the reset, with the potential for the infinite loop. Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action") Reported-by: Ferenc Fejes <ferenc.fejes@ericsson.com> Closes: https://lore.kernel.org/netdev/0caf33cf6adb3a5bf137eeaa20e89b167c9986d5.camel@ericsson.com/ Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Ferenc Fejes <ferenc.fejes@ericsson.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-05igc: Change IGC_MIN to allow set rx/tx value between 64 and 80Olga Zaborska
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igc devices can use as low as 64 descriptors. This change will unify igc with other drivers. Based on commit 7b1be1987c1e ("e1000e: lower ring minimum size to 64") Fixes: 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers") Signed-off-by: Olga Zaborska <olga.zaborska@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24igc: Decrease PTM short interval from 10 us to 1 usSasha Neftin
With the 10us interval, we were seeing PTM transactions take around 12us. Hardware team suggested this interval could be lowered to 1us which was confirmed with PCIe sniffer. With the 1us interval, PTM dialogs took around 2us. Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24igc: Add support for multiple in-flight TX timestampsVinicius Costa Gomes
Add support for using the four sets of timestamping registers that i225/i226 have available for TX. In some workloads, where multiple applications request hardware transmission timestamps, it was possible that some of those requests were denied because the only in use register was already occupied. This is also in preparation to future support for hardware timestamping with multiple PTP domains. With multiple domains chances of multiple TX timestamps being requested at the same time increase. Before: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37 | responses | TX timestamp offset (ns) rate clients | lost invalid basic xleave | min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% +1 +41 +73 13 1500 150 0.00% 0.00% 0.00% 100.00% +9 +49 +87 15 2250 225 0.00% 0.00% 0.00% 100.00% +9 +42 +79 13 3375 337 0.00% 0.00% 0.00% 100.00% +11 +46 +81 13 5062 506 0.00% 0.00% 0.00% 100.00% +7 +44 +80 13 7593 759 0.00% 0.00% 0.00% 100.00% +9 +44 +79 12 11389 1138 0.00% 0.00% 0.00% 100.00% +14 +51 +87 13 17083 1708 0.00% 0.00% 0.00% 100.00% +1 +41 +80 14 25624 2562 0.00% 0.00% 0.00% 100.00% +11 +50 +5107 51 38436 3843 0.00% 0.00% 0.00% 100.00% -2 +36 +7843 38 57654 5765 0.00% 0.00% 0.00% 100.00% +4 +42 +10503 69 86481 8648 0.00% 0.00% 0.00% 100.00% +11 +54 +5492 65 129721 12972 0.00% 0.00% 0.00% 100.00% +31 +2680 +6942 2606 194581 16384 16.79% 0.00% 0.87% 82.34% +73 +4444 +15879 3116 291871 16384 35.05% 0.00% 1.53% 63.42% +188 +5381 +17019 3035 437806 16384 54.95% 0.00% 2.55% 42.50% +233 +6302 +13885 2846 After: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o 37 | responses | TX timestamp offset (ns) rate clients | lost invalid basic xleave | min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% -20 +12 +43 13 1500 150 0.00% 0.00% 0.00% 100.00% -23 +18 +57 14 2250 225 0.00% 0.00% 0.00% 100.00% -2 +33 +67 13 3375 337 0.00% 0.00% 0.00% 100.00% +1 +38 +76 13 5062 506 0.00% 0.00% 0.00% 100.00% +9 +52 +93 14 7593 759 0.00% 0.00% 0.00% 100.00% +11 +47 +82 13 11389 1138 0.00% 0.00% 0.00% 100.00% -9 +27 +74 13 17083 1708 0.00% 0.00% 0.00% 100.00% -13 +25 +66 14 25624 2562 0.00% 0.00% 0.00% 100.00% -8 +28 +65 13 38436 3843 0.00% 0.00% 0.00% 100.00% -13 +28 +69 13 57654 5765 0.00% 0.00% 0.00% 100.00% -11 +32 +71 14 86481 8648 0.00% 0.00% 0.00% 100.00% +2 +44 +83 14 129721 12972 15.36% 0.00% 0.35% 84.29% -2 +2248 +22907 4252 194581 16384 42.98% 0.00% 1.98% 55.04% -4 +5278 +65039 5856 291871 16384 54.33% 0.00% 2.21% 43.46% -3 +6306 +22608 5665 We can see that with 4 registers, as expected, we are able to handle a increasing number of requests more consistently, but as soon as all registers are in use, the decrease in quality of service happens in a sharp step. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/net/inet_sock.h f866fbc842de ("ipv4: fix data-races around inet->inet_id") c274af224269 ("inet: introduce inet->inet_flags") https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/ Adjacent changes: drivers/net/bonding/bond_alb.c e74216b8def3 ("bonding: fix macvlan over alb bond support") f11e5bd159b0 ("bonding: support balance-alb with openvswitch") drivers/net/ethernet/broadcom/bgmac.c d6499f0b7c7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()") 23a14488ea58 ("net: bgmac: Fix return value check for fixed_phy_register()") drivers/net/ethernet/broadcom/genet/bcmmii.c 32bbe64a1386 ("net: bcmgenet: Fix return value check for fixed_phy_register()") acf50d1adbf4 ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()") net/sctp/socket.c f866fbc842de ("ipv4: fix data-races around inet->inet_id") b09bde5c3554 ("inet: move inet->mc_loop to inet->inet_frags") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22igc: Fix the typo in the PTM Control macroSasha Neftin
The IGC_PTM_CTRL_SHRT_CYC defines the time between two consecutive PTM requests. The bit resolution of this field is six bits. That bit five was missing in the mask. This patch comes to correct the typo in the IGC_PTM_CTRL_SHRT_CYC macro. Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20230821171721.2203572-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/intel/igc/igc_main.c 06b412589eef ("igc: Add lock to safeguard global Qbv variables") d3750076d464 ("igc: Add TransmissionOverrun counter") drivers/net/ethernet/microsoft/mana/mana_en.c a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive") a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h") 92272ec4107e ("eth: add missing xdp.h includes in drivers") net/mptcp/protocol.h 511b90e39250 ("mptcp: fix disconnect vs accept race") b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") tools/testing/selftests/net/mptcp/mptcp_join.sh c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test") 03668c65d153 ("selftests: mptcp: join: rework detailed report") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08igc: Add lock to safeguard global Qbv variablesMuhammad Husaini Zulkifli
Access to shared variables through hrtimer requires locking in order to protect the variables because actions to write into these variables (oper_gate_closed, admin_gate_closed, and qbv_transition) might potentially occur simultaneously. This patch provides a locking mechanisms to avoid such scenarios. Fixes: 175c241288c0 ("igc: Fix TX Hang issue when QBV Gate is closed") Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230807205129.3129346-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-03Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2023-08-03 We've added 54 non-merge commits during the last 10 day(s) which contain a total of 84 files changed, 4026 insertions(+), 562 deletions(-). The main changes are: 1) Add SO_REUSEPORT support for TC bpf_sk_assign from Lorenz Bauer, Daniel Borkmann 2) Support new insns from cpu v4 from Yonghong Song 3) Non-atomically allocate freelist during prefill from YiFei Zhu 4) Support defragmenting IPv(4|6) packets in BPF from Daniel Xu 5) Add tracepoint to xdp attaching failure from Leon Hwang 6) struct netdev_rx_queue and xdp.h reshuffling to reduce rebuild time from Jakub Kicinski * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits) net: invert the netdevice.h vs xdp.h dependency net: move struct netdev_rx_queue out of netdevice.h eth: add missing xdp.h includes in drivers selftests/bpf: Add testcase for xdp attaching failure tracepoint bpf, xdp: Add tracepoint to xdp attaching failure selftests/bpf: fix static assert compilation issue for test_cls_*.c bpf: fix bpf_probe_read_kernel prototype mismatch riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework libbpf: fix typos in Makefile tracing: bpf: use struct trace_entry in struct syscall_tp_t bpf, devmap: Remove unused dtab field from bpf_dtab_netdev bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry netfilter: bpf: Only define get_proto_defrag_hook() if necessary bpf: Fix an array-index-out-of-bounds issue in disasm.c net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn docs/bpf: Fix malformed documentation bpf: selftests: Add defrag selftests bpf: selftests: Support custom type and proto for client sockets bpf: selftests: Support not connecting client socket netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link ... ==================== Link: https://lore.kernel.org/r/20230803174845.825419-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-03eth: add missing xdp.h includes in driversJakub Kicinski
Handful of drivers currently expect to get xdp.h by virtue of including netdevice.h. This will soon no longer be the case so add explicit includes. Reviewed-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-07-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26igc: Fix Kernel Panic during ndo_tx_timeout callbackMuhammad Husaini Zulkifli
The Xeon validation group has been carrying out some loaded tests with various HW configurations, and they have seen some transmit queue time out happening during the test. This will cause the reset adapter function to be called by igc_tx_timeout(). Similar race conditions may arise when the interface is being brought down and up in igc_reinit_locked(), an interrupt being generated, and igc_clean_tx_irq() being called to complete the TX. When the igc_tx_timeout() function is invoked, this patch will turn off all TX ring HW queues during igc_down() process. TX ring HW queues will be activated again during the igc_configure_tx_ring() process when performing the igc_up() procedure later. This patch also moved existing igc_disable_tx_ring_hw() to avoid using forward declaration. Kernel trace: [ 7678.747813] ------------[ cut here ]------------ [ 7678.757914] NETDEV WATCHDOG: enp1s0 (igc): transmit queue 2 timed out [ 7678.770117] WARNING: CPU: 0 PID: 13 at net/sched/sch_generic.c:525 dev_watchdog+0x1ae/0x1f0 [ 7678.784459] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO) cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO) svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO) svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO) smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO) pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO) svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O) ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci [ 7678.784496] input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common usb_common pps_core [ 7679.200403] RIP: 0010:dev_watchdog+0x1ae/0x1f0 [ 7679.210201] Code: 28 e9 53 ff ff ff 4c 89 e7 c6 05 06 42 b9 00 01 e8 17 d1 fb ff 44 89 e9 4c 89 e6 48 c7 c7 40 ad fb 81 48 89 c2 e8 52 62 82 ff <0f> 0b e9 72 ff ff ff 65 8b 05 80 7d 7c 7e 89 c0 48 0f a3 05 0a c1 [ 7679.245438] RSP: 0018:ffa00000001f7d90 EFLAGS: 00010282 [ 7679.256021] RAX: 0000000000000000 RBX: ff11000109938440 RCX: 0000000000000000 [ 7679.268710] RDX: ff11000361e26cd8 RSI: ff11000361e1b880 RDI: ff11000361e1b880 [ 7679.281314] RBP: ffa00000001f7da8 R08: ff1100035f8fffe8 R09: 0000000000027ffb [ 7679.293840] R10: 0000000000001f0a R11: ff1100035f840000 R12: ff11000109938000 [ 7679.306276] R13: 0000000000000002 R14: dead000000000122 R15: ffa00000001f7e18 [ 7679.318648] FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000 [ 7679.332064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7679.342757] CR2: 00007ffff7fca168 CR3: 000000013b08a006 CR4: 0000000000471ef8 [ 7679.354984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7679.367207] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 7679.379370] PKRU: 55555554 [ 7679.386446] Call Trace: [ 7679.393152] <TASK> [ 7679.399363] ? __pfx_dev_watchdog+0x10/0x10 [ 7679.407870] call_timer_fn+0x31/0x110 [ 7679.415698] expire_timers+0xb2/0x120 [ 7679.423403] run_timer_softirq+0x179/0x1e0 [ 7679.431532] ? __schedule+0x2b1/0x820 [ 7679.439078] __do_softirq+0xd1/0x295 [ 7679.446426] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 7679.454867] run_ksoftirqd+0x22/0x30 [ 7679.462058] smpboot_thread_fn+0xb7/0x160 [ 7679.469670] kthread+0xcd/0xf0 [ 7679.476097] ? __pfx_kthread+0x10/0x10 [ 7679.483211] ret_from_fork+0x29/0x50 [ 7679.490047] </TASK> [ 7679.495204] ---[ end trace 0000000000000000 ]--- [ 7679.503179] igc 0000:01:00.0 enp1s0: Register Dump [ 7679.511230] igc 0000:01:00.0 enp1s0: Register Name Value [ 7679.519892] igc 0000:01:00.0 enp1s0: CTRL 181c0641 [ 7679.528782] igc 0000:01:00.0 enp1s0: STATUS 40280683 [ 7679.537551] igc 0000:01:00.0 enp1s0: CTRL_EXT 10000040 [ 7679.546284] igc 0000:01:00.0 enp1s0: MDIC 180a3800 [ 7679.554942] igc 0000:01:00.0 enp1s0: ICR 00000081 [ 7679.563503] igc 0000:01:00.0 enp1s0: RCTL 04408022 [ 7679.571963] igc 0000:01:00.0 enp1s0: RDLEN[0-3] 00001000 00001000 00001000 00001000 [ 7679.583075] igc 0000:01:00.0 enp1s0: RDH[0-3] 00000068 000000b6 0000000f 00000031 [ 7679.594162] igc 0000:01:00.0 enp1s0: RDT[0-3] 00000066 000000b2 0000000e 00000030 [ 7679.605174] igc 0000:01:00.0 enp1s0: RXDCTL[0-3] 02040808 02040808 02040808 02040808 [ 7679.616196] igc 0000:01:00.0 enp1s0: RDBAL[0-3] 1bb7c000 1bb7f000 1bb82000 0ef33000 [ 7679.627242] igc 0000:01:00.0 enp1s0: RDBAH[0-3] 00000001 00000001 00000001 00000001 [ 7679.638256] igc 0000:01:00.0 enp1s0: TCTL a503f0fa [ 7679.646607] igc 0000:01:00.0 enp1s0: TDBAL[0-3] 2ba4a000 1bb6f000 1bb74000 1bb79000 [ 7679.657609] igc 0000:01:00.0 enp1s0: TDBAH[0-3] 00000001 00000001 00000001 00000001 [ 7679.668551] igc 0000:01:00.0 enp1s0: TDLEN[0-3] 00001000 00001000 00001000 00001000 [ 7679.679470] igc 0000:01:00.0 enp1s0: TDH[0-3] 000000a7 0000002d 000000bf 000000d9 [ 7679.690406] igc 0000:01:00.0 enp1s0: TDT[0-3] 000000a7 0000002d 000000bf 000000d9 [ 7679.701264] igc 0000:01:00.0 enp1s0: TXDCTL[0-3] 02100108 02100108 02100108 02100108 [ 7679.712123] igc 0000:01:00.0 enp1s0: Reset adapter [ 7683.085967] igc 0000:01:00.0 enp1s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 8086.945561] ------------[ cut here ]------------ Entering kdb (current=0xffffffff8220b200, pid 0) on processor 0 Oops: (null) due to oops @ 0xffffffff81573888 RIP: 0010:dql_completed+0x148/0x160 Code: c9 00 48 89 57 58 e9 46 ff ff ff 45 85 e4 41 0f 95 c4 41 39 db 0f 95 c1 41 84 cc 74 05 45 85 ed 78 0a 44 89 c1 e9 27 ff ff ff <0f> 0b 01 f6 44 89 c1 29 f1 0f 48 ca eb 8c cc cc cc cc cc cc cc cc RSP: 0018:ffa0000000003e00 EFLAGS: 00010287 RAX: 000000000000006c RBX: ffa0000003eb0f78 RCX: ff11000109938000 RDX: 0000000000000003 RSI: 0000000000000160 RDI: ff110001002e9480 RBP: ffa0000000003ed8 R08: ff110001002e93c0 R09: ffa0000000003d28 R10: 0000000000007cc0 R11: 0000000000007c54 R12: 00000000ffffffd9 R13: ff1100037039cb00 R14: 00000000ffffffd9 R15: ff1100037039c048 FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? igc_poll+0x1a9/0x14d0 [igc] __napi_poll+0x2e/0x1b0 net_rx_action+0x126/0x250 __do_softirq+0xd1/0x295 irq_exit_rcu+0xc5/0xf0 common_interrupt+0x86/0xa0 </IRQ> <TASK> asm_common_interrupt+0x27/0x40 RIP: 0010:cpuidle_enter_state+0xd3/0x3e0 Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8 1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d RSP: 0018:ffffffff82203df0 EFLAGS: 00000202 RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000 RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018 R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000 R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002 cpuidle_enter+0x2e/0x50 call_cpuidle+0x23/0x40 do_idle+0x1be/0x220 cpu_startup_entry+0x20/0x30 rest_init+0xb5/0xc0 arch_call_rest_init+0xe/0x30 start_kernel+0x448/0x760 x86_64_start_kernel+0x109/0x150 secondary_startup_64_no_verify+0xe0/0xeb </TASK> more> [0]kdb> [0]kdb> [0]kdb> go Catastrophic error detected kdb_continue_catastrophic=0, type go a second time if you really want to continue [0]kdb> go Catastrophic error detected kdb_continue_catastrophic=0, attempting to continue [ 8086.955689] refcount_t: underflow; use-after-free. [ 8086.955697] WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0xc2/0x110 [ 8086.955706] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO) cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO) svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO) svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO) smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO) pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO) svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O) ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci [ 8086.955751] input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common usb_common pps_core [ 8086.955784] RIP: 0010:refcount_warn_saturate+0xc2/0x110 [ 8086.955788] Code: 01 e8 82 e7 b4 ff 0f 0b 5d c3 cc cc cc cc 80 3d 68 c6 eb 00 00 75 81 48 c7 c7 a0 87 f6 81 c6 05 58 c6 eb 00 01 e8 5e e7 b4 ff <0f> 0b 5d c3 cc cc cc cc 80 3d 42 c6 eb 00 00 0f 85 59 ff ff ff 48 [ 8086.955790] RSP: 0018:ffa0000000003da0 EFLAGS: 00010286 [ 8086.955793] RAX: 0000000000000000 RBX: ff1100011da40ee0 RCX: ff11000361e1b888 [ 8086.955794] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ff11000361e1b880 [ 8086.955795] RBP: ffa0000000003da0 R08: 80000000ffff9f45 R09: ffa0000000003d28 [ 8086.955796] R10: ff1100035f840000 R11: 0000000000000028 R12: ff11000319ff8000 [ 8086.955797] R13: ff1100011bb79d60 R14: 00000000ffffffd6 R15: ff1100037039cb00 [ 8086.955798] FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000 [ 8086.955800] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8086.955801] CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8 [ 8086.955803] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8086.955803] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 8086.955804] PKRU: 55555554 [ 8086.955805] Call Trace: [ 8086.955806] <IRQ> [ 8086.955808] tcp_wfree+0x112/0x130 [ 8086.955814] skb_release_head_state+0x24/0xa0 [ 8086.955818] napi_consume_skb+0x9c/0x160 [ 8086.955821] igc_poll+0x5d8/0x14d0 [igc] [ 8086.955835] __napi_poll+0x2e/0x1b0 [ 8086.955839] net_rx_action+0x126/0x250 [ 8086.955843] __do_softirq+0xd1/0x295 [ 8086.955846] irq_exit_rcu+0xc5/0xf0 [ 8086.955851] common_interrupt+0x86/0xa0 [ 8086.955857] </IRQ> [ 8086.955857] <TASK> [ 8086.955858] asm_common_interrupt+0x27/0x40 [ 8086.955862] RIP: 0010:cpuidle_enter_state+0xd3/0x3e0 [ 8086.955866] Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8 1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d [ 8086.955867] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202 [ 8086.955869] RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f [ 8086.955870] RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000 [ 8086.955871] RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018 [ 8086.955872] R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000 [ 8086.955873] R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002 [ 8086.955875] cpuidle_enter+0x2e/0x50 [ 8086.955880] call_cpuidle+0x23/0x40 [ 8086.955884] do_idle+0x1be/0x220 [ 8086.955887] cpu_startup_entry+0x20/0x30 [ 8086.955889] rest_init+0xb5/0xc0 [ 8086.955892] arch_call_rest_init+0xe/0x30 [ 8086.955895] start_kernel+0x448/0x760 [ 8086.955898] x86_64_start_kernel+0x109/0x150 [ 8086.955900] secondary_startup_64_no_verify+0xe0/0xeb [ 8086.955904] </TASK> [ 8086.955904] ---[ end trace 0000000000000000 ]--- [ 8086.955912] ------------[ cut here ]------------ [ 8086.955913] kernel BUG at lib/dynamic_queue_limits.c:27! [ 8086.955918] invalid opcode: 0000 [#1] SMP [ 8086.955922] RIP: 0010:dql_completed+0x148/0x160 [ 8086.955925] Code: c9 00 48 89 57 58 e9 46 ff ff ff 45 85 e4 41 0f 95 c4 41 39 db 0f 95 c1 41 84 cc 74 05 45 85 ed 78 0a 44 89 c1 e9 27 ff ff ff <0f> 0b 01 f6 44 89 c1 29 f1 0f 48 ca eb 8c cc cc cc cc cc cc cc cc [ 8086.955927] RSP: 0018:ffa0000000003e00 EFLAGS: 00010287 [ 8086.955928] RAX: 000000000000006c RBX: ffa0000003eb0f78 RCX: ff11000109938000 [ 8086.955929] RDX: 0000000000000003 RSI: 0000000000000160 RDI: ff110001002e9480 [ 8086.955930] RBP: ffa0000000003ed8 R08: ff110001002e93c0 R09: ffa0000000003d28 [ 8086.955931] R10: 0000000000007cc0 R11: 0000000000007c54 R12: 00000000ffffffd9 [ 8086.955932] R13: ff1100037039cb00 R14: 00000000ffffffd9 R15: ff1100037039c048 [ 8086.955933] FS: 0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000 [ 8086.955934] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8086.955935] CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8 [ 8086.955936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8086.955937] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 8086.955938] PKRU: 55555554 [ 8086.955939] Call Trace: [ 8086.955939] <IRQ> [ 8086.955940] ? igc_poll+0x1a9/0x14d0 [igc] [ 8086.955949] __napi_poll+0x2e/0x1b0 [ 8086.955952] net_rx_action+0x126/0x250 [ 8086.955956] __do_softirq+0xd1/0x295 [ 8086.955958] irq_exit_rcu+0xc5/0xf0 [ 8086.955961] common_interrupt+0x86/0xa0 [ 8086.955964] </IRQ> [ 8086.955965] <TASK> [ 8086.955965] asm_common_interrupt+0x27/0x40 [ 8086.955968] RIP: 0010:cpuidle_enter_state+0xd3/0x3e0 [ 8086.955971] Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8 1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d [ 8086.955972] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202 [ 8086.955973] RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f [ 8086.955974] RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000 [ 8086.955974] RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018 [ 8086.955975] R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000 [ 8086.955976] R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002 [ 8086.955978] cpuidle_enter+0x2e/0x50 [ 8086.955981] call_cpuidle+0x23/0x40 [ 8086.955984] do_idle+0x1be/0x220 [ 8086.955985] cpu_startup_entry+0x20/0x30 [ 8086.955987] rest_init+0xb5/0xc0 [ 8086.955990] arch_call_rest_init+0xe/0x30 [ 8086.955992] start_kernel+0x448/0x760 [ 8086.955994] x86_64_start_kernel+0x109/0x150 [ 8086.955996] secondary_startup_64_no_verify+0xe0/0xeb [ 8086.955998] </TASK> [ 8086.955999] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO) cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO) svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO) svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO) smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO) pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO) svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O) ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci [ 8086.956029] input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common usb_common pps_core [16762.543675] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.593 msecs [16762.543678] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.595 msecs [16762.543673] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.495 msecs [16762.543679] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.599 msecs [16762.543678] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.598 msecs [16762.543690] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.605 msecs [16762.543684] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.599 msecs [16762.543693] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.613 msecs [16762.543784] ---[ end trace 0000000000000000 ]--- [16762.849099] RIP: 0010:dql_completed+0x148/0x160 PANIC: Fatal exception in interrupt Fixes: 9b275176270e ("igc: Add ndo_tx_timeout support") Tested-by: Alejandra Victoria Alcaraz <alejandra.victoria.alcaraz@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18igc: Prevent garbled TX queue with XDP ZEROCOPYFlorian Kauer
In normal operation, each populated queue item has next_to_watch pointing to the last TX desc of the packet, while each cleaned item has it set to 0. In particular, next_to_use that points to the next (necessarily clean) item to use has next_to_watch set to 0. When the TX queue is used both by an application using AF_XDP with ZEROCOPY as well as a second non-XDP application generating high traffic, the queue pointers can get in an invalid state where next_to_use points to an item where next_to_watch is NOT set to 0. However, the implementation assumes at several places that this is never the case, so if it does hold, bad things happen. In particular, within the loop inside of igc_clean_tx_irq(), next_to_clean can overtake next_to_use. Finally, this prevents any further transmission via this queue and it never gets unblocked or signaled. Secondly, if the queue is in this garbled state, the inner loop of igc_clean_tx_ring() will never terminate, completely hogging a CPU core. The reason is that igc_xdp_xmit_zc() reads next_to_use before acquiring the lock, and writing it back (potentially unmodified) later. If it got modified before locking, the outdated next_to_use is written pointing to an item that was already used elsewhere (and thus next_to_watch got written). Fixes: 9acf59a752d4 ("igc: Enable TX via AF_XDP zero-copy") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18igc: Add TransmissionOverrun counterMuhammad Husaini Zulkifli
Add TransmissionOverrun as per defined by IEEE 802.1Q Bridges. TransmissionOverrun counter shall be incremented if the implementation detects that a frame from a given queue is still being transmitted by the MAC when that gate-close event for that queue occurs. This counter is utilised by the Certification conformance test to inform the user application whether any packets are currently being transmitted on a particular queue during a gate-close event. Intel Discrete I225/I226 have a mechanism to not transmit a packets if the gate open time is insufficient for the packet transmission by setting the Strict_End bit. Thus, it is expected for this counter to be always zero at this moment. Inspired from enetc_taprio_stats() and enetc_taprio_queue_stats(), now driver also report the tx_overruns counter per traffic class. User can get this counter by using below command: 1) tc -s qdisc show dev <interface> root 2) tc -s class show dev <interface> Test Result (Before): class mq :1 root Sent 1289 bytes 20 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq :2 root Sent 124 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq :3 root Sent 46028 bytes 86 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq :4 root Sent 2596 bytes 14 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Test Result (After): class taprio 100:1 root Sent 8491 bytes 38 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Transmit overruns: 0 class taprio 100:2 root Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Transmit overruns: 0 class taprio 100:3 root Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Transmit overruns: 0 class taprio 100:4 root Sent 994 bytes 11 pkt (dropped 0, overlimits 0 requeues 1) backlog 0b 0p requeues 1 Transmit overruns: 0 Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230714201428.1718097-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-10igc: Fix inserting of empty frame for launchtimeFlorian Kauer
The insertion of an empty frame was introduced with commit db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit") in order to ensure that the current cycle has at least one packet if there is some packet to be scheduled for the next cycle. However, the current implementation does not properly check if a packet is already scheduled for the current cycle. Currently, an empty packet is always inserted if and only if txtime >= end_of_cycle && txtime > last_tx_cycle but since last_tx_cycle is always either the end of the current cycle (end_of_cycle) or the end of a previous cycle, the second part (txtime > last_tx_cycle) is always true unless txtime == last_tx_cycle. What actually needs to be checked here is if the last_tx_cycle was already written within the current cycle, so an empty frame should only be inserted if and only if txtime >= end_of_cycle && end_of_cycle > last_tx_cycle. This patch does not only avoid an unnecessary insertion, but it can actually be harmful to insert an empty packet if packets are already scheduled in the current cycle, because it can lead to a situation where the empty packet is actually processed as the first packet in the upcoming cycle shifting the packet with the first_flag even one cycle into the future, finally leading to a TX hang. The TX hang can be reproduced on a i225 with: sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x1 \ txtime-delay 500000 \ clockid CLOCK_TAI sudo tc qdisc replace dev enp1s0 parent 100:1 etf \ clockid CLOCK_TAI \ delta 500000 \ offload \ skip_sock_check and traffic generator sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns with traffic.cfg #define ETH_P_IP 0x0800 { /* Ethernet Header */ 0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed 0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed const16(ETH_P_IP), /* IPv4 Header */ 0b01000101, 0, # IPv4 version, IHL, TOS const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header)) const16(2), # IPv4 ident 0b01000000, 0, # IPv4 flags, fragmentation off 64, # IPv4 TTL 17, # Protocol UDP csumip(14, 33), # IPv4 checksum /* UDP Header */ 10, 0, 48, 1, # IP Src - adapt as needed 10, 0, 48, 10, # IP Dest - adapt as needed const16(5555), # UDP Src Port const16(6666), # UDP Dest Port const16(1008), # UDP length (UDP header 8 bytes + payload length) csumudp(14, 34), # UDP checksum /* Payload */ fill('W', 1000), } and the observed message with that is for example igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang Tx Queue <0> TDH <32> TDT <3c> next_to_use <3c> next_to_clean <32> buffer_info[next_to_clean] time_stamp <ffff26a8> next_to_watch <00000000632a1828> jiffies <ffff27f8> desc.status <1048000> Fixes: db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-10igc: Fix launchtime before start of cycleFlorian Kauer
It is possible (verified on a running system) that frames are processed by igc_tx_launchtime with a txtime before the start of the cycle (baset_est). However, the result of txtime - baset_est is written into a u32, leading to a wrap around to a positive number. The following launchtime > 0 check will only branch to executing launchtime = 0 if launchtime is already 0. Fix it by using a s32 before checking launchtime > 0. Fixes: db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-10igc: No strict mode in pure launchtime/CBS offloadFlorian Kauer
The flags IGC_TXQCTL_STRICT_CYCLE and IGC_TXQCTL_STRICT_END prevent the packet transmission over slot and cycle boundaries. This is important for taprio offload where the slots and cycles correspond to the slots and cycles configured for the network. However, the Qbv offload feature of the i225 is also used for enabling TX launchtime / ETF offload. In that case, however, the cycle has no meaning for the network and is only used internally to adapt the base time register after a second has passed. Enabling strict mode in this case would unnecessarily prevent the transmission of certain packets (i.e. at the boundary of a second) and thus interferes with the ETF qdisc that promises transmission at a certain point in time. Similar to ETF, this also applies to CBS offload that also should not be influenced by strict mode unless taprio offload would be enabled at the same time. This fully reverts commit d8f45be01dd9 ("igc: Use strict cycles for Qbv scheduling") but its commit message only describes what was already implemented before that commit. The difference to a plain revert of that commit is that it now copes with the base_time = 0 case that was fixed with commit e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv") In particular, enabling strict mode leads to TX hang situations under high traffic if taprio is applied WITHOUT taprio offload but WITH ETF offload, e.g. as in sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x1 \ txtime-delay 500000 \ clockid CLOCK_TAI sudo tc qdisc replace dev enp1s0 parent 100:1 etf \ clockid CLOCK_TAI \ delta 500000 \ offload \ skip_sock_check and traffic generator sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns with traffic.cfg #define ETH_P_IP 0x0800 { /* Ethernet Header */ 0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed 0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed const16(ETH_P_IP), /* IPv4 Header */ 0b01000101, 0, # IPv4 version, IHL, TOS const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header)) const16(2), # IPv4 ident 0b01000000, 0, # IPv4 flags, fragmentation off 64, # IPv4 TTL 17, # Protocol UDP csumip(14, 33), # IPv4 checksum /* UDP Header */ 10, 0, 48, 1, # IP Src - adapt as needed 10, 0, 48, 10, # IP Dest - adapt as needed const16(5555), # UDP Src Port const16(6666), # UDP Dest Port const16(1008), # UDP length (UDP header 8 bytes + payload length) csumudp(14, 34), # UDP checksum /* Payload */ fill('W', 1000), } and the observed message with that is for example igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang Tx Queue <0> TDH <d0> TDT <f0> next_to_use <f0> next_to_clean <d0> buffer_info[next_to_clean] time_stamp <ffff661f> next_to_watch <00000000245a4efb> jiffies <ffff6e48> desc.status <1048000> Fixes: d8f45be01dd9 ("igc: Use strict cycles for Qbv scheduling") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-10igc: Handle already enabled taprio offload for basetime 0Florian Kauer
Since commit e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv") it is possible to enable taprio offload with a basetime of 0. However, the check if taprio offload is already enabled (and thus -EALREADY should be returned for igc_save_qbv_schedule) still relied on adapter->base_time > 0. This can be reproduced as follows: # TAPRIO offload (flags == 0x2) and base-time = 0 sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x2 # The second call should fail with "Error: Device failed to setup taprio offload." # But that only happens if base-time was != 0 sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x2 Fixes: e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-10igc: Do not enable taprio offload for invalid argumentsFlorian Kauer
Only set adapter->taprio_offload_enable after validating the arguments. Otherwise, it stays set even if the offload was not enabled. Since the subsequent code does not get executed in case of invalid arguments, it will not be read at first. However, by activating and then deactivating another offload (e.g. ETF/TX launchtime offload), taprio_offload_enable is read and erroneously keeps the offload feature of the NIC enabled. This can be reproduced as follows: # TAPRIO offload (flags == 0x2) and negative base-time leading to expected -ERANGE sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time -1000 \ sched-entry S 01 300000 \ flags 0x2 # IGC_TQAVCTRL is 0x0 as expected (iomem=relaxed for reading register) sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1 # Activate ETF offload sudo tc qdisc replace dev enp1s0 parent root handle 6666 mqprio \ num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 2@2 \ hw 0 sudo tc qdisc add dev enp1s0 parent 6666:1 etf \ clockid CLOCK_TAI \ delta 500000 \ offload # IGC_TQAVCTRL is 0x9 as expected sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1 # Deactivate ETF offload again sudo tc qdisc delete dev enp1s0 parent 6666:1 # IGC_TQAVCTRL should now be 0x0 again, but is observed as 0x9 sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1 Fixes: e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-10igc: Rename qbv_enable to taprio_offload_enableFlorian Kauer
In the current implementation the flags adapter->qbv_enable and IGC_FLAG_TSN_QBV_ENABLED have a similar name, but do not have the same meaning. The first one is used only to indicate taprio offload (i.e. when igc_save_qbv_schedule was called), while the second one corresponds to the Qbv mode of the hardware. However, the second one is also used to support the TX launchtime feature, i.e. ETF qdisc offload. This leads to situations where adapter->qbv_enable is false, but the flag IGC_FLAG_TSN_QBV_ENABLED is set. This is prone to confusion. The rename should reduce this confusion. Since it is a pure rename, it has no impact on functionality. Fixes: e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05igc: Handle PPS start time programming for past time valuesAravindhan Gunasekaran
I225/6 hardware can be programmed to start PPS output once the time in Target Time registers is reached. The time programmed in these registers should always be into future. Only then PPS output is triggered when SYSTIM register reaches the programmed value. There are two modes in i225/6 hardware to program PPS, pulse and clock mode. There were issues reported where PPS is not generated when start time is in past. Example 1, "echo 0 0 0 2 0 > /sys/class/ptp/ptp0/period" In the current implementation, a value of '0' is programmed into Target time registers and PPS output is in pulse mode. Eventually an interrupt which is triggered upon SYSTIM register reaching Target time is not fired. Thus no PPS output is generated. Example 2, "echo 0 0 0 1 0 > /sys/class/ptp/ptp0/period" Above case, a value of '0' is programmed into Target time registers and PPS output is in clock mode. Here, HW tries to catch-up the current time by incrementing Target Time register. This catch-up time seem to vary according to programmed PPS period time as per the HW design. In my experiments, the delay ranged between few tens of seconds to few minutes. The PPS output is only generated after the Target time register reaches current time. In my experiments, I also observed PPS stopped working with below test and could not recover until module is removed and loaded again. 1) echo 0 <future time> 0 1 0 > /sys/class/ptp/ptp1/period 2) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period 3) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period After this PPS did not work even if i re-program with proper values. I could only get this back working by reloading the driver. This patch takes care of calculating and programming appropriate future time value into Target Time registers. Fixes: 5e91c72e560c ("igc: Fix PPS delta between two synchronized end-points") Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05igc: Include the length/type field and VLAN tag in queueMaxSDUTan Tee Min
IEEE 802.1Q does not have clear definitions of what constitutes an SDU (Service Data Unit), but IEEE Std 802.3 clause 3.1.2 does define the MAC service primitives and clause 3.2.7 does define the MAC Client Data for Q-tagged frames. It shows that the mac_service_data_unit (MSDU) does NOT contain the preamble, destination and source address, or FCS. The MSDU does contain the length/type field, MAC client data, VLAN tag and any padding data (prior to the FCS). Thus, the maximum 802.3 frame size that is allowed to be transmitted should be QueueMaxSDU (MSDU) + 16 (6 byte SA + 6 byte DA + 4 byte FCS). Fixes: 92a0dcb8427d ("igc: offload queue max SDU from tc-taprio") Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05igc: set TP bit in 'supported' and 'advertising' fields of ↵Prasad Koya
ethtool_link_ksettings set TP bit in the 'supported' and 'advertising' fields. i225/226 parts only support twisted pair copper. Fixes: 8c5ad0dae93c ("igc: Add ethtool support") Signed-off-by: Prasad Koya <prasad@arista.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05igc: Fix TX Hang issue when QBV Gate is closedMuhammad Husaini Zulkifli
If a user schedules a Gate Control List (GCL) to close one of the QBV gates while also transmitting a packet to that closed gate, TX Hang will be happen. HW would not drop any packet when the gate is closed and keep queuing up in HW TX FIFO until the gate is re-opened. This patch implements the solution to drop the packet for the closed gate. This patch will also reset the adapter to perform SW initialization for each 1st Gate Control List (GCL) to avoid hang. This is due to the HW design, where changing to TSN transmit mode requires SW initialization. Intel Discrete I225/6 transmit mode cannot be changed when in dynamic mode according to Software User Manual Section 7.5.2.1. Subsequent Gate Control List (GCL) operations will proceed without a reset, as they already are in TSN Mode. Step to reproduce: DUT: 1) Configure GCL List with certain gate close. BASE=$(date +%s%N) tc qdisc replace dev $IFACE parent root handle 100 taprio \ num_tc 4 \ map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 \ queues 1@0 1@1 1@2 1@3 \ base-time $BASE \ sched-entry S 0x8 500000 \ sched-entry S 0x4 500000 \ flags 0x2 2) Transmit the packet to closed gate. You may use udp_tai application to transmit UDP packet to any of the closed gate. ./udp_tai -i <interface> -P 100000 -p 90 -c 1 -t <0/1> -u 30004 Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Co-developed-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Tested-by: Chwee Lin Choong <chwee.lin.choong@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05igc: Remove delay during TX ring configurationMuhammad Husaini Zulkifli
Remove unnecessary delay during the TX ring configuration. This will cause delay, especially during link down and link up activity. Furthermore, old SKUs like as I225 will call the reset_adapter to reset the controller during TSN mode Gate Control List (GCL) setting. This will add more time to the configuration of the real-time use case. It doesn't mentioned about this delay in the Software User Manual. It might have been ported from legacy code I210 in the past. Fixes: 13b5b7fd6a4a ("igc: Add support for Tx/Rx rings") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05igc: Add condition for qbv_config_change_errors counterMuhammad Husaini Zulkifli
Add condition to increase the qbv counter during taprio qbv configuration only. There might be a case when TC already been setup then user configure the ETF/CBS qdisc and this counter will increase if no condition above. Fixes: ae4fe4698300 ("igc: Add qbv_config_change_errors counter") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes to prepare for the 6.5 net-next PR. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-22igc: Work around HW bug causing missing timestampsVinicius Costa Gomes
There's an hardware issue that can cause missing timestamps. The bug is that the interrupt is only cleared if the IGC_TXSTMPH_0 register is read. The bug can cause a race condition if a timestamp is captured at the wrong time, and we will miss that timestamp. To reduce the time window that the problem is able to happen, in case no timestamp was ready, we read the "previous" value of the timestamp registers, and we compare with the "current" one, if it didn't change we can be reasonably sure that no timestamp was captured. If they are different, we use the new value as the captured timestamp. The HW bug is not easy to reproduce, got to reproduce it when smashing the NIC with timestamping requests from multiple applications (e.g. multiple ntpperf instances + ptp4l), after 10s of minutes. This workaround has more impact when multiple timestamp registers are used, and the IGC_TXSTMPH_0 register always need to be read, so the interrupt is cleared. Fixes: 2c344ae24501 ("igc: Add support for TX timestamping") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-22igc: Retrieve TX timestamp during interrupt handlingVinicius Costa Gomes
When the interrupt is handled, the TXTT_0 bit in the TSYNCTXCTL register should already be set and the timestamp value already loaded in the appropriate register. This simplifies the handling, and reduces the latency for retrieving the TX timestamp, which increase the amount of TX timestamps that can be handled in a given time period. As the "work" function doesn't run in a workqueue anymore, rename it to something more sensible, a event handler. Using ntpperf[1] we can see the following performance improvements: Before: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37 | responses | TX timestamp offset (ns) rate clients | lost invalid basic xleave | min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% -56 +9 +52 19 1500 150 0.00% 0.00% 0.00% 100.00% -40 +30 +75 22 2250 225 0.00% 0.00% 0.00% 100.00% -11 +29 +72 15 3375 337 0.00% 0.00% 0.00% 100.00% -18 +40 +88 22 5062 506 0.00% 0.00% 0.00% 100.00% -19 +23 +77 15 7593 759 0.00% 0.00% 0.00% 100.00% +7 +47 +5168 43 11389 1138 0.00% 0.00% 0.00% 100.00% -11 +41 +5240 39 17083 1708 0.00% 0.00% 0.00% 100.00% +19 +60 +5288 50 25624 2562 0.00% 0.00% 0.00% 100.00% +1 +56 +5368 58 38436 3843 0.00% 0.00% 0.00% 100.00% -84 +12 +8847 66 57654 5765 0.00% 0.00% 100.00% 0.00% 86481 8648 0.00% 0.00% 100.00% 0.00% 129721 12972 0.00% 0.00% 100.00% 0.00% 194581 16384 0.00% 0.00% 100.00% 0.00% 291871 16384 27.35% 0.00% 72.65% 0.00% 437806 16384 50.05% 0.00% 49.95% 0.00% After: $ sudo ./ntpperf -i enp3s0 -m 10:22:22:22:22:21 -d 192.168.1.3 -s 172.18.0.0/16 -I -H -o -37 | responses | TX timestamp offset (ns) rate clients | lost invalid basic xleave | min mean max stddev 1000 100 0.00% 0.00% 0.00% 100.00% -44 +0 +61 19 1500 150 0.00% 0.00% 0.00% 100.00% -6 +39 +81 16 2250 225 0.00% 0.00% 0.00% 100.00% -22 +25 +69 15 3375 337 0.00% 0.00% 0.00% 100.00% -28 +15 +56 14 5062 506 0.00% 0.00% 0.00% 100.00% +7 +78 +143 27 7593 759 0.00% 0.00% 0.00% 100.00% -54 +24 +144 47 11389 1138 0.00% 0.00% 0.00% 100.00% -90 -33 +28 21 17083 1708 0.00% 0.00% 0.00% 100.00% -50 -2 +35 14 25624 2562 0.00% 0.00% 0.00% 100.00% -62 +7 +66 23 38436 3843 0.00% 0.00% 0.00% 100.00% -33 +30 +5395 36 57654 5765 0.00% 0.00% 100.00% 0.00% 86481 8648 0.00% 0.00% 100.00% 0.00% 129721 12972 0.00% 0.00% 100.00% 0.00% 194581 16384 19.50% 0.00% 80.50% 0.00% 291871 16384 35.81% 0.00% 64.19% 0.00% 437806 16384 55.40% 0.00% 44.60% 0.00% [1] https://github.com/mlichvar/ntpperf Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-22igc: Check if hardware TX timestamping is enabled earlierVinicius Costa Gomes
Before requesting a packet transmission to be hardware timestamped, check if the user has TX timestamping enabled. Fixes an issue that if a packet was internally forwarded to the NIC, and it had the SKBTX_HW_TSTAMP flag set, the driver would mark that timestamp as skipped. In reality, that timestamp was "not for us", as TX timestamp could never be enabled in the NIC. Checking if the TX timestamping is enabled earlier has a secondary effect that when TX timestamping is disabled, there's no need to check for timestamp timeouts. We should only take care to free any pending timestamp when TX timestamping is disabled, as that skb would never be released otherwise. Fixes: 2c344ae24501 ("igc: Add support for TX timestamping") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>