diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-01-22 08:28:57 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-01-22 08:28:57 -0800 |
commit | 0ad9617c78acbc71373fb341a6f75d4012b01d69 (patch) | |
tree | 602d7c9ec86d9a4891a96a2996af6e4368a647eb /drivers/net/ethernet/intel/ice/ice_main.c | |
parent | 5f537664e705b0bf8b7e329861f20128534f6a83 (diff) | |
parent | cf33d96f50903214226b379b3f10d1f262dae018 (diff) |
Merge tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"This is slightly smaller than usual, with the most interesting work
being still around RTNL scope reduction.
Core:
- More core refactoring to reduce the RTNL lock contention, including
preparatory work for the per-network namespace RTNL lock, replacing
RTNL lock with a per device-one to protect NAPI-related net device
data and moving synchronize_net() calls outside such lock.
- Extend drop reasons usage, adding net scheduler, AF_UNIX, bridge
and more specific TCP coverage.
- Reduce network namespace tear-down time by removing per-subsystems
synchronize_net() in tipc and sched.
- Add flow label selector support for fib rules, allowing traffic
redirection based on such header field.
Netfilter:
- Do not remove netdev basechain when last device is gone, allowing
netdev basechains without devices.
- Revisit the flowtable teardown strategy, dealing better with fin,
reset and re-open events.
- Scale-up IP-vs connection dumping by avoiding linear search on each
restart.
Protocols:
- A significant XDP socket refactor, consolidating and optimizing
several helpers into the core
- Better scaling of ICMP rate-limiting, by removing false-sharing in
inet peers handling.
- Introduces netlink notifications for multicast IPv4 and IPv6
address changes.
- Add ipsec support for IP-TFS/AggFrag encapsulation, allowing
aggregation and fragmentation of the inner IP.
- Add sysctl to configure TIME-WAIT reuse delay for TCP sockets, to
avoid local port exhaustion issues when the average connection
lifetime is very short.
- Support updating keys (re-keying) for connections using kernel TLS
(for TLS 1.3 only).
- Support ipv4-mapped ipv6 address clients in smc-r v2.
- Add support for jumbo data packet transmission in RxRPC sockets,
gluing multiple data packets in a single UDP packet.
- Support RxRPC RACK-TLP to manage packet loss and retransmission in
conjunction with the congestion control algorithm.
Driver API:
- Introduce a unified and structured interface for reporting PHY
statistics, exposing consistent data across different H/W via
ethtool.
- Make timestamping selectable, allow the user to select the desired
hwtstamp provider (PHY or MAC) administratively.
- Add support for configuring a header-data-split threshold (HDS)
value via ethtool, to deal with partial or buggy H/W
implementation.
- Consolidate DSA drivers Energy Efficiency Ethernet support.
- Add EEE management to phylink, making use of the phylib
implementation.
- Add phylib support for in-band capabilities negotiation.
- Simplify how phylib-enabled mac drivers expose the supported
interfaces.
Tests and tooling:
- Make the YNL tool package-friendly to make it easier to deploy it
separately from the kernel.
- Increase TCP selftest coverage importing several packetdrill
test-cases.
- Regenerate the ethtool uapi header from the YNL spec, to ease
maintenance and future development.
- Add YNL support for decoding the link types used in net self-tests,
allowing a single build to run both net and drivers/net.
Drivers:
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- add cross E-Switch QoS support
- add SW Steering support for ConnectX-8
- implement support for HW-Managed Flow Steering, improving the
rule deletion/insertion rate
- support for multi-host LAG
- Intel (ixgbe, ice, igb):
- ice: add support for devlink health events
- ixgbe: add initial support for E610 chipset variant
- igb: add support for AF_XDP zero-copy
- Meta:
- add support for basic RSS config
- allow changing the number of channels
- add hardware monitoring support
- Broadcom (bnxt):
- implement TCP data split and HDS threshold ethtool support,
enabling Device Memory TCP.
- Marvell Octeon:
- implement egress ipsec offload support for the cn10k family
- Hisilicon (HIBMC):
- implement unicast MAC filtering
- Ethernet NICs embedded and virtual:
- Convert UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS, avoiding
contented atomic operations for drop counters
- Freescale:
- quicc: phylink conversion
- enetc: support Tx and Rx checksum offload and improve TSO
performances
- MediaTek:
- airoha: introduce support for ETS and HTB Qdisc offload
- Microchip:
- lan78XX USB: preparation work for phylink conversion
- Synopsys (stmmac):
- support DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45
- refactor EEE support to leverage the new driver API
- optimize DMA and cache access to increase raw RX performances
by 40%
- TI:
- icssg-prueth: add multicast filtering support for VLAN
interface
- netkit:
- add ability to configure head/tailroom
- VXLAN:
- accepts packets with user-defined reserved bit
- Ethernet switches:
- Microchip:
- lan969x: add RGMII support
- lan969x: improve TX and RX performance using the FDMA engine
- nVidia/Mellanox:
- move Tx header handling to PCI driver, to ease XDP support
- Ethernet PHYs:
- Texas Instruments DP83822:
- add support for GPIO2 clock output
- Realtek:
- 8169: add support for RTL8125D rev.b
- rtl822x: add hwmon support for the temperature sensor
- Microchip:
- add support for RDS PTP hardware
- consolidate periodic output signal generation
- CAN:
- several DT-bindings to DT schema conversions
- tcan4x5x:
- add HW standby support
- support nWKRQ voltage selection
- kvaser:
- allowing Bus Error Reporting runtime configuration
- WiFi:
- the on-going Multi-Link Operation (MLO) effort continues,
affecting both the stack and in drivers
- mac80211/cfg80211:
- Emergency Preparedness Communication Services (EPCS) station
mode support
- support for adding and removing station links for MLO
- add support for WiFi 7/EHT mesh over 320 MHz channels
- report Tx power info for each link
- RealTek (rtw88):
- enable USB Rx aggregation and USB 3 to improve performance
- LED support
- RealTek (rtw89):
- refactor power save to support Multi-Link Operations
- add support for RTL8922AE-VS variant
- MediaTek (mt76):
- single wiphy multiband support (preparation for MLO)
- p2p device support
- add TP-Link TXE50UH USB adapter support
- Qualcomm (ath10k):
- support for the QCA6698AQ IP core
- Qualcomm (ath12k):
- enable MLO for QCN9274
- Bluetooth:
- Allow sysfs to trigger hdev reset, to allow recovering devices
not responsive from user-space
- MediaTek: add support for MT7922, MT7925, MT7921e devices
- Realtek: add support for RTL8851BE devices
- Qualcomm: add support for WCN785x devices
- ISO: allow BIG re-sync"
* tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1386 commits)
net/rose: prevent integer overflows in rose_setsockopt()
net: phylink: fix regression when binding a PHY
net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanup
net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanup
net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path
ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL.
ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL.
ipv6: Move lifetime validation to inet6_rtm_newaddr().
ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr().
ipv6: Pass dev to inet6_addr_add().
ipv6: Convert inet6_ioctl() to per-netns RTNL.
ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup().
ipv6: Hold rtnl_net_lock() in addrconf_dad_work().
ipv6: Hold rtnl_net_lock() in addrconf_verify_work().
ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL.
ipv6: Add __in6_dev_get_rtnl_net().
net: stmmac: Drop redundant skb_mark_for_recycle() for SKB frags
net: mii: Fix the Speed display when the network cable is not connected
sysctl net: Remove macro checks for CONFIG_SYSCTL
eth: bnxt: update header sizing defaults
...
Diffstat (limited to 'drivers/net/ethernet/intel/ice/ice_main.c')
-rw-r--r-- | drivers/net/ethernet/intel/ice/ice_main.c | 170 |
1 files changed, 102 insertions, 68 deletions
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 89fa3d53d317..c3a0fb97c5ee 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -14,7 +14,7 @@ #include "ice_dcb_lib.h" #include "ice_dcb_nl.h" #include "devlink/devlink.h" -#include "devlink/devlink_port.h" +#include "devlink/port.h" #include "ice_sf_eth.h" #include "ice_hwmon.h" /* Including ice_trace.h with CREATE_TRACE_POINTS defined will generate the @@ -1567,6 +1567,9 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum ice_ctl_q q_type) case ice_aqc_opc_lldp_set_mib_change: ice_dcb_process_lldp_set_mib_change(pf, &event); break; + case ice_aqc_opc_get_health_status: + ice_process_health_status_event(pf, &event); + break; default: dev_dbg(dev, "%s Receive Queue unknown event 0x%04x ignored\n", qtype, opcode); @@ -1816,6 +1819,8 @@ static void ice_handle_mdd_event(struct ice_pf *pf) if (netif_msg_tx_err(pf)) dev_info(dev, "Malicious Driver Detection event %d on TX queue %d PF# %d VF# %d\n", event, queue, pf_num, vf_num); + ice_report_mdd_event(pf, ICE_MDD_SRC_TX_PQM, pf_num, vf_num, + event, queue); wr32(hw, GL_MDET_TX_PQM, 0xffffffff); } @@ -1829,6 +1834,8 @@ static void ice_handle_mdd_event(struct ice_pf *pf) if (netif_msg_tx_err(pf)) dev_info(dev, "Malicious Driver Detection event %d on TX queue %d PF# %d VF# %d\n", event, queue, pf_num, vf_num); + ice_report_mdd_event(pf, ICE_MDD_SRC_TX_TCLAN, pf_num, vf_num, + event, queue); wr32(hw, GL_MDET_TX_TCLAN_BY_MAC(hw), U32_MAX); } @@ -1842,6 +1849,8 @@ static void ice_handle_mdd_event(struct ice_pf *pf) if (netif_msg_rx_err(pf)) dev_info(dev, "Malicious Driver Detection event %d on RX queue %d PF# %d VF# %d\n", event, queue, pf_num, vf_num); + ice_report_mdd_event(pf, ICE_MDD_SRC_RX, pf_num, vf_num, event, + queue); wr32(hw, GL_MDET_RX, 0xffffffff); } @@ -2355,6 +2364,18 @@ static void ice_check_media_subtask(struct ice_pf *pf) } } +static void ice_service_task_recovery_mode(struct work_struct *work) +{ + struct ice_pf *pf = container_of(work, struct ice_pf, serv_task); + + set_bit(ICE_ADMINQ_EVENT_PENDING, pf->state); + ice_clean_adminq_subtask(pf); + + ice_service_task_complete(pf); + + mod_timer(&pf->serv_tmr, jiffies + msecs_to_jiffies(100)); +} + /** * ice_service_task - manage and run subtasks * @work: pointer to work_struct contained by the PF struct @@ -2364,9 +2385,11 @@ static void ice_service_task(struct work_struct *work) struct ice_pf *pf = container_of(work, struct ice_pf, serv_task); unsigned long start_time = jiffies; - /* subtasks */ + if (pf->health_reporters.tx_hang_buf.tx_ring) { + ice_report_tx_hang(pf); + pf->health_reporters.tx_hang_buf.tx_ring = NULL; + } - /* process reset requests first */ ice_reset_subtask(pf); /* bail if a reset/recovery cycle is pending or rebuild failed */ @@ -4741,55 +4764,12 @@ static void ice_decfg_netdev(struct ice_vsi *vsi) vsi->netdev = NULL; } -/** - * ice_wait_for_fw - wait for full FW readiness - * @hw: pointer to the hardware structure - * @timeout: milliseconds that can elapse before timing out - */ -static int ice_wait_for_fw(struct ice_hw *hw, u32 timeout) -{ - int fw_loading; - u32 elapsed = 0; - - while (elapsed <= timeout) { - fw_loading = rd32(hw, GL_MNG_FWSM) & GL_MNG_FWSM_FW_LOADING_M; - - /* firmware was not yet loaded, we have to wait more */ - if (fw_loading) { - elapsed += 100; - msleep(100); - continue; - } - return 0; - } - - return -ETIMEDOUT; -} - int ice_init_dev(struct ice_pf *pf) { struct device *dev = ice_pf_to_dev(pf); struct ice_hw *hw = &pf->hw; int err; - err = ice_init_hw(hw); - if (err) { - dev_err(dev, "ice_init_hw failed: %d\n", err); - return err; - } - - /* Some cards require longer initialization times - * due to necessity of loading FW from an external source. - * This can take even half a minute. - */ - if (ice_is_pf_c827(hw)) { - err = ice_wait_for_fw(hw, 30000); - if (err) { - dev_err(dev, "ice_wait_for_fw timed out"); - return err; - } - } - ice_init_feature_support(pf); err = ice_init_ddp_config(hw, pf); @@ -4810,7 +4790,7 @@ int ice_init_dev(struct ice_pf *pf) err = ice_init_pf(pf); if (err) { dev_err(dev, "ice_init_pf failed: %d\n", err); - goto err_init_pf; + return err; } pf->hw.udp_tunnel_nic.set_port = ice_udp_tunnel_set_port; @@ -4834,7 +4814,7 @@ int ice_init_dev(struct ice_pf *pf) if (err) { dev_err(dev, "ice_init_interrupt_scheme failed: %d\n", err); err = -EIO; - goto err_init_interrupt_scheme; + goto unroll_pf_init; } /* In case of MSIX we are going to setup the misc vector right here @@ -4845,17 +4825,15 @@ int ice_init_dev(struct ice_pf *pf) err = ice_req_irq_msix_misc(pf); if (err) { dev_err(dev, "setup of misc vector failed: %d\n", err); - goto err_req_irq_msix_misc; + goto unroll_irq_scheme_init; } return 0; -err_req_irq_msix_misc: +unroll_irq_scheme_init: ice_clear_interrupt_scheme(pf); -err_init_interrupt_scheme: +unroll_pf_init: ice_deinit_pf(pf); -err_init_pf: - ice_deinit_hw(hw); return err; } @@ -5087,6 +5065,7 @@ static int ice_init_devlink(struct ice_pf *pf) return err; ice_devlink_init_regions(pf); + ice_health_init(pf); ice_devlink_register(pf); return 0; @@ -5095,6 +5074,7 @@ static int ice_init_devlink(struct ice_pf *pf) static void ice_deinit_devlink(struct ice_pf *pf) { ice_devlink_unregister(pf); + ice_health_deinit(pf); ice_devlink_destroy_regions(pf); ice_devlink_unregister_params(pf); } @@ -5249,6 +5229,36 @@ void ice_unload(struct ice_pf *pf) ice_decfg_netdev(vsi); } +static int ice_probe_recovery_mode(struct ice_pf *pf) +{ + struct device *dev = ice_pf_to_dev(pf); + int err; + + dev_err(dev, "Firmware recovery mode detected. Limiting functionality. Refer to the Intel(R) Ethernet Adapters and Devices User Guide for details on firmware recovery mode\n"); + + INIT_HLIST_HEAD(&pf->aq_wait_list); + spin_lock_init(&pf->aq_wait_lock); + init_waitqueue_head(&pf->aq_wait_queue); + + timer_setup(&pf->serv_tmr, ice_service_timer, 0); + pf->serv_tmr_period = HZ; + INIT_WORK(&pf->serv_task, ice_service_task_recovery_mode); + clear_bit(ICE_SERVICE_SCHED, pf->state); + err = ice_create_all_ctrlq(&pf->hw); + if (err) + return err; + + scoped_guard(devl, priv_to_devlink(pf)) { + err = ice_init_devlink(pf); + if (err) + return err; + } + + ice_service_task_restart(pf); + + return 0; +} + /** * ice_probe - Device initialization routine * @pdev: PCI device information struct @@ -5312,13 +5322,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent) } pci_set_master(pdev); - - adapter = ice_adapter_get(pdev); - if (IS_ERR(adapter)) - return PTR_ERR(adapter); - pf->pdev = pdev; - pf->adapter = adapter; pci_set_drvdata(pdev, pf); set_bit(ICE_DOWN, pf->state); /* Disable service task until DOWN bit is cleared */ @@ -5346,29 +5350,47 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent) hw->debug_mask = debug; #endif + if (ice_is_recovery_mode(hw)) + return ice_probe_recovery_mode(pf); + + err = ice_init_hw(hw); + if (err) { + dev_err(dev, "ice_init_hw failed: %d\n", err); + return err; + } + + adapter = ice_adapter_get(pdev); + if (IS_ERR(adapter)) { + err = PTR_ERR(adapter); + goto unroll_hw_init; + } + pf->adapter = adapter; + err = ice_init(pf); if (err) - goto err_init; + goto unroll_adapter; devl_lock(priv_to_devlink(pf)); err = ice_load(pf); if (err) - goto err_load; + goto unroll_init; err = ice_init_devlink(pf); if (err) - goto err_init_devlink; + goto unroll_load; devl_unlock(priv_to_devlink(pf)); return 0; -err_init_devlink: +unroll_load: ice_unload(pf); -err_load: +unroll_init: devl_unlock(priv_to_devlink(pf)); ice_deinit(pf); -err_init: +unroll_adapter: ice_adapter_put(pdev); +unroll_hw_init: + ice_deinit_hw(hw); return err; } @@ -5448,6 +5470,14 @@ static void ice_remove(struct pci_dev *pdev) msleep(100); } + if (ice_is_recovery_mode(&pf->hw)) { + ice_service_task_stop(pf); + scoped_guard(devl, priv_to_devlink(pf)) { + ice_deinit_devlink(pf); + } + return; + } + if (test_bit(ICE_FLAG_SRIOV_ENA, pf->flags)) { set_bit(ICE_VF_RESETS_DISABLED, pf->state); ice_free_vfs(pf); @@ -7793,6 +7823,8 @@ static void ice_rebuild(struct ice_pf *pf, enum ice_reset_req reset_type) /* if we get here, reset flow is successful */ clear_bit(ICE_RESET_FAILED, pf->state); + ice_health_clear(pf); + ice_plug_aux_dev(pf); if (ice_is_feature_supported(pf, ICE_F_SRIOV_LAG)) ice_lag_rebuild(pf); @@ -8283,16 +8315,18 @@ void ice_tx_timeout(struct net_device *netdev, unsigned int txqueue) if (tx_ring) { struct ice_hw *hw = &pf->hw; - u32 head, val = 0; + u32 head, intr = 0; head = FIELD_GET(QTX_COMM_HEAD_HEAD_M, rd32(hw, QTX_COMM_HEAD(vsi->txq_map[txqueue]))); /* Read interrupt register */ - val = rd32(hw, GLINT_DYN_CTL(tx_ring->q_vector->reg_idx)); + intr = rd32(hw, GLINT_DYN_CTL(tx_ring->q_vector->reg_idx)); netdev_info(netdev, "tx_timeout: VSI_num: %d, Q %u, NTC: 0x%x, HW_HEAD: 0x%x, NTU: 0x%x, INT: 0x%x\n", vsi->vsi_num, txqueue, tx_ring->next_to_clean, - head, tx_ring->next_to_use, val); + head, tx_ring->next_to_use, intr); + + ice_prep_tx_hang_report(pf, tx_ring, vsi->vsi_num, head, intr); } pf->tx_timeout_last_recovery = jiffies; |