summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)Author
2023-07-26Revert "wifi: ath11k: Enable threaded NAPI"Kalle Valo
This reverts commit 13aa2fb692d3717767303817f35b3e650109add3. This commit broke QCN9074 initialisation: [ 358.960477] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 36866 [ 358.960481] ath11k_pci 0000:04:00.0: failed to send WMI_STA_POWERSAVE_PARAM_CMDID [ 358.960484] ath11k_pci 0000:04:00.0: could not set uapsd params -105 As there's no fix available let's just revert it to get QCN9074 working again. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217536 Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230720151444.2016637-1-kvalo@kernel.org
2023-07-25net: ipa: only reset hashed tables when supportedAlex Elder
Last year, the code that manages GSI channel transactions switched from using spinlock-protected linked lists to using indexes into the ring buffer used for a channel. Recently, Google reported seeing transaction reference count underflows occasionally during shutdown. Doug Anderson found a way to reproduce the issue reliably, and bisected the issue to the commit that eliminated the linked lists and the lock. The root cause was ultimately determined to be related to unused transactions being committed as part of the modem shutdown cleanup activity. Unused transactions are not normally expected (except in error cases). The modem uses some ranges of IPA-resident memory, and whenever it shuts down we zero those ranges. In ipa_filter_reset_table() a transaction is allocated to zero modem filter table entries. If hashing is not supported, hashed table memory should not be zeroed. But currently nothing prevents that, and the result is an unused transaction. Something similar occurs when we zero routing table entries for the modem. By preventing any attempt to clear hashed tables when hashing is not supported, the reference count underflow is avoided in this case. Note that there likely remains an issue with properly freeing unused transactions (if they occur due to errors). This patch addresses only the underflows that Google originally reported. Cc: <stable@vger.kernel.org> # 6.1.x Fixes: d338ae28d8a8 ("net: ipa: kill all other transaction lists") Tested-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20230724224055.1688854-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-25macvlan: add forgotten nla_policy for IFLA_MACVLAN_BC_CUTOFFLin Ma
The previous commit 954d1fa1ac93 ("macvlan: Add netlink attribute for broadcast cutoff") added one additional attribute named IFLA_MACVLAN_BC_CUTOFF to allow broadcast cutfoff. However, it forgot to describe the nla_policy at macvlan_policy (drivers/net/macvlan.c). Hence, this suppose NLA_S32 (4 bytes) integer can be faked as empty (0 bytes) by a malicious user, which could leads to OOB in heap just like CVE-2023-3773. To fix it, this commit just completes the nla_policy description for IFLA_MACVLAN_BC_CUTOFF. This enforces the length check and avoids the potential OOB read. Fixes: 954d1fa1ac93 ("macvlan: Add netlink attribute for broadcast cutoff") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230723080205.3715164-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-25net: stmmac: Apply redundant write work around on 4.xx tooVincent Whitchurch
commit a3a57bf07de23fe1ff779e0fdf710aa581c3ff73 ("net: stmmac: work around sporadic tx issue on link-up") worked around a problem with TX sometimes not working after a link-up by avoiding a redundant write to MAC_CTRL_REG (aka GMAC_CONFIG), since the IP appeared to have problems with handling multiple writes to that register in some cases. That commit however only added the work around to dwmac_lib.c (apart from the common code in stmmac_main.c), but my systems with version 4.21a of the IP exhibit the same problem, so add the work around to dwmac4_lib.c too. Fixes: a3a57bf07de2 ("net: stmmac: work around sporadic tx issue on link-up") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230721-stmmac-tx-workaround-v1-1-9411cbd5ee07@axis.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-25octeontx2-af: Fix hash extraction enable configurationSuman Ghosh
As of today, hash extraction support is enabled for all the silicons. Because of which we are facing initialization issues when the silicon does not support hash extraction. During creation of the hardware parsing table for IPv6 address, we need to consider if hash extraction is enabled then extract only 32 bit, otherwise 128 bit needs to be extracted. This patch fixes the issue and configures the hardware parser based on the availability of the feature. Fixes: a95ab93550d3 ("octeontx2-af: Use hashed field in MCAM key") Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230721061222.2632521-1-sumang@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-25team: reset team's flags when down link is P2P deviceHangbin Liu
When adding a point to point downlink to team device, we neglected to reset the team's flags, which were still using flags like BROADCAST and MULTICAST. Consequently, this would initiate ARP/DAD for P2P downlink interfaces, such as when adding a GRE device to team device. Fix this by remove multicast/broadcast flags and add p2p and noarp flags. After removing the none ethernet interface and adding an ethernet interface to team, we need to reset team interface flags. Unlike bonding interface, team do not need restore IFF_MASTER, IFF_SLAVE flags. Reported-by: Liang Li <liali@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2221438 Fixes: 1d76efe1577b ("team: add support for non-ethernet devices") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-25bonding: reset bond's flags when down link is P2P deviceHangbin Liu
When adding a point to point downlink to the bond, we neglected to reset the bond's flags, which were still using flags like BROADCAST and MULTICAST. Consequently, this would initiate ARP/DAD for P2P downlink interfaces, such as when adding a GRE device to the bonding. To address this issue, let's reset the bond's flags for P2P interfaces. Before fix: 7: gre0@NONE: <POINTOPOINT,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN group default qlen 1000 link/gre6 2006:70:10::1 peer 2006:70:10::2 permaddr 167f:18:f188:: 8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/gre6 2006:70:10::1 brd 2006:70:10::2 inet6 fe80::200:ff:fe00:0/64 scope link valid_lft forever preferred_lft forever After fix: 7: gre0@NONE: <POINTOPOINT,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond2 state UNKNOWN group default qlen 1000 link/gre6 2006:70:10::1 peer 2006:70:10::2 permaddr c29e:557a:e9d9:: 8: bond0: <POINTOPOINT,NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/gre6 2006:70:10::1 peer 2006:70:10::2 inet6 fe80::1/64 scope link valid_lft forever preferred_lft forever Reported-by: Liang Li <liali@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2221438 Fixes: 872254dd6b1f ("net/bonding: Enable bonding to enslave non ARPHRD_ETHER") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-24Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-07-21 (i40e, iavf) This series contains updates to i40e and iavf drivers. Wang Ming corrects an error check on i40e. Jake unlocks crit_lock on allocation failure to prevent deadlock and stops re-enabling of interrupts when it's not intended for iavf. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: check for removal state before IAVF_FLAG_PF_COMMS_FAILED iavf: fix potential deadlock on allocation failure i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir() ==================== Link: https://lore.kernel.org/r/20230721155812.1292752-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24Merge tag 'linux-can-fixes-for-6.5-20230724' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2023-07-24 The first patch is by me and adds a missing set of CAN state to CAN_STATE_STOPPED on close in the gs_usb driver. The last patch is by Eric Dumazet and fixes a lockdep issue in the CAN raw protocol. * tag 'linux-can-fixes-for-6.5-20230724' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: raw: fix lockdep issue in raw_release() can: gs_usb: gs_can_close(): add missing set of CAN state to CAN_STATE_STOPPED ==================== Link: https://lore.kernel.org/r/20230724150141.766047-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24ice: Fix memory management in ice_ethtool_fdir.cJedrzej Jagielski
Fix ethtool FDIR logic to not use memory after its release. In the ice_ethtool_fdir.c file there are 2 spots where code can refer to pointers which may be missing. In the ice_cfg_fdir_xtrct_seq() function seg may be freed but even then may be still used by memcpy(&tun_seg[1], seg, sizeof(*seg)). In the ice_add_fdir_ethtool() function struct ice_fdir_fltr *input may first fail to be added via ice_fdir_update_list_entry() but then may be deleted by ice_fdir_update_list_entry. Terminate in both cases when the returned value of the previous operation is other than 0, free memory and don't use it anymore. Reported-by: Michal Schmidt <mschmidt@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2208423 Fixes: cac2a27cd9ab ("ice: Support IPv4 Flow Director filters") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230721155854.1292805-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24net: fec: avoid tx queue timeout when XDP is enabledWei Fang
According to the implementation of XDP of FEC driver, the XDP path shares the transmit queues with the kernel network stack, so it is possible to lead to a tx timeout event when XDP uses the tx queue pretty much exclusively. And this event will cause the reset of the FEC hardware. To avoid timeout in this case, we use the txq_trans_cond_update() interface to update txq->trans_start to jiffies so that watchdog won't generate a transmit timeout warning. Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support") Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://lore.kernel.org/r/20230721083559.2857312-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24ethernet: atheros: fix return value check in atl1e_tso_csum()Yuanjun Gong
in atl1e_tso_csum, it should check the return value of pskb_trim(), and return an error code if an unexpected value is returned by pskb_trim(). Fixes: a6a5325239c2 ("atl1e: Atheros L1E Gigabit Ethernet driver") Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230720144219.39285-1-ruc_gongyuanjun@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24atheros: fix return value check in atl1_tso()Yuanjun Gong
in atl1_tso(), it should check the return value of pskb_trim(), and return an error code if an unexpected value is returned by pskb_trim(). Fixes: 401c0aabec4b ("atl1: simplify tx packet descriptor") Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Link: https://lore.kernel.org/r/20230722142511.12448-1-ruc_gongyuanjun@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-24wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC)Paul Fertser
On DBDC devices the first (internal) phy is only capable of using 2.4 GHz band, and the 5 GHz band is exposed via a separate phy object, so avoid the false advertising. Reported-by: Rani Hod <rani.hod@gmail.com> Closes: https://github.com/openwrt/openwrt/pull/12361 Fixes: 7660a1bd0c22 ("mt76: mt7615: register ext_phy if DBDC is detected") Cc: stable@vger.kernel.org Signed-off-by: Paul Fertser <fercerpav@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230605073408.8699-1-fercerpav@gmail.com
2023-07-24vxlan: fix GRO with VXLAN-GPEJiri Benc
In VXLAN-GPE, there may not be an Ethernet header following the VXLAN header. But in GRO, the vxlan driver calls eth_gro_receive unconditionally, which means the following header is incorrectly parsed as Ethernet. Introduce GPE specific GRO handling. For better performance, do not check for GPE during GRO but rather install a different set of functions at setup time. Fixes: e1e5314de08ba ("vxlan: implement GPE") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24vxlan: generalize vxlan_parse_gpe_hdr and remove unused argsJiri Benc
The vxlan_parse_gpe_hdr function extracts the next protocol value from the GPE header and marks GPE bits as parsed. In order to be used in the next patch, split the function into protocol extraction and bit marking. The bit marking is meaningful only in vxlan_rcv; move it directly there. Rename the function to vxlan_parse_gpe_proto to reflect what it now does. Remove unused arguments skb and vxflags. Move the function earlier in the file to allow it to be called from more places in the next patch. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24ethernet: atheros: fix return value check in atl1c_tso_csum()Yuanjun Gong
in atl1c_tso_csum, it should check the return value of pskb_trim(), and return an error code if an unexpected value is returned by pskb_trim(). Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24vxlan: calculate correct header length for GPEJiri Benc
VXLAN-GPE does not add an extra inner Ethernet header. Take that into account when calculating header length. This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is cached. In the collect_md mode (which is the only mode that VXLAN-GPE supports), there's no magic auto-setting of the tunnel interface MTU. It can't be, since the destination and thus the underlying interface may be different for each packet. So, the administrator is responsible for setting the correct tunnel interface MTU. Apparently, the administrators are capable enough to calculate that the maximum MTU for VXLAN-GPE is (their_lower_MTU - 36). They set the tunnel interface MTU to 1464. If you run a TCP stream over such interface, it's then segmented according to the MTU 1464, i.e. producing 1514 bytes frames. Which is okay, this still fits the lower MTU. However, skb_tunnel_check_pmtu (called from vxlan_xmit_one) uses 50 as the header size and thus incorrectly calculates the frame size to be 1528. This leads to ICMP too big message being generated (locally), PMTU of 1450 to be cached and the TCP stream to be resegmented. The fix is to use the correct actual header size, especially for skb_tunnel_check_pmtu calculation. Fixes: e1e5314de08ba ("vxlan: implement GPE") Signed-off-by: Jiri Benc <jbenc@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24net: hns3: fix wrong bw weight of disabled tc issueJijie Shao
In dwrr mode, the default bandwidth weight of disabled tc is set to 0. If the bandwidth weight is 0, the mode will change to sp. Therefore, disabled tc default bandwidth weight need changed to 1, and 0 is returned when query the bandwidth weight of disabled tc. In addition, driver need stop configure bandwidth weight if tc is disabled. Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24net: hns3: fix wrong tc bandwidth weight data issueJijie Shao
Currently, the weight saved by the driver is used as the query result, which may be different from the actual weight in the register. Therefore, the register value read from the firmware is used as the query result Fixes: 0e32038dc856 ("net: hns3: refactor dump tc of debugfs") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24net: hns3: add tm flush when setting tmHao Lan
When the tm module is configured with traffic, traffic may be abnormal. This patch fixes this problem. Before the tm module is configured, traffic processing should be stopped. After the tm module is configured, traffic processing is enabled. Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24net: hns3: fix the imp capability bit cannot exceed 32 bits issueHao Lan
Current only the first 32 bits of the capability flag bit are considered. When the matching capability flag bit is greater than 31 bits, it will get an error bit.This patch use bitmap to solve this issue. It can handle each capability bit whitout bit width limit. Fixes: da77aef9cc58 ("net: hns3: create common cmdq resource allocate/free/query APIs") Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-23net: phy: marvell10g: fix 88x3310 power upJiawen Wu
Clear MV_V2_PORT_CTRL_PWRDOWN bit to set power up for 88x3310 PHY, it sometimes does not take effect immediately. And a read of this register causes the bit not to clear. This will cause mv3310_reset() to time out, which will fail the config initialization. So add a delay before the next access. Fixes: c9cc1c815d36 ("net: phy: marvell10g: place in powersave mode at probe") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21iavf: check for removal state before IAVF_FLAG_PF_COMMS_FAILEDJacob Keller
In iavf_adminq_task(), if the function can't acquire the adapter->crit_lock, it checks if the driver is removing. If so, it simply exits without re-enabling the interrupt. This is done to ensure that the task stops processing as soon as possible once the driver is being removed. However, if the IAVF_FLAG_PF_COMMS_FAILED is set, the function checks this before attempting to acquire the lock. In this case, the function exits early and re-enables the interrupt. This will happen even if the driver is already removing. Avoid this, by moving the check to after the adapter->crit_lock is acquired. This way, if the driver is removing, we will not re-enable the interrupt. Fixes: fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-21iavf: fix potential deadlock on allocation failureJacob Keller
In iavf_adminq_task(), if kzalloc() fails to allocate the event.msg_buf, the function will exit without releasing the adapter->crit_lock. This is unlikely, but if it happens, the next access to that mutex will deadlock. Fix this by moving the unlock to the end of the function, and adding a new label to allow jumping to the unlock portion of the function exit flow. Fixes: fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-21i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir()Wang Ming
The debugfs_create_dir() function returns error pointers. It never returns NULL. Most incorrect error checks were fixed, but the one in i40e_dbg_init() was forgotten. Fix the remaining error check. Fixes: 02e9c290814c ("i40e: debugfs interface") Signed-off-by: Wang Ming <machel@vivo.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-20net: phy: prevent stale pointer dereference in phy_init()Vladimir Oltean
mdio_bus_init() and phy_driver_register() both have error paths, and if those are ever hit, ethtool will have a stale pointer to the phy_ethtool_phy_ops stub structure, which references memory from a module that failed to load (phylib). It is probably hard to force an error in this code path even manually, but the error teardown path of phy_init() should be the same as phy_exit(), which is now simply not the case. Fixes: 55d8f053ce1b ("net: phy: Register ethtool PHY operations") Link: https://lore.kernel.org/netdev/ZLaiJ4G6TaJYGJyU@shell.armlinux.org.uk/ Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230720000231.1939689-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-20can: gs_usb: gs_can_close(): add missing set of CAN state to CAN_STATE_STOPPEDMarc Kleine-Budde
After an initial link up the CAN device is in ERROR-ACTIVE mode. Due to a missing CAN_STATE_STOPPED in gs_can_close() it doesn't change to STOPPED after a link down: | ip link set dev can0 up | ip link set dev can0 down | ip --details link show can0 | 13: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 10 | link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0 | can state ERROR-ACTIVE restart-ms 1000 Add missing assignment of CAN_STATE_STOPPED in gs_can_close(). Cc: stable@vger.kernel.org Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices") Link: https://lore.kernel.org/all/20230718-gs_usb-fix-can-state-v1-1-f19738ae2c23@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-07-19net: ethernet: mtk_eth_soc: always mtk_get_ib1_pkt_typeDaniel Golle
entries and bind debugfs files would display wrong data on NETSYS_V2 and later because instead of using mtk_get_ib1_pkt_type the driver would use MTK_FOE_IB1_PACKET_TYPE which corresponds to NETSYS_V1(.x) SoCs. Use mtk_get_ib1_pkt_type so entries and bind records display correctly. Fixes: 03a3180e5c09e ("net: ethernet: mtk_eth_soc: introduce flow offloading support for mt7986") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/c0ae03d0182f4d27b874cbdf0059bc972c317f3c.1689727134.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-19Revert "r8169: disable ASPM during NAPI poll"Heiner Kallweit
This reverts commit e1ed3e4d91112027b90c7ee61479141b3f948e6a. Turned out the change causes a performance regression. Link: https://lore.kernel.org/netdev/20230713124914.GA12924@green245/T/ Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/055c6bc2-74fa-8c67-9897-3f658abb5ae7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-19r8169: revert 2ab19de62d67 ("r8169: remove ASPM restrictions now that ASPM ↵Heiner Kallweit
is disabled during NAPI poll") There have been reports that on a number of systems this change breaks network connectivity. Therefore effectively revert it. Mainly affected seem to be systems where BIOS denies ASPM access to OS. Due to later changes we can't do a direct revert. Fixes: 2ab19de62d67 ("r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/e47bac0d-e802-65e1-b311-6acb26d5cf10@freenet.de/T/ Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217596 Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/57f13ec0-b216-d5d8-363d-5b05528ec5fb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-19drivers:net: fix return value check in ocelot_fdma_receive_skbYuanjun Gong
ocelot_fdma_receive_skb should return false if an unexpected value is returned by pskb_trim. Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-19drivers: net: fix return value check in emac_tso_csum()Yuanjun Gong
in emac_tso_csum(), return an error code if an unexpected value is returned by pskb_trim(). Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-18Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-07-17 (iavf) This series contains updates to iavf driver only. Ding Hui fixes use-after-free issue by calling netif_napi_del() for all allocated q_vectors. He also resolves out-of-bounds issue by not updating to new values when timeout is encountered. Marcin and Ahmed change the way resets are handled so that the callback operating under the RTNL lock will wait for the reset to finish, the rtnl_lock sensitive functions in reset flow will schedule the netdev update for later in order to remove circular dependency with the critical lock. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: fix reset task race with iavf_remove() iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies Revert "iavf: Do not restart Tx queues after reset task failure" Revert "iavf: Detach device during reset task" iavf: Wait for reset in callbacks which trigger it iavf: use internal state to free traffic IRQs iavf: Fix out-of-bounds when setting channels on remove iavf: Fix use-after-free in free_netdev ==================== Link: https://lore.kernel.org/r/20230717175205.3217774-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18octeontx2-pf: mcs: Generate hash key using ecb(aes)Subbaraya Sundeep
Hardware generated encryption and ICV tags are found to be wrong when tested with IEEE MACSEC test vectors. This is because as per the HRM, the hash key (derived by AES-ECB block encryption of an all 0s block with the SAK) has to be programmed by the software in MCSX_RS_MCS_CPM_TX_SLAVE_SA_PLCY_MEM_4X register. Hence fix this by generating hash key in software and configuring in hardware. Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/1689574603-28093-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18igc: Prevent garbled TX queue with XDP ZEROCOPYFlorian Kauer
In normal operation, each populated queue item has next_to_watch pointing to the last TX desc of the packet, while each cleaned item has it set to 0. In particular, next_to_use that points to the next (necessarily clean) item to use has next_to_watch set to 0. When the TX queue is used both by an application using AF_XDP with ZEROCOPY as well as a second non-XDP application generating high traffic, the queue pointers can get in an invalid state where next_to_use points to an item where next_to_watch is NOT set to 0. However, the implementation assumes at several places that this is never the case, so if it does hold, bad things happen. In particular, within the loop inside of igc_clean_tx_irq(), next_to_clean can overtake next_to_use. Finally, this prevents any further transmission via this queue and it never gets unblocked or signaled. Secondly, if the queue is in this garbled state, the inner loop of igc_clean_tx_ring() will never terminate, completely hogging a CPU core. The reason is that igc_xdp_xmit_zc() reads next_to_use before acquiring the lock, and writing it back (potentially unmodified) later. If it got modified before locking, the outdated next_to_use is written pointing to an item that was already used elsewhere (and thus next_to_watch got written). Fixes: 9acf59a752d4 ("igc: Enable TX via AF_XDP zero-copy") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18Merge tag 'linux-can-fixes-for-6.5-20230717' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2023-07-17 The 1st patch is by Ziyang Xuan and fixes a possible memory leak in the receiver handling in the CAN RAW protocol. YueHaibing contributes a use after free in bcm_proc_show() of the Broad Cast Manager (BCM) CAN protocol. The next 2 patches are by me and fix a possible null pointer dereference in the RX path of the gs_usb driver with activated hardware timestamps and the candlelight firmware. The last patch is by Fedor Ross, Marek Vasut and me and targets the mcp251xfd driver. The polling timeout of __mcp251xfd_chip_set_mode() is increased to fix bus joining on busy CAN buses and very low bit rate. * tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout can: gs_usb: fix time stamp counter initialization can: gs_usb: gs_can_open(): improve error handling can: bcm: Fix UAF in bcm_proc_show() can: raw: fix receiver memory leak ==================== Link: https://lore.kernel.org/r/20230717180938.230816-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18octeontx2-pf: Dont allocate BPIDs for LBK interfacesGeetha sowjanya
Current driver enables backpressure for LBK interfaces. But these interfaces do not support this feature. Hence, this patch fixes the issue by skipping the backpressure configuration for these interfaces. Fixes: 75f36270990c ("octeontx2-pf: Support to enable/disable pause frames via ethtool"). Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Link: https://lore.kernel.org/r/20230716093741.28063-1-gakula@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-18vrf: Fix lockdep splat in output pathIdo Schimmel
Cited commit converted the neighbour code to use the standard RCU variant instead of the RCU-bh variant, but the VRF code still uses rcu_read_lock_bh() / rcu_read_unlock_bh() around the neighbour lookup code in its IPv4 and IPv6 output paths, resulting in lockdep splats [1][2]. Can be reproduced using [3]. Fix by switching to rcu_read_lock() / rcu_read_unlock(). [1] ============================= WARNING: suspicious RCU usage 6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted ----------------------------- include/net/neighbour.h:302 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by ping/183: #0: ffff888105ea1d80 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0xc6c/0x33c0 #1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output+0x2e3/0x2030 stack backtrace: CPU: 0 PID: 183 Comm: ping Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xc1/0xf0 lockdep_rcu_suspicious+0x211/0x3b0 vrf_output+0x1380/0x2030 ip_push_pending_frames+0x125/0x2a0 raw_sendmsg+0x200d/0x33c0 inet_sendmsg+0xa2/0xe0 __sys_sendto+0x2aa/0x420 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [2] ============================= WARNING: suspicious RCU usage 6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted ----------------------------- include/net/neighbour.h:302 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by ping6/182: #0: ffff888114b63000 (sk_lock-AF_INET6){+.+.}-{0:0}, at: rawv6_sendmsg+0x1602/0x3e50 #1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output6+0xe9/0x1310 stack backtrace: CPU: 0 PID: 182 Comm: ping6 Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xc1/0xf0 lockdep_rcu_suspicious+0x211/0x3b0 vrf_output6+0xd32/0x1310 ip6_local_out+0xb4/0x1a0 ip6_send_skb+0xbc/0x340 ip6_push_pending_frames+0xe5/0x110 rawv6_sendmsg+0x2e6e/0x3e50 inet_sendmsg+0xa2/0xe0 __sys_sendto+0x2aa/0x420 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [3] #!/bin/bash ip link add name vrf-red up numtxqueues 2 type vrf table 10 ip link add name swp1 up master vrf-red type dummy ip address add 192.0.2.1/24 dev swp1 ip address add 2001:db8:1::1/64 dev swp1 ip neigh add 192.0.2.2 lladdr 00:11:22:33:44:55 nud perm dev swp1 ip neigh add 2001:db8:1::2 lladdr 00:11:22:33:44:55 nud perm dev swp1 ip vrf exec vrf-red ping 192.0.2.2 -c 1 &> /dev/null ip vrf exec vrf-red ping6 2001:db8:1::2 -c 1 &> /dev/null Fixes: 09eed1192cec ("neighbour: switch to standard rcu, instead of rcu_bh") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lore.kernel.org/netdev/CA+G9fYtEr-=GbcXNDYo3XOkwR+uYgehVoDjsP0pFLUpZ_AZcyg@mail.gmail.com/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230715153605.4068066-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-18Merge branch '100GbE' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-07-14 (ice) This series contains updates to ice driver only. Petr Oros removes multiple calls made to unregister netdev and devlink_port. Michal fixes null pointer dereference that can occur during reload. ==================== Link: https://lore.kernel.org/r/20230714201041.1717834-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-17can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeoutFedor Ross
The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0 mode' or . The maximum length of a CAN frame is 736 bits (64 data bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe spacing). For low bit rates like 10 kbit/s the arbitrarily chosen MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small. Otherwise during polling for the CAN controller to enter 'Normal CAN 2.0 mode' the timeout limit is exceeded and the configuration fails with: | $ ip link set dev can1 up type can bitrate 10000 | [ 731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468). | [ 731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying. | [ 731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check. | RTNETLINK answers: Connection timed out Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in EFF mode, worst case bit stuffing and interframe spacing at the current bit rate. For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS that holds the max frame length in bits, which is 736. This can be replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in a cleanup patch later. Fixes: 55e5b97f003e8 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN") Signed-off-by: Fedor Ross <fedor.ross@ifm.com> Signed-off-by: Marek Vasut <marex@denx.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20230717-mcp251xfd-fix-increase-poll-timeout-v5-1-06600f34c684@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-07-17iavf: fix reset task race with iavf_remove()Ahmed Zaki
The reset task is currently scheduled from the watchdog or adminq tasks. First, all direct calls to schedule the reset task are replaced with the iavf_schedule_reset(), which is modified to accept the flag showing the type of reset. To prevent the reset task from starting once iavf_remove() starts, we need to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now easily added to iavf_schedule_reset(). Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task. It is redundant since all callers who set the flag immediately schedules the reset task. Fixes: 3ccd54ef44eb ("iavf: Fix init state closure on remove") Fixes: 14756b2ae265 ("iavf: Fix __IAVF_RESETTING state usage") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: fix a deadlock caused by rtnl and driver's lock circular dependenciesAhmed Zaki
A driver's lock (crit_lock) is used to serialize all the driver's tasks. Lockdep, however, shows a circular dependency between rtnl and crit_lock. This happens when an ndo that already holds the rtnl requests the driver to reset, since the reset task (in some paths) tries to grab rtnl to either change real number of queues of update netdev features. [566.241851] ====================================================== [566.241893] WARNING: possible circular locking dependency detected [566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G OE [566.241984] ------------------------------------------------------ [566.242025] repro.sh/2604 is trying to acquire lock: [566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf] [566.242167] but task is already holding lock: [566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf] [566.242300] which lock already depends on the new lock. [566.242353] the existing dependency chain (in reverse order) is: [566.242401] -> #1 (rtnl_mutex){+.+.}-{3:3}: [566.242451] __mutex_lock+0xc1/0xbb0 [566.242489] iavf_init_interrupt_scheme+0x179/0x440 [iavf] [566.242560] iavf_watchdog_task+0x80b/0x1400 [iavf] [566.242627] process_one_work+0x2b3/0x560 [566.242663] worker_thread+0x4f/0x3a0 [566.242696] kthread+0xf2/0x120 [566.242730] ret_from_fork+0x29/0x50 [566.242763] -> #0 (&adapter->crit_lock){+.+.}-{3:3}: [566.242815] __lock_acquire+0x15ff/0x22b0 [566.242869] lock_acquire+0xd2/0x2c0 [566.242901] __mutex_lock+0xc1/0xbb0 [566.242934] iavf_close+0x3c/0x240 [iavf] [566.242997] __dev_close_many+0xac/0x120 [566.243036] dev_close_many+0x8b/0x140 [566.243071] unregister_netdevice_many_notify+0x165/0x7c0 [566.243116] unregister_netdevice_queue+0xd3/0x110 [566.243157] iavf_remove+0x6c1/0x730 [iavf] [566.243217] pci_device_remove+0x33/0xa0 [566.243257] device_release_driver_internal+0x1bc/0x240 [566.243299] pci_stop_bus_device+0x6c/0x90 [566.243338] pci_stop_and_remove_bus_device+0xe/0x20 [566.243380] pci_iov_remove_virtfn+0xd1/0x130 [566.243417] sriov_disable+0x34/0xe0 [566.243448] ice_free_vfs+0x2da/0x330 [ice] [566.244383] ice_sriov_configure+0x88/0xad0 [ice] [566.245353] sriov_numvfs_store+0xde/0x1d0 [566.246156] kernfs_fop_write_iter+0x15e/0x210 [566.246921] vfs_write+0x288/0x530 [566.247671] ksys_write+0x74/0xf0 [566.248408] do_syscall_64+0x58/0x80 [566.249145] entry_SYSCALL_64_after_hwframe+0x72/0xdc [566.249886] other info that might help us debug this: [566.252014] Possible unsafe locking scenario: [566.253432] CPU0 CPU1 [566.254118] ---- ---- [566.254800] lock(rtnl_mutex); [566.255514] lock(&adapter->crit_lock); [566.256233] lock(rtnl_mutex); [566.256897] lock(&adapter->crit_lock); [566.257388] *** DEADLOCK *** The deadlock can be triggered by a script that is continuously resetting the VF adapter while doing other operations requiring RTNL, e.g: while :; do ip link set $VF up ethtool --set-channels $VF combined 2 ip link set $VF down ip link set $VF up ethtool --set-channels $VF combined 4 ip link set $VF down done Any operation that triggers a reset can substitute "ethtool --set-channles" As a fix, add a new task "finish_config" that do all the work which needs rtnl lock. With the exception of iavf_remove(), all work that require rtnl should be called from this task. As for iavf_remove(), at the point where we need to call unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent finish_config work cannot restart after that since the task is guarded by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config(). Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17Revert "iavf: Do not restart Tx queues after reset task failure"Marcin Szycik
This reverts commit 08f1c147b7265245d67321585c68a27e990e0c4b. Netdev is no longer being detached during reset, so this fix can be reverted. We leave the removal of "hacky" IFF_UP flag update. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17Revert "iavf: Detach device during reset task"Marcin Szycik
This reverts commit aa626da947e9cd30c4cf727493903e1adbb2c0a0. Detaching device during reset was not fully fixing the rtnl locking issue, as there could be a situation where callback was already in progress before detaching netdev. Furthermore, detaching netdevice causes TX timeouts if traffic is running. To reproduce: ip netns exec ns1 iperf3 -c $PEER_IP -t 600 --logfile /dev/null & while :; do for i in 200 7000 400 5000 300 3000 ; do ip netns exec ns1 ip link set $VF1 mtu $i sleep 2 done sleep 10 done Currently, callbacks such as iavf_change_mtu() wait for the reset. If the reset fails to acquire the rtnl_lock, they schedule the netdev update for later while continuing the reset flow. Operations like MTU changes are performed under the rtnl_lock. Therefore, when the operation finishes, another callback that uses rtnl_lock can start. Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: Wait for reset in callbacks which trigger itMarcin Szycik
There was a fail when trying to add the interface to bonding right after changing the MTU on the interface. It was caused by bonding interface unable to open the interface due to interface being in __RESETTING state because of MTU change. Add new reset_waitqueue to indicate that reset has finished. Add waiting for reset to finish in callbacks which trigger hw reset: iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam(). We use a 5000ms timeout period because on Hyper-V based systems, this operation takes around 3000-4000ms. In normal circumstances, it doesn't take more than 500ms to complete. Add a function iavf_wait_for_reset() to reuse waiting for reset code and use it also in iavf_set_channels(), which already waits for reset. We don't use error handling in iavf_set_channels() as this could cause the device to be in incorrect state if the reset was scheduled but hit timeout or the waitng function was interrupted by a signal. Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Co-developed-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: use internal state to free traffic IRQsAhmed Zaki
If the system tries to close the netdev while iavf_reset_task() is running, __LINK_STATE_START will be cleared and netif_running() will return false in iavf_reinit_interrupt_scheme(). This will result in iavf_free_traffic_irqs() not being called and a leak as follows: [7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0' [7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0 is shown when pci_disable_msix() is later called. Fix by using the internal adapter state. The traffic IRQs will always exist if state == __IAVF_RUNNING. Fixes: 5b36e8d04b44 ("i40evf: Enable VF to request an alternate queue allocation") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: Fix out-of-bounds when setting channels on removeDing Hui
If we set channels greater during iavf_remove(), and waiting reset done would be timeout, then returned with error but changed num_active_queues directly, that will lead to OOB like the following logs. Because the num_active_queues is greater than tx/rx_rings[] allocated actually. Reproducer: [root@host ~]# cat repro.sh #!/bin/bash pf_dbsf="0000:41:00.0" vf0_dbsf="0000:41:02.0" g_pids=() function do_set_numvf() { echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) } function do_set_channel() { local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/) [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; } ifconfig $nic 192.168.18.5 netmask 255.255.255.0 ifconfig $nic up ethtool -L $nic combined 1 ethtool -L $nic combined 4 sleep $((RANDOM%3)) } function on_exit() { local pid for pid in "${g_pids[@]}"; do kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null done g_pids=() } trap "on_exit; exit" EXIT while :; do do_set_numvf ; done & g_pids+=($!) while :; do do_set_channel ; done & g_pids+=($!) wait Result: [ 3506.152887] iavf 0000:41:02.0: Removing device [ 3510.400799] ================================================================== [ 3510.400820] BUG: KASAN: slab-out-of-bounds in iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400823] Read of size 8 at addr ffff88b6f9311008 by task repro.sh/55536 [ 3510.400823] [ 3510.400830] CPU: 101 PID: 55536 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 3510.400832] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021 [ 3510.400835] Call Trace: [ 3510.400851] dump_stack+0x71/0xab [ 3510.400860] print_address_description+0x6b/0x290 [ 3510.400865] ? iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400868] kasan_report+0x14a/0x2b0 [ 3510.400873] iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400880] iavf_remove+0x2b6/0xc70 [iavf] [ 3510.400884] ? iavf_free_all_rx_resources+0x160/0x160 [iavf] [ 3510.400891] ? wait_woken+0x1d0/0x1d0 [ 3510.400895] ? notifier_call_chain+0xc1/0x130 [ 3510.400903] pci_device_remove+0xa8/0x1f0 [ 3510.400910] device_release_driver_internal+0x1c6/0x460 [ 3510.400916] pci_stop_bus_device+0x101/0x150 [ 3510.400919] pci_stop_and_remove_bus_device+0xe/0x20 [ 3510.400924] pci_iov_remove_virtfn+0x187/0x420 [ 3510.400927] ? pci_iov_add_virtfn+0xe10/0xe10 [ 3510.400929] ? pci_get_subsys+0x90/0x90 [ 3510.400932] sriov_disable+0xed/0x3e0 [ 3510.400936] ? bus_find_device+0x12d/0x1a0 [ 3510.400953] i40e_free_vfs+0x754/0x1210 [i40e] [ 3510.400966] ? i40e_reset_all_vfs+0x880/0x880 [i40e] [ 3510.400968] ? pci_get_device+0x7c/0x90 [ 3510.400970] ? pci_get_subsys+0x90/0x90 [ 3510.400982] ? pci_vfs_assigned.part.7+0x144/0x210 [ 3510.400987] ? __mutex_lock_slowpath+0x10/0x10 [ 3510.400996] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 3510.401001] sriov_numvfs_store+0x214/0x290 [ 3510.401005] ? sriov_totalvfs_show+0x30/0x30 [ 3510.401007] ? __mutex_lock_slowpath+0x10/0x10 [ 3510.401011] ? __check_object_size+0x15a/0x350 [ 3510.401018] kernfs_fop_write+0x280/0x3f0 [ 3510.401022] vfs_write+0x145/0x440 [ 3510.401025] ksys_write+0xab/0x160 [ 3510.401028] ? __ia32_sys_read+0xb0/0xb0 [ 3510.401031] ? fput_many+0x1a/0x120 [ 3510.401032] ? filp_close+0xf0/0x130 [ 3510.401038] do_syscall_64+0xa0/0x370 [ 3510.401041] ? page_fault+0x8/0x30 [ 3510.401043] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 3510.401073] RIP: 0033:0x7f3a9bb842c0 [ 3510.401079] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24 [ 3510.401080] RSP: 002b:00007ffc05f1fe18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 3510.401083] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3a9bb842c0 [ 3510.401085] RDX: 0000000000000002 RSI: 0000000002327408 RDI: 0000000000000001 [ 3510.401086] RBP: 0000000002327408 R08: 00007f3a9be53780 R09: 00007f3a9c8a4700 [ 3510.401086] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ 3510.401087] R13: 0000000000000001 R14: 00007f3a9be52620 R15: 0000000000000001 [ 3510.401090] [ 3510.401093] Allocated by task 76795: [ 3510.401098] kasan_kmalloc+0xa6/0xd0 [ 3510.401099] __kmalloc+0xfb/0x200 [ 3510.401104] iavf_init_interrupt_scheme+0x26f/0x1310 [iavf] [ 3510.401108] iavf_watchdog_task+0x1d58/0x4050 [iavf] [ 3510.401114] process_one_work+0x56a/0x11f0 [ 3510.401115] worker_thread+0x8f/0xf40 [ 3510.401117] kthread+0x2a0/0x390 [ 3510.401119] ret_from_fork+0x1f/0x40 [ 3510.401122] 0xffffffffffffffff [ 3510.401123] In timeout handling, we should keep the original num_active_queues and reset num_req_queues to 0. Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Cc: Donglin Peng <pengdonglin@sangfor.com.cn> Cc: Huang Cun <huangcun@sangfor.com.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17iavf: Fix use-after-free in free_netdevDing Hui
We do netif_napi_add() for all allocated q_vectors[], but potentially do netif_napi_del() for part of them, then kfree q_vectors and leave invalid pointers at dev->napi_list. Reproducer: [root@host ~]# cat repro.sh #!/bin/bash pf_dbsf="0000:41:00.0" vf0_dbsf="0000:41:02.0" g_pids=() function do_set_numvf() { echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) } function do_set_channel() { local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/) [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; } ifconfig $nic 192.168.18.5 netmask 255.255.255.0 ifconfig $nic up ethtool -L $nic combined 1 ethtool -L $nic combined 4 sleep $((RANDOM%3)) } function on_exit() { local pid for pid in "${g_pids[@]}"; do kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null done g_pids=() } trap "on_exit; exit" EXIT while :; do do_set_numvf ; done & g_pids+=($!) while :; do do_set_channel ; done & g_pids+=($!) wait Result: [ 4093.900222] ================================================================== [ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390 [ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699 [ 4093.900233] [ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021 [ 4093.900239] Call Trace: [ 4093.900244] dump_stack+0x71/0xab [ 4093.900249] print_address_description+0x6b/0x290 [ 4093.900251] ? free_netdev+0x308/0x390 [ 4093.900252] kasan_report+0x14a/0x2b0 [ 4093.900254] free_netdev+0x308/0x390 [ 4093.900261] iavf_remove+0x825/0xd20 [iavf] [ 4093.900265] pci_device_remove+0xa8/0x1f0 [ 4093.900268] device_release_driver_internal+0x1c6/0x460 [ 4093.900271] pci_stop_bus_device+0x101/0x150 [ 4093.900273] pci_stop_and_remove_bus_device+0xe/0x20 [ 4093.900275] pci_iov_remove_virtfn+0x187/0x420 [ 4093.900277] ? pci_iov_add_virtfn+0xe10/0xe10 [ 4093.900278] ? pci_get_subsys+0x90/0x90 [ 4093.900280] sriov_disable+0xed/0x3e0 [ 4093.900282] ? bus_find_device+0x12d/0x1a0 [ 4093.900290] i40e_free_vfs+0x754/0x1210 [i40e] [ 4093.900298] ? i40e_reset_all_vfs+0x880/0x880 [i40e] [ 4093.900299] ? pci_get_device+0x7c/0x90 [ 4093.900300] ? pci_get_subsys+0x90/0x90 [ 4093.900306] ? pci_vfs_assigned.part.7+0x144/0x210 [ 4093.900309] ? __mutex_lock_slowpath+0x10/0x10 [ 4093.900315] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 4093.900318] sriov_numvfs_store+0x214/0x290 [ 4093.900320] ? sriov_totalvfs_show+0x30/0x30 [ 4093.900321] ? __mutex_lock_slowpath+0x10/0x10 [ 4093.900323] ? __check_object_size+0x15a/0x350 [ 4093.900326] kernfs_fop_write+0x280/0x3f0 [ 4093.900329] vfs_write+0x145/0x440 [ 4093.900330] ksys_write+0xab/0x160 [ 4093.900332] ? __ia32_sys_read+0xb0/0xb0 [ 4093.900334] ? fput_many+0x1a/0x120 [ 4093.900335] ? filp_close+0xf0/0x130 [ 4093.900338] do_syscall_64+0xa0/0x370 [ 4093.900339] ? page_fault+0x8/0x30 [ 4093.900341] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 4093.900357] RIP: 0033:0x7f16ad4d22c0 [ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24 [ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0 [ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001 [ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700 [ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001 [ 4093.900367] [ 4093.900368] Allocated by task 820: [ 4093.900371] kasan_kmalloc+0xa6/0xd0 [ 4093.900373] __kmalloc+0xfb/0x200 [ 4093.900376] iavf_init_interrupt_scheme+0x63b/0x1320 [iavf] [ 4093.900380] iavf_watchdog_task+0x3d51/0x52c0 [iavf] [ 4093.900382] process_one_work+0x56a/0x11f0 [ 4093.900383] worker_thread+0x8f/0xf40 [ 4093.900384] kthread+0x2a0/0x390 [ 4093.900385] ret_from_fork+0x1f/0x40 [ 4093.900387] 0xffffffffffffffff [ 4093.900387] [ 4093.900388] Freed by task 6699: [ 4093.900390] __kasan_slab_free+0x137/0x190 [ 4093.900391] kfree+0x8b/0x1b0 [ 4093.900394] iavf_free_q_vectors+0x11d/0x1a0 [iavf] [ 4093.900397] iavf_remove+0x35a/0xd20 [iavf] [ 4093.900399] pci_device_remove+0xa8/0x1f0 [ 4093.900400] device_release_driver_internal+0x1c6/0x460 [ 4093.900401] pci_stop_bus_device+0x101/0x150 [ 4093.900402] pci_stop_and_remove_bus_device+0xe/0x20 [ 4093.900403] pci_iov_remove_virtfn+0x187/0x420 [ 4093.900404] sriov_disable+0xed/0x3e0 [ 4093.900409] i40e_free_vfs+0x754/0x1210 [i40e] [ 4093.900415] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 4093.900416] sriov_numvfs_store+0x214/0x290 [ 4093.900417] kernfs_fop_write+0x280/0x3f0 [ 4093.900418] vfs_write+0x145/0x440 [ 4093.900419] ksys_write+0xab/0x160 [ 4093.900420] do_syscall_64+0xa0/0x370 [ 4093.900421] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 4093.900422] 0xffffffffffffffff [ 4093.900422] [ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200 which belongs to the cache kmalloc-8k of size 8192 [ 4093.900425] The buggy address is located 5184 bytes inside of 8192-byte region [ffff88b4dc144200, ffff88b4dc146200) [ 4093.900425] The buggy address belongs to the page: [ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0 [ 4093.900430] flags: 0x10000000008100(slab|head) [ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80 [ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000 [ 4093.900434] page dumped because: kasan: bad access detected [ 4093.900435] [ 4093.900435] Memory state around the buggy address: [ 4093.900436] ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900437] ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900438] ^ [ 4093.900439] ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900440] ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900440] ================================================================== Although the patch #2 (of 2) can avoid the issue triggered by this repro.sh, there still are other potential risks that if num_active_queues is changed to less than allocated q_vectors[] by unexpected, the mismatched netif_napi_add/del() can also cause UAF. Since we actually call netif_napi_add() for all allocated q_vectors unconditionally in iavf_alloc_q_vectors(), so we should fix it by letting netif_napi_del() match to netif_napi_add(). Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Cc: Donglin Peng <pengdonglin@sangfor.com.cn> Cc: Huang Cun <huangcun@sangfor.com.cn> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-17can: gs_usb: fix time stamp counter initializationMarc Kleine-Budde
If the gs_usb device driver is unloaded (or unbound) before the interface is shut down, the USB stack first calls the struct usb_driver::disconnect and then the struct net_device_ops::ndo_stop callback. In gs_usb_disconnect() all pending bulk URBs are killed, i.e. no more RX'ed CAN frames are send from the USB device to the host. Later in gs_can_close() a reset control message is send to each CAN channel to remove the controller from the CAN bus. In this race window the USB device can still receive CAN frames from the bus and internally queue them to be send to the host. At least in the current version of the candlelight firmware, the queue of received CAN frames is not emptied during the reset command. After loading (or binding) the gs_usb driver, new URBs are submitted during the struct net_device_ops::ndo_open callback and the candlelight firmware starts sending its already queued CAN frames to the host. However, this scenario was not considered when implementing the hardware timestamp function. The cycle counter/time counter infrastructure is set up (gs_usb_timestamp_init()) after the USBs are submitted, resulting in a NULL pointer dereference if timecounter_cyc2time() (via the call chain: gs_usb_receive_bulk_callback() -> gs_usb_set_timestamp() -> gs_usb_skb_set_timestamp()) is called too early. Move the gs_usb_timestamp_init() function before the URBs are submitted to fix this problem. For a comprehensive solution, we need to consider gs_usb devices with more than 1 channel. The cycle counter/time counter infrastructure is setup per channel, but the RX URBs are per device. Once gs_can_open() of _a_ channel has been called, and URBs have been submitted, the gs_usb_receive_bulk_callback() can be called for _all_ available channels, even for channels that are not running, yet. As cycle counter/time counter has not set up, this will again lead to a NULL pointer dereference. Convert the cycle counter/time counter from a "per channel" to a "per device" functionality. Also set it up, before submitting any URBs to the device. Further in gs_usb_receive_bulk_callback(), don't process any URBs for not started CAN channels, only resubmit the URB. Fixes: 45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support") Closes: https://github.com/candle-usb/candleLight_fw/issues/137#issuecomment-1623532076 Cc: stable@vger.kernel.org Cc: John Whittington <git@jbrengineering.co.uk> Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-2-9017cefcd9d5@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>