summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)Author
2022-09-22Merge tag 'net-6.0-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wifi, netfilter and can. A handful of awaited fixes here - revert of the FEC changes, bluetooth fix, fixes for iwlwifi spew. We added a warning in PHY/MDIO code which is triggering on a couple of platforms in a false-positive-ish way. If we can't iron that out over the week we'll drop it and re-add for 6.1. I've added a new "follow up fixes" section for fixes to fixes in 6.0-rcs but it may actually give the false impression that those are problematic or that more testing time would have caught them. So likely a one time thing. Follow up fixes: - nf_tables_addchain: fix nft_counters_enabled underflow - ebtables: fix memory leak when blob is malformed - nf_ct_ftp: fix deadlock when nat rewrite is needed Current release - regressions: - Revert "fec: Restart PPS after link state change" and the related "net: fec: Use a spinlock to guard `fep->ptp_clk_on`" - Bluetooth: fix HCIGETDEVINFO regression - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2 - mptcp: fix fwd memory accounting on coalesce - rwlock removal fall out: - ipmr: always call ip{,6}_mr_forward() from RCU read-side critical section - ipv6: fix crash when IPv6 is administratively disabled - tcp: read multiple skbs in tcp_read_skb() - mdio_bus_phy_resume state warning fallout: - eth: ravb: fix PHY state warning splat during system resume - eth: sh_eth: fix PHY state warning splat during system resume Current release - new code bugs: - wifi: iwlwifi: don't spam logs with NSS>2 messages - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC Previous releases - regressions: - bonding: fix NULL deref in bond_rr_gen_slave_id - wifi: iwlwifi: mark IWLMEI as broken Previous releases - always broken: - nf_conntrack helpers: - irc: tighten matching on DCC message - sip: fix ct_sip_walk_headers - osf: fix possible bogus match in nf_osf_find() - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header - core: fix flow symmetric hash - bonding, team: unsync device addresses on ndo_stop - phy: micrel: fix shared interrupt on LAN8814" * tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) selftests: forwarding: add shebang for sch_red.sh bnxt: prevent skb UAF after handing over to PTP worker net: marvell: Fix refcounting bugs in prestera_port_sfp_bind() net: sched: fix possible refcount leak in tc_new_tfilter() net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD udp: Use WARN_ON_ONCE() in udp_read_skb() selftests: bonding: cause oops in bond_rr_gen_slave_id bonding: fix NULL deref in bond_rr_gen_slave_id net: phy: micrel: fix shared interrupt on LAN8814 net/smc: Stop the CLC flow if no link to map buffers on ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient net: atlantic: fix potential memory leak in aq_ndev_close() can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported can: gs_usb: gs_can_open(): fix race dev->can.state condition can: flexcan: flexcan_mailbox_read() fix return value for drop = true net: sh_eth: Fix PHY state warning splat during system resume net: ravb: Fix PHY state warning splat during system resume netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed netfilter: ebtables: fix memory leak when blob is malformed netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain() ...
2022-09-22bnxt: prevent skb UAF after handing over to PTP workerJakub Kicinski
When reading the timestamp is required bnxt_tx_int() hands over the ownership of the completed skb to the PTP worker. The skb should not be used afterwards, as the worker may run before the rest of our code and free the skb, leading to a use-after-free. Since dev_kfree_skb_any() accepts NULL make the loss of ownership more obvious and set skb to NULL. Fixes: 83bb623c968e ("bnxt_en: Transmit and retrieve packet timestamps") Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20220921201005.335390-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22net: marvell: Fix refcounting bugs in prestera_port_sfp_bind()Liang He
In prestera_port_sfp_bind(), there are two refcounting bugs: (1) we should call of_node_get() before of_find_node_by_name() as it will automaitcally decrease the refcount of 'from' argument; (2) we should call of_node_put() for the break of the iteration for_each_child_of_node() as it will automatically increase and decrease the 'child'. Fixes: 52323ef75414 ("net: marvell: prestera: add phylink support") Signed-off-by: Liang He <windhl@126.com> Reviewed-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Link: https://lore.kernel.org/r/20220921133245.4111672-1-windhl@126.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLDSean Anderson
There is a separate receive path for small packets (under 256 bytes). Instead of allocating a new dma-capable skb to be used for the next packet, this path allocates a skb and copies the data into it (reusing the existing sbk for the next packet). There are two bytes of junk data at the beginning of every packet. I believe these are inserted in order to allow aligned DMA and IP headers. We skip over them using skb_reserve. Before copying over the data, we must use a barrier to ensure we see the whole packet. The current code only synchronizes len bytes, starting from the beginning of the packet, including the junk bytes. However, this leaves off the final two bytes in the packet. Synchronize the whole packet. To reproduce this problem, ping a HME with a payload size between 17 and 214 $ ping -s 17 <hme_address> which will complain rather loudly about the data mismatch. Small packets (below 60 bytes on the wire) do not have this issue. I suspect this is related to the padding added to increase the minimum packet size. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Sean Anderson <seanga2@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220920235018.1675956-1-seanga2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-20 (ice) Michal re-sets TC configuration when changing number of queues. Mateusz moves the check and call for link-down-on-close to the specific path for downing/closing the interface. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix interface being down after reset with link-down-on-close flag on ice: config netdev tc before setting queues number ==================== Link: https://lore.kernel.org/r/20220920205344.1860934-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficientLarysa Zaremba
The original patch added the static branch to handle the situation, when assigning an XDP TX queue to every CPU is not possible, so they have to be shared. However, in the XDP transmit handler ice_xdp_xmit(), an error was returned in such cases even before static condition was checked, thus making queue sharing still impossible. Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/r/20220919134346.25030-1-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-19 (iavf, i40e) Norbert adds checking of buffer size for Rx buffer checks in iavf. Michal corrects setting of max MTU in iavf to account for MTU data provided by PF, fixes i40e to set VF max MTU, and resolves lack of rate limiting when value was less than divisor for i40e. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: Fix set max_tx_rate when it is lower than 1 Mbps i40e: Fix VF set max MTU size iavf: Fix set max MTU size with port VLAN and jumbo frames iavf: Fix bad page state ==================== Link: https://lore.kernel.org/r/20220919223428.572091-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21net: atlantic: fix potential memory leak in aq_ndev_close()Jianglei Nie
If aq_nic_stop() fails, aq_ndev_close() returns err without calling aq_nic_deinit() to release the relevant memory and resource, which will lead to a memory leak. We can fix it by deleting the if condition judgment and goto statement to call aq_nic_deinit() directly after aq_nic_stop() to fix the memory leak. Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-20net: sh_eth: Fix PHY state warning splat during system resumeGeert Uytterhoeven
Since commit 744d23c71af39c7d ("net: phy: Warn about incorrect mdio_bus_phy_resume() state"), a warning splat is printed during system resume with Wake-on-LAN disabled: WARNING: CPU: 0 PID: 626 at drivers/net/phy/phy_device.c:323 mdio_bus_phy_resume+0xbc/0xe4 As the Renesas SuperH Ethernet driver already calls phy_{stop,start}() in its suspend/resume callbacks, it is sufficient to just mark the MAC responsible for managing the power state of the PHY. Fixes: fba863b816049b03 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/c6e1331b9bef61225fa4c09db3ba3e2e7214ba2d.1663598886.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20net: ravb: Fix PHY state warning splat during system resumeGeert Uytterhoeven
Since commit 744d23c71af39c7d ("net: phy: Warn about incorrect mdio_bus_phy_resume() state"), a warning splat is printed during system resume with Wake-on-LAN disabled: WARNING: CPU: 0 PID: 1197 at drivers/net/phy/phy_device.c:323 mdio_bus_phy_resume+0xbc/0xc8 As the Renesas Ethernet AVB driver already calls phy_{stop,start}() in its suspend/resume callbacks, it is sufficient to just mark the MAC responsible for managing the power state of the PHY. Fixes: fba863b816049b03 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/8ec796f47620980fdd0403e21bd8b7200b4fa1d4.1663598796.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20ice: Fix interface being down after reset with link-down-on-close flag onMateusz Palczewski
When performing a reset on ice driver with link-down-on-close flag on interface would always stay down. Fix this by moving a check of this flag to ice_stop() that is called only when user wants to bring interface down. Fixes: ab4ab73fc1ec ("ice: Add ethtool private flag to make forcing link down optional") Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Petr Oros <poros@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-20ice: config netdev tc before setting queues numberMichal Swiatkowski
After lowering number of tx queues the warning appears: "Number of in use tx queues changed invalidating tc mappings. Priority traffic classification disabled!" Example command to reproduce: ethtool -L enp24s0f0 tx 36 rx 36 Fix this by setting correct tc mapping before setting real number of queues on netdev. Fixes: 0754d65bd4be5 ("ice: Add infrastructure for mqprio support via ndo_setup_tc") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-20net: enetc: deny offload of tc-based TSN features on VF interfacesVladimir Oltean
TSN features on the ENETC (taprio, cbs, gate, police) are configured through a mix of command BD ring messages and port registers: enetc_port_rd(), enetc_port_wr(). Port registers are a region of the ENETC memory map which are only accessible from the PCIe Physical Function. They are not accessible from the Virtual Functions. Moreover, attempting to access these registers crashes the kernel: $ echo 1 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs pci 0000:00:01.0: [1957:ef00] type 00 class 0x020001 fsl_enetc_vf 0000:00:01.0: Adding to iommu group 15 fsl_enetc_vf 0000:00:01.0: enabling device (0000 -> 0002) fsl_enetc_vf 0000:00:01.0 eno0vf0: renamed from eth0 $ tc qdisc replace dev eno0vf0 root taprio num_tc 8 map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \ sched-entry S 0x7f 900000 sched-entry S 0x80 100000 flags 0x2 Unable to handle kernel paging request at virtual address ffff800009551a08 Internal error: Oops: 96000007 [#1] PREEMPT SMP pc : enetc_setup_tc_taprio+0x170/0x47c lr : enetc_setup_tc_taprio+0x16c/0x47c Call trace: enetc_setup_tc_taprio+0x170/0x47c enetc_setup_tc+0x38/0x2dc taprio_change+0x43c/0x970 taprio_init+0x188/0x1e0 qdisc_create+0x114/0x470 tc_modify_qdisc+0x1fc/0x6c0 rtnetlink_rcv_msg+0x12c/0x390 Split enetc_setup_tc() into separate functions for the PF and for the VF drivers. Also remove enetc_qos.o from being included into enetc-vf.ko, since it serves absolutely no purpose there. Fixes: 34c6adf1977b ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220916133209.3351399-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20net: enetc: move enetc_set_psfp() out of the common enetc_set_features()Vladimir Oltean
The VF netdev driver shouldn't respond to changes in the NETIF_F_HW_TC flag; only PFs should. Moreover, TSN-specific code should go to enetc_qos.c, which should not be included in the VF driver. Fixes: 79e499829f3f ("net: enetc: add hw tc hw offload features for PSPF capability") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220916133209.3351399-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20sfc/siena: fix null pointer dereference in efx_hard_start_xmitÍñigo Huguet
Like in previous patch for sfc, prevent potential (but unlikely) NULL pointer dereference. Fixes: 12804793b17c ("sfc: decouple TXQ type from label") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Link: https://lore.kernel.org/r/20220915141958.16458-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20sfc/siena: fix TX channel offset when using legacy interruptsÍñigo Huguet
As in previous commit for sfc, fix TX channels offset when efx_siena_separate_tx_channels is false (the default) Fixes: 25bde571b4a8 ("sfc/siena: fix wrong tx channel offset with efx_separate_tx_channels") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Link: https://lore.kernel.org/r/20220915141653.15504-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20Revert "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"Francesco Dolcini
This reverts commit b353b241f1eb9b6265358ffbe2632fdcb563354f, this is creating multiple issues, just not ready to be merged yet. Link: https://lore.kernel.org/all/CAHk-=wj1obPoTu1AHj9Bd_BGYjdjDyPP+vT5WMj8eheb3A9WHw@mail.gmail.com/ Link: https://lore.kernel.org/all/20220907143915.5w65kainpykfobte@pengutronix.de/ Fixes: b353b241f1eb ("net: fec: Use a spinlock to guard `fep->ptp_clk_on`") Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20Revert "fec: Restart PPS after link state change"Francesco Dolcini
This reverts commit f79959220fa5fbda939592bf91c7a9ea90419040, this is creating multiple issues, just not ready to be merged yet. Link: https://lore.kernel.org/all/20220905180542.GA3685102@roeck-us.net/ Link: https://lore.kernel.org/all/CAHk-=wj1obPoTu1AHj9Bd_BGYjdjDyPP+vT5WMj8eheb3A9WHw@mail.gmail.com/ Fixes: f79959220fa5 ("fec: Restart PPS after link state change") Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-19gve: Fix GFP flags when allocing pagesShailend Chand
Use GFP_ATOMIC when allocating pages out of the hotpath, continue to use GFP_KERNEL when allocating pages during setup. GFP_KERNEL will allow blocking which allows it to succeed more often in a low memory enviornment but in the hotpath we do not want to allow the allocation to block. Fixes: 9b8dd5e5ea48b ("gve: DQO: Add RX path") Signed-off-by: Shailend Chand <shailend@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://lore.kernel.org/r/20220913000901.959546-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19bnxt_en: fix flags to check for supported fw versionVadim Fedorenko
The warning message of unsupported FW appears every time RX timestamps are disabled on the interface. The patch fixes the flags to correct set for the check. Fixes: 66ed81dcedc6 ("bnxt_en: Enable packet timestamping for all RX packets") Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20220915234932.25497-1-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19sfc: fix null pointer dereference in efx_hard_start_xmitÍñigo Huguet
Trying to get the channel from the tx_queue variable here is wrong because we can only be here if tx_queue is NULL, so we shouldn't dereference it. As the above comment in the code says, this is very unlikely to happen, but it's wrong anyway so let's fix it. I hit this issue because of a different bug that caused tx_queue to be NULL. If that happens, this is the error message that we get here: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [...] RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc] Fixes: 12804793b17c ("sfc: decouple TXQ type from label") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220914111135.21038-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19sfc: fix TX channel offset when using legacy interruptsÍñigo Huguet
In legacy interrupt mode the tx_channel_offset was hardcoded to 1, but that's not correct if efx_sepparate_tx_channels is false. In that case, the offset is 0 because the tx queues are in the single existing channel at index 0, together with the rx queue. Without this fix, as soon as you try to send any traffic, it tries to get the tx queues from an uninitialized channel getting these errors: WARNING: CPU: 1 PID: 0 at drivers/net/ethernet/sfc/tx.c:540 efx_hard_start_xmit+0x12e/0x170 [sfc] [...] RIP: 0010:efx_hard_start_xmit+0x12e/0x170 [sfc] [...] Call Trace: <IRQ> dev_hard_start_xmit+0xd7/0x230 sch_direct_xmit+0x9f/0x360 __dev_queue_xmit+0x890/0xa40 [...] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [...] RIP: 0010:efx_hard_start_xmit+0x153/0x170 [sfc] [...] Call Trace: <IRQ> dev_hard_start_xmit+0xd7/0x230 sch_direct_xmit+0x9f/0x360 __dev_queue_xmit+0x890/0xa40 [...] Fixes: c308dfd1b43e ("sfc: fix wrong tx channel offset with efx_separate_tx_channels") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220914103648.16902-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19net: ethernet: mtk_eth_soc: enable XDP support just for MT7986 SoCLorenzo Bianconi
Disable page_pool/XDP support for MT7621 SoC in order fix a regression introduce adding XDP for MT7986 SoC. There is no a real use case for XDP on MT7621 since it is a low-end cpu. Moreover this patch reduces the memory footprint. Tested-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Fixes: 23233e577ef9 ("net: ethernet: mtk_eth_soc: rely on page_pool for single page buffers") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/2bf31e27b888c43228b0d84dd2ef5033338269e2.1663074002.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19net: mana: Add rmb after checking owner bitsHaiyang Zhang
Per GDMA spec, rmb is necessary after checking owner_bits, before reading EQ or CQ entries. Add rmb in these two places to comply with the specs. Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reported-by: Sinan Kaya <Sinan.Kaya@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1662928805-15861-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19i40e: Fix set max_tx_rate when it is lower than 1 MbpsMichal Jaron
While converting max_tx_rate from bytes to Mbps, this value was set to 0, if the original value was lower than 125000 bytes (1 Mbps). This would cause no transmission rate limiting to occur. This happened due to lack of check of max_tx_rate against the 1 Mbps value for max_tx_rate and the following division by 125000. Fix this issue by adding a helper i40e_bw_bytes_to_mbits() which sets max_tx_rate to minimum usable value of 50 Mbps, if its value is less than 1 Mbps, otherwise do the required conversion by dividing by 125000. Fixes: 5ecae4120a6b ("i40e: Refactor VF BW rate limiting") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-19i40e: Fix VF set max MTU sizeMichal Jaron
Max MTU sent to VF is set to 0 during memory allocation. It cause that max MTU on VF is changed to IAVF_MAX_RXBUFFER and does not depend on data from HW. Set max_mtu field in virtchnl_vf_resource struct to inform VF in GET_VF_RESOURCES msg what size should be max frame. Fixes: dab86afdbbd1 ("i40e/i40evf: Change the way we limit the maximum frame size for Rx") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-19iavf: Fix set max MTU size with port VLAN and jumbo framesMichal Jaron
After setting port VLAN and MTU to 9000 on VF with ice driver there was an iavf error "PF returned error -5 (IAVF_ERR_PARAM) to our request 6". During queue configuration, VF's max packet size was set to IAVF_MAX_RXBUFFER but on ice max frame size was smaller by VLAN_HLEN due to making some space for port VLAN as VF is not aware whether it's in a port VLAN. This mismatch in sizes caused ice to reject queue configuration with ERR_PARAM error. Proper max_mtu is sent from ice PF to VF with GET_VF_RESOURCES msg but VF does not look at this. In iavf change max_frame from IAVF_MAX_RXBUFFER to max_mtu received from pf with GET_VF_RESOURCES msg to make vf's max_frame_size dependent from pf. Add check if received max_mtu is not in eligible range then set it to IAVF_MAX_RXBUFFER. Fixes: dab86afdbbd1 ("i40e/i40evf: Change the way we limit the maximum frame size for Rx") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-19mlxbf_gige: clear MDIO gateway lock after readDavid Thompson
The MDIO gateway (GW) lock in BlueField-2 GIGE logic is set after read. This patch adds logic to make sure the lock is always cleared at the end of each MDIO transaction. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20220902164247.19862-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19iavf: Fix bad page stateNorbert Zulinski
Fix bad page state, free inappropriate page in handling dummy descriptor. iavf_build_skb now has to check not only if rx_buffer is NULL but also if size is zero, same thing in iavf_clean_rx_irq. Without this patch driver would free page that will be used by napi_build_skb. Fixes: a9f49e006030 ("iavf: Fix handling of dummy receive descriptors") Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-16Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-08 (ice, iavf) This series contains updates to ice and iavf drivers. Dave removes extra unplug of auxiliary bus on reset which caused a scheduling while atomic to be reported for ice. Ding Hui defers setting of queues for TCs to ensure valid configuration and restores old config if invalid for ice. Sylwester fixes a check of setting MAC address to occur after result is received from PF for iavf driver. Brett changes check of ring tail to use software cached value as not all devices have access to register tail for iavf driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16net: marvell: prestera: add support for for Aldrin2Oleksandr Mazur
Aldrin2 (98DX8525) is a Marvell Prestera PP, with 100G support. Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> V2: - retarget to net tree instead of net-next; - fix missed colon in patch subject ('net marvell' vs 'net: mavell'); Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-12Merge tag 'hyperv-fixes-signed-20220912' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix an error handling issue in DRM driver (Christophe JAILLET) - Fix some issues in framebuffer driver (Vitaly Kuznetsov) - Two typo fixes (Jason Wang, Shaomin Deng) - Drop unnecessary casting in kvp tool (Zhou Jie) * tag 'hyperv-fixes-signed-20220912' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: Never allocate anything besides framebuffer from framebuffer memory region Drivers: hv: Always reserve framebuffer region for Gen1 VMs PCI: Move PCI_VENDOR_ID_MICROSOFT/PCI_DEVICE_ID_HYPERV_VIDEO definitions to pci_ids.h tools: hv: kvp: remove unnecessary (void*) conversions Drivers: hv: remove duplicate word in a comment tools: hv: Remove an extraneous "the" drm/hyperv: Fix an error handling path in hyperv_vmbus_probe()
2022-09-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Many bug fixes in several drivers: - Fix misuse of the DMA API in rtrs - Several irdma issues: hung task due to SQ flushing, incorrect capability reporting to userspace, improper error handling for MW corners, touching an uninitialized SGL for during invalidation. - hns was using the wrong page size limits for the HW, an incorrect calculation of wqe_shift causing WQE corruption, and mis computed a timer id. - Fix a crash in SRP triggered by blktests - Fix compiler errors by calling virt_to_page() with the proper type in siw - Userspace triggerable deadlock in ODP - mlx5 could use the wrong profile due to some driver loading races, counters were not working in some device configurations, and a crash on error unwind" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/irdma: Report RNR NAK generation in device caps RDMA/irdma: Use s/g array in post send only when its valid RDMA/irdma: Return correct WC error for bind operation failure RDMA/irdma: Return error on MR deregister CQP failure RDMA/irdma: Report the correct max cqes from query device MAINTAINERS: Update maintainers of HiSilicon RoCE RDMA/mlx5: Fix UMR cleanup on error flow of driver init RDMA/mlx5: Set local port to one when accessing counters RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile IB/core: Fix a nested dead lock as part of ODP flow RDMA/siw: Pass a pointer to virt_to_page() RDMA/srp: Set scmnd->result only when scmnd is not NULL RDMA/hns: Remove the num_qpc_timer variable RDMA/hns: Fix wrong fixed value of qp->rq.wqe_shift RDMA/hns: Fix supported page size RDMA/cma: Fix arguments order in net device validation RDMA/irdma: Fix drain SQ hang with no completion RDMA/rtrs-srv: Pass the correct number of entries for dma mapped SGL RDMA/rtrs-clt: Use the right sg_cnt after ib_dma_map_sg
2022-09-08iavf: Fix cached head and tail value for iavf_get_tx_pendingBrett Creeley
The underlying hardware may or may not allow reading of the head or tail registers and it really makes no difference if we use the software cached values. So, always used the software cached values. Fixes: 9c6c12595b73 ("i40e: Detection and recovery of TX queue hung logic moved to service_task from tx_timeout") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Co-developed-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-08iavf: Fix change VF's mac addressSylwester Dziedziuch
Previously changing mac address gives false negative because ip link set <interface> address <MAC> return with RTNLINK: Permission denied. In iavf_set_mac was check if PF handled our mac set request, even before filter was added to list. Because this check returns always true and it never waits for PF's response. Move iavf_is_mac_handled to wait_event_interruptible_timeout instead of false. Now it will wait for PF's response and then check if address was added or rejected. Fixes: 35a2443d0910 ("iavf: Add waiting for response from PF in set mac") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Co-developed-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-08ice: Fix crash by keep old cfg when update TCs more than queuesDing Hui
There are problems if allocated queues less than Traffic Classes. Commit a632b2a4c920 ("ice: ethtool: Prohibit improper channel config for DCB") already disallow setting less queues than TCs. Another case is if we first set less queues, and later update more TCs config due to LLDP, ice_vsi_cfg_tc() will failed but left dirty num_txq/rxq and tc_cfg in vsi, that will cause invalid pointer access. [ 95.968089] ice 0000:3b:00.1: More TCs defined than queues/rings allocated. [ 95.968092] ice 0000:3b:00.1: Trying to use more Rx queues (8), than were allocated (1)! [ 95.968093] ice 0000:3b:00.1: Failed to config TC for VSI index: 0 [ 95.969621] general protection fault: 0000 [#1] SMP NOPTI [ 95.969705] CPU: 1 PID: 58405 Comm: lldpad Kdump: loaded Tainted: G U W O --------- -t - 4.18.0 #1 [ 95.969867] Hardware name: O.E.M/BC11SPSCB10, BIOS 8.23 12/30/2021 [ 95.969992] RIP: 0010:devm_kmalloc+0xa/0x60 [ 95.970052] Code: 5c ff ff ff 31 c0 5b 5d 41 5c c3 b8 f4 ff ff ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 89 d1 <8b> 97 60 02 00 00 48 8d 7e 18 48 39 f7 72 3f 55 89 ce 53 48 8b 4c [ 95.970344] RSP: 0018:ffffc9003f553888 EFLAGS: 00010206 [ 95.970425] RAX: dead000000000200 RBX: ffffea003c425b00 RCX: 00000000006080c0 [ 95.970536] RDX: 00000000006080c0 RSI: 0000000000000200 RDI: dead000000000200 [ 95.970648] RBP: dead000000000200 R08: 00000000000463c0 R09: ffff888ffa900000 [ 95.970760] R10: 0000000000000000 R11: 0000000000000002 R12: ffff888ff6b40100 [ 95.970870] R13: ffff888ff6a55018 R14: 0000000000000000 R15: ffff888ff6a55460 [ 95.970981] FS: 00007f51b7d24700(0000) GS:ffff88903ee80000(0000) knlGS:0000000000000000 [ 95.971108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 95.971197] CR2: 00007fac5410d710 CR3: 0000000f2c1de002 CR4: 00000000007606e0 [ 95.971309] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 95.971419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 95.971530] PKRU: 55555554 [ 95.971573] Call Trace: [ 95.971622] ice_setup_rx_ring+0x39/0x110 [ice] [ 95.971695] ice_vsi_setup_rx_rings+0x54/0x90 [ice] [ 95.971774] ice_vsi_open+0x25/0x120 [ice] [ 95.971843] ice_open_internal+0xb8/0x1f0 [ice] [ 95.971919] ice_ena_vsi+0x4f/0xd0 [ice] [ 95.971987] ice_dcb_ena_dis_vsi.constprop.5+0x29/0x90 [ice] [ 95.972082] ice_pf_dcb_cfg+0x29a/0x380 [ice] [ 95.972154] ice_dcbnl_setets+0x174/0x1b0 [ice] [ 95.972220] dcbnl_ieee_set+0x89/0x230 [ 95.972279] ? dcbnl_ieee_del+0x150/0x150 [ 95.972341] dcb_doit+0x124/0x1b0 [ 95.972392] rtnetlink_rcv_msg+0x243/0x2f0 [ 95.972457] ? dcb_doit+0x14d/0x1b0 [ 95.972510] ? __kmalloc_node_track_caller+0x1d3/0x280 [ 95.972591] ? rtnl_calcit.isra.31+0x100/0x100 [ 95.972661] netlink_rcv_skb+0xcf/0xf0 [ 95.972720] netlink_unicast+0x16d/0x220 [ 95.972781] netlink_sendmsg+0x2ba/0x3a0 [ 95.975891] sock_sendmsg+0x4c/0x50 [ 95.979032] ___sys_sendmsg+0x2e4/0x300 [ 95.982147] ? kmem_cache_alloc+0x13e/0x190 [ 95.985242] ? __wake_up_common_lock+0x79/0x90 [ 95.988338] ? __check_object_size+0xac/0x1b0 [ 95.991440] ? _copy_to_user+0x22/0x30 [ 95.994539] ? move_addr_to_user+0xbb/0xd0 [ 95.997619] ? __sys_sendmsg+0x53/0x80 [ 96.000664] __sys_sendmsg+0x53/0x80 [ 96.003747] do_syscall_64+0x5b/0x1d0 [ 96.006862] entry_SYSCALL_64_after_hwframe+0x65/0xca Only update num_txq/rxq when passed check, and restore tc_cfg if setup queue map failed. Fixes: a632b2a4c920 ("ice: ethtool: Prohibit improper channel config for DCB") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Reviewed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-08ice: Don't double unplug aux on peer initiated resetDave Ertman
In the IDC callback that is accessed when the aux drivers request a reset, the function to unplug the aux devices is called. This function is also called in the ice_prepare_for_reset function. This double call is causing a "scheduling while atomic" BUG. [ 662.676430] ice 0000:4c:00.0 rocep76s0: cqp opcode = 0x1 maj_err_code = 0xffff min_err_code = 0x8003 [ 662.676609] ice 0000:4c:00.0 rocep76s0: [Modify QP Cmd Error][op_code=8] status=-29 waiting=1 completion_err=1 maj=0xffff min=0x8003 [ 662.815006] ice 0000:4c:00.0 rocep76s0: ICE OICR event notification: oicr = 0x10000003 [ 662.815014] ice 0000:4c:00.0 rocep76s0: critical PE Error, GLPE_CRITERR=0x00011424 [ 662.815017] ice 0000:4c:00.0 rocep76s0: Requesting a reset [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [ 662.815477] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill 8021q garp mrp stp llc vfat fat rpcrdma intel_rapl_msr intel_rapl_common sunrpc i10nm_edac rdma_ucm nfit ib_srpt libnvdimm ib_isert iscsi_target_mod x86_pkg_temp_thermal intel_powerclamp coretemp target_core_mod snd_hda_intel ib_iser snd_intel_dspcfg libiscsi snd_intel_sdw_acpi scsi_transport_iscsi kvm_intel iTCO_wdt rdma_cm snd_hda_codec kvm iw_cm ipmi_ssif iTCO_vendor_support snd_hda_core irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device rapl snd_pcm snd_timer isst_if_mbox_pci pcspkr isst_if_mmio irdma intel_uncore idxd acpi_ipmi joydev isst_if_common snd mei_me idxd_bus ipmi_si soundcore i2c_i801 mei ipmi_devintf i2c_smbus i2c_ismt ipmi_msghandler acpi_power_meter acpi_pad rv(OE) ib_uverbs ib_cm ib_core xfs libcrc32c ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helpe r ttm [ 662.815546] nvme nvme_core ice drm crc32c_intel i40e t10_pi wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse [ 662.815557] Preemption disabled at: [ 662.815558] [<0000000000000000>] 0x0 [ 662.815563] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S OE 5.17.1 #2 [ 662.815566] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815568] Call Trace: [ 662.815572] <IRQ> [ 662.815574] dump_stack_lvl+0x33/0x42 [ 662.815581] __schedule_bug.cold.147+0x7d/0x8a [ 662.815588] __schedule+0x798/0x990 [ 662.815595] schedule+0x44/0xc0 [ 662.815597] schedule_preempt_disabled+0x14/0x20 [ 662.815600] __mutex_lock.isra.11+0x46c/0x490 [ 662.815603] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815633] device_del+0x37/0x3d0 [ 662.815639] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815674] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815693] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815712] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815719] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.815741] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.815756] __handle_irq_event_percpu+0x4c/0x180 [ 662.815762] handle_irq_event_percpu+0xf/0x40 [ 662.815764] handle_irq_event+0x34/0x60 [ 662.815766] handle_edge_irq+0x9a/0x1c0 [ 662.815770] __common_interrupt+0x62/0x100 [ 662.815774] common_interrupt+0xb4/0xd0 [ 662.815779] </IRQ> [ 662.815780] <TASK> [ 662.815780] asm_common_interrupt+0x1e/0x40 [ 662.815785] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.815789] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.815791] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.815793] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.815795] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.815796] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.815797] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.815798] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.815801] cpuidle_enter+0x29/0x40 [ 662.815803] do_idle+0x261/0x2b0 [ 662.815807] cpu_startup_entry+0x19/0x20 [ 662.815809] start_secondary+0x114/0x150 [ 662.815813] secondary_startup_64_no_verify+0xd5/0xdb [ 662.815818] </TASK> [ 662.815846] bad: scheduling from the idle thread! [ 662.815849] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S W OE 5.17.1 #2 [ 662.815852] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021 [ 662.815853] Call Trace: [ 662.815855] <IRQ> [ 662.815856] dump_stack_lvl+0x33/0x42 [ 662.815860] dequeue_task_idle+0x20/0x30 [ 662.815863] __schedule+0x1c3/0x990 [ 662.815868] schedule+0x44/0xc0 [ 662.815871] schedule_preempt_disabled+0x14/0x20 [ 662.815873] __mutex_lock.isra.11+0x3a8/0x490 [ 662.815876] ? __ibdev_printk+0x76/0xc0 [ib_core] [ 662.815904] device_del+0x37/0x3d0 [ 662.815909] ice_unplug_aux_dev+0x1a/0x40 [ice] [ 662.815937] ice_schedule_reset+0x3c/0xd0 [ice] [ 662.815961] irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [ 662.815979] ? bitmap_find_next_zero_area_off+0x45/0xa0 [ 662.815985] ice_send_event_to_aux+0x54/0x70 [ice] [ 662.816011] ice_misc_intr+0x21d/0x2d0 [ice] [ 662.816033] __handle_irq_event_percpu+0x4c/0x180 [ 662.816037] handle_irq_event_percpu+0xf/0x40 [ 662.816039] handle_irq_event+0x34/0x60 [ 662.816042] handle_edge_irq+0x9a/0x1c0 [ 662.816045] __common_interrupt+0x62/0x100 [ 662.816048] common_interrupt+0xb4/0xd0 [ 662.816052] </IRQ> [ 662.816053] <TASK> [ 662.816054] asm_common_interrupt+0x1e/0x40 [ 662.816057] RIP: 0010:cpuidle_enter_state+0xd6/0x380 [ 662.816060] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [ 662.816063] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [ 662.816065] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [ 662.816067] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7 [ 662.816068] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0 [ 662.816070] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08 [ 662.816071] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [ 662.816075] cpuidle_enter+0x29/0x40 [ 662.816077] do_idle+0x261/0x2b0 [ 662.816080] cpu_startup_entry+0x19/0x20 [ 662.816083] start_secondary+0x114/0x150 [ 662.816087] secondary_startup_64_no_verify+0xd5/0xdb [ 662.816091] </TASK> [ 662.816169] bad: scheduling from the idle thread! The correct place to unplug the aux devices for a reset is in the prepare_for_reset function, as this is a common place for all reset flows. It also has built in protection from being called twice in a single reset instance before the aux devices are replugged. Fixes: f9f5301e7e2d4 ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Helena Anna Dubel <helena.anna.dubel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-07net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skbLorenzo Bianconi
Even if max hash configured in hw in mtk_ppe_hash_entry is MTK_PPE_ENTRIES - 1, check theoretical OOB accesses in mtk_ppe_check_skb routine Fixes: c4f033d9e03e9 ("net: ethernet: mtk_eth_soc: rework hardware flow table management") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-07net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clearLorenzo Bianconi
Set ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear routine. Fixes: 33fc42de33278 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05PCI: Move PCI_VENDOR_ID_MICROSOFT/PCI_DEVICE_ID_HYPERV_VIDEO definitions to ↵Vitaly Kuznetsov
pci_ids.h There are already three places in kernel which define PCI_VENDOR_ID_MICROSOFT and two for PCI_DEVICE_ID_HYPERV_VIDEO and there's a need to use these from core VMBus code. Move the defines where they belong. No functional change. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220827130345.1320254-2-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-05stmmac: intel: Simplify intel_eth_pci_remove()Christophe JAILLET
There is no point to call pcim_iounmap_regions() in the remove function, this frees a managed resource that would be release by the framework anyway. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()Greg Kroah-Hartman
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. Fix this up to be much simpler logic and only create the root debugfs directory once when the driver is first accessed. That resolves the memory leak and makes things more obvious as to what the intent is. Cc: Marcin Wojtas <mw@semihalf.com> Cc: Russell King <linux@armlinux.org.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: stable <stable@kernel.org> Fixes: 21da57a23125 ("net: mvpp2: add a debugfs interface for the Header Parser") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profileMaher Sanalla
When the RDMA auxiliary driver probes, it sets its profile based on devlink driverinit value. The latter might not be in sync with FW yet (In case devlink reload is not performed), thus causing a mismatch between RDMA driver and FW. This results in the following FW syndrome when the RDMA driver tries to adjust RoCE state, which fails the probe: "0xC1F678 | modify_nic_vport_context: roce_en set on a vport that doesn't support roce" To prevent this, select the PF profile based on FW RoCE capability instead of relying on devlink driverinit value. To provide backward compatibility of the RoCE disable feature, on older FW's where roce_rw is not set (FW RoCE capability is read-only), keep the current behavior e.g., rely on devlink driverinit value. Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-03Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-02 (i40e, iavf) This series contains updates to i40e and iavf drivers. Przemyslaw adds reset to ADQ configuration to allow for setting of rate limit beyond TC0 for i40e. Ivan Vecera does not free client on failure to open which could cause NULL pointer dereference to occur on i40e. He also detaches device during reset to prevent NDO calls with could cause races for iavf. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-03Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== This series contains updates to ice driver only. Przemyslaw fixes memory leak of DMA memory due to incorrect freeing of rx_buf. Michal S corrects incorrect call to free memory. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-02net: fec: Use a spinlock to guard `fep->ptp_clk_on`Csókás Bence
Mutexes cannot be taken in a non-preemptible context, causing a panic in `fec_ptp_save_state()`. Replacing `ptp_clk_mutex` by `tmreg_lock` fixes this. Fixes: 6a4d7234ae9a ("net: fec: ptp: avoid register access when ipg clock is disabled") Fixes: f79959220fa5 ("fec: Restart PPS after link state change") Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/all/20220827160922.642zlcd5foopozru@pengutronix.de/ Signed-off-by: Csókás Bence <csokas.bence@prolan.hu> Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com> # Toradex Apalis iMX6 Link: https://lore.kernel.org/r/20220901140402.64804-1-csokas.bence@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-02net: fec: add pm_qos support on imx6q platformWei Fang
There is a very low probability that tx timeout will occur during suspend and resume stress test on imx6q platform. So we add pm_qos support to prevent system from entering low level idles which may affect the transmission of tx. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://lore.kernel.org/r/20220830070148.2021947-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-02iavf: Detach device during reset taskIvan Vecera
iavf_reset_task() takes crit_lock at the beginning and holds it during whole call. The function subsequently calls iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs when userspace initiates during the reset task any ndo callback that runs under RTNL like iavf_open() because some of that functions tries to take crit_lock. This leads to classic A-B B-A deadlock scenario. To resolve this situation the device should be detached in iavf_reset_task() prior taking crit_lock to avoid subsequent ndos running under RTNL and reattach the device at the end. Fixes: 62fe2a865e6d ("i40evf: add missing rtnl_lock() around i40evf_set_interrupt_capability") Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Patryk Piotrowski <patryk.piotrowski@intel.com> Cc: SlawomirX Laba <slawomirx.laba@intel.com> Tested-by: Vitaly Grinberg <vgrinber@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-02i40e: Fix kernel crash during module removalIvan Vecera
The driver incorrectly frees client instance and subsequent i40e module removal leads to kernel crash. Reproducer: 1. Do ethtool offline test followed immediately by another one host# ethtool -t eth0 offline; ethtool -t eth0 offline 2. Remove recursively irdma module that also removes i40e module host# modprobe -r irdma Result: [ 8675.035651] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.193774] i40e 0000:3d:00.0 eno1: testing finished [ 8675.201316] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.358921] i40e 0000:3d:00.0 eno1: testing finished [ 8675.496921] i40e 0000:3d:00.0: IRDMA hardware initialization FAILED init_state=2 status=-110 [ 8686.188955] i40e 0000:3d:00.1: i40e_ptp_stop: removed PHC on eno2 [ 8686.943890] i40e 0000:3d:00.1: Deleted LAN device PF1 bus=0x3d dev=0x00 func=0x01 [ 8686.952669] i40e 0000:3d:00.0: i40e_ptp_stop: removed PHC on eno1 [ 8687.761787] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 8687.768755] #PF: supervisor read access in kernel mode [ 8687.773895] #PF: error_code(0x0000) - not-present page [ 8687.779034] PGD 0 P4D 0 [ 8687.781575] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 8687.785935] CPU: 51 PID: 172891 Comm: rmmod Kdump: loaded Tainted: G W I 5.19.0+ #2 [ 8687.794800] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0X.02.0001.051420190324 05/14/2019 [ 8687.805222] RIP: 0010:i40e_lan_del_device+0x13/0xb0 [i40e] [ 8687.810719] Code: d4 84 c0 0f 84 b8 25 01 00 e9 9c 25 01 00 41 bc f4 ff ff ff eb 91 90 0f 1f 44 00 00 41 54 55 53 48 8b 87 58 08 00 00 48 89 fb <48> 8b 68 30 48 89 ef e8 21 8a 0f d5 48 89 ef e8 a9 78 0f d5 48 8b [ 8687.829462] RSP: 0018:ffffa604072efce0 EFLAGS: 00010202 [ 8687.834689] RAX: 0000000000000000 RBX: ffff8f43833b2000 RCX: 0000000000000000 [ 8687.841821] RDX: 0000000000000000 RSI: ffff8f4b0545b298 RDI: ffff8f43833b2000 [ 8687.848955] RBP: ffff8f43833b2000 R08: 0000000000000001 R09: 0000000000000000 [ 8687.856086] R10: 0000000000000000 R11: 000ffffffffff000 R12: ffff8f43833b2ef0 [ 8687.863218] R13: ffff8f43833b2ef0 R14: ffff915103966000 R15: ffff8f43833b2008 [ 8687.870342] FS: 00007f79501c3740(0000) GS:ffff8f4adffc0000(0000) knlGS:0000000000000000 [ 8687.878427] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8687.884174] CR2: 0000000000000030 CR3: 000000014276e004 CR4: 00000000007706e0 [ 8687.891306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8687.898441] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8687.905572] PKRU: 55555554 [ 8687.908286] Call Trace: [ 8687.910737] <TASK> [ 8687.912843] i40e_remove+0x2c0/0x330 [i40e] [ 8687.917040] pci_device_remove+0x33/0xa0 [ 8687.920962] device_release_driver_internal+0x1aa/0x230 [ 8687.926188] driver_detach+0x44/0x90 [ 8687.929770] bus_remove_driver+0x55/0xe0 [ 8687.933693] pci_unregister_driver+0x2a/0xb0 [ 8687.937967] i40e_exit_module+0xc/0xf48 [i40e] Two offline tests cause IRDMA driver failure (ETIMEDOUT) and this failure is indicated back to i40e_client_subtask() that calls i40e_client_del_instance() to free client instance referenced by pf->cinst and sets this pointer to NULL. During the module removal i40e_remove() calls i40e_lan_del_device() that dereferences pf->cinst that is NULL -> crash. Do not remove client instance when client open callbacks fails and just clear __I40E_CLIENT_INSTANCE_OPENED bit. The driver also needs to take care about this situation (when netdev is up and client is NOT opened) in i40e_notify_client_of_netdev_close() and calls client close callback only when __I40E_CLIENT_INSTANCE_OPENED is set. Fixes: 0ef2d5afb12d ("i40e: KISS the client interface") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Helena Anna Dubel <helena.anna.dubel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-02i40e: Fix ADQ rate limiting for PFPrzemyslaw Patynowski
Fix HW rate limiting for ADQ. Fallback to kernel queue selection for ADQ, as it is network stack that decides which queue to use for transmit with ADQ configured. Reset PF after creation of VMDq2 VSIs required for ADQ, as to reprogram TX queue contexts in i40e_configure_tx_ring. Without this patch PF would limit TX rate only according to TC0. Fixes: a9ce82f744dc ("i40e: Enable 'channel' mode in mqprio for TC configs") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>