summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2024-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 35d92abfbad8 ("net: hns3: fix kernel crash when devlink reload during initialization") 2a1a1a7b5fd7 ("net: hns3: add command queue trace for hns3") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08Merge tag 'wireless-next-2024-05-08' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.10 The third, and most likely the last, "new features" pull request for v6.10 with changes both in stack and in drivers. In ath12k and rtw89 we disabled Wireless Extensions just like with iwlwifi earlier. Wi-Fi 7 devices will not support Wireless Extensions (WEXT) anymore so if someone is still using the legacy WEXT interface it's time to switch to nl80211 now! We merged wireless into wireless-next as we decided not to send a wireless pull request to v6.9 this late in the cycle. Also an immutable branch with MHI subsystem was merged to get ath11k and ath12k hibernation working. Major changes: mac80211/cfg80211 * handle color change per link mt76 * mt7921 LED control * mt7925 EHT radiotap support * mt7920e PCI support ath12k * debugfs support * dfs_simulate_radar debugfs file * disable Wireless Extensions * suspend and hibernation support * ACPI support * refactoring in preparation of multi-link support ath11k * support hibernation (required changes in qrtr and MHI subsystems) * ieee80211-freq-limit Device Tree property support ath10k * firmware-name Device Tree property support rtw89 * complete features of new WiFi 7 chip 8922AE including BT-coexistence and WoWLAN * use BIOS ACPI settings to set TX power and channels * disable Wireless Extensios on Wi-Fi 7 devices iwlwifi * block_esr debugfs file * support again firmware API 90 (was reverted earlier) * provide channel survey information for Automatic Channel Selection (ACS) * tag 'wireless-next-2024-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (214 commits) wifi: mwl8k: initialize cmd->addr[] properly wifi: iwlwifi: Ensure prph_mac dump includes all addresses wifi: iwlwifi: mvm: don't request statistics in restart wifi: iwlwifi: mvm: exit EMLSR if secondary link is not used wifi: iwlwifi: mvm: add beacon template version 14 wifi: iwlwifi: mvm: align UATS naming with firmware wifi: iwlwifi: Force SCU_ACTIVE for specific platforms wifi: iwlwifi: mvm: record and return channel survey information wifi: iwlwifi: mvm: add the firmware API for channel survey wifi: iwlwifi: mvm: Fix race in scan completion wifi: iwlwifi: mvm: Add a print for invalid link pair due to bandwidth wifi: iwlwifi: mvm: add a debugfs for reading EMLSR blocking reasons wifi: iwlwifi: mvm: Add active EMLSR blocking reasons prints wifi: iwlwifi: bump FW API to 90 for BZ/SC devices wifi: iwlwifi: mvm: fix primary link setting wifi: iwlwifi: mvm: use already determined cmd_id wifi: iwlwifi: mvm: don't reset link selection during restart wifi: iwlwifi: Print EMLSR states name wifi: iwlwifi: mvm: Block EMLSR when a p2p/softAP vif is active wifi: iwlwifi: mvm: fix typo in debug print ... ==================== Link: https://lore.kernel.org/r/20240508120726.85A10C113CC@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08net: dst_cache: annotate data-races around dst_cache->reset_tsEric Dumazet
dst_cache->reset_ts is read or written locklessly, add READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240507132000.614591-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08net: dsa: add support switches global DSCP priority mappingOleksij Rempel
Some switches like Microchip KSZ variants do not support per port DSCP priority configuration. Instead there is a global DSCP mapping table. To handle it, we will accept set/del request to any of user ports to make global configuration and update dcb app entries for all other ports. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: add IEEE 802.1q specific helpersOleksij Rempel
IEEE 802.1q specification provides recommendation and examples which can be used as good default values for different drivers. This patch implements mapping examples documented in IEEE 802.1Q-2022 in Annex I "I.3 Traffic type to traffic class mapping" and IETF DSCP naming and mapping DSCP to Traffic Type inspired by RFC8325. This helpers will be used in followup patches for dsa/microchip DCB implementation. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: add support for DCB get/set apptrust configurationOleksij Rempel
Add DCB support to get/set trust configuration for different packet priority information sources. Some switch allow to chose different source of packet priority classification. For example on KSZ switches it is possible to configure VLAN PCP and/or DSCP sources. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08xsk: use generic DMA sync shortcut instead of a custom oneAlexander Lobakin
XSk infra's been using its own DMA sync shortcut to try avoiding redundant function calls. Now that there is a generic one, remove the custom implementation and rely on the generic helpers. xsk_buff_dma_sync_for_cpu() doesn't need the second argument anymore, remove it. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-07page_pool: don't use driver-set flags field directlyAlexander Lobakin
page_pool::p is driver-defined params, copied directly from the structure passed to page_pool_create(). The structure isn't meant to be modified by the Page Pool core code and this even might look confusing[0][1]. In order to be able to alter some flags, let's define our own, internal fields the same way as the already existing one (::has_init_callback). They are defined as bits in the driver-set params, leave them so here as well, to not waste byte-per-bit or so. Almost 30 bits are still free for future extensions. We could've defined only new flags here or only the ones we may need to alter, but checking some flags in one place while others in another doesn't sound convenient or intuitive. ::flags passed by the driver can now go to the "slow" PP params. Suggested-by: Jakub Kicinski <kuba@kernel.org> Link[0]: https://lore.kernel.org/netdev/20230703133207.4f0c54ce@kernel.org Suggested-by: Alexander Duyck <alexanderduyck@fb.com> Link[1]: https://lore.kernel.org/netdev/CAKgT0UfZCGnWgOH96E4GV3ZP6LLbROHM7SHE8NKwq+exX+Gk_Q@mail.gmail.com Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-07page_pool: make sure frag API fields don't span between cachelinesAlexander Lobakin
After commit 5027ec19f104 ("net: page_pool: split the page_pool_params into fast and slow") that made &page_pool contain only "hot" params at the start, cacheline boundary chops frag API fields group in the middle again. To not bother with this each time fast params get expanded or shrunk, let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to their actual size (2 longs + 1 int). This ensures 16-byte alignment for the 32-bit architectures and 32-byte alignment for the 64-bit ones, excluding unnecessary false-sharing. ::page_state_hold_cnt is used quite intensively on hotpath no matter if frag API is used, so move it to the newly created hole in the first cacheline. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-06Merge tag 'ipsec-next-2024-05-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-05-03 1) Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support. This was defined by an early version of an IETF draft that did not make it to a standard. 2) Introduce direction attribute for xfrm states. xfrm states have a direction, a stsate can be used either for input or output packet processing. Add a direction to xfrm states to make it clear for what a xfrm state is used. * tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Restrict SA direction attribute to specific netlink message types xfrm: Add dir validation to "in" data path lookup xfrm: Add dir validation to "out" data path lookup xfrm: Add Direction to the SA in or out udpencap: Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support ==================== Link: https://lore.kernel.org/r/20240503082732.2835810-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07gtp: properly parse extension headersPablo Neira Ayuso
Currently GTP packets are dropped if the next extension field is set to non-zero value, but this are valid GTP packets. TS 29.281 provides a longer header format, which is defined as struct gtp1_header_long. Such long header format is used if any of the S, PN, E flags is set. This long header is 4 bytes longer than struct gtp1_header, plus variable length (optional) extension headers. The next extension header field is zero is no extension header is provided. The extension header is composed of a length field which includes total number of 4 byte words including the extension header itself (1 byte), payload (variable length) and next type (1 byte). The extension header size and its payload is aligned to 4 bytes. A GTP packet might come with a chain extensions headers, which makes it slightly cumbersome to parse because the extension next header field comes at the end of the extension header, and there is a need to check if this field becomes zero to stop the extension header parser. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06net: create tcp_gro_header_pull helper functionFelix Fietkau
Pull the code out of tcp_gro_receive in order to access the tcp header from tcp4/6_gro_receive. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06net: create tcp_gro_lookup helper functionFelix Fietkau
This pulls the flow port matching out of tcp_gro_receive, so that it can be reused for the next change, which adds the TCP fraglist GRO heuristic. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06net: move skb_gro_receive_list from udp to coreFelix Fietkau
This helper function will be used for TCP fraglist GRO support Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-05queue_api: define queue apiMina Almasry
This API enables the net stack to reset the queues used for devmem TCP. Signed-off-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Shailend Chand <shailend@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-03Merge tag 'ipsec-2024-05-02' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-05-02 1) Fix an error pointer dereference in xfrm_in_fwd_icmp. From Antony Antony. 2) Preserve vlan tags for ESP transport mode software GRO. From Paul Davey. 3) Fix a spelling mistake in an uapi xfrm.h comment. From Anotny Antony. * tag 'ipsec-2024-05-02' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: Correct spelling mistake in xfrm.h comment xfrm: Preserve vlan tags for transport mode software GRO xfrm: fix possible derferencing in error path ==================== Link: https://lore.kernel.org/r/20240502084838.2269355-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-03ax.25: x.25: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Avoid a buffer overflow when traversing the ctl_table by ensuring that AX25_MAX_VALUES is the same as the size of ax25_param_table. This is done with a BUILD_BUG_ON where ax25_param_table is defined and a CONFIG_AX25_DAMA_SLAVE guard in the unnamed enum definition as well as in the ax25_dev_device_up and ax25_ds_set_timer functions. The overflow happened when the sentinel was removed from ax25_param_table. The sentinel's data element was changed when CONFIG_AX25_DAMA_SLAVE was undefined. This had no adverse effects as it still stopped on the sentinel's null procname but needed to be addressed once the sentinel was removed. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-03wifi: mac80211: handle color change per linkAditya Kumar Singh
In order to support color change with MLO, handle the link ID now passed from cfg80211, adjust the code to do everything per link and call the notifications to cfg80211 correctly. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://msgid.link/20240422053412.2024075-4-quic_adisi@quicinc.com Link: https://msgid.link/20240422053412.2024075-5-quic_adisi@quicinc.com Link: https://msgid.link/20240422053412.2024075-6-quic_adisi@quicinc.com Link: https://msgid.link/20240422053412.2024075-7-quic_adisi@quicinc.com [squash, move API call updates to this patch] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-03wifi: cfg80211: handle color change per linkAditya Kumar Singh
Currently, during color change, no link id information is passed down. In order to support color change during Multi Link Operation, it is required to pass link id as well. Additionally, update notification APIs to allow drivers/mac80211 to pass the link ID. Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://msgid.link/20240422053412.2024075-2-quic_adisi@quicinc.com Link: https://msgid.link/20240422053412.2024075-3-quic_adisi@quicinc.com [squash, actually only pass 0 from mac80211] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-02tcp: Add new args for cong_control in tcp_congestion_opsMiao Xu
This patch adds two new arguments for cong_control of struct tcp_congestion_ops: - ack - flag These two arguments are inherited from the caller tcp_cong_control in tcp_intput.c. One use case of them is to update cwnd and pacing rate inside cong_control based on the info they provide. For example, the flag can be used to decide if it is the right time to raise or reduce a sender's cwnd. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Miao Xu <miaxu@meta.com> Link: https://lore.kernel.org/r/20240502042318.801932-2-miaxu@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02net: gro: fix udp bad offset in socket lookup by adding ↵Richard Gobert
{inner_}network_offset to napi_gro_cb Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp: additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the complete phase of gro. The functions always return skb->network_header, which in the case of encapsulated packets at the gro complete phase, is always set to the innermost L3 of the packet. That means that calling {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_ L3/L4 may return an unexpected value. This incorrect usage leads to a bug in GRO's UDP socket lookup. udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These *_hdr functions return network_header which will point to the innermost L3, resulting in the wrong offset being used in __udp{4,6}_lib_lookup with encapsulated packets. This patch adds network_offset and inner_network_offset to napi_gro_cb, and makes sure both are set correctly. To fix the issue, network_offsets union is used inside napi_gro_cb, in which both the outer and the inner network offsets are saved. Reproduction example: Endpoint configuration example (fou + local address bind) # ip fou add port 6666 ipproto 4 # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip # ip link set tun1 up # ip a add 1.1.1.2/24 dev tun1 Netperf TCP_STREAM result on net-next before patch is applied: net-next main, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.28 2.37 net-next main, GRO disabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2745.06 patch applied, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2877.38 Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-01net: dsa: Remove adjust_link pathsFlorian Fainelli
Now that we no longer any drivers using PHYLIB's adjust_link callback, remove all paths that made use of adjust_link as well as the associated functions. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240430164816.2400606-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01net: dsa: Remove fixed_link_update memberFlorian Fainelli
We have not had a switch driver use a fixed_link_update callback since 58d56fcc3964f9be0a9ca42fd126bcd9dc7afc90 ("net: dsa: bcm_sf2: Get rid of PHYLIB functions") remove this callback. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240430164816.2400606-2-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01netfs, 9p: Implement helpers for new write codeDavid Howells
Implement the helpers for the new write code in 9p. There's now an optional ->prepare_write() that allows the filesystem to set the parameters for the next write, such as maximum size and maximum segment count, and an ->issue_write() that is called to initiate an (asynchronous) write operation. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: v9fs@lists.linux.dev cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01xfrm: Add Direction to the SA in or outAntony Antony
This patch introduces the 'dir' attribute, 'in' or 'out', to the xfrm_state, SA, enhancing usability by delineating the scope of values based on direction. An input SA will restrict values pertinent to input, effectively segregating them from output-related values. And an output SA will restrict attributes for output. This change aims to streamline the configuration process and improve the overall consistency of SA attributes during configuration. This feature sets the groundwork for future patches, including the upcoming IP-TFS patch. Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-04-30net: move sysctl_mem_pcpu_rsv to net_hotdataEric Dumazet
sysctl_mem_pcpu_rsv is used in TCP fast path, move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30net: add <net/proto_memory.h>Eric Dumazet
Move some proto memory definitions out of <net/sock.h> Very few files need them, and following patch will include <net/hotdata.h> from <net/proto_memory.h> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30tcp: move tcp_out_of_memory() to net/ipv4/tcp.cEric Dumazet
tcp_out_of_memory() has a single caller: tcp_check_oom(). Following patch will also make sk_memory_allocated() not anymore visible from <net/sock.h> and <net/tcp.h> Add const qualifier to sock argument of tcp_out_of_memory() and tcp_check_oom(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30net: move sysctl_skb_defer_max to net_hotdataEric Dumazet
sysctl_skb_defer_max is used in TCP fast path, move it to net_hodata. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30net: move sysctl_max_skb_frags to net_hotdataEric Dumazet
sysctl_max_skb_frags is used in TCP and MPTCP fast paths, move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30inet: introduce dst_rtable() helperEric Dumazet
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper") This patch does a similar change for IPv4. Instead of (struct rtable *)dst casts, we can use : #define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst) Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240429133009.1227754-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30net/smc: add operations to merge sndbuf with peer DMBWen Gu
In some scenarios using Emulated-ISM device, sndbuf can share the same physical memory region with peer DMB to avoid data copy from one side to the other. In such case the sndbuf is only a descriptor that describes the shared memory and does not actually occupy memory, it's more like a ghost buffer. +----------+ +----------+ | socket A | | socket B | +----------+ +----------+ | | +--------+ +--------+ | sndbuf | | DMB | | desc | | desc | +--------+ +--------+ | | | +----v-----+ +--------------------------> memory | +----------+ So here introduces three new SMC-D device operations to check if this feature is supported by device, and to {attach|detach} ghost sndbuf to peer DMB. For now only loopback-ism supports this. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: mark optional smcd_ops and check for support when calledWen Gu
Some operations are not supported by new introduced Emulated-ISM, so mark them as optional and check if the device supports them when called. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30net/smc: decouple ism_client from SMC-D DMB registrationWen Gu
The struct 'ism_client' is specialized for s390 platform firmware ISM. So replace it with 'void' to make SMCD DMB registration helper generic for both Emulated-ISM and existing ISM. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-30netdev: add queue statsXuan Zhuo
These stats are commonly. Support reporting those via netdev-genl queue stats. name: rx-hw-drops name: rx-hw-drop-overruns name: rx-csum-unnecessary name: rx-csum-none name: rx-csum-bad name: rx-hw-gro-packets name: rx-hw-gro-bytes name: rx-hw-gro-wire-packets name: rx-hw-gro-wire-bytes name: rx-hw-drop-ratelimits name: tx-hw-drops name: tx-hw-drop-errors name: tx-csum-none name: tx-needs-csum name: tx-hw-gso-packets name: tx-hw-gso-bytes name: tx-hw-gso-wire-packets name: tx-hw-gso-wire-bytes name: tx-hw-drop-ratelimits Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-29Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-04-29 We've added 147 non-merge commits during the last 32 day(s) which contain a total of 158 files changed, 9400 insertions(+), 2213 deletions(-). The main changes are: 1) Add an internal-only BPF per-CPU instruction for resolving per-CPU memory addresses and implement support in x86 BPF JIT. This allows inlining per-CPU array and hashmap lookups and the bpf_get_smp_processor_id() helper, from Andrii Nakryiko. 2) Add BPF link support for sk_msg and sk_skb programs, from Yonghong Song. 3) Optimize x86 BPF JIT's emit_mov_imm64, and add support for various atomics in bpf_arena which can be JITed as a single x86 instruction, from Alexei Starovoitov. 4) Add support for passing mark with bpf_fib_lookup helper, from Anton Protopopov. 5) Add a new bpf_wq API for deferring events and refactor sleepable bpf_timer code to keep common code where possible, from Benjamin Tissoires. 6) Fix BPF_PROG_TEST_RUN infra with regards to bpf_dummy_struct_ops programs to check when NULL is passed for non-NULLable parameters, from Eduard Zingerman. 7) Harden the BPF verifier's and/or/xor value tracking, from Harishankar Vishwanathan. 8) Introduce crypto kfuncs to make BPF programs able to utilize the kernel crypto subsystem, from Vadim Fedorenko. 9) Various improvements to the BPF instruction set standardization doc, from Dave Thaler. 10) Extend libbpf APIs to partially consume items from the BPF ringbuffer, from Andrea Righi. 11) Bigger batch of BPF selftests refactoring to use common network helpers and to drop duplicate code, from Geliang Tang. 12) Support bpf_tail_call_static() helper for BPF programs with GCC 13, from Jose E. Marchesi. 13) Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF program to have code sections where preemption is disabled, from Kumar Kartikeya Dwivedi. 14) Allow invoking BPF kfuncs from BPF_PROG_TYPE_SYSCALL programs, from David Vernet. 15) Extend the BPF verifier to allow different input maps for a given bpf_for_each_map_elem() helper call in a BPF program, from Philo Lu. 16) Add support for PROBE_MEM32 and bpf_addr_space_cast instructions for riscv64 and arm64 JITs to enable BPF Arena, from Puranjay Mohan. 17) Shut up a false-positive KMSAN splat in interpreter mode by unpoison the stack memory, from Martin KaFai Lau. 18) Improve xsk selftest coverage with new tests on maximum and minimum hardware ring size configurations, from Tushar Vyavahare. 19) Various ReST man pages fixes as well as documentation and bash completion improvements for bpftool, from Rameez Rehman & Quentin Monnet. 20) Fix libbpf with regards to dumping subsequent char arrays, from Quentin Deslandes. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (147 commits) bpf, docs: Clarify PC use in instruction-set.rst bpf_helpers.h: Define bpf_tail_call_static when building with GCC bpf, docs: Add introduction for use in the ISA Internet Draft selftests/bpf: extend BPF_SOCK_OPS_RTT_CB test for srtt and mrtt_us bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB args selftests/bpf: dummy_st_ops should reject 0 for non-nullable params bpf: check bpf_dummy_struct_ops program params for test runs selftests/bpf: do not pass NULL for non-nullable params in dummy_st_ops selftests/bpf: adjust dummy_st_ops_success to detect additional error bpf: mark bpf_dummy_struct_ops.test_1 parameter as nullable selftests/bpf: Add ring_buffer__consume_n test. bpf: Add bpf_guard_preempt() convenience macro selftests: bpf: crypto: add benchmark for crypto functions selftests: bpf: crypto skcipher algo selftests bpf: crypto: add skcipher to bpf crypto bpf: make common crypto API for TC/XDP programs bpf: update the comment for BTF_FIELDS_MAX selftests/bpf: Fix wq test. selftests/bpf: Use make_sockaddr in test_sock_addr selftests/bpf: Use connect_to_addr in test_sock_addr ... ==================== Link: https://lore.kernel.org/r/20240429131657.19423-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-29ipv6: introduce dst_rt6_info() helperEric Dumazet
Instead of (struct rt6_info *)dst casts, we can use : #define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst) Some places needed missing const qualifiers : ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway() v2: added missing parts (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-29ipv6: use call_rcu_hurry() in fib6_info_release()Eric Dumazet
This is a followup of commit c4e86b4363ac ("net: add two more call_rcu_hurry()") fib6_info_destroy_rcu() is calling nexthop_put() or fib6_nh_release() We must not delay it too much or risk unregister_netdevice/ref_tracker traces because references to netdev are not released in time. This should speedup device/netns dismantles when CONFIG_RCU_LAZY=y Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-26tcp: support rstreason for passive resetJason Xing
Reuse the dropreason logic to show the exact reason of tcp reset, so we can finally display the corresponding item in enum sk_reset_reason instead of reinventing new reset reasons. This patch replaces all the prior NOT_SPECIFIED reasons. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26rstreason: prepare for active resetJason Xing
Like what we did to passive reset: only passing possible reset reason in each active reset path. No functional changes. Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26rstreason: prepare for passive resetJason Xing
Adjust the parameter and support passing reason of reset which is for now NOT_SPECIFIED. No functional changes. Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26net: introduce rstreason to detect why the RST is sentJason Xing
Add a new standalone file for the easy future extension to support both active reset and passive reset in the TCP/DCCP/MPTCP protocols. This patch only does the preparations for reset reason mechanism, nothing else changes. The reset reasons are divided into three parts: 1) reuse drop reasons for passive reset in TCP 2) our own independent reasons which aren't relying on other reasons at all 3) reuse MP_TCPRST option for MPTCP The benefits of a standalone reset reason are listed here: 1) it can cover more than one case, such as reset reasons in MPTCP, active reset reasons. 2) people can easily/fastly understand and maintain this mechanism. 3) we get unified format of output with prefix stripped. 4) more new reset reasons are on the way ... I will implement the basic codes of active/passive reset reason in those three protocols, which are not complete for this moment. For passive reset part in TCP, I only introduce the NO_SOCKET common case which could be set as an example. After this series applied, it will have the ability to open a new gate to let other people contribute more reasons into it :) Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26net/sched: fix false lockdep warning on qdisc root lockDavide Caratti
Xiumei and Christoph reported the following lockdep splat, complaining of the qdisc root lock being taken twice: ============================================ WARNING: possible recursive locking detected 6.7.0-rc3+ #598 Not tainted -------------------------------------------- swapper/2/0 is trying to acquire lock: ffff888177190110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 but task is already holding lock: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&sch->q.lock); lock(&sch->q.lock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by swapper/2/0: #0: ffff888135a09d98 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x11a/0x510 #1: ffffffffaaee5260 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x2c0/0x1ed0 #2: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 #3: ffff88811995a110 (&sch->q.lock){+.-.}-{2:2}, at: __dev_queue_xmit+0x1560/0x2e70 #4: ffffffffaaee5200 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x209/0x2e70 stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.7.0-rc3+ #598 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x4a/0x80 __lock_acquire+0xfdd/0x3150 lock_acquire+0x1ca/0x540 _raw_spin_lock+0x34/0x80 __dev_queue_xmit+0x1560/0x2e70 tcf_mirred_act+0x82e/0x1260 [act_mirred] tcf_action_exec+0x161/0x480 tcf_classify+0x689/0x1170 prio_enqueue+0x316/0x660 [sch_prio] dev_qdisc_enqueue+0x46/0x220 __dev_queue_xmit+0x1615/0x2e70 ip_finish_output2+0x1218/0x1ed0 __ip_finish_output+0x8b3/0x1350 ip_output+0x163/0x4e0 igmp_ifc_timer_expire+0x44b/0x930 call_timer_fn+0x1a2/0x510 run_timer_softirq+0x54d/0x11a0 __do_softirq+0x1b3/0x88f irq_exit_rcu+0x18f/0x1e0 sysvec_apic_timer_interrupt+0x6f/0x90 </IRQ> This happens when TC does a mirred egress redirect from the root qdisc of device A to the root qdisc of device B. As long as these two locks aren't protecting the same qdisc, they can be acquired in chain: add a per-qdisc lockdep key to silence false warnings. This dynamic key should safely replace the static key we have in sch_htb: it was added to allow enqueueing to the device "direct qdisc" while still holding the qdisc root lock. v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet) CC: Maxim Mikityanskiy <maxim@isovalent.com> CC: Xiumei Mu <xmu@redhat.com> Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/451 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-26xfrm: Preserve vlan tags for transport mode software GROPaul Davey
The software GRO path for esp transport mode uses skb_mac_header_rebuild prior to re-injecting the packet via the xfrm_napi_dev. This only copies skb->mac_len bytes of header which may not be sufficient if the packet contains 802.1Q tags or other VLAN tags. Worse copying only the initial header will leave a packet marked as being VLAN tagged but without the corresponding tag leading to mangling when it is later untagged. The VLAN tags are important when receiving the decrypted esp transport mode packet after GRO processing to ensure it is received on the correct interface. Therefore record the full mac header length in xfrm*_transport_input for later use in corresponding xfrm*_transport_finish to copy the entire mac header when rebuilding the mac header for GRO. The skb->data pointer is left pointing skb->mac_header bytes after the start of the mac header as is expected by the network stack and network and transport header offsets reset to this location. Fixes: 7785bba299a8 ("esp: Add a software GRO codepath") Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-04-25mm: change inlined allocation helpers to account at the call siteSuren Baghdasaryan
Main goal of memory allocation profiling patchset is to provide accounting that is cheap enough to run in production. To achieve that we inject counters using codetags at the allocation call sites to account every time allocation is made. This injection allows us to perform accounting efficiently because injected counters are immediately available as opposed to the alternative methods, such as using _RET_IP_, which would require counter lookup and appropriate locking that makes accounting much more expensive. This method requires all allocation functions to inject separate counters at their call sites so that their callers can be individually accounted. Counter injection is implemented by allocation hooks which should wrap all allocation functions. Inlined functions which perform allocations but do not use allocation hooks are directly charged for the allocations they perform. In most cases these functions are just specialized allocation wrappers used from multiple places to allocate objects of a specific type. It would be more useful to do the accounting at their call sites instead. Instrument these helpers to do accounting at the call site. Simple inlined allocation wrappers are converted directly into macros. More complex allocators or allocators with documentation are converted into _noprof versions and allocation hooks are added. This allows memory allocation profiling mechanism to charge allocations to the callers of these functions. Link: https://lkml.kernel.org/r/20240415020731.1152108-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Jan Kara <jack@suse.cz> [jbd2] Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== net: intel: start The Great Code Dedup + Page Pool for iavf Alexander Lobakin says: Here's a two-shot: introduce {,Intel} Ethernet common library (libeth and libie) and switch iavf to Page Pool. Details are in the commit messages; here's a summary: Not a secret there's a ton of code duplication between two and more Intel ethernet modules. Before introducing new changes, which would need to be copied over again, start decoupling the already existing duplicate functionality into a new module, which will be shared between several Intel Ethernet drivers. The first name that came to my mind was "libie" -- "Intel Ethernet common library". Also this sounds like "lovelie" (-> one word, no "lib I E" pls) and can be expanded as "lib Internet Explorer" :P The "generic", pure-software part is placed separately, so that it can be easily reused in any driver by any vendor without linking to the Intel pre-200G guts. In a few words, it's something any modern driver does the same way, but nobody moved it level up (yet). The series is only the beginning. From now on, adding every new feature or doing any good driver refactoring will remove much more lines than add for quite some time. There's a basic roadmap with some deduplications planned already, not speaking of that touching every line now asks: "can I share this?". The final destination is very ambitious: have only one unified driver for at least i40e, ice, iavf, and idpf with a struct ops for each generation. That's never gonna happen, right? But you still can at least try. PP conversion for iavf lands within the same series as these two are tied closely. libie will support Page Pool model only, so that a driver can't use much of the lib until it's converted. iavf is only the example, the rest will eventually be converted soon on a per-driver basis. That is when it gets really interesting. Stay tech. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: MAINTAINERS: add entry for libeth and libie iavf: switch to Page Pool iavf: pack iavf_ring more efficiently libeth: add Rx buffer management page_pool: add DMA-sync-for-CPU inline helper page_pool: constify some read-only function arguments slab: introduce kvmalloc_array_node() and kvcalloc_node() iavf: drop page splitting and recycling iavf: kill "legacy-rx" for good net: intel: introduce {, Intel} Ethernet common library ==================== Link: https://lore.kernel.org/r/20240424203559.3420468-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25net: add two more call_rcu_hurry()Eric Dumazet
I had failures with pmtu.sh selftests lately, with netns dismantles firing ref_tracking alerts [1]. After much debugging, I found that some queued rcu callbacks were delayed by minutes, because of CONFIG_RCU_LAZY=y option. Joel Fernandes had a similar issue in the past, fixed with commit 483c26ff63f4 ("net: Use call_rcu_hurry() for dst_release()") In this commit, I make sure nexthop_free_rcu() and free_fib_info_rcu() are not delayed too much because they both can release device references. tools/testing/selftests/net/pmtu.sh no longer fails. Traces were: [ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 3/5 users at dst_alloc+0x76/0x160 ip6_dst_alloc+0x25/0x80 ip6_pol_route+0x2a8/0x450 ip6_pol_route_output+0x1f/0x30 fib6_rule_lookup+0x163/0x270 ip6_route_output_flags+0xda/0x190 ip6_dst_lookup_tail.constprop.0+0x1d0/0x260 ip6_dst_lookup_flow+0x47/0xa0 udp_tunnel6_dst_lookup+0x158/0x210 vxlan_xmit_one+0x4c2/0x1550 [vxlan] vxlan_xmit+0x52d/0x14f0 [vxlan] dev_hard_start_xmit+0x7b/0x1e0 __dev_queue_xmit+0x20b/0xe40 ip6_finish_output2+0x2ea/0x6e0 ip6_finish_output+0x143/0x320 ip6_output+0x74/0x140 [ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 1/5 users at netdev_get_by_index+0xc0/0xe0 fib6_nh_init+0x1a9/0xa90 rtm_new_nexthop+0x6fa/0x1580 rtnetlink_rcv_msg+0x155/0x3e0 netlink_rcv_skb+0x61/0x110 rtnetlink_rcv+0x19/0x20 netlink_unicast+0x23f/0x380 netlink_sendmsg+0x1fc/0x430 ____sys_sendmsg+0x2ef/0x320 ___sys_sendmsg+0x86/0xd0 __sys_sendmsg+0x67/0xc0 __x64_sys_sendmsg+0x21/0x30 x64_sys_call+0x252/0x2030 do_syscall_64+0x6c/0x190 entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 1/5 users at ipv6_add_dev+0x136/0x530 addrconf_notify+0x19d/0x770 notifier_call_chain+0x65/0xd0 raw_notifier_call_chain+0x1a/0x20 call_netdevice_notifiers_info+0x54/0x90 register_netdevice+0x61e/0x790 veth_newlink+0x230/0x440 __rtnl_newlink+0x7d2/0xaa0 rtnl_newlink+0x4c/0x70 rtnetlink_rcv_msg+0x155/0x3e0 netlink_rcv_skb+0x61/0x110 rtnetlink_rcv+0x19/0x20 netlink_unicast+0x23f/0x380 netlink_sendmsg+0x1fc/0x430 ____sys_sendmsg+0x2ef/0x320 ___sys_sendmsg+0x86/0xd0 .... [ 1079.316024] ? show_regs+0x68/0x80 [ 1079.316087] ? __warn+0x8c/0x140 [ 1079.316103] ? ref_tracker_free+0x1a0/0x270 [ 1079.316117] ? report_bug+0x196/0x1c0 [ 1079.316135] ? handle_bug+0x42/0x80 [ 1079.316149] ? exc_invalid_op+0x1c/0x70 [ 1079.316162] ? asm_exc_invalid_op+0x1f/0x30 [ 1079.316193] ? ref_tracker_free+0x1a0/0x270 [ 1079.316208] ? _raw_spin_unlock+0x1a/0x40 [ 1079.316222] ? free_unref_page+0x126/0x1a0 [ 1079.316239] ? destroy_large_folio+0x69/0x90 [ 1079.316251] ? __folio_put+0x99/0xd0 [ 1079.316276] dst_dev_put+0x69/0xd0 [ 1079.316308] fib6_nh_release_dsts.part.0+0x3d/0x80 [ 1079.316327] fib6_nh_release+0x45/0x70 [ 1079.316340] nexthop_free_rcu+0x131/0x170 [ 1079.316356] rcu_do_batch+0x1ee/0x820 [ 1079.316370] ? rcu_do_batch+0x179/0x820 [ 1079.316388] rcu_core+0x1aa/0x4d0 [ 1079.316405] rcu_core_si+0x12/0x20 [ 1079.316417] __do_softirq+0x13a/0x3dc [ 1079.316435] __irq_exit_rcu+0xa3/0x110 [ 1079.316449] irq_exit_rcu+0x12/0x30 [ 1079.316462] sysvec_apic_timer_interrupt+0x5b/0xe0 [ 1079.316474] asm_sysvec_apic_timer_interrupt+0x1f/0x30 [ 1079.316569] RIP: 0033:0x7f06b65c63f0 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240423205408.39632-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25bpf: add mrtt and srtt as BPF_SOCK_OPS_RTT_CB argsPhilo Lu
Two important arguments in RTT estimation, mrtt and srtt, are passed to tcp_bpf_rtt(), so that bpf programs get more information about RTT computation in BPF_SOCK_OPS_RTT_CB. The difference between bpf_sock_ops->srtt_us and the srtt here is: the former is an old rtt before update, while srtt passed by tcp_bpf_rtt() is that after update. Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Link: https://lore.kernel.org/r/20240425161724.73707-2-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c 89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link") 87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c 4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>