summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2024-11-18net: ip: fix unexpected return in fib_validate_source()Menglong Dong
The errno should be replaced with drop reasons in fib_validate_source(), and the "-EINVAL" shouldn't be returned. And this causes a warning, which is reported by syzkaller: netlink: 'syz-executor371': attribute type 4 has an invalid length. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 5842 at net/core/skbuff.c:1219 __sk_skb_reason_drop net/core/skbuff.c:1216 [inline] WARNING: CPU: 0 PID: 5842 at net/core/skbuff.c:1219 sk_skb_reason_drop+0x87/0x380 net/core/skbuff.c:1241 Modules linked in: CPU: 0 UID: 0 PID: 5842 Comm: syz-executor371 Not tainted 6.12.0-rc6-syzkaller-01362-ga58f00ed24b8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024 RIP: 0010:__sk_skb_reason_drop net/core/skbuff.c:1216 [inline] RIP: 0010:sk_skb_reason_drop+0x87/0x380 net/core/skbuff.c:1241 Code: 00 00 00 fc ff df 41 8d 9e 00 00 fc ff bf 01 00 fc ff 89 de e8 ea 9f 08 f8 81 fb 00 00 fc ff 77 3a 4c 89 e5 e8 9a 9b 08 f8 90 <0f> 0b 90 eb 5e bf 01 00 00 00 89 ee e8 c8 9f 08 f8 85 ed 0f 8e 49 RSP: 0018:ffffc90003d57078 EFLAGS: 00010293 RAX: ffffffff898c3ec6 RBX: 00000000fffbffea RCX: ffff8880347a5a00 RDX: 0000000000000000 RSI: 00000000fffbffea RDI: 00000000fffc0001 RBP: dffffc0000000000 R08: ffffffff898c3eb6 R09: 1ffff110023eb7d4 R10: dffffc0000000000 R11: ffffed10023eb7d5 R12: dffffc0000000000 R13: ffff888011f5bdc0 R14: 00000000ffffffea R15: 0000000000000000 FS: 000055557d41e380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056519d31d608 CR3: 000000007854e000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> kfree_skb_reason include/linux/skbuff.h:1263 [inline] ip_rcv_finish_core+0xfde/0x1b50 net/ipv4/ip_input.c:424 ip_list_rcv_finish net/ipv4/ip_input.c:610 [inline] ip_sublist_rcv+0x3b1/0xab0 net/ipv4/ip_input.c:636 ip_list_rcv+0x42b/0x480 net/ipv4/ip_input.c:670 __netif_receive_skb_list_ptype net/core/dev.c:5715 [inline] __netif_receive_skb_list_core+0x94e/0x980 net/core/dev.c:5762 __netif_receive_skb_list net/core/dev.c:5814 [inline] netif_receive_skb_list_internal+0xa51/0xe30 net/core/dev.c:5905 netif_receive_skb_list+0x55/0x4b0 net/core/dev.c:5957 xdp_recv_frames net/bpf/test_run.c:280 [inline] xdp_test_run_batch net/bpf/test_run.c:361 [inline] bpf_test_run_xdp_live+0x1b5e/0x21b0 net/bpf/test_run.c:390 bpf_prog_test_run_xdp+0x805/0x11e0 net/bpf/test_run.c:1318 bpf_prog_test_run+0x2e4/0x360 kernel/bpf/syscall.c:4266 __sys_bpf+0x48d/0x810 kernel/bpf/syscall.c:5671 __do_sys_bpf kernel/bpf/syscall.c:5760 [inline] __se_sys_bpf kernel/bpf/syscall.c:5758 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5758 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f18af25a8e9 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffee4090af8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f18af25a8e9 RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Fix it by returning "-SKB_DROP_REASON_IP_LOCAL_SOURCE" instead of "-EINVAL" in fib_validate_source(). Reported-by: syzbot+52fbd90f020788ec7709@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6738e539.050a0220.e1c64.0002.GAE@google.com/ Fixes: 82d9983ebeb8 ("net: ip: make ip_route_input_noref() return drop reasons") Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18ipv6/udp: Add 4-tuple hash for connected socketPhilo Lu
Implement ipv6 udp hash4 like that in ipv4. The major difference is that the hash value should be calculated with udp6_ehashfn(). Besides, ipv4-mapped ipv6 address is handled before hash() and rehash(). Export udp_ehashfn because now we use it in udpv6 rehash. Core procedures of hash/unhash/rehash are same as ipv4, and udpv4 and udpv6 share the same udptable, so some functions in ipv4 hash4 can also be shared. Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com> Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com> Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18ipv4/udp: Add 4-tuple hash for connected socketPhilo Lu
Currently, the udp_table has two hash table, the port hash and portaddr hash. Usually for UDP servers, all sockets have the same local port and addr, so they are all on the same hash slot within a reuseport group. In some applications, UDP servers use connect() to manage clients. In particular, when firstly receiving from an unseen 4 tuple, a new socket is created and connect()ed to the remote addr:port, and then the fd is used exclusively by the client. Once there are connected sks in a reuseport group, udp has to score all sks in the same hash2 slot to find the best match. This could be inefficient with a large number of connections, resulting in high softirq overhead. To solve the problem, this patch implement 4-tuple hash for connected udp sockets. During connect(), hash4 slot is updated, as well as a corresponding counter, hash4_cnt, in hslot2. In __udp4_lib_lookup(), hslot4 will be searched firstly if the counter is non-zero. Otherwise, hslot2 is used like before. Note that only connected sockets enter this hash4 path, while un-connected ones are not affected. hlist_nulls is used for hash4, because we probably move to another hslot wrongly when lookup with concurrent rehash. Then we check nulls at the list end to see if we should restart lookup. Because udp does not use SLAB_TYPESAFE_BY_RCU, we don't need to touch sk_refcnt when lookup. Stress test results (with 1 cpu fully used) are shown below, in pps: (1) _un-connected_ socket as server [a] w/o hash4: 1,825176 [b] w/ hash4: 1,831750 (+0.36%) (2) 500 _connected_ sockets as server [c] w/o hash4: 290860 (only 16% of [a]) [d] w/ hash4: 1,889658 (+3.1% compared with [b]) With hash4, compute_score is skipped when lookup, so [d] is slightly better than [b]. Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com> Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com> Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18net/udp: Add 4-tuple hash list basisPhilo Lu
Add a new hash list, hash4, in udp table. It will be used to implement 4-tuple hash for connected udp sockets. This patch adds the hlist to table, and implements helpers and the initialization. 4-tuple hash is implemented in the following patch. hash4 uses hlist_nulls to avoid moving wrongly onto another hlist due to concurrent rehash, because rehash() can happen with lookup(). Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com> Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com> Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18net/udp: Add a new struct for hash2 slotPhilo Lu
Preparing for udp 4-tuple hash (uhash4 for short). To implement uhash4 without cache line missing when lookup, hslot2 is used to record the number of hashed sockets in hslot4. Thus adding a new struct udp_hslot_main with field hash4_cnt, which is used by hash2. The new struct is used to avoid doubling the size of udp_hslot. Before uhash4 lookup, firstly checking hash4_cnt to see if there are hashed sks in hslot4. Because hslot2 is always used in lookup, there is no cache line miss. Related helpers are updated, and use the helpers as possible. uhash4 is implemented in following patches. Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18Merge tag 'ipsec-next-2024-11-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next-11-15 1) Add support for RFC 9611 per cpu xfrm state handling. 2) Add inbound and outbound xfrm state caches to speed up state lookups. 3) Convert xfrm to dscp_t. From Guillaume Nault. 4) Fix error handling in build_aevent. From Everest K.C. 5) Replace strncpy with strscpy_pad in copy_to_user_auth. From Daniel Yang. 6) Fix an uninitialized symbol during acquire state insertion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-15Merge tag 'nf-next-24-11-15' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Extended netlink error reporting if nfnetlink attribute parser fails, from Donald Hunter. 2) Incorrect request_module() module, from Simon Horman. 3) A series of patches to reduce memory consumption for set element transactions. Florian Westphal says: "When doing a flush on a set or mass adding/removing elements from a set, each element needs to allocate 96 bytes to hold the transactional state. In such cases, virtually all the information in struct nft_trans_elem is the same. Change nft_trans_elem to a flex-array, i.e. a single nft_trans_elem can hold multiple set element pointers. The number of elements that can be stored in one nft_trans_elem is limited by the slab allocator, this series limits the compaction to at most 62 elements as it caps the reallocation to 2048 bytes of memory." 4) A series of patches to prepare the transition to dscp_t in .flowi_tos. From Guillaume Nault. 5) Support for bitwise operations with two source registers, from Jeremy Sowden. * tag 'nf-next-24-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: bitwise: add support for doing AND, OR and XOR directly netfilter: bitwise: rename some boolean operation functions netfilter: nf_dup4: Convert nf_dup_ipv4_route() to dscp_t. netfilter: nft_fib: Convert nft_fib4_eval() to dscp_t. netfilter: rpfilter: Convert rpfilter_mt() to dscp_t. netfilter: flow_offload: Convert nft_flow_route() to dscp_t. netfilter: ipv4: Convert ip_route_me_harder() to dscp_t. netfilter: nf_tables: allocate element update information dynamically netfilter: nf_tables: switch trans_elem to real flex array netfilter: nf_tables: prepare nft audit for set element compaction netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structure netfilter: nf_tables: add nft_trans_commit_list_add_elem helper netfilter: bpf: Pass string literal as format argument of request_module() netfilter: nfnetlink: Report extack policy errors for batched ops ==================== Link: https://patch.msgid.link/20241115133207.8907-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15netfilter: nf_dup4: Convert nf_dup_ipv4_route() to dscp_t.Guillaume Nault
Use ip4h_dscp() instead of reading iph->tos directly. ip4h_dscp() returns a dscp_t value which is temporarily converted back to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to dscp_t in the future, we'll only have to remove that inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-15netfilter: nft_fib: Convert nft_fib4_eval() to dscp_t.Guillaume Nault
Use ip4h_dscp() instead of reading iph->tos directly. ip4h_dscp() returns a dscp_t value which is temporarily converted back to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to dscp_t in the future, we'll only have to remove that inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-15netfilter: rpfilter: Convert rpfilter_mt() to dscp_t.Guillaume Nault
Use ip4h_dscp() instead of reading iph->tos directly. ip4h_dscp() returns a dscp_t value which is temporarily converted back to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to dscp_t in the future, we'll only have to remove that inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-15netfilter: ipv4: Convert ip_route_me_harder() to dscp_t.Guillaume Nault
Use ip4h_dscp()instead of reading iph->tos directly. ip4h_dscp() returns a dscp_t value which is temporarily converted back to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to dscp_t in the future, we'll only have to remove that inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc8). Conflicts: tools/testing/selftests/net/.gitignore 252e01e68241 ("selftests: net: add netlink-dumps to .gitignore") be43a6b23829 ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw") https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/ drivers/net/phy/phylink.c 671154f174e0 ("net: phylink: ensure PHY momentary link-fails are handled") 7530ea26c810 ("net: phylink: remove "using_mac_select_pcs"") Adjacent changes: drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c 5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines") e96321fad3ad ("net: ethernet: Switch back to struct platform_driver::remove()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-13ipmr: Fix access to mfc_cache_list without lock heldBreno Leitao
Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the following code flow, the RCU read lock is not held, causing the following error when `RCU_PROVE` is not held. The same problem might show up in the IPv6 code path. 6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N ----------------------------- net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!! rcu_scheduler_active = 2, debug_locks = 1 2 locks held by RetransmitAggre/3519: #0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290 #1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90 stack backtrace: lockdep_rcu_suspicious mr_table_dump ipmr_rtm_dumproute rtnl_dump_all rtnl_dumpit netlink_dump __netlink_dump_start rtnetlink_rcv_msg netlink_rcv_skb netlink_unicast netlink_sendmsg This is not a problem per see, since the RTNL lock is held here, so, it is safe to iterate in the list without the RCU read lock, as suggested by Eric. To alleviate the concern, modify the code to use list_for_each_entry_rcu() with the RTNL-held argument. The annotation will raise an error only if RTNL or RCU read lock are missing during iteration, signaling a legitimate problem, otherwise it will avoid this false positive. This will solve the IPv6 case as well, since ip6mr_rtm_dumproute() calls this function as well. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241108-ipmr_rcu-v2-1-c718998e209b@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-12net: ip: make ip_route_use_hint() return drop reasonsMenglong Dong
In this commit, we make ip_route_use_hint() return drop reasons. The drop reasons that we return are similar to what we do in ip_route_input_slow(), and no drop reasons are added in this commit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_mkroute_input/__mkroute_input return drop reasonsMenglong Dong
In this commit, we make ip_mkroute_input() and __mkroute_input() return drop reasons. The drop reason "SKB_DROP_REASON_ARP_PVLAN_DISABLE" is introduced for the case: the packet which is not IP is forwarded to the in_dev, and the proxy_arp_pvlan is not enabled. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input() return drop reasonsMenglong Dong
In this commit, we make ip_route_input() return skb drop reasons that come from ip_route_input_noref(). Meanwhile, adjust all the call to it. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input_noref() return drop reasonsMenglong Dong
In this commit, we make ip_route_input_noref() return drop reasons, which come from ip_route_input_rcu(). We need adjust the callers of ip_route_input_noref() to make sure the return value of ip_route_input_noref() is used properly. The errno that ip_route_input_noref() returns comes from ip_route_input and bpf_lwt_input_reroute in the origin logic, and we make them return -EINVAL on error instead. In the following patch, we will make ip_route_input() returns drop reasons too. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input_rcu() return drop reasonsMenglong Dong
In this commit, we make ip_route_input_rcu() return drop reasons, which come from ip_route_input_mc() and ip_route_input_slow(). The only caller of ip_route_input_rcu() is ip_route_input_noref(). We adjust it by making it return -EINVAL on error and ignore the reasons that ip_route_input_rcu() returns. In the following patch, we will make ip_route_input_noref() returns the drop reasons. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input_slow() return drop reasonsMenglong Dong
In this commit, we make ip_route_input_slow() return skb drop reasons, and following new skb drop reasons are added: SKB_DROP_REASON_IP_INVALID_DEST The only caller of ip_route_input_slow() is ip_route_input_rcu(), and we adjust it by making it return -EINVAL on error. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_mc_validate_source() return drop reasonMenglong Dong
Make ip_mc_validate_source() return drop reason, and adjust the call of it in ip_route_input_mc(). Another caller of it is ip_rcv_finish_core->udp_v4_early_demux, and the errno is not checked in detail, so we don't do more adjustment for it. The drop reason "SKB_DROP_REASON_IP_LOCALNET" is added in this commit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input_mc() return drop reasonMenglong Dong
Make ip_route_input_mc() return drop reason, and adjust the call of it in ip_route_input_rcu(). Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make fib_validate_source() support drop reasonsMenglong Dong
In this commit, we make fib_validate_source() and __fib_validate_source() return -reason instead of errno on error. The return value of fib_validate_source can be -errno, 0, and 1. It's hard to make fib_validate_source() return drop reasons directly. The fib_validate_source() will return 1 if the scope of the source(revert) route is HOST. And the __mkroute_input() will mark the skb with IPSKB_DOREDIRECT in this case (combine with some other conditions). And then, a REDIRECT ICMP will be sent in ip_forward() if this flag exists. We can't pass this information to __mkroute_input if we make fib_validate_source() return drop reasons. Therefore, we introduce the wrapper fib_validate_source_reason() for fib_validate_source(), which will return the drop reasons on error. In the origin logic, LINUX_MIB_IPRPFILTER will be counted if fib_validate_source() return -EXDEV. And now, we need to adjust it by checking "reason == SKB_DROP_REASON_IP_RPFILTER". However, this will take effect only after the patch "net: ip: make ip_route_input_noref() return drop reasons", as we can't pass the drop reasons from fib_validate_source() to ip_rcv_finish_core() in this patch. Following new drop reasons are added in this patch: SKB_DROP_REASON_IP_LOCAL_SOURCE SKB_DROP_REASON_IP_INVALID_SOURCE Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-11net: ipv4: Cache pmtu for all packet paths if multipath enabledVladimir Vdovin
Check number of paths by fib_info_num_path(), and update_or_create_fnhe() for every path. Problem is that pmtu is cached only for the oif that has received icmp message "need to frag", other oifs will still try to use "default" iface mtu. An example topology showing the problem: | host1 +---------+ | dummy0 | 10.179.20.18/32 mtu9000 +---------+ +-----------+----------------+ +---------+ +---------+ | ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31 +---------+ +---------+ | (all here have mtu 9000) | +------+ +------+ | ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31 +------+ +------+ | | ---------+------------+-------------------+------ | +-----+ | ro3 | 10.10.10.10 mtu1500 +-----+ | ======================================== some networks ======================================== | +-----+ | eth0| 10.10.30.30 mtu9000 +-----+ | host2 host1 have enabled multipath and sysctl net.ipv4.fib_multipath_hash_policy = 1: default proto static src 10.179.20.18 nexthop via 10.179.2.12 dev ens17f1 weight 1 nexthop via 10.179.2.140 dev ens17f0 weight 1 When host1 tries to do pmtud from 10.179.20.18/32 to host2, host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500. And host1 caches it in nexthop exceptions cache. Problem is that it is cached only for the iface that has received icmp, and there is no way that ro3 will send icmp msg to host1 via another path. Host1 now have this routes to host2: ip r g 10.10.30.30 sport 30000 dport 443 10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0 cache expires 521sec mtu 1500 ip r g 10.10.30.30 sport 30033 dport 443 10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0 cache So when host1 tries again to reach host2 with mtu>1500, if packet flow is lucky enough to be hashed with oif=ens17f1 its ok, if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1, until lucky day when ro3 will send it through another flow to ens17f0. Signed-off-by: Vladimir Vdovin <deliran@verdict.gg> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20241108093427.317942-1-deliran@verdict.gg Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: convert to nla_get_*_default()Johannes Berg
Most of the original conversion is from the spatch below, but I edited some and left out other instances that were either buggy after conversion (where default values don't fit into the type) or just looked strange. @@ expression attr, def; expression val; identifier fn =~ "^nla_get_.*"; fresh identifier dfn = fn ## "_default"; @@ ( -if (attr) - val = fn(attr); -else - val = def; +val = dfn(attr, def); | -if (!attr) - val = def; -else - val = fn(attr); +val = dfn(attr, def); | -if (!attr) - return def; -return fn(attr); +return dfn(attr, def); | -attr ? fn(attr) : def +dfn(attr, def) | -!attr ? def : fn(attr) +dfn(attr, def) ) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://patch.msgid.link/20241108114145.0580b8684e7f.I740beeaa2f70ebfc19bfca1045a24d6151992790@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Remove bare neighbour::next pointerGilad Naaman
Remove the now-unused neighbour::next pointer, leaving struct neighbour solely with the hlist_node implementation. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241107160444.2913124-6-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07Merge tag 'nf-next-24-11-07' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following series contains Netfilter updates for net-next: 1) Make legacy xtables configs user selectable, from Breno Leitao. 2) Fix a few sparse warnings related to percpu, from Uros Bizjak. 3) Use strscpy_pad, from Justin Stitt. 4) Use nft_trans_elem_alloc() in catchall flush, from Florian Westphal. 5) A series of 7 patches to fix false positive with CONFIG_RCU_LIST=y. Florian also sees possible issue with 10 while module load/removal when requesting an expression that is available via module. As for patch 11, object is being updated so reference on the module already exists so I don't see any real issue. Florian says: "Unfortunately there are many more errors, and not all are false positives. First patches pass lockdep_commit_lock_is_held() to the rcu list traversal macro so that those splats are avoided. The last two patches are real code change as opposed to 'pass the transaction mutex to relax rcu check': Those two lists are not protected by transaction mutex so could be altered in parallel. This targets nf-next because these are long-standing issues." netfilter pull request 24-11-07 * tag 'nf-next-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: must hold rcu read lock while iterating object type list netfilter: nf_tables: must hold rcu read lock while iterating expression type list netfilter: nf_tables: avoid false-positive lockdep splats with basechain hook netfilter: nf_tables: avoid false-positive lockdep splats in set walker netfilter: nf_tables: avoid false-positive lockdep splats with flowtables netfilter: nf_tables: avoid false-positive lockdep splats with sets netfilter: nf_tables: avoid false-positive lockdep splat on rule deletion netfilter: nf_tables: prefer nft_trans_elem_alloc helper netfilter: nf_tables: replace deprecated strncpy with strscpy_pad netfilter: nf_tables: Fix percpu address space issues in nf_tables_api.c netfilter: Make legacy configs user selectable ==================== Link: https://patch.msgid.link/20241106234625.168468-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-06xfrm: Convert struct xfrm_dst_lookup_params -> tos to dscp_t.Guillaume Nault
Add type annotation to the "tos" field of struct xfrm_dst_lookup_params, to ensure that the ECN bits aren't mistakenly taken into account when doing route lookups. Rename that field (tos -> dscp) to make that change explicit. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-03net/tcp: Add missing lockdep annotations for TCP-AO hlist traversalsDmitry Safonov
Under CONFIG_PROVE_RCU_LIST + CONFIG_RCU_EXPERT hlist_for_each_entry_rcu() provides very helpful splats, which help to find possible issues. I missed CONFIG_RCU_EXPERT=y in my testing config the same as described in a3e4bf7f9675 ("configs/debug: make sure PROVE_RCU_LIST=y takes effect"). The fix itself is trivial: add the very same lockdep annotations as were used to dereference ao_info from the socket. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20241028152645.35a8be66@kernel.org/ Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20241030-tcp-ao-hlist-lockdep-annotate-v1-1-bf641a64d7c6@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: tcp: Add noinline_for_tracing annotation for tcp_drop_reason()Yafang Shao
We previously hooked the tcp_drop_reason() function using BPF to monitor TCP drop reasons. However, after upgrading our compiler from GCC 9 to GCC 11, tcp_drop_reason() is now inlined, preventing us from hooking into it. To address this, it would be beneficial to make noinline explicitly for tracing. Link: https://lore.kernel.org/netdev/CANn89iJuShCmidCi_ZkYABtmscwbVjhuDta1MS5LxV_4H9tKOA@mail.gmail.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Menglong Dong <menglong8.dong@gmail.com> Link: https://patch.msgid.link/20241024093742.87681-3-laoar.shao@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31tcp: only release congestion control if it has been initializedPengcheng Yang
Currently, when cleaning up congestion control, we always call the release regardless of whether it has been initialized. There is no need to release when closing TCP_LISTEN and TCP_CLOSE (close immediately after socket()). In this case, tcp_cdg calls kfree(NULL) in release without causing an exception, but for some customized ca, this could lead to unexpected exceptions. We need to ensure that init and release are called in pairs. Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Link: https://patch.msgid.link/1729845944-6003-1-git-send-email-yangpc@wangsu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc6). Conflicts: drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c cbe84e9ad5e2 ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd") 188a1bf89432 ("wifi: mac80211: re-order assigning channel in activate links") https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/ net/mac80211/cfg.c c4382d5ca1af ("wifi: mac80211: update the right link for tx power") 8dd0498983ee ("wifi: mac80211: Fix setting txpower with emulate_chanctx") drivers/net/ethernet/intel/ice/ice_ptp_hw.h 6e58c3310622 ("ice: fix crash on probe for DPLL enabled E810 LOM") e4291b64e118 ("ice: Align E810T GPIO to other products") ebb2693f8fbd ("ice: Read SDP section from NVM for pin definitions") ac532f4f4251 ("ice: Cleanup unused declarations") https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix BPF verifier to force a checkpoint when the program's jump history becomes too long (Eduard Zingerman) - Add several fixes to the BPF bits iterator addressing issues like memory leaks and overflow problems (Hou Tao) - Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong) - Fix BPF test infra's LIVE_FRAME frame update after a page has been recycled (Toke Høiland-Jørgensen) - Fix BPF verifier and undo the 40-bytes extra stack space for bpf_fastcall patterns due to various bugs (Eduard Zingerman) - Fix a BPF sockmap race condition which could trigger a NULL pointer dereference in sock_map_link_update_prog (Cong Wang) - Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk under the socket lock (Jiayuan Chen) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled selftests/bpf: Add three test cases for bits_iter bpf: Use __u64 to save the bits in bits iterator bpf: Check the validity of nr_words in bpf_iter_bits_new() bpf: Add bpf_mem_alloc_check_size() helper bpf: Free dynamically allocated bits in bpf_iter_bits_destroy() bpf: disallow 40-bytes extra stack for bpf_fastcall patterns selftests/bpf: Add test for trie_get_next_key() bpf: Fix out-of-bounds write in trie_get_next_key() selftests/bpf: Test with a very short loop bpf: Force checkpoint when jmp history is too long bpf: fix filed access without lock sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()
2024-10-30tcp: add more warn of socket in tcp_send_loss_probe()Jason Xing
Add two fields to print in the helper which here covers tcp_send_loss_probe(). Link: https://lore.kernel.org/all/5632e043-bdba-4d75-bc7e-bf58014492fd@redhat.com/ Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Cc: Neal Cardwell <ncardwell@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-29ipv4: Prepare ip_rt_get_source() to future .flowi4_tos conversion.Guillaume Nault
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the dscp_t value to __u8 with inet_dscp_to_dsfield(). Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop the inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/0a13a200f31809841975e38633914af1061e0c04.1729530028.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29ipv4: Prepare ipmr_rt_fib_lookup() to future .flowi4_tos conversion.Guillaume Nault
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the dscp_t value to __u8 with inet_dscp_to_dsfield(). Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop the inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/462402a097260357a7aba80228612305f230b6a9.1729530028.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29ipv4: Prepare icmp_reply() to future .flowi4_tos conversion.Guillaume Nault
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the dscp_t value to __u8 with inet_dscp_to_dsfield(). Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop the inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/61b7563563f8b0a562b5b62032fe5260034d0aac.1729530028.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29ipv4: Prepare fib_compute_spec_dst() to future .flowi4_tos conversion.Guillaume Nault
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the dscp_t value to __u8 with inet_dscp_to_dsfield(). Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop the inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/a0eba69cce94f747e4c7516184a85ffd0abbe3f0.1729530028.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_find()Ido Schimmel
The per-netns IP tunnel hash table is protected by the RTNL mutex and ip_tunnel_find() is only called from the control path where the mutex is taken. Add a lockdep expression to hlist_for_each_entry_rcu() in ip_tunnel_find() in order to validate that the mutex is held and to silence the suspicious RCU usage warning [1]. [1] WARNING: suspicious RCU usage 6.12.0-rc3-custom-gd95d9a31aceb #139 Not tainted ----------------------------- net/ipv4/ip_tunnel.c:221 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/362: #0: ffffffff86fc7cb0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60 stack backtrace: CPU: 12 UID: 0 PID: 362 Comm: ip Not tainted 6.12.0-rc3-custom-gd95d9a31aceb #139 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: <TASK> dump_stack_lvl+0xba/0x110 lockdep_rcu_suspicious.cold+0x4f/0xd6 ip_tunnel_find+0x435/0x4d0 ip_tunnel_newlink+0x517/0x7a0 ipgre_newlink+0x14c/0x170 __rtnl_newlink+0x1173/0x19c0 rtnl_newlink+0x6c/0xa0 rtnetlink_rcv_msg+0x3cc/0xf60 netlink_rcv_skb+0x171/0x450 netlink_unicast+0x539/0x7f0 netlink_sendmsg+0x8c1/0xd80 ____sys_sendmsg+0x8f9/0xc20 ___sys_sendmsg+0x197/0x1e0 __sys_sendmsg+0x122/0x1f0 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241023123009.749764-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29bpf: fix filed access without lockJiayuan Chen
The tcp_bpf_recvmsg_parser() function, running in user context, retrieves seq_copied from tcp_sk without holding the socket lock, and stores it in a local variable seq. However, the softirq context can modify tcp_sk->seq_copied concurrently, for example, n tcp_read_sock(). As a result, the seq value is stale when it is assigned back to tcp_sk->copied_seq at the end of tcp_bpf_recvmsg_parser(), leading to incorrect behavior. Due to concurrency, the copied_seq field in tcp_bpf_recvmsg_parser() might be set to an incorrect value (less than the actual copied_seq) at the end of function: 'WRITE_ONCE(tcp->copied_seq, seq)'. This causes the 'offset' to be negative in tcp_read_sock()->tcp_recv_skb() when processing new incoming packets (sk->copied_seq - skb->seq becomes less than 0), and all subsequent packets will be dropped. Signed-off-by: Jiayuan Chen <mrpre@163.com> Link: https://lore.kernel.org/r/20241028065226.35568-1-mrpre@163.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-29xfrm: Add an inbound percpu state cache.Steffen Klassert
Now that we can have percpu xfrm states, the number of active states might increase. To get a better lookup performance, we add a percpu cache to cache the used inbound xfrm states. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29ipv4: Convert devinet_ioctl to per-netns RTNL.Kuniyuki Iwashima
ioctl(SIOCGIFCONF) calls dev_ifconf() that operates on the current netns. Let's use per-netns RTNL helpers in dev_ifconf() and inet_gifconf(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert devinet_ioctl() to per-netns RTNL except for SIOCSIFFLAGS.Kuniyuki Iwashima
Basically, devinet_ioctl() operates on a single netns. However, ioctl(SIOCSIFFLAGS) will trigger the netdev notifier that could touch another netdev in different netns. Let's use per-netns RTNL helper in devinet_ioctl() and place ASSERT_RTNL() for SIOCSIFFLAGS. We will remove ASSERT_RTNL() once RTM_SETLINK and RTM_DELLINK are converted. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert devinet_sysctl_forward() to per-netns RTNL.Kuniyuki Iwashima
devinet_sysctl_forward() touches only a single netns. Let's use rtnl_trylock() and __in_dev_get_rtnl_net(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert check_lifetime() to per-netns RTNL.Kuniyuki Iwashima
Since commit 1675f385213e ("ipv4: Namespacify IPv4 address GC."), check_lifetime() works on a per-netns basis. Let's use rtnl_net_lock() and rtnl_net_dereference(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert RTM_DELADDR to per-netns RTNL.Kuniyuki Iwashima
Let's push down RTNL into inet_rtm_deladdr() as rtnl_net_lock(). Now, ip_mc_autojoin_config() is always called under per-netns RTNL, so ASSERT_RTNL() can be replaced with ASSERT_RTNL_NET(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Use per-netns RTNL helpers in inet_rtm_newaddr().Kuniyuki Iwashima
inet_rtm_to_ifa() and find_matching_ifa() are called under rtnl_net_lock(). __in_dev_get_rtnl() and in_dev_for_each_ifa_rtnl() there can use per-netns RTNL helpers. Let's define and use __in_dev_get_rtnl_net() and in_dev_for_each_ifa_rtnl_net(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Convert RTM_NEWADDR to per-netns RTNL.Kuniyuki Iwashima
The address hash table and GC are already namespacified. Let's push down RTNL into inet_rtm_newaddr() as rtnl_net_lock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Don't allocate ifa for 0.0.0.0 in inet_rtm_newaddr().Kuniyuki Iwashima
When we pass 0.0.0.0 to __inet_insert_ifa(), it frees ifa and returns 0. We can do this check much earlier for RTM_NEWADDR even before allocating struct in_ifaddr. Let's move the validation to 1. inet_insert_ifa() for ioctl() 2. inet_rtm_newaddr() for RTM_NEWADDR Now, we can remove the same check in find_matching_ifa(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29ipv4: Factorise RTM_NEWADDR validation to inet_validate_rtm().Kuniyuki Iwashima
rtm_to_ifaddr() validates some attributes, looks up a netdev, allocates struct in_ifaddr, and validates IFA_CACHEINFO. There is no reason to delay IFA_CACHEINFO validation. We will push RTNL down to inet_rtm_newaddr(), and then we want to complete rtnetlink validation before rtnl_net_lock(). Let's factorise the validation parts. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts and no adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>