summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2024-10-08net/sched: accept TCA_STAB only for root qdiscEric Dumazet
Most qdiscs maintain their backlog using qdisc_pkt_len(skb) on the assumption it is invariant between the enqueue() and dequeue() handlers. Unfortunately syzbot can crash a host rather easily using a TBF + SFQ combination, with an STAB on SFQ [1] We can't support TCA_STAB on arbitrary level, this would require to maintain per-qdisc storage. [1] [ 88.796496] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 88.798611] #PF: supervisor read access in kernel mode [ 88.799014] #PF: error_code(0x0000) - not-present page [ 88.799506] PGD 0 P4D 0 [ 88.799829] Oops: Oops: 0000 [#1] SMP NOPTI [ 88.800569] CPU: 14 UID: 0 PID: 2053 Comm: b371744477 Not tainted 6.12.0-rc1-virtme #1117 [ 88.801107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 88.801779] RIP: 0010:sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq [ 88.802544] Code: 0f b7 50 12 48 8d 04 d5 00 00 00 00 48 89 d6 48 29 d0 48 8b 91 c0 01 00 00 48 c1 e0 03 48 01 c2 66 83 7a 1a 00 7e c0 48 8b 3a <4c> 8b 07 4c 89 02 49 89 50 08 48 c7 47 08 00 00 00 00 48 c7 07 00 All code ======== 0: 0f b7 50 12 movzwl 0x12(%rax),%edx 4: 48 8d 04 d5 00 00 00 lea 0x0(,%rdx,8),%rax b: 00 c: 48 89 d6 mov %rdx,%rsi f: 48 29 d0 sub %rdx,%rax 12: 48 8b 91 c0 01 00 00 mov 0x1c0(%rcx),%rdx 19: 48 c1 e0 03 shl $0x3,%rax 1d: 48 01 c2 add %rax,%rdx 20: 66 83 7a 1a 00 cmpw $0x0,0x1a(%rdx) 25: 7e c0 jle 0xffffffffffffffe7 27: 48 8b 3a mov (%rdx),%rdi 2a:* 4c 8b 07 mov (%rdi),%r8 <-- trapping instruction 2d: 4c 89 02 mov %r8,(%rdx) 30: 49 89 50 08 mov %rdx,0x8(%r8) 34: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi) 3b: 00 3c: 48 rex.W 3d: c7 .byte 0xc7 3e: 07 (bad) ... Code starting with the faulting instruction =========================================== 0: 4c 8b 07 mov (%rdi),%r8 3: 4c 89 02 mov %r8,(%rdx) 6: 49 89 50 08 mov %rdx,0x8(%r8) a: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi) 11: 00 12: 48 rex.W 13: c7 .byte 0xc7 14: 07 (bad) ... [ 88.803721] RSP: 0018:ffff9a1f892b7d58 EFLAGS: 00000206 [ 88.804032] RAX: 0000000000000000 RBX: ffff9a1f8420c800 RCX: ffff9a1f8420c800 [ 88.804560] RDX: ffff9a1f81bc1440 RSI: 0000000000000000 RDI: 0000000000000000 [ 88.805056] RBP: ffffffffc04bb0e0 R08: 0000000000000001 R09: 00000000ff7f9a1f [ 88.805473] R10: 000000000001001b R11: 0000000000009a1f R12: 0000000000000140 [ 88.806194] R13: 0000000000000001 R14: ffff9a1f886df400 R15: ffff9a1f886df4ac [ 88.806734] FS: 00007f445601a740(0000) GS:ffff9a2e7fd80000(0000) knlGS:0000000000000000 [ 88.807225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 88.807672] CR2: 0000000000000000 CR3: 000000050cc46000 CR4: 00000000000006f0 [ 88.808165] Call Trace: [ 88.808459] <TASK> [ 88.808710] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) [ 88.809261] ? page_fault_oops (arch/x86/mm/fault.c:715) [ 88.809561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539) [ 88.809806] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) [ 88.810074] ? sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq [ 88.810411] sfq_reset (net/sched/sch_sfq.c:525) sch_sfq [ 88.810671] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036) [ 88.810950] tbf_reset (./include/linux/timekeeping.h:169 net/sched/sch_tbf.c:334) sch_tbf [ 88.811208] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036) [ 88.811484] netif_set_real_num_tx_queues (./include/linux/spinlock.h:396 ./include/net/sch_generic.h:768 net/core/dev.c:2958) [ 88.811870] __tun_detach (drivers/net/tun.c:590 drivers/net/tun.c:673) [ 88.812271] tun_chr_close (drivers/net/tun.c:702 drivers/net/tun.c:3517) [ 88.812505] __fput (fs/file_table.c:432 (discriminator 1)) [ 88.812735] task_work_run (kernel/task_work.c:230) [ 88.813016] do_exit (kernel/exit.c:940) [ 88.813372] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4)) [ 88.813639] ? handle_mm_fault (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/memcontrol.h:1022 ./include/linux/memcontrol.h:1045 ./include/linux/memcontrol.h:1052 mm/memory.c:5928 mm/memory.c:6088) [ 88.813867] do_group_exit (kernel/exit.c:1070) [ 88.814138] __x64_sys_exit_group (kernel/exit.c:1099) [ 88.814490] x64_sys_call (??:?) [ 88.814791] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 88.815012] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [ 88.815495] RIP: 0033:0x7f44560f1975 Fixes: 175f9c1bba9b ("net_sched: Add size table for qdiscs") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20241007184130.3960565-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08caif: Remove unused cfsrvl_getphyidDr. David Alan Gilbert
cfsrvl_getphyid() has been unused since 2011's commit f36214408470 ("caif: Use RCU and lists in cfcnfg.c for managing caif link layers") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241007004456.149899-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08net-timestamp: namespacify the sysctl_tstamp_allow_dataJason Xing
Let it be tuned in per netns by admins. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20241005222609.94980-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08wifi: cfg80211: unexport wireless_nlevent_flush()Johannes Berg
This no longer needs to be exported, so don't export it. Link: https://patch.msgid.link/20241007214715.3dd736dc3ac0.I1388536e99c37f28a007dd753c473ad21513d9a9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: cfg80211: stop exporting wext symbolsJohannes Berg
CFG80211_WEXT_EXPORT is no longer needed, if we only make ipw2200 return the static name for SIOCGIWNAME itself. Link: https://patch.msgid.link/20241007211431.8d4a7242ce92.I66ceb885ddfa52c368feeea1ea884bf988c525f2@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: wext/libipw: move spy implementation to libipwJohannes Berg
There's no driver left using this other than ipw2200, so move the data bookkeeping and code into libipw. Link: https://patch.msgid.link/20241007210254.037d864cda7d.Ib2197cb056ff05746d3521a5fba637062acb7314@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: ipw2x00/lib80211: move remaining lib80211 into libipwJohannes Berg
There's already much code in libipw that used to be shared with more drivers, but now with the prior cleanups, those old Intel ipw2x00 drivers are also the only ones using whatever is now left of lib80211. Move lib80211 entirely into libipw. Link: https://patch.msgid.link/20241007202707.915ef7b9e7c7.Ib9876d2fe3c90f11d6df458b16d0b7d4bf551a8d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: nl80211: remove redundant null pointer check in coalescingDmitry Kandybka
In 'cfg80211_free_coalesce', '&coalesce->rules[i]' is a pointer to VLA member of 'struct cfg80211_coalesce' and should never be NULL, so redundant check may be dropped. I think this is correct, but I haven't tested it seriously. Compile tested only. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Dmitry Kandybka <d.kandybka@gmail.com> Link: https://patch.msgid.link/20241003095912.218465-1-d.kandybka@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08mac80211: Reorganize kerneldoc parameter namesJulia Lawall
Reorganize kerneldoc parameter names to match the parameter order in the function header. Problems identified using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://patch.msgid.link/20240930112121.95324-28-Julia.Lawall@inria.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: cfg80211: check radio iface combination for multi radio per wiphyKarthikeyan Periyasamy
Currently, wiphy_verify_combinations() fails for the multi-radio per wiphy due to the condition check on new global interface combination that DFS only works on one channel. In a multi-radio scenario, new global interface combination encompasses the capabilities of all radio combinations, so it supports more than one channel with DFS. For multi-radio per wiphy, interface combination verification needs to be performed for radio specific interface combinations. This is necessary as the new global interface combination combines the capabilities of all radio combinations. Fixes: a01b1e9f9955 ("wifi: mac80211: add support for DFS with multiple radios") Signed-off-by: Karthikeyan Periyasamy <quic_periyasa@quicinc.com> Link: https://patch.msgid.link/20240917140239.886083-1-quic_periyasa@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: mac80211, cfg80211: miscellaneous spelling fixesDmitry Antipov
Correct spelling here and there as suggested by codespell. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://patch.msgid.link/20240913084919.118862-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: mac80211: constify ieee80211_ie_build_{he,eht}_oper() chandefJeff Johnson
The chandef parameter passed to ieee80211_ie_build_he_oper() and ieee80211_ie_build_eht_oper is read-only. Since it is never modified, add the const qualifier to this parameter. This makes these consistent with ieee80211_ie_build_ht_oper() and ieee80211_ie_build_vht_oper(). Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://patch.msgid.link/20240910-wireless-utils-constify-v1-1-e59947bcb3c3@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: mac80211: skip non-uploaded keys in ieee80211_iter_keysFelix Fietkau
Sync iterator conditions with ieee80211_iter_keys_rcu. Fixes: 830af02f24fb ("mac80211: allow driver to iterate keys") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/20241006153630.87885-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: mac80211: do not pass a stopped vif to the driver in .get_txpowerFelix Fietkau
Avoid potentially crashing in the driver because of uninitialized private data Fixes: 5b3dc42b1b0d ("mac80211: add support for driver tx power reporting") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/20241002095630.22431-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: mac80211: Convert color collision detection to wiphy workRemi Pommarel
Call to ieee80211_color_collision_detection_work() needs wiphy lock to be held (see lockdep assert in cfg80211_bss_color_notify()). Not locking wiphy causes the following lockdep error: WARNING: CPU: 2 PID: 42 at net/wireless/nl80211.c:19505 cfg80211_bss_color_notify+0x1a4/0x25c Modules linked in: CPU: 2 PID: 42 Comm: kworker/u8:3 Tainted: G W 6.4.0-02327-g36c6cb260481 #1048 Hardware name: Workqueue: phy1 ieee80211_color_collision_detection_work pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : cfg80211_bss_color_notify+0x1a4/0x25c lr : cfg80211_bss_color_notify+0x1a0/0x25c sp : ffff000002947d00 x29: ffff000002947d00 x28: ffff800008e1a000 x27: ffff000002bd4705 x26: ffff00000d034000 x25: ffff80000903cf40 x24: 0000000000000000 x23: ffff00000cb70720 x22: 0000000000800000 x21: ffff800008dfb008 x20: 000000000000008d x19: ffff00000d035fa8 x18: 0000000000000010 x17: 0000000000000001 x16: 000003564b1ce96a x15: 000d69696d057970 x14: 000000000000003b x13: 0000000000000001 x12: 0000000000040000 x11: 0000000000000001 x10: ffff80000978f9c0 x9 : ffff0000028d3174 x8 : ffff800008e30000 x7 : 0000000000000000 x6 : 0000000000000028 x5 : 000000000002f498 x4 : ffff00000d034a80 x3 : 0000000000800000 x2 : ffff800016143000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: cfg80211_bss_color_notify+0x1a4/0x25c ieee80211_color_collision_detection_work+0x20/0x118 process_one_work+0x294/0x554 worker_thread+0x70/0x440 kthread+0xf4/0xf8 ret_from_fork+0x10/0x20 irq event stamp: 77372 hardirqs last enabled at (77371): [<ffff800008a346fc>] _raw_spin_unlock_irq+0x2c/0x4c hardirqs last disabled at (77372): [<ffff800008a28754>] el1_dbg+0x20/0x48 softirqs last enabled at (77350): [<ffff8000089e120c>] batadv_send_outstanding_bcast_packet+0xb8/0x120 softirqs last disabled at (77348): [<ffff8000089e11d4>] batadv_send_outstanding_bcast_packet+0x80/0x120 The wiphy lock cannot be taken directly from color collision detection delayed work (ieee80211_color_collision_detection_work()) because this work is cancel_delayed_work_sync() under this wiphy lock causing a potential deadlock( see [0] for details). To fix that ieee80211_color_collision_detection_work() could be converted to a wiphy work and cancel_delayed_work_sync() can be simply replaced by wiphy_delayed_work_cancel() serving the same purpose under wiphy lock. This could potentially fix [1]. [0]: https://lore.kernel.org/linux-wireless/D4A40Q44OAY2.W3SIF6UEPBUN@freebox.fr/ [1]: https://lore.kernel.org/lkml/000000000000612f290618eee3e5@google.com/ Reported-by: Nicolas Escande <nescande@freebox.fr> Signed-off-by: Remi Pommarel <repk@triplefau.lt> Link: https://patch.msgid.link/20240924192805.13859-3-repk@triplefau.lt Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: cfg80211: Add wiphy_delayed_work_pending()Remi Pommarel
Add wiphy_delayed_work_pending() to check if any delayed work timer is pending, that can be used to be sure that wiphy_delayed_work_queue() won't postpone an already pending delayed work. Signed-off-by: Remi Pommarel <repk@triplefau.lt> Link: https://patch.msgid.link/20240924192805.13859-2-repk@triplefau.lt [fix return value kernel-doc] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: cfg80211: Do not create BSS entries for unsupported channelsChenming Huang
Currently, in cfg80211_parse_ml_elem_sta_data(), when RNR element indicates a BSS that operates in a channel that current regulatory domain doesn't support, a NULL value is returned by ieee80211_get_channel_khz() and assigned to this BSS entry's channel field. Later in cfg80211_inform_single_bss_data(), the reported BSS entry's channel will be wrongly overridden by transmitted BSS's. This could result in connection failure that when wpa_supplicant tries to select this reported BSS entry while it actually resides in an unsupported channel. Since this channel is not supported, it is reasonable to skip such entries instead of reporting wrong information. Signed-off-by: Chenming Huang <quic_chenhuan@quicinc.com> Link: https://patch.msgid.link/20240923021644.12885-1-quic_chenhuan@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08wifi: mac80211: Fix setting txpower with emulate_chanctxBen Greear
Propagate hw conf into the driver when txpower changes and driver is emulating channel contexts. Signed-off-by: Ben Greear <greearb@candelatech.com> Link: https://patch.msgid.link/20240924011325.1509103-1-greearb@candelatech.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08mac80211: MAC80211_MESSAGE_TRACING should depend on TRACINGGeert Uytterhoeven
When tracing is disabled, there is no point in asking the user about enabling tracing of all mac80211 debug messages. Fixes: 3fae0273168026ed ("mac80211: trace debug messages") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://patch.msgid.link/85bbe38ce0df13350f45714e2dc288cc70947a19.1727179690.git.geert@linux-m68k.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-08bpf: add get_netns_cookie helper to tc programsMahe Tardy
This is needed in the context of Cilium and Tetragon to retrieve netns cookie from hostns when traffic leaves Pod, so that we can correlate skb->sk's netns cookie. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com> Link: https://lore.kernel.org/r/20241007095958.97442-1-mahe.tardy@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-08rtnetlink: Add ASSERT_RTNL_NET() placeholder for netdev notifier.Kuniyuki Iwashima
The global and per-netns netdev notifier depend on RTNL, and its dependency is not so clear due to nested calls. Let's add a placeholder to place ASSERT_RTNL_NET() for each event. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08rtnetlink: Add assertion helpers for per-netns RTNL.Kuniyuki Iwashima
Once an RTNL scope is converted with rtnl_net_lock(), we will replace RTNL helper functions inside the scope with the following per-netns alternatives: ASSERT_RTNL() -> ASSERT_RTNL_NET(net) rcu_dereference_rtnl(p) -> rcu_dereference_rtnl_net(net, p) Note that the per-netns helpers are equivalent to the conventional helpers unless CONFIG_DEBUG_NET_SMALL_RTNL is enabled. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08rtnetlink: Add per-netns RTNL.Kuniyuki Iwashima
The goal is to break RTNL down into per-netns mutex. This patch adds per-netns mutex and its helper functions, rtnl_net_lock() and rtnl_net_unlock(). rtnl_net_lock() acquires the global RTNL and per-netns RTNL mutex, and rtnl_net_unlock() releases them. We will replace 800+ rtnl_lock() with rtnl_net_lock() and finally removes rtnl_lock() in rtnl_net_lock(). When we need to nest per-netns RTNL mutex, we will use __rtnl_net_lock(), and its locking order is defined by rtnl_net_lock_cmp_fn() as follows: 1. init_net is first 2. netns address ascending order Note that the conversion will be done under CONFIG_DEBUG_NET_SMALL_RTNL with LOCKDEP so that we can carefully add the extra mutex without slowing down RTNL operations during conversion. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-07Merge tag 'for-net-2024-10-04' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - RFCOMM: FIX possible deadlock in rfcomm_sk_state_change - hci_conn: Fix UAF in hci_enhanced_setup_sync - btusb: Don't fail external suspend requests * tag 'for-net-2024-10-04' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: btusb: Don't fail external suspend requests Bluetooth: hci_conn: Fix UAF in hci_enhanced_setup_sync Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change ==================== Link: https://patch.msgid.link/20241004210124.4010321-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07ipv4: remove fib_info_devhash[]Eric Dumazet
Upcoming per-netns RTNL conversion needs to get rid of shared hash tables. fib_info_devhash[] is one of them. It is unclear why we used a hash table, because a single hlist_head per net device was cheaper and scalable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241004134720.579244-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07ipv4: remove fib_info_lockEric Dumazet
After the prior patch, fib_info_lock became redundant because all of its users are holding RTNL. BH protection is not needed. Remove the READ_ONCE()/WRITE_ONCE() annotations around fib_info_cnt, since it is protected by RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241004134720.579244-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07ipv4: use rcu in ip_fib_check_default()Eric Dumazet
fib_info_devhash[] is not resized in fib_info_hash_move(). fib_nh structs are already freed after an rcu grace period. This will allow to remove fib_info_lock in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241004134720.579244-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07ipv4: remove fib_devindex_hashfn()Eric Dumazet
fib_devindex_hashfn() converts a 32bit ifindex value to a 8bit hash. It makes no sense doing this from fib_info_hashfn() and fib_find_info_nh(). It is better to keep as many bits as possible to let fib_info_hashfn_result() have better spread. Only fib_info_devhash_bucket() needs to make this operation, we can 'inline' trivial fib_devindex_hashfn() in it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241004134720.579244-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07net: dsa: remove obsolete phylink dsa_switch operationsRussell King (Oracle)
No driver now uses the DSA switch phylink members, so we can now remove the method pointers, but we need to leave empty shim functions to allow those drivers that do not provide phylink MAC operations structure to continue functioning. Signed-off-by: Russell King (oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # sja1105, felix, dsa_loop Link: https://patch.msgid.link/E1swKNV-0060oN-1b@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07net: explicitly clear the sk pointer, when pf->create failsIgnat Korchagin
We have recently noticed the exact same KASAN splat as in commit 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creation fails"). The problem is that commit did not fully address the problem, as some pf->create implementations do not use sk_common_release in their error paths. For example, we can use the same reproducer as in the above commit, but changing ping to arping. arping uses AF_PACKET socket and if packet_create fails, it will just sk_free the allocated sk object. While we could chase all the pf->create implementations and make sure they NULL the freed sk object on error from the socket, we can't guarantee future protocols will not make the same mistake. So it is easier to just explicitly NULL the sk pointer upon return from pf->create in __sock_create. We do know that pf->create always releases the allocated sk object on error, so if the pointer is not NULL, it is definitely dangling. Fixes: 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creation fails") Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Cc: stable@vger.kernel.org Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241003170151.69445-1-ignat@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07net: tcp: refresh tcp_mstamp for compressed ack in timerMenglong Dong
For now, we refresh the tcp_mstamp for delayed acks and keepalives, but not for the compressed ack in tcp_compressed_ack_kick(). I have not found out the effact of the tcp_mstamp when sending ack, but we can still refresh it for the compressed ack to keep consistent. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241003082231.759759-1-dongml2@chinatelecom.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Several small bugfixes all over the place. Most notably, fixes the vsock allocation with GFP_KERNEL in atomic context, which has been triggering warnings for lots of testers" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost/scsi: null-ptr-dereference in vhost_scsi_get_req() vsock/virtio: use GFP_ATOMIC under RCU read lock virtio_console: fix misc probe bugs virtio_ring: tag event_triggered as racy for KCSAN vdpa/octeon_ep: Fix format specifier for pointers in debug messages
2024-10-07remove pointless includes of <linux/fdtable.h>Al Viro
some of those used to be needed, some had been cargo-culted for no reason... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-07vsock/virtio: use GFP_ATOMIC under RCU read lockMichael S. Tsirkin
virtio_transport_send_pkt in now called on transport fast path, under RCU read lock. In that case, we have a bug: virtio_add_sgs is called with GFP_KERNEL, and might sleep. Pass the gfp flags as an argument, and use GFP_ATOMIC on the fast path. Link: https://lore.kernel.org/all/hfcr2aget2zojmqpr4uhlzvnep4vgskblx5b6xf2ddosbsrke7@nt34bxgp7j2x Fixes: efcd71af38be ("vsock/virtio: avoid queuing packets when intermediate queue is empty") Reported-by: Christian Brauner <brauner@kernel.org> Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: Luigi Leonardi <luigi.leonardi@outlook.com> Message-ID: <3fbfb6e871f625f89eb578c7228e127437b1975a.1727876449.git.mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2024-10-07xfrm: validate new SA's prefixlen using SA family when sel.family is unsetSabrina Dubroca
This expands the validation introduced in commit 07bf7908950a ("xfrm: Validate address prefix lengths in the xfrm selector.") syzbot created an SA with usersa.sel.family = AF_UNSPEC usersa.sel.prefixlen_s = 128 usersa.family = AF_INET Because of the AF_UNSPEC selector, verify_newsa_info doesn't put limits on prefixlen_{s,d}. But then copy_from_user_state sets x->sel.family to usersa.family (AF_INET). Do the same conversion in verify_newsa_info before validating prefixlen_{s,d}, since that's how prefixlen is going to be used later on. Reported-by: syzbot+cc39f136925517aed571@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-10-06batman-adv: Use string choice helper to print booleansSven Eckelmann
The commit ea4692c75e1c ("lib/string_helpers: Consolidate string helpers implementation") introduced common helpers to print string representations of boolean helpers. These are supposed to be used instead of open coded versions. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2024-10-05batman-adv: Add flex array to struct batadv_tvlv_tt_dataErick Archer
The "struct batadv_tvlv_tt_data" uses a dynamically sized set of trailing elements. Specifically, it uses an array of structures of type "batadv_tvlv_tt_vlan_data". So, use the preferred way in the kernel declaring a flexible array [1]. At the same time, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). In this case, it is important to note that the attribute used is specifically __counted_by_be since variable "num_vlan" is of type __be16. The following change to the "batadv_tt_tvlv_ogm_handler_v1" function: - tt_vlan = (struct batadv_tvlv_tt_vlan_data *)(tt_data + 1); - tt_change = (struct batadv_tvlv_tt_change *)(tt_vlan + num_vlan); + tt_change = (struct batadv_tvlv_tt_change *)((void *)tt_data + + flex_size); is intended to prevent the compiler from generating an "out-of-bounds" notification due to the __counted_by attribute. The compiler can do a pointer calculation using the vlan_data flexible array memory, or in other words, this may be calculated as an array offset, since it is the same as: &tt_data->vlan_data[num_vlan] Therefore, we go past the end of the array. In other "multiple trailing flexible array" situations, this has been solved by addressing from the base pointer, since the compiler either knows the full allocation size or it knows nothing about it (this case, since it came from a "void *" function argument). The order in which the structure batadv_tvlv_tt_data and the structure batadv_tvlv_tt_vlan_data are defined must be swap to avoid an incomplete type error. Also, avoid the open-coded arithmetic in memory allocator functions [2] using the "struct_size" macro and use the "flex_array_size" helper to clarify some calculations, when possible. Moreover, the new structure member also allow us to avoid the open-coded arithmetic on pointers in some situations. Take advantage of this. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/next/process/deprecated.html#zero-length-and-one-element-arrays [1] Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [2] Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Erick Archer <erick.archer@outlook.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2024-10-05batman-adv: Start new development cycleSimon Wunderlich
This version will contain all the (major or even only minor) changes for Linux 6.13. The version number isn't a semantic version number with major and minor information. It is just encoding the year of the expected publishing as Linux -rc1 and the number of published versions this year (starting at 0). Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2024-10-04net_sched: sch_fq: add the ability to offload pacingJeffrey Ji
Some network devices have the ability to offload EDT (Earliest Departure Time) which is the model used for TCP pacing and FQ packet scheduler. Some of them implement the timing wheel mechanism described in https://saeed.github.io/files/carousel-sigcomm17.pdf with an associated 'timing wheel horizon'. This patchs adds to FQ packet scheduler TCA_FQ_OFFLOAD_HORIZON attribute. Its value is capped by the device max_pacing_offload_horizon, added in the prior patch. It allows FQ to let packets within pacing offload horizon to be delivered to the device, which will handle the needed delay without host involvement. Signed-off-by: Jeffrey Ji <jeffreyji@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20241003121219.2396589-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attributeEric Dumazet
Some network devices have the ability to offload EDT (Earliest Departure Time) which is the model used for TCP pacing and FQ packet scheduler. Some of them implement the timing wheel mechanism described in https://saeed.github.io/files/carousel-sigcomm17.pdf with an associated 'timing wheel horizon'. This patch adds dev->max_pacing_offload_horizon expressing this timing wheel horizon in nsec units. This is a read-only attribute. Unless a driver sets it, dev->max_pacing_offload_horizon is zero. v2: addressed Jakub feedback ( https://lore.kernel.org/netdev/20240930152304.472767-2-edumazet@google.com/T/#mf6294d714c41cc459962154cc2580ce3c9693663 ) v3: added yaml doc (also per Jakub feedback) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20241003121219.2396589-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04net: Fix an unsafe loop on the listAnastasia Kovaleva
The kernel may crash when deleting a genetlink family if there are still listeners for that family: Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0 LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0 Call Trace: __netlink_clear_multicast_users+0x74/0xc0 genl_unregister_family+0xd4/0x2d0 Change the unsafe loop on the list to a safe one, because inside the loop there is an element removal from this list. Fixes: b8273570f802 ("genetlink: fix netns vs. netlink table locking (2)") Cc: stable@vger.kernel.org Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04tcp: add a fast path in tcp_delack_timer()Eric Dumazet
delack timer is not stopped from inet_csk_clear_xmit_timer() because we do not define INET_CSK_CLEAR_TIMERS. This is a conscious choice : inet_csk_clear_xmit_timer() is often called from another cpu. Calling del_timer() would cause false sharing and lock contention. This means that very often, tcp_delack_timer() is called at the timer expiration, while there is no ACK to transmit. This can be detected very early, avoiding the socket spinlock. Notes: - test about tp->compressed_ack is racy, but in the unlikely case there is a race, the dedicated compressed_ack_timer hrtimer would close it. - Even if the fast path is not taken, reading icsk->icsk_ack.pending and tp->compressed_ack before acquiring the socket spinlock reduces acquisition time and chances of contention. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241002173042.917928-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04tcp: add a fast path in tcp_write_timer()Eric Dumazet
retransmit timer is not stopped from inet_csk_clear_xmit_timer() because we do not define INET_CSK_CLEAR_TIMERS. This is a conscious choice : for active TCP flows, it is better to only call mod_timer(), because there is more chances of keeping the timer unchanged. Also inet_csk_clear_xmit_timer() is often called from another cpu, and calling del_timer() would cause false sharing and lock contention. This means that very often, tcp_write_timer() is called at the timer expiration, while there is nothing to retransmit. This can be detected very early, avoiding the socket spinlock. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241002173042.917928-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04tcp: annotate data-races around icsk->icsk_pendingEric Dumazet
icsk->icsk_pending can be read locklessly already. Following patch in the series will add another lockless read. Add smp_load_acquire() and smp_store_release() annotations because following patch will add a test in tcp_write_timer(), and READ_ONCE()/WRITE_ONCE() alone would possibly lead to races. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241002173042.917928-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04Bluetooth: hci_conn: Fix UAF in hci_enhanced_setup_syncLuiz Augusto von Dentz
This checks if the ACL connection remains valid as it could be destroyed while hci_enhanced_setup_sync is pending on cmd_sync leading to the following trace: BUG: KASAN: slab-use-after-free in hci_enhanced_setup_sync+0x91b/0xa60 Read of size 1 at addr ffff888002328ffd by task kworker/u5:2/37 CPU: 0 UID: 0 PID: 37 Comm: kworker/u5:2 Not tainted 6.11.0-rc6-01300-g810be445d8d6 #7099 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 Workqueue: hci0 hci_cmd_sync_work Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 ? hci_enhanced_setup_sync+0x91b/0xa60 print_report+0x152/0x4c0 ? hci_enhanced_setup_sync+0x91b/0xa60 ? __virt_addr_valid+0x1fa/0x420 ? hci_enhanced_setup_sync+0x91b/0xa60 kasan_report+0xda/0x1b0 ? hci_enhanced_setup_sync+0x91b/0xa60 hci_enhanced_setup_sync+0x91b/0xa60 ? __pfx_hci_enhanced_setup_sync+0x10/0x10 ? __pfx___mutex_lock+0x10/0x10 hci_cmd_sync_work+0x1c2/0x330 process_one_work+0x7d9/0x1360 ? __pfx_lock_acquire+0x10/0x10 ? __pfx_process_one_work+0x10/0x10 ? assign_work+0x167/0x240 worker_thread+0x5b7/0xf60 ? __kthread_parkme+0xac/0x1c0 ? __pfx_worker_thread+0x10/0x10 ? __pfx_worker_thread+0x10/0x10 kthread+0x293/0x360 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 34: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __hci_conn_add+0x187/0x17d0 hci_connect_sco+0x2e1/0xb90 sco_sock_connect+0x2a2/0xb80 __sys_connect+0x227/0x2a0 __x64_sys_connect+0x6d/0xb0 do_syscall_64+0x71/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 37: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x101/0x160 kfree+0xd0/0x250 device_release+0x9a/0x210 kobject_put+0x151/0x280 hci_conn_del+0x448/0xbf0 hci_abort_conn_sync+0x46f/0x980 hci_cmd_sync_work+0x1c2/0x330 process_one_work+0x7d9/0x1360 worker_thread+0x5b7/0xf60 kthread+0x293/0x360 ret_from_fork+0x2f/0x70 ret_from_fork_asm+0x1a/0x30 Cc: stable@vger.kernel.org Fixes: e07a06b4eb41 ("Bluetooth: Convert SCO configure_datapath to hci_sync") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-04Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_changeLuiz Augusto von Dentz
rfcomm_sk_state_change attempts to use sock_lock so it must never be called with it locked but rfcomm_sock_ioctl always attempt to lock it causing the following trace: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Not tainted ------------------------------------------------------ syz-executor386/5093 is trying to acquire lock: ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1671 [inline] ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: rfcomm_sk_state_change+0x5b/0x310 net/bluetooth/rfcomm/sock.c:73 but task is already holding lock: ffff88807badfd28 (&d->lock){+.+.}-{3:3}, at: __rfcomm_dlc_close+0x226/0x6a0 net/bluetooth/rfcomm/core.c:491 Reported-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com Tested-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218 Fixes: 3241ad820dbb ("[Bluetooth] Add timestamp support to L2CAP, RFCOMM and SCO") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-04ethtool: rss: fix rss key initialization warningDaniel Zahka
This warning is emitted when a driver does not default populate an rss key when one is not provided from userspace. Some devices do not support individual rss keys per context. For these devices, it is ok to leave the key zeroed out in ethtool_rxfh_context. Do not warn on zeroed key when ethtool_ops.rxfh_per_ctx_key == 0. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20241003162310.1310576-1-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04netfilter: br_netfilter: fix panic with metadata_dst skbAndy Roulin
Fix a kernel panic in the br_netfilter module when sending untagged traffic via a VxLAN device. This happens during the check for fragmentation in br_nf_dev_queue_xmit. It is dependent on: 1) the br_netfilter module being loaded; 2) net.bridge.bridge-nf-call-iptables set to 1; 3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port; 4) untagged frames with size higher than the VxLAN MTU forwarded/flooded When forwarding the untagged packet to the VxLAN bridge port, before the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL. Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check for frames that needs to be fragmented: frames with higher MTU than the VxLAN device end up calling br_nf_ip_fragment, which in turns call ip_skb_dst_mtu. The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst with valid dst->dev, thus the crash. This case was never supported in the first place, so drop the packet instead. PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data. [ 176.291791] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000110 [ 176.292101] Mem abort info: [ 176.292184] ESR = 0x0000000096000004 [ 176.292322] EC = 0x25: DABT (current EL), IL = 32 bits [ 176.292530] SET = 0, FnV = 0 [ 176.292709] EA = 0, S1PTW = 0 [ 176.292862] FSC = 0x04: level 0 translation fault [ 176.293013] Data abort info: [ 176.293104] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 176.293488] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 176.293787] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000 [ 176.294166] [0000000000000110] pgd=0000000000000000, p4d=0000000000000000 [ 176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth br_netfilter bridge stp llc ipv6 crct10dif_ce [ 176.295923] CPU: 0 PID: 188 Comm: ping Not tainted 6.8.0-rc3-g5b3fbd61b9d1 #2 [ 176.296314] Hardware name: linux,dummy-virt (DT) [ 176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter] [ 176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter] [ 176.297636] sp : ffff800080003630 [ 176.297743] x29: ffff800080003630 x28: 0000000000000008 x27: ffff6828c49ad9f8 [ 176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24: 00000000000003e8 [ 176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21: ffff6828c3b16d28 [ 176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18: 0000000000000014 [ 176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15: 0000000095744632 [ 176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12: ffffb7e137926a70 [ 176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 : 0000000000000000 [ 176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 : f20e0100bebafeca [ 176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 : 0000000000000000 [ 176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 : ffff6828c7f918f0 [ 176.300889] Call trace: [ 176.301123] br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter] [ 176.301411] br_nf_post_routing+0x2a8/0x3e4 [br_netfilter] [ 176.301703] nf_hook_slow+0x48/0x124 [ 176.302060] br_forward_finish+0xc8/0xe8 [bridge] [ 176.302371] br_nf_hook_thresh+0x124/0x134 [br_netfilter] [ 176.302605] br_nf_forward_finish+0x118/0x22c [br_netfilter] [ 176.302824] br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter] [ 176.303136] br_nf_forward+0x2b8/0x4e0 [br_netfilter] [ 176.303359] nf_hook_slow+0x48/0x124 [ 176.303803] __br_forward+0xc4/0x194 [bridge] [ 176.304013] br_flood+0xd4/0x168 [bridge] [ 176.304300] br_handle_frame_finish+0x1d4/0x5c4 [bridge] [ 176.304536] br_nf_hook_thresh+0x124/0x134 [br_netfilter] [ 176.304978] br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter] [ 176.305188] br_nf_pre_routing+0x250/0x524 [br_netfilter] [ 176.305428] br_handle_frame+0x244/0x3cc [bridge] [ 176.305695] __netif_receive_skb_core.constprop.0+0x33c/0xecc [ 176.306080] __netif_receive_skb_one_core+0x40/0x8c [ 176.306197] __netif_receive_skb+0x18/0x64 [ 176.306369] process_backlog+0x80/0x124 [ 176.306540] __napi_poll+0x38/0x17c [ 176.306636] net_rx_action+0x124/0x26c [ 176.306758] __do_softirq+0x100/0x26c [ 176.307051] ____do_softirq+0x10/0x1c [ 176.307162] call_on_irq_stack+0x24/0x4c [ 176.307289] do_softirq_own_stack+0x1c/0x2c [ 176.307396] do_softirq+0x54/0x6c [ 176.307485] __local_bh_enable_ip+0x8c/0x98 [ 176.307637] __dev_queue_xmit+0x22c/0xd28 [ 176.307775] neigh_resolve_output+0xf4/0x1a0 [ 176.308018] ip_finish_output2+0x1c8/0x628 [ 176.308137] ip_do_fragment+0x5b4/0x658 [ 176.308279] ip_fragment.constprop.0+0x48/0xec [ 176.308420] __ip_finish_output+0xa4/0x254 [ 176.308593] ip_finish_output+0x34/0x130 [ 176.308814] ip_output+0x6c/0x108 [ 176.308929] ip_send_skb+0x50/0xf0 [ 176.309095] ip_push_pending_frames+0x30/0x54 [ 176.309254] raw_sendmsg+0x758/0xaec [ 176.309568] inet_sendmsg+0x44/0x70 [ 176.309667] __sys_sendto+0x110/0x178 [ 176.309758] __arm64_sys_sendto+0x28/0x38 [ 176.309918] invoke_syscall+0x48/0x110 [ 176.310211] el0_svc_common.constprop.0+0x40/0xe0 [ 176.310353] do_el0_svc+0x1c/0x28 [ 176.310434] el0_svc+0x34/0xb4 [ 176.310551] el0t_64_sync_handler+0x120/0x12c [ 176.310690] el0t_64_sync+0x190/0x194 [ 176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860) [ 176.315743] ---[ end trace 0000000000000000 ]--- [ 176.316060] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000 [ 176.316564] PHYS_OFFSET: 0xffff97d780000000 [ 176.316782] CPU features: 0x0,88000203,3c020000,0100421b [ 176.317210] Memory Limit: none [ 176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal Exception in interrupt ]---\ Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Andy Roulin <aroulin@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241001154400.22787-2-aroulin@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04net_tstamp: add SCM_TS_OPT_ID for RAW socketsVadim Fedorenko
The last type of sockets which supports SOF_TIMESTAMPING_OPT_ID is RAW sockets. To add new option this patch converts all callers (direct and indirect) of _sock_tx_timestamp to provide sockcm_cookie instead of tsflags. And while here fix __sock_tx_timestamp to receive tsflags as __u32 instead of __u16. Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241001125716.2832769-3-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control messageVadim Fedorenko
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX timestamps and packets sent via socket. Unfortunately, there is no way to reliably predict socket timestamp ID value in case of error returned by sendmsg. For UDP sockets it's impossible because of lockless nature of UDP transmit, several threads may send packets in parallel. In case of RAW sockets MSG_MORE option makes things complicated. More details are in the conversation [1]. This patch adds new control message type to give user-space software an opportunity to control the mapping between packets and values by providing ID with each sendmsg for UDP sockets. The documentation is also added in this patch. [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/ Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://patch.msgid.link/20241001125716.2832769-2-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>