Age | Commit message (Collapse) | Author |
|
nft_chain_validate already performs loop detection because a cycle will
result in a call stack overflow (ctx->level >= NFT_JUMP_STACK_SIZE).
It also follows maps via ->validate callback in nft_lookup, so there
appears no reason to iterate the maps again.
nf_tables_check_loops() and all its helper functions can be removed.
This improves ruleset load time significantly, from 23s down to 12s.
This also fixes a crash bug. Old loop detection code can result in
unbounded recursion:
BUG: TASK stack guard page was hit at ....
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN
CPU: 4 PID: 1539 Comm: nft Not tainted 6.10.0-rc5+ #1
[..]
with a suitable ruleset during validation of register stores.
I can't see any actual reason to attempt to check for this from
nft_validate_register_store(), at this point the transaction is still in
progress, so we don't have a full picture of the rule graph.
For nf-next it might make sense to either remove it or make this depend
on table->validate_state in case we could catch an error earlier
(for improved error reporting to userspace).
Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Happens when rules get flushed/deleted while packet is out, so remove
this WARN_ON.
This WARN exists in one form or another since v4.14, no need to backport
this to older releases, hence use a more recent fixes tag.
Fixes: 3f8019688894 ("netfilter: move nf_reinject into nfnetlink_queue modules")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202407081453.11ac0f63-lkp@intel.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Do not attach SQI value if link is down. "SQI values are only valid if
link-up condition is present" per OpenAlliance specification of
100Base-T1 Interoperability Test suite [1]. The same rule would apply
for other link types.
[1] https://opensig.org/automotive-ethernet-specifications/#
Fixes: 806602191592 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Woojung Huh <woojung.huh@microchip.com>
Link: https://patch.msgid.link/20240709061943.729381-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since 'ppp_async_encode()' assumes valid LCP packets (with code
from 1 to 7 inclusive), add 'ppp_check_packet()' to ensure that
LCP packet has an actual body beyond PPP_LCP header bytes, and
reject claimed-as-LCP but actually malformed data otherwise.
Reported-by: syzbot+ec0723ba9605678b14bf@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ec0723ba9605678b14bf
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The below commit introduced a warning message when phy state is not in
the states: PHY_HALTED, PHY_READY, and PHY_UP.
commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
mtk-star-emac doesn't need mdiobus suspend/resume. To fix the warning
message during resume, indicate the phy resume/suspend is managed by the
mac when probing.
Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Signed-off-by: Jian Hui Lee <jianhui.lee@canonical.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20240708065210.4178980-1-jianhui.lee@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Commit 861e8086029e ("e1000e: move force SMBUS from enable ulp function
to avoid PHY loss issue") resolved a PHY access loss during suspend on
Meteor Lake consumer platforms, but it affected corporate systems
incorrectly.
A better fix, working for both consumer and corporate systems, was
proposed in commit bfd546a552e1 ("e1000e: move force SMBUS near the end
of enable_ulp function"). However, it introduced a regression on older
devices, such as [8086:15B8], [8086:15F9], [8086:15BE].
This patch aims to fix the secondary regression, by limiting the scope of
the changes to Meteor Lake platforms only.
Fixes: bfd546a552e1 ("e1000e: move force SMBUS near the end of enable_ulp function")
Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218940
Reported-by: Dieter Mummenschanz <dmummenschanz@web.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218936
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240709203123.2103296-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
retracted its window to zero, tcp_retransmit_timer() can
retransmit a packet every two jiffies (2 ms for HZ=1000),
for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
The fix is to make sure tcp_rtx_probe0_timed_out() takes
icsk->icsk_user_timeout into account.
Before blamed commit, the socket would not timeout after
icsk->icsk_user_timeout, but would use standard exponential
backoff for the retransmits.
Also worth noting that before commit e89688e3e978 ("net: tcp:
fix unexcepted socket die when snd_wnd is 0"), the issue
would last 2 minutes instead of 4.
Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The number of the currently released descriptor is never incremented
which results in the same skb being released multiple times.
Fixes: 504d4721ee8e ("MIPS: Lantiq: Add ethernet driver")
Reported-by: Joe Perches <joe@perches.com>
Closes: https://lore.kernel.org/all/fc1bf93d92bb5b2f99c6c62745507cc22f3a7b2d.camel@perches.com/
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240708205826.5176-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The commit 6533e558c650 ("i40e: Fix reset path while removing
the driver") introduced a new PF state "__I40E_IN_REMOVE" to block
modifying the XDP program while the driver is being removed.
Unfortunately, such a change is useful only if the ".ndo_bpf()"
callback was called out of the rmmod context because unloading the
existing XDP program is also a part of driver removing procedure.
In other words, from the rmmod context the driver is expected to
unload the XDP program without reporting any errors. Otherwise,
the kernel warning with callstack is printed out to dmesg.
Example failing scenario:
1. Load the i40e driver.
2. Load the XDP program.
3. Unload the i40e driver (using "rmmod" command).
The example kernel warning log:
[ +0.004646] WARNING: CPU: 94 PID: 10395 at net/core/dev.c:9290 unregister_netdevice_many_notify+0x7a9/0x870
[...]
[ +0.010959] RIP: 0010:unregister_netdevice_many_notify+0x7a9/0x870
[...]
[ +0.002726] Call Trace:
[ +0.002457] <TASK>
[ +0.002119] ? __warn+0x80/0x120
[ +0.003245] ? unregister_netdevice_many_notify+0x7a9/0x870
[ +0.005586] ? report_bug+0x164/0x190
[ +0.003678] ? handle_bug+0x3c/0x80
[ +0.003503] ? exc_invalid_op+0x17/0x70
[ +0.003846] ? asm_exc_invalid_op+0x1a/0x20
[ +0.004200] ? unregister_netdevice_many_notify+0x7a9/0x870
[ +0.005579] ? unregister_netdevice_many_notify+0x3cc/0x870
[ +0.005586] unregister_netdevice_queue+0xf7/0x140
[ +0.004806] unregister_netdev+0x1c/0x30
[ +0.003933] i40e_vsi_release+0x87/0x2f0 [i40e]
[ +0.004604] i40e_remove+0x1a1/0x420 [i40e]
[ +0.004220] pci_device_remove+0x3f/0xb0
[ +0.003943] device_release_driver_internal+0x19f/0x200
[ +0.005243] driver_detach+0x48/0x90
[ +0.003586] bus_remove_driver+0x6d/0xf0
[ +0.003939] pci_unregister_driver+0x2e/0xb0
[ +0.004278] i40e_exit_module+0x10/0x5f0 [i40e]
[ +0.004570] __do_sys_delete_module.isra.0+0x197/0x310
[ +0.005153] do_syscall_64+0x85/0x170
[ +0.003684] ? syscall_exit_to_user_mode+0x69/0x220
[ +0.004886] ? do_syscall_64+0x95/0x170
[ +0.003851] ? exc_page_fault+0x7e/0x180
[ +0.003932] entry_SYSCALL_64_after_hwframe+0x71/0x79
[ +0.005064] RIP: 0033:0x7f59dc9347cb
[ +0.003648] Code: 73 01 c3 48 8b 0d 65 16 0c 00 f7 d8 64 89 01 48 83
c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 16 0c 00 f7 d8 64 89 01 48
[ +0.018753] RSP: 002b:00007ffffac99048 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ +0.007577] RAX: ffffffffffffffda RBX: 0000559b9bb2f6e0 RCX: 00007f59dc9347cb
[ +0.007140] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000559b9bb2f748
[ +0.007146] RBP: 00007ffffac99070 R08: 1999999999999999 R09: 0000000000000000
[ +0.007133] R10: 00007f59dc9a5ac0 R11: 0000000000000206 R12: 0000000000000000
[ +0.007141] R13: 00007ffffac992d8 R14: 0000559b9bb2f6e0 R15: 0000000000000000
[ +0.007151] </TASK>
[ +0.002204] ---[ end trace 0000000000000000 ]---
Fix this by checking if the XDP program is being loaded or unloaded.
Then, block only loading a new program while "__I40E_IN_REMOVE" is set.
Also, move testing "__I40E_IN_REMOVE" flag to the beginning of XDP_SETUP
callback to avoid unnecessary operations and checks.
Fixes: 6533e558c650 ("i40e: Fix reset path while removing the driver")
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240708230750.625986-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
X would not start in my old 32-bit partition (and the "n"-handling looks
just as wrong on 64-bit, but for whatever reason did not show up there):
"n" must be accumulated over all pages before it's added to "offset" and
compared with "copy", immediately after the skb_frag_foreach_page() loop.
Fixes: d2d30a376d9c ("net: allow skb_datagram_iter to be called from any context")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://patch.msgid.link/fef352e8-b89a-da51-f8ce-04bc39ee6481@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-07-09
The following pull-request contains BPF updates for your *net* tree.
We've added 3 non-merge commits during the last 1 day(s) which contain
a total of 5 files changed, 81 insertions(+), 11 deletions(-).
The main changes are:
1) Fix a use-after-free in a corner case where tcx_entry got released too
early. Also add BPF test coverage along with the fix, from Daniel Borkmann.
2) Fix a kernel panic on Loongarch in sk_msg_recvmsg() which got triggered
by running BPF sockmap selftests, from Geliang Tang.
bpf-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
skmsg: Skip zero length skb in sk_msg_recvmsg
selftests/bpf: Extend tcx tests to cover late tcx_entry release
bpf: Fix too early release of tcx_entry
====================
Link: https://patch.msgid.link/20240709091452.27840-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When SMP is enabled and spinlocks are actually functional then there is
a deadlock with the 'statelock' spinlock between ks8851_start_xmit_spi
and ks8851_irq:
watchdog: BUG: soft lockup - CPU#0 stuck for 27s!
call trace:
queued_spin_lock_slowpath+0x100/0x284
do_raw_spin_lock+0x34/0x44
ks8851_start_xmit_spi+0x30/0xb8
ks8851_start_xmit+0x14/0x20
netdev_start_xmit+0x40/0x6c
dev_hard_start_xmit+0x6c/0xbc
sch_direct_xmit+0xa4/0x22c
__qdisc_run+0x138/0x3fc
qdisc_run+0x24/0x3c
net_tx_action+0xf8/0x130
handle_softirqs+0x1ac/0x1f0
__do_softirq+0x14/0x20
____do_softirq+0x10/0x1c
call_on_irq_stack+0x3c/0x58
do_softirq_own_stack+0x1c/0x28
__irq_exit_rcu+0x54/0x9c
irq_exit_rcu+0x10/0x1c
el1_interrupt+0x38/0x50
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x64/0x68
__netif_schedule+0x6c/0x80
netif_tx_wake_queue+0x38/0x48
ks8851_irq+0xb8/0x2c8
irq_thread_fn+0x2c/0x74
irq_thread+0x10c/0x1b0
kthread+0xc8/0xd8
ret_from_fork+0x10/0x20
This issue has not been identified earlier because tests were done on
a device with SMP disabled and so spinlocks were actually NOPs.
Now use spin_(un)lock_bh for TX queue related locking to avoid execution
of softirq work synchronously that would lead to a deadlock.
Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240706101337.854474-1-rwahl@gmx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
rvu_check_rsrc_availability()
In rvu_check_rsrc_availability() in case of invalid SSOW req, an incorrect
data is printed to error log. 'req->sso' value is printed instead of
'req->ssow'. Looks like "copy-paste" mistake.
Fix this mistake by replacing 'req->sso' with 'req->ssow'.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 746ea74241fa ("octeontx2-af: Add RVU block LF provisioning support")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240705095317.12640-1-amishin@t-argos.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
bnxt doesn't check if a ring is used by RSS contexts when reducing
ring count. Core performs a similar check for the drivers for
the main context, but core doesn't know about additional contexts,
so it can't validate them. bnxt_fill_hw_rss_tbl_p5() uses ring
id to index bp->rx_ring[], which without the check may end up
being out of bounds.
BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
Read of size 2 at addr ffff8881c5809618 by task ethtool/31525
Call Trace:
__bnxt_hwrm_vnic_set_rss+0xb79/0xe40
bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460
__bnxt_setup_vnic_p5+0x12e/0x270
__bnxt_open_nic+0x2262/0x2f30
bnxt_open_nic+0x5d/0xf0
ethnl_set_channels+0x5d4/0xb30
ethnl_default_set_doit+0x2f1/0x620
Core does track the additional contexts in net-next, so we can
move this validation out of the driver as a follow up there.
Fixes: b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240705020005.681746-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
platform, the following kernel panic occurs:
[...]
Oops[#1]:
CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18
Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
... ...
ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD: 0000000c (PPLV0 +PIE +PWE)
EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
BADV: 0000000000000040
PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
Stack : ...
Call Trace:
[<9000000004162774>] copy_page_to_iter+0x74/0x1c0
[<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
[<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
[<90000000049aae34>] inet_recvmsg+0x54/0x100
[<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
[<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
[<900000000481e27c>] sys_recvfrom+0x1c/0x40
[<9000000004c076ec>] do_syscall+0x8c/0xc0
[<9000000003731da4>] handle_syscall+0xc4/0x160
Code: ...
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Fatal exception
Kernel relocated by 0x3510000
.text @ 0x9000000003710000
.data @ 0x9000000004d70000
.bss @ 0x9000000006469400
---[ end Kernel panic - not syncing: Fatal exception ]---
[...]
This crash happens every time when running sockmap_skb_verdict_shutdown
subtest in sockmap_basic.
This crash is because a NULL pointer is passed to page_address() in the
sk_msg_recvmsg(). Due to the different implementations depending on the
architecture, page_address(NULL) will trigger a panic on Loongarch
platform but not on x86 platform. So this bug was hidden on x86 platform
for a while, but now it is exposed on Loongarch platform. The root cause
is that a zero length skb (skb->len == 0) was put on the queue.
This zero length skb is a TCP FIN packet, which was sent by shutdown(),
invoked in test_sockmap_skb_verdict_shutdown():
shutdown(p1, SHUT_WR);
In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
page is put to this sge (see sg_set_page in sg_set_page), but this empty
sge is queued into ingress_msg list.
And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by
sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it
to kmap_local_page() and to page_address(), then kernel panics.
To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(),
if copy is zero, that means it's a zero length skb, skip invoking
copy_page_to_iter(). We are using the EFAULT return triggered by
copy_page_to_iter to check for is_fin in tcp_bpf.c.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Suggested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn
|
|
Reinit PHY after cable test, otherwise link can't be established on
tested port. This issue is reproducible on LAN9372 switches with
integrated 100BaseT1 PHYs.
Fixes: 788050256c411 ("net: phy: microchip_t1: add cable test support for lan87xx phy")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20240705084954.83048-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a test case which replaces an active ingress qdisc while keeping the
miniq in-tact during the transition period to the new clsact qdisc.
# ./vmtest.sh -- ./test_progs -t tc_link
[...]
./test_progs -t tc_link
[ 3.412871] bpf_testmod: loading out-of-tree module taints kernel.
[ 3.413343] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#332 tc_links_after:OK
#333 tc_links_append:OK
#334 tc_links_basic:OK
#335 tc_links_before:OK
#336 tc_links_chain_classic:OK
#337 tc_links_chain_mixed:OK
#338 tc_links_dev_chain0:OK
#339 tc_links_dev_cleanup:OK
#340 tc_links_dev_mixed:OK
#341 tc_links_ingress:OK
#342 tc_links_invalid:OK
#343 tc_links_prepend:OK
#344 tc_links_replace:OK
#345 tc_links_revision:OK
Summary: 14/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240708133130.11609-2-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Pedro Pinto and later independently also Hyunwoo Kim and Wongi Lee reported
an issue that the tcx_entry can be released too early leading to a use
after free (UAF) when an active old-style ingress or clsact qdisc with a
shared tc block is later replaced by another ingress or clsact instance.
Essentially, the sequence to trigger the UAF (one example) can be as follows:
1. A network namespace is created
2. An ingress qdisc is created. This allocates a tcx_entry, and
&tcx_entry->miniq is stored in the qdisc's miniqp->p_miniq. At the
same time, a tcf block with index 1 is created.
3. chain0 is attached to the tcf block. chain0 must be connected to
the block linked to the ingress qdisc to later reach the function
tcf_chain0_head_change_cb_del() which triggers the UAF.
4. Create and graft a clsact qdisc. This causes the ingress qdisc
created in step 1 to be removed, thus freeing the previously linked
tcx_entry:
rtnetlink_rcv_msg()
=> tc_modify_qdisc()
=> qdisc_create()
=> clsact_init() [a]
=> qdisc_graft()
=> qdisc_destroy()
=> __qdisc_destroy()
=> ingress_destroy() [b]
=> tcx_entry_free()
=> kfree_rcu() // tcx_entry freed
5. Finally, the network namespace is closed. This registers the
cleanup_net worker, and during the process of releasing the
remaining clsact qdisc, it accesses the tcx_entry that was
already freed in step 4, causing the UAF to occur:
cleanup_net()
=> ops_exit_list()
=> default_device_exit_batch()
=> unregister_netdevice_many()
=> unregister_netdevice_many_notify()
=> dev_shutdown()
=> qdisc_put()
=> clsact_destroy() [c]
=> tcf_block_put_ext()
=> tcf_chain0_head_change_cb_del()
=> tcf_chain_head_change_item()
=> clsact_chain_head_change()
=> mini_qdisc_pair_swap() // UAF
There are also other variants, the gist is to add an ingress (or clsact)
qdisc with a specific shared block, then to replace that qdisc, waiting
for the tcx_entry kfree_rcu() to be executed and subsequently accessing
the current active qdisc's miniq one way or another.
The correct fix is to turn the miniq_active boolean into a counter. What
can be observed, at step 2 above, the counter transitions from 0->1, at
step [a] from 1->2 (in order for the miniq object to remain active during
the replacement), then in [b] from 2->1 and finally [c] 1->0 with the
eventual release. The reference counter in general ranges from [0,2] and
it does not need to be atomic since all access to the counter is protected
by the rtnl mutex. With this in place, there is no longer a UAF happening
and the tcx_entry is freed at the correct time.
Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: Pedro Pinto <xten@osec.io>
Co-developed-by: Pedro Pinto <xten@osec.io>
Signed-off-by: Pedro Pinto <xten@osec.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Hyunwoo Kim <v4bel@theori.io>
Cc: Wongi Lee <qwerty@theori.io>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240708133130.11609-1-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Correct the example to match the help text from the devlink utility.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Loss recovery undo_retrans bookkeeping had a long-standing bug where a
DSACK from a spurious TLP retransmit packet could cause an erroneous
undo of a fast recovery or RTO recovery that repaired a single
really-lost packet (in a sequence range outside that of the TLP
retransmit). Basically, because the loss recovery state machine didn't
account for the fact that it sent a TLP retransmit, the DSACK for the
TLP retransmit could erroneously be implicitly be interpreted as
corresponding to the normal fast recovery or RTO recovery retransmit
that plugged a real hole, thus resulting in an improper undo.
For example, consider the following buggy scenario where there is a
real packet loss but the congestion control response is improperly
undone because of this bug:
+ send packets P1, P2, P3, P4
+ P1 is really lost
+ send TLP retransmit of P4
+ receive SACK for original P2, P3, P4
+ enter fast recovery, fast-retransmit P1, increment undo_retrans to 1
+ receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!)
+ receive cumulative ACK for P1-P4 (fast retransmit plugged real hole)
The fix: when we initialize undo machinery in tcp_init_undo(), if
there is a TLP retransmit in flight, then increment tp->undo_retrans
so that we make sure that we receive a DSACK corresponding to the TLP
retransmit, as well as DSACKs for all later normal retransmits, before
triggering a loss recovery undo. Note that we also have to move the
line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO
we remember the tp->tlp_high_seq value until tcp_init_undo() and clear
it only afterward.
Also note that the bug dates back to the original 2013 TLP
implementation, commit 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)").
However, this patch will only compile and work correctly with kernels
that have tp->tlp_retrans, which was added only in v5.8 in 2020 in
commit 76be93fc0702 ("tcp: allow at most one TLP probe per flight").
So we associate this fix with that later commit.
Fixes: 76be93fc0702 ("tcp: allow at most one TLP probe per flight")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Kevin Yang <yyd@google.com>
Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jason A. Donenfeld says:
====================
wireguard fixes for 6.10-rc7
These are four small fixes for WireGuard, which are all marked for
stable:
1) A QEMU command line fix to remove deprecated flags.
2) Use of proper unaligned helpers to avoid unaligned memory access on
some systems, from Helge.
3) Two patches to annotate intentional data races, so KCSAN and syzbot
don't get upset.
====================
Link: https://patch.msgid.link/20240704154517.1572127-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
KCSAN reports a race in wg_packet_send_keepalive, which is intentional:
BUG: KCSAN: data-race in wg_packet_send_keepalive / wg_packet_send_staged_packets
write to 0xffff88814cd91280 of 8 bytes by task 3194 on cpu 0:
__skb_queue_head_init include/linux/skbuff.h:2162 [inline]
skb_queue_splice_init include/linux/skbuff.h:2248 [inline]
wg_packet_send_staged_packets+0xe5/0xad0 drivers/net/wireguard/send.c:351
wg_xmit+0x5b8/0x660 drivers/net/wireguard/device.c:218
__netdev_start_xmit include/linux/netdevice.h:4940 [inline]
netdev_start_xmit include/linux/netdevice.h:4954 [inline]
xmit_one net/core/dev.c:3548 [inline]
dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3564
__dev_queue_xmit+0xeff/0x1d80 net/core/dev.c:4349
dev_queue_xmit include/linux/netdevice.h:3134 [inline]
neigh_connected_output+0x231/0x2a0 net/core/neighbour.c:1592
neigh_output include/net/neighbour.h:542 [inline]
ip6_finish_output2+0xa66/0xce0 net/ipv6/ip6_output.c:137
ip6_finish_output+0x1a5/0x490 net/ipv6/ip6_output.c:222
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:243
dst_output include/net/dst.h:451 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
ndisc_send_skb+0x4a2/0x670 net/ipv6/ndisc.c:509
ndisc_send_rs+0x3ab/0x3e0 net/ipv6/ndisc.c:719
addrconf_dad_completed+0x640/0x8e0 net/ipv6/addrconf.c:4295
addrconf_dad_work+0x891/0xbc0
process_one_work kernel/workqueue.c:2633 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706
worker_thread+0x525/0x730 kernel/workqueue.c:2787
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
read to 0xffff88814cd91280 of 8 bytes by task 3202 on cpu 1:
skb_queue_empty include/linux/skbuff.h:1798 [inline]
wg_packet_send_keepalive+0x20/0x100 drivers/net/wireguard/send.c:225
wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline]
wg_packet_handshake_receive_worker+0x445/0x5e0 drivers/net/wireguard/receive.c:213
process_one_work kernel/workqueue.c:2633 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706
worker_thread+0x525/0x730 kernel/workqueue.c:2787
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
value changed: 0xffff888148fef200 -> 0xffff88814cd91280
Mark this race as intentional by using the skb_queue_empty_lockless()
function rather than skb_queue_empty(), which uses READ_ONCE()
internally to annotate the race.
Cc: stable@vger.kernel.org
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20240704154517.1572127-5-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
KCSAN reports a race in the CPU round robin function, which, as the
comment points out, is intentional:
BUG: KCSAN: data-race in wg_packet_send_staged_packets / wg_packet_send_staged_packets
read to 0xffff88811254eb28 of 4 bytes by task 3160 on cpu 1:
wg_cpumask_next_online drivers/net/wireguard/queueing.h:127 [inline]
wg_queue_enqueue_per_device_and_peer drivers/net/wireguard/queueing.h:173 [inline]
wg_packet_create_data drivers/net/wireguard/send.c:320 [inline]
wg_packet_send_staged_packets+0x60e/0xac0 drivers/net/wireguard/send.c:388
wg_packet_send_keepalive+0xe2/0x100 drivers/net/wireguard/send.c:239
wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline]
wg_packet_handshake_receive_worker+0x449/0x5f0 drivers/net/wireguard/receive.c:213
process_one_work kernel/workqueue.c:3248 [inline]
process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3329
worker_thread+0x526/0x720 kernel/workqueue.c:3409
kthread+0x1d1/0x210 kernel/kthread.c:389
ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
write to 0xffff88811254eb28 of 4 bytes by task 3158 on cpu 0:
wg_cpumask_next_online drivers/net/wireguard/queueing.h:130 [inline]
wg_queue_enqueue_per_device_and_peer drivers/net/wireguard/queueing.h:173 [inline]
wg_packet_create_data drivers/net/wireguard/send.c:320 [inline]
wg_packet_send_staged_packets+0x6e5/0xac0 drivers/net/wireguard/send.c:388
wg_packet_send_keepalive+0xe2/0x100 drivers/net/wireguard/send.c:239
wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline]
wg_packet_handshake_receive_worker+0x449/0x5f0 drivers/net/wireguard/receive.c:213
process_one_work kernel/workqueue.c:3248 [inline]
process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3329
worker_thread+0x526/0x720 kernel/workqueue.c:3409
kthread+0x1d1/0x210 kernel/kthread.c:389
ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
value changed: 0xffffffff -> 0x00000000
Mark this race as intentional by using READ/WRITE_ONCE().
Cc: stable@vger.kernel.org
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20240704154517.1572127-4-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On the parisc platform, the kernel issues kernel warnings because
swap_endian() tries to load a 128-bit IPv6 address from an unaligned
memory location:
Kernel: unaligned access to 0x55f4688c in wg_allowedips_insert_v6+0x2c/0x80 [wireguard] (iir 0xf3010df)
Kernel: unaligned access to 0x55f46884 in wg_allowedips_insert_v6+0x38/0x80 [wireguard] (iir 0xf2010dc)
Avoid such unaligned memory accesses by instead using the
get_unaligned_be64() helper macro.
Signed-off-by: Helge Deller <deller@gmx.de>
[Jason: replace src[8] in original patch with src+8]
Cc: stable@vger.kernel.org
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20240704154517.1572127-3-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
QEMU 9.0 removed -no-acpi, in favor of machine properties, so update the
Makefile to use the correct QEMU invocation.
Cc: stable@vger.kernel.org
Fixes: b83fdcd9fb8a ("wireguard: selftests: use microvm on x86")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://patch.msgid.link/20240704154517.1572127-2-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Return an error code if bcmasp_interface_create() fails. Don't return
success.
Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Justin Chen <justin.chen@broadcom.com>
Link: https://patch.msgid.link/ZoWKBkHH9D1fqV4r@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The 'phy' parameter supplied to lan9303_phy_read/_write was sometimes a
DSA port number and sometimes a PHY address. This isn't a problem as
long as they are equal. But if the external phy_addr_sel_strap pin is
wired to 'high', the PHY addresses change from 0-1-2 to 1-2-3 (CPU,
slave0, slave1). In this case, lan9303_phy_read/_write must translate
between DSA port numbers and the corresponding PHY address.
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20240703145718.19951-1-ceggers@arri.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, wireless and netfilter.
There's one fix for power management with Intel's e1000e here,
Thorsten tells us there's another problem that started in v6.9. We're
trying to wrap that up but I don't think it's blocking.
Current release - new code bugs:
- wifi: mac80211: disable softirqs for queued frame handling
- af_unix: fix uninit-value in __unix_walk_scc(), with the new
garbage collection algo
Previous releases - regressions:
- Bluetooth:
- qca: fix BT enable failure for QCA6390 after warm reboot
- add quirk to ignore reserved PHY bits in LE Extended Adv Report,
abused by some Broadcom controllers found on Apple machines
- wifi: wilc1000: fix ies_len type in connect path
Previous releases - always broken:
- tcp: fix DSACK undo in fast recovery to call tcp_try_to_open(),
avoid premature timeouts
- net: make sure skb_datagram_iter maps fragments page by page, in
case we somehow get compound highmem mixed in
- eth: bnx2x: fix multiple UBSAN array-index-out-of-bounds when more
queues are used
Misc:
- MAINTAINERS: Remembering Larry Finger"
* tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
bnxt_en: Fix the resource check condition for RSS contexts
mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file
inet_diag: Initialize pad field in struct inet_diag_req_v2
tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.
selftests: make order checking verbose in msg_zerocopy selftest
selftests: fix OOM in msg_zerocopy selftest
ice: use proper macro for testing bit
ice: Reject pin requests with unsupported flags
ice: Don't process extts if PTP is disabled
ice: Fix improper extts handling
selftest: af_unix: Add test case for backtrack after finalising SCC.
af_unix: Fix uninit-value in __unix_walk_scc()
bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
net: rswitch: Avoid use-after-free in rswitch_poll()
netfilter: nf_tables: unconditionally flush pending work before notifier
wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
wifi: iwlwifi: mvm: avoid link lookup in statistics
wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
wifi: wilc1000: fix ies_len type in connect path
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Fix and add physical to virtual address translations in dasd and
virtio_ccw drivers. For virtio_ccw this is just a minimal fix.
More code cleanup will follow.
- Small defconfig updates
* tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/dasd: Fix invalid dereferencing of indirect CCW data pointer
s390/vfio_ccw: Fix target addresses of TIC CCWs
s390: Update defconfigs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fix from Hans de Goede:
- Fix regression in toshiba_acpi introduced in 6.10-rc1
* tag 'platform-drivers-x86-v6.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: toshiba_acpi: Fix quickstart quirk handling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull Kselftest fix from Mickaël Salaün:
"Fix Kselftests timeout.
We can't use CLONE_VFORK, since that blocks the parent - and thus the
timeout handling - until the child exits or execve's.
Go back to using plain fork()"
* tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/harness: Fix tests timeout and race condition
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from, Andrew Morton:
"6 hotfies, all cc:stable. Some fixes for longstanding nilfs2 issues
and three unrelated MM fixes"
* tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix incorrect inode allocation from reserved inodes
nilfs2: add missing check for inode numbers on directory entries
nilfs2: fix inode number range checks
mm: avoid overflows in dirty throttling logic
Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
mm: optimize the redundant loop of mm_update_owner_next()
|
|
While creating a new RSS context, bnxt_rfs_capable() currently
makes a strict check to see if the required VNICs are already
available. If the current VNICs are not what is required,
either too many or not enough, it will call the firmware to
reserve the exact number required.
There is a bug in the firmware when the driver tries to
relinquish some reserved VNICs and RSS contexts. It will
cause the default VNIC to lose its RSS configuration and
cause receive packets to be placed incorrectly.
Workaround this problem by skipping the resource reduction.
The driver will not reduce the VNIC and RSS context reservations
when a context is deleted. The resources will be available for
use when new contexts are created later.
Potentially, this workaround can cause us to run out of VNIC
and RSS contexts if there are a lot of VF functions creating
and deleting RSS contexts. In the future, we will conditionally
disable this workaround when the firmware fix is available.
Fixes: 438ba39b25fe ("bnxt_en: Improve RSS context reservation infrastructure")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20240625010210.2002310-1-kuba@kernel.org/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240703180112.78590-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
file
In case of invalid INI file mlxsw_linecard_types_init() deallocates memory
but doesn't reset pointer to NULL and returns 0. In case of any error
occurred after mlxsw_linecard_types_init() call, mlxsw_linecards_init()
calls mlxsw_linecard_types_fini() which performs memory deallocation again.
Add pointer reset to NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: b217127e5e4e ("mlxsw: core_linecards: Add line card objects and implement provisioning")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20240703203251.8871-1-amishin@t-argos.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.10
Hopefully the last fixes for v6.10. Fix a regression in wilc1000
where bitrate Information Elements longer than 255 bytes were broken.
Few fixes also to mac80211 and iwlwifi.
* tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
wifi: iwlwifi: mvm: avoid link lookup in statistics
wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
wifi: wilc1000: fix ies_len type in connect path
wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP
====================
Link: https://patch.msgid.link/20240704111431.11DEDC3277B@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains a oneliner patch to inconditionally flush
workqueue containing stale objects to be released, syzbot managed to
trigger UaF. Patch from Florian Westphal.
netfilter pull request 24-07-04
* tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: unconditionally flush pending work before notifier
====================
Link: https://patch.msgid.link/20240703223304.1455-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
KMSAN reported uninit-value access in raw_lookup() [1]. Diag for raw
sockets uses the pad field in struct inet_diag_req_v2 for the
underlying protocol. This field corresponds to the sdiag_raw_protocol
field in struct inet_diag_req_raw.
inet_diag_get_exact_compat() converts inet_diag_req to
inet_diag_req_v2, but leaves the pad field uninitialized. So the issue
occurs when raw_lookup() accesses the sdiag_raw_protocol field.
Fix this by initializing the pad field in
inet_diag_get_exact_compat(). Also, do the same fix in
inet_diag_dump_compat() to avoid the similar issue in the future.
[1]
BUG: KMSAN: uninit-value in raw_lookup net/ipv4/raw_diag.c:49 [inline]
BUG: KMSAN: uninit-value in raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
raw_lookup net/ipv4/raw_diag.c:49 [inline]
raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
inet_diag_cmd_exact+0x7d9/0x980
inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x332/0x3d0 net/socket.c:745
____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
__sys_sendmsg net/socket.c:2668 [inline]
__do_sys_sendmsg net/socket.c:2677 [inline]
__se_sys_sendmsg net/socket.c:2675 [inline]
__x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was stored to memory at:
raw_sock_get+0x650/0x800 net/ipv4/raw_diag.c:71
raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
inet_diag_cmd_exact+0x7d9/0x980
inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x332/0x3d0 net/socket.c:745
____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
__sys_sendmsg net/socket.c:2668 [inline]
__do_sys_sendmsg net/socket.c:2677 [inline]
__se_sys_sendmsg net/socket.c:2675 [inline]
__x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Local variable req.i created at:
inet_diag_get_exact_compat net/ipv4/inet_diag.c:1396 [inline]
inet_diag_rcv_msg_compat+0x2a6/0x530 net/ipv4/inet_diag.c:1426
sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
CPU: 1 PID: 8888 Comm: syz-executor.6 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Fixes: 432490f9d455 ("net: ip, diag -- Add diag interface for raw sockets")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240703091649.111773-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When we process segments with TCP AO, we don't check it in
tcp_parse_options(). Thus, opt_rx->saw_unknown is set to 1,
which unconditionally triggers the BPF TCP option parser.
Let's avoid the unnecessary BPF invocation.
Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240703033508.6321-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Zijian Zhang says:
====================
fix OOM and order check in msg_zerocopy selftest
In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.
Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.
And, we find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.
====================
Link: https://patch.msgid.link/20240701225349.3395580-1-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.
Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240701225349.3395580-3-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.
Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.
Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240701225349.3395580-2-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-06-25 (ice)
This series contains updates to ice driver only.
Milena adds disabling of extts events when PTP is disabled.
Jake prevents possible NULL pointer by checking that timestamps are
ready before processing extts events and adds checks for unsupported
PTP pin configuration.
Petr Oros replaces _test_bit() with the correct test_bit() macro.
v1: https://lore.kernel.org/netdev/20240625170248.199162-1-anthony.l.nguyen@intel.com/
====================
Link: https://patch.msgid.link/20240702171459.2606611-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do not use _test_bit() macro for testing bit. The proper macro for this
is one without underline.
_test_bit() is what test_bit() was prior to const-optimization. It
directly calls arch_test_bit(), i.e. the arch-specific implementation
(or the generic one). It's strictly _internal_ and shouldn't be used
anywhere outside the actual test_bit() macro.
test_bit() is a wrapper which checks whether the bitmap and the bit
number are compile-time constants and if so, it calls the optimized
function which evaluates this call to a compile-time constant as well.
If either of them is not a compile-time constant, it just calls _test_bit().
test_bit() is the actual function to use anywhere in the kernel.
IOW, calling _test_bit() avoids potential compile-time optimizations.
The sensors is not a compile-time constant, thus most probably there
are no object code changes before and after the patch.
But anyway, we shouldn't call internal wrappers instead of
the actual API.
Fixes: 4da71a77fc3b ("ice: read internal temperature sensor")
Acked-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver receives requests for configuring pins via the .enable
callback of the PTP clock object. These requests come into the driver
with flags which modify the requested behavior from userspace. Current
implementation in ice does not reject flags that it doesn't support.
This causes the driver to incorrectly apply requests with such flags as
PTP_PEROUT_DUTY_CYCLE, or any future flags added by the kernel which it
is not yet aware of.
Fix this by properly validating flags in both ice_ptp_cfg_perout and
ice_ptp_cfg_extts. Ensure that we check by bit-wise negating supported
flags rather than just checking and rejecting known un-supported flags.
This is preferable, as it ensures better compatibility with future
kernels.
Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The ice_ptp_extts_event() function can race with ice_ptp_release() and
result in a NULL pointer dereference which leads to a kernel panic.
Panic occurs because the ice_ptp_extts_event() function calls
ptp_clock_event() with a NULL pointer. The ice driver has already
released the PTP clock by the time the interrupt for the next external
timestamp event occurs.
To fix this, modify the ice_ptp_extts_event() function to check the
PTP state and bail early if PTP is not ready.
Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extts events are disabled and enabled by the application ts2phc.
However, in case where the driver is removed when the application is
running, a specific extts event remains enabled and can cause a kernel
crash.
As a side effect, when the driver is reloaded and application is started
again, remaining extts event for the channel from a previous run will
keep firing and the message "extts on unexpected channel" might be
printed to the user.
To avoid that, extts events shall be disabled when PTP is released.
Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller reported a KMSAN splat in __unix_walk_scc() while backtracking
edge_stack after finalising SCC.
Let's add a test case exercising the path.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://patch.msgid.link/20240702160428.10153-2-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
KMSAN reported uninit-value access in __unix_walk_scc() [1].
In the list_for_each_entry_reverse() loop, when the vertex's index
equals it's scc_index, the loop uses the variable vertex as a
temporary variable that points to a vertex in scc. And when the loop
is finished, the variable vertex points to the list head, in this case
scc, which is a local variable on the stack (more precisely, it's not
even scc and might underflow the call stack of __unix_walk_scc():
container_of(&scc, struct unix_vertex, scc_entry)).
However, the variable vertex is used under the label prev_vertex. So
if the edge_stack is not empty and the function jumps to the
prev_vertex label, the function will access invalid data on the
stack. This causes the uninit-value access issue.
Fix this by introducing a new temporary variable for the loop.
[1]
BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline]
BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline]
BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
__unix_walk_scc net/unix/garbage.c:478 [inline]
unix_walk_scc net/unix/garbage.c:526 [inline]
__unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
kthread+0x3c4/0x530 kernel/kthread.c:389
ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Uninit was stored to memory at:
unix_walk_scc net/unix/garbage.c:526 [inline]
__unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
kthread+0x3c4/0x530 kernel/kthread.c:389
ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Local variable entries created at:
ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222
netdev_tracker_free include/linux/netdevice.h:4058 [inline]
netdev_put include/linux/netdevice.h:4075 [inline]
dev_put include/linux/netdevice.h:4101 [inline]
update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813
CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Workqueue: events_unbound __unix_gc
Fixes: 3484f063172d ("af_unix: Detect Strongly Connected Components.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240702160428.10153-1-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In function bond_option_arp_ip_targets_set(), if newval->string is an
empty string, newval->string+1 will point to the byte after the
string, causing an out-of-bound read.
BUG: KASAN: slab-out-of-bounds in strlen+0x7d/0xa0 lib/string.c:418
Read of size 1 at addr ffff8881119c4781 by task syz-executor665/8107
CPU: 1 PID: 8107 Comm: syz-executor665 Not tainted 6.7.0-rc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0xc1/0x5e0 mm/kasan/report.c:475
kasan_report+0xbe/0xf0 mm/kasan/report.c:588
strlen+0x7d/0xa0 lib/string.c:418
__fortify_strlen include/linux/fortify-string.h:210 [inline]
in4_pton+0xa3/0x3f0 net/core/utils.c:130
bond_option_arp_ip_targets_set+0xc2/0x910
drivers/net/bonding/bond_options.c:1201
__bond_opt_set+0x2a4/0x1030 drivers/net/bonding/bond_options.c:767
__bond_opt_set_notify+0x48/0x150 drivers/net/bonding/bond_options.c:792
bond_opt_tryset_rtnl+0xda/0x160 drivers/net/bonding/bond_options.c:817
bonding_sysfs_store_option+0xa1/0x120 drivers/net/bonding/bond_sysfs.c:156
dev_attr_store+0x54/0x80 drivers/base/core.c:2366
sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136
kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334
call_write_iter include/linux/fs.h:2020 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x96a/0xd80 fs/read_write.c:584
ksys_write+0x122/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
---[ end trace ]---
Fix it by adding a check of string length before using it.
Fixes: f9de11a16594 ("bonding: add ip checks when store ip target")
Signed-off-by: Yue Sun <samsun1006219@gmail.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20240702-bond-oob-v6-1-2dfdba195c19@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The use-after-free is actually in rswitch_tx_free(), which is inlined in
rswitch_poll(). Since `skb` and `gq->skbs[gq->dirty]` are in fact the
same pointer, the skb is first freed using dev_kfree_skb_any(), then the
value in skb->len is used to update the interface statistics.
Let's move around the instructions to use skb->len before the skb is
freed.
This bug is trivial to reproduce using KFENCE. It will trigger a splat
every few packets. A simple ARP request or ICMP echo request is enough.
Fixes: 271e015b9153 ("net: rswitch: Add unmap_addrs instead of dma address in each desc")
Signed-off-by: Radu Rendec <rrendec@redhat.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20240702210838.2703228-1-rrendec@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|