Age | Commit message (Collapse) | Author |
|
This reverts commit 56a16035bb6effb37177867cea94c13a8382f745.
Since the previous commit, STP works on bridge in netns.
# unshare -n
# ip link add br0 type bridge
# ip link add veth0 type veth peer name veth1
# ip link set veth0 master br0 up
[ 50.558135] br0: port 1(veth0) entered blocking state
[ 50.558366] br0: port 1(veth0) entered disabled state
[ 50.558798] veth0: entered allmulticast mode
[ 50.564401] veth0: entered promiscuous mode
# ip link set veth1 master br0 up
[ 54.215487] br0: port 2(veth1) entered blocking state
[ 54.215657] br0: port 2(veth1) entered disabled state
[ 54.215848] veth1: entered allmulticast mode
[ 54.219577] veth1: entered promiscuous mode
# ip link set br0 type bridge stp_state 1
# ip link set br0 up
[ 61.960726] br0: port 2(veth1) entered blocking state
[ 61.961097] br0: port 2(veth1) entered listening state
[ 61.961495] br0: port 1(veth0) entered blocking state
[ 61.961653] br0: port 1(veth0) entered listening state
[ 63.998835] br0: port 2(veth1) entered blocking state
[ 77.437113] br0: port 1(veth0) entered learning state
[ 86.653501] br0: received packet on veth0 with own address as source address (addr:6e:0f:e7:6f:5f:5f, vlan:0)
[ 92.797095] br0: port 1(veth0) entered forwarding state
[ 92.797398] br0: topology change detected, propagating
Let's remove the warning.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Now these upper layer protocol handlers can be called from llc_rcv()
as sap->rcv_func(), which is registered by llc_sap_open().
* function which is passed to register_8022_client()
-> no in-kernel user calls register_8022_client().
* snap_rcv()
`- proto->rcvfunc() : registered by register_snap_client()
-> aarp_rcv() and atalk_rcv() drop packets from non-root netns
* stp_pdu_rcv()
`- garp_protos[]->rcv() : registered by stp_proto_register()
-> garp_pdu_rcv() and br_stp_rcv() are netns-aware
So, we can safely remove the netns restriction in llc_rcv().
Fixes: e730c15519d0 ("[NET]: Make packet reception network namespace safe")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will remove this restriction in llc_rcv() in the following patch,
which means that the protocol handler must be aware of netns.
if (!net_eq(dev_net(dev), &init_net))
goto drop;
llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it
if not NULL.
If the PDU type is LLC_DEST_CONN, llc_conn_handler() is called to pass
skb to corresponding sockets. Then, we must look up a proper socket in
the same netns with skb->dev.
llc_conn_handler() calls __llc_lookup() to look up a established or
litening socket by __llc_lookup_established() and llc_lookup_listener().
Both functions iterate on a list and call llc_estab_match() or
llc_listener_match() to check if the socket is the correct destination.
However, these functions do not check netns.
Also, bind() and connect() call llc_establish_connection(), which
finally calls __llc_lookup_established(), to check if there is a
conflicting socket.
Let's test netns in llc_estab_match() and llc_listener_match().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will remove this restriction in llc_rcv() soon, which means that the
protocol handler must be aware of netns.
if (!net_eq(dev_net(dev), &init_net))
goto drop;
llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it
if not NULL.
If the PDU type is LLC_DEST_SAP, llc_sap_handler() is called to pass skb
to corresponding sockets. Then, we must look up a proper socket in the
same netns with skb->dev.
If the destination is a multicast address, llc_sap_handler() calls
llc_sap_mcast(). It calculates a hash based on DSAP and skb->dev->ifindex,
iterates on a socket list, and calls llc_mcast_match() to check if the
socket is the correct destination. Then, llc_mcast_match() checks if
skb->dev matches with llc_sk(sk)->dev. So, we need not check netns here.
OTOH, if the destination is a unicast address, llc_sap_handler() calls
llc_lookup_dgram() to look up a socket, but it does not check the netns.
Therefore, we need to add netns check in llc_lookup_dgram().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This reverts commit 3f4ca5fafc08881d7a57daa20449d171f2887043.
Commit 3f4ca5fafc08 ("tcp: avoid the lookup process failing to get sk in
ehash table") reversed the order in how a socket is inserted into ehash
to fix an issue that ehash-lookup could fail when reqsk/full sk/twsk are
swapped. However, it introduced another lookup failure.
The full socket in ehash is allocated from a slab with SLAB_TYPESAFE_BY_RCU
and does not have SOCK_RCU_FREE, so the socket could be reused even while
it is being referenced on another CPU doing RCU lookup.
Let's say a socket is reused and inserted into the same hash bucket during
lookup. After the blamed commit, a new socket is inserted at the end of
the list. If that happens, we will skip sockets placed after the previous
position of the reused socket, resulting in ehash lookup failure.
As described in Documentation/RCU/rculist_nulls.rst, we should insert a
new socket at the head of the list to avoid such an issue.
This issue, the swap-lookup-failure, and another variant reported in [0]
can all be handled properly by adding a locked ehash lookup suggested by
Eric Dumazet [1].
However, this issue could occur for every packet, thus more likely than
the other two races, so let's revert the change for now.
Link: https://lore.kernel.org/netdev/20230606064306.9192-1-duanmuquan@baidu.com/ [0]
Link: https://lore.kernel.org/netdev/CANn89iK8snOz8TYOhhwfimC7ykYA78GA3Nyv8x06SZYa1nKdyA@mail.gmail.com/ [1]
Fixes: 3f4ca5fafc08 ("tcp: avoid the lookup process failing to get sk in ehash table")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230717215918.15723-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
goto free_skb if an unexpected result is returned by pskb_tirm()
in erspan_xmit().
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
goto err_free_skb if an unexpected result is returned by pskb_tirm()
in erspan_fb_xmit().
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
goto tx_err if an unexpected result is returned by pskb_tirm()
in ip6erspan_tunnel_xmit().
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
key might contain private part of the key, so better use
kfree_sensitive to free it.
Fixes: 38320c70d282 ("[IPSEC]: Use crypto_aead and authenc in ESP")
Signed-off-by: Wang Ming <machel@vivo.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TCP request sockets are lockless, tcp_rsk(req)->ts_recent
can change while being read by another cpu as syzbot noticed.
This is harmless, but we should annotate the known races.
Note that tcp_check_req() changes req->ts_recent a bit early,
we might change this in the future.
BUG: KCSAN: data-race in tcp_check_req / tcp_check_req
write to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 1:
tcp_check_req+0x694/0xc70 net/ipv4/tcp_minisocks.c:762
tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:303 [inline]
ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:468 [inline]
ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
NF_HOOK include/linux/netfilter.h:303 [inline]
ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5493 [inline]
__netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
process_backlog+0x21f/0x380 net/core/dev.c:5935
__napi_poll+0x60/0x3b0 net/core/dev.c:6498
napi_poll net/core/dev.c:6565 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6698
__do_softirq+0xc1/0x265 kernel/softirq.c:571
do_softirq+0x7e/0xb0 kernel/softirq.c:472
__local_bh_enable_ip+0x64/0x70 kernel/softirq.c:396
local_bh_enable+0x1f/0x20 include/linux/bottom_half.h:33
rcu_read_unlock_bh include/linux/rcupdate.h:843 [inline]
__dev_queue_xmit+0xabb/0x1d10 net/core/dev.c:4271
dev_queue_xmit include/linux/netdevice.h:3088 [inline]
neigh_hh_output include/net/neighbour.h:528 [inline]
neigh_output include/net/neighbour.h:542 [inline]
ip_finish_output2+0x700/0x840 net/ipv4/ip_output.c:229
ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:317
NF_HOOK_COND include/linux/netfilter.h:292 [inline]
ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:431
dst_output include/net/dst.h:458 [inline]
ip_local_out net/ipv4/ip_output.c:126 [inline]
__ip_queue_xmit+0xa4d/0xa70 net/ipv4/ip_output.c:533
ip_queue_xmit+0x38/0x40 net/ipv4/ip_output.c:547
__tcp_transmit_skb+0x1194/0x16e0 net/ipv4/tcp_output.c:1399
tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
tcp_write_xmit+0x13ff/0x2fd0 net/ipv4/tcp_output.c:2693
__tcp_push_pending_frames+0x6a/0x1a0 net/ipv4/tcp_output.c:2877
tcp_push_pending_frames include/net/tcp.h:1952 [inline]
__tcp_sock_set_cork net/ipv4/tcp.c:3336 [inline]
tcp_sock_set_cork+0xe8/0x100 net/ipv4/tcp.c:3343
rds_tcp_xmit_path_complete+0x3b/0x40 net/rds/tcp_send.c:52
rds_send_xmit+0xf8d/0x1420 net/rds/send.c:422
rds_send_worker+0x42/0x1d0 net/rds/threads.c:200
process_one_work+0x3e6/0x750 kernel/workqueue.c:2408
worker_thread+0x5f2/0xa10 kernel/workqueue.c:2555
kthread+0x1d7/0x210 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
read to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 0:
tcp_check_req+0x32a/0xc70 net/ipv4/tcp_minisocks.c:622
tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:303 [inline]
ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:468 [inline]
ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
NF_HOOK include/linux/netfilter.h:303 [inline]
ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5493 [inline]
__netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
process_backlog+0x21f/0x380 net/core/dev.c:5935
__napi_poll+0x60/0x3b0 net/core/dev.c:6498
napi_poll net/core/dev.c:6565 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6698
__do_softirq+0xc1/0x265 kernel/softirq.c:571
run_ksoftirqd+0x17/0x20 kernel/softirq.c:939
smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
kthread+0x1d7/0x210 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
value changed: 0x1cd237f1 -> 0x1cd237f2
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230717144445.653164-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TCP request sockets are lockless, some of their fields
can change while being read by another cpu as syzbot noticed.
This is usually harmless, but we should annotate the known
races.
This patch takes care of tcp_rsk(req)->txhash,
a separate one is needed for tcp_rsk(req)->ts_recent.
BUG: KCSAN: data-race in tcp_make_synack / tcp_rtx_synack
write to 0xffff8881362304bc of 4 bytes by task 32083 on cpu 1:
tcp_rtx_synack+0x9d/0x2a0 net/ipv4/tcp_output.c:4213
inet_rtx_syn_ack+0x38/0x80 net/ipv4/inet_connection_sock.c:880
tcp_check_req+0x379/0xc70 net/ipv4/tcp_minisocks.c:665
tcp_v6_rcv+0x125b/0x1b20 net/ipv6/tcp_ipv6.c:1673
ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
NF_HOOK include/linux/netfilter.h:303 [inline]
ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
dst_input include/net/dst.h:468 [inline]
ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:303 [inline]
ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core net/core/dev.c:5452 [inline]
__netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
netif_receive_skb_internal net/core/dev.c:5652 [inline]
netif_receive_skb+0x4a/0x310 net/core/dev.c:5711
tun_rx_batched+0x3bf/0x400
tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997
tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043
call_write_iter include/linux/fs.h:1871 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x4ab/0x7d0 fs/read_write.c:584
ksys_write+0xeb/0x1a0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x42/0x50 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff8881362304bc of 4 bytes by task 32078 on cpu 0:
tcp_make_synack+0x367/0xb40 net/ipv4/tcp_output.c:3663
tcp_v6_send_synack+0x72/0x420 net/ipv6/tcp_ipv6.c:544
tcp_conn_request+0x11a8/0x1560 net/ipv4/tcp_input.c:7059
tcp_v6_conn_request+0x13f/0x180 net/ipv6/tcp_ipv6.c:1175
tcp_rcv_state_process+0x156/0x1de0 net/ipv4/tcp_input.c:6494
tcp_v6_do_rcv+0x98a/0xb70 net/ipv6/tcp_ipv6.c:1509
tcp_v6_rcv+0x17b8/0x1b20 net/ipv6/tcp_ipv6.c:1735
ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
NF_HOOK include/linux/netfilter.h:303 [inline]
ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
dst_input include/net/dst.h:468 [inline]
ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
NF_HOOK include/linux/netfilter.h:303 [inline]
ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core net/core/dev.c:5452 [inline]
__netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
netif_receive_skb_internal net/core/dev.c:5652 [inline]
netif_receive_skb+0x4a/0x310 net/core/dev.c:5711
tun_rx_batched+0x3bf/0x400
tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997
tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043
call_write_iter include/linux/fs.h:1871 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x4ab/0x7d0 fs/read_write.c:584
ksys_write+0xeb/0x1a0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x42/0x50 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x91d25731 -> 0xe79325cd
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 32078 Comm: syz-executor.4 Not tainted 6.5.0-rc1-syzkaller-00033-geb26cbb1a754 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230717144445.653164-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2023-07-17
The 1st patch is by Ziyang Xuan and fixes a possible memory leak in
the receiver handling in the CAN RAW protocol.
YueHaibing contributes a use after free in bcm_proc_show() of the
Broad Cast Manager (BCM) CAN protocol.
The next 2 patches are by me and fix a possible null pointer
dereference in the RX path of the gs_usb driver with activated
hardware timestamps and the candlelight firmware.
The last patch is by Fedor Ross, Marek Vasut and me and targets the
mcp251xfd driver. The polling timeout of __mcp251xfd_chip_set_mode()
is increased to fix bus joining on busy CAN buses and very low bit
rate.
* tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout
can: gs_usb: fix time stamp counter initialization
can: gs_usb: gs_can_open(): improve error handling
can: bcm: Fix UAF in bcm_proc_show()
can: raw: fix receiver memory leak
====================
Link: https://lore.kernel.org/r/20230717180938.230816-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
BUG: KASAN: slab-use-after-free in bcm_proc_show+0x969/0xa80
Read of size 8 at addr ffff888155846230 by task cat/7862
CPU: 1 PID: 7862 Comm: cat Not tainted 6.5.0-rc1-00153-gc8746099c197 #230
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xd5/0x150
print_report+0xc1/0x5e0
kasan_report+0xba/0xf0
bcm_proc_show+0x969/0xa80
seq_read_iter+0x4f6/0x1260
seq_read+0x165/0x210
proc_reg_read+0x227/0x300
vfs_read+0x1d5/0x8d0
ksys_read+0x11e/0x240
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Allocated by task 7846:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x9e/0xa0
bcm_sendmsg+0x264b/0x44e0
sock_sendmsg+0xda/0x180
____sys_sendmsg+0x735/0x920
___sys_sendmsg+0x11d/0x1b0
__sys_sendmsg+0xfa/0x1d0
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 7846:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x27/0x40
____kasan_slab_free+0x161/0x1c0
slab_free_freelist_hook+0x119/0x220
__kmem_cache_free+0xb4/0x2e0
rcu_core+0x809/0x1bd0
bcm_op is freed before procfs entry be removed in bcm_release(),
this lead to bcm_proc_show() may read the freed bcm_op.
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230715092543.15548-1-yuehaibing@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Got kmemleak errors with the following ltp can_filter testcase:
for ((i=1; i<=100; i++))
do
./can_filter &
sleep 0.1
done
==============================================================
[<00000000db4a4943>] can_rx_register+0x147/0x360 [can]
[<00000000a289549d>] raw_setsockopt+0x5ef/0x853 [can_raw]
[<000000006d3d9ebd>] __sys_setsockopt+0x173/0x2c0
[<00000000407dbfec>] __x64_sys_setsockopt+0x61/0x70
[<00000000fd468496>] do_syscall_64+0x33/0x40
[<00000000b7e47d51>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
It's a bug in the concurrent scenario of unregister_netdevice_many()
and raw_release() as following:
cpu0 cpu1
unregister_netdevice_many(can_dev)
unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this
net_set_todo(can_dev)
raw_release(can_socket)
dev = dev_get_by_index(, ro->ifindex); // dev == NULL
if (dev) { // receivers in dev_rcv_lists not free because dev is NULL
raw_disable_allfilters(, dev, );
dev_put(dev);
}
...
ro->bound = 0;
...
call_netdevice_notifiers(NETDEV_UNREGISTER, )
raw_notify(, NETDEV_UNREGISTER, )
if (ro->bound) // invalid because ro->bound has been set 0
raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed
Add a net_device pointer member in struct raw_sock to record bound
can_dev, and use rtnl_lock to serialize raw_socket members between
raw_bind(), raw_release(), raw_setsockopt() and raw_notify(). Use
ro->dev to decide whether to free receivers in dev_rcv_lists.
Fixes: 8d0caedb7596 ("can: bcm/raw/isotp: use per module netdevice notifier")
Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Link: https://lore.kernel.org/all/20230711011737.1969582-1-william.xuanziyang@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
If TCA_FLOWER_CLASSID is specified in the netlink message, the code will
call tcf_bind_filter. However, if any error occurs after that, the code
should undo this by calling tcf_unbind_filter.
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If cls_bpf_offload errors out, we must also undo tcf_bind_filter that
was done before the error.
Fix that by calling tcf_unbind_filter in errout_parms.
Fixes: eadb41489fd2 ("net: cls_bpf: add support for marking filters as hardware-only")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the case of an update, when TCA_U32_LINK is set, u32_set_parms will
decrement the refcount of the ht_down (struct tc_u_hnode) pointer
present in the older u32 filter which we are replacing. However, if
u32_replace_hw_knode errors out, the update command fails and that
ht_down pointer continues decremented. To fix that, when
u32_replace_hw_knode fails, check if ht_down's refcount was decremented
and undo the decrement.
Fixes: d34e3e181395 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When u32_replace_hw_knode fails, we need to undo the tcf_bind_filter
operation done at u32_set_parms.
Fixes: d34e3e181395 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mall_set_parms
In case an error occurred after mall_set_parms executed successfully, we
must undo the tcf_bind_filter call it issues.
Fix that by calling tcf_unbind_filter in err_replace_hw_filter label.
Fixes: ec2507d2a306 ("net/sched: cls_matchall: Fix error path")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4
packets.") checks DODGY bit for UDP, but for packets that can be fed
directly to the device after gso_segs reset, it actually falls through
to fragmentation:
https://lore.kernel.org/all/CAJPywTKDdjtwkLVUW6LRA2FU912qcDmQOQGt2WaDo28KzYDg+A@mail.gmail.com/
This change restores the expected behavior of GSO_UDP_L4 packets.
Fixes: 1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4 packets.")
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we create an L2 loop on a bridge in netns, we will see packets storm
even if STP is enabled.
# unshare -n
# ip link add br0 type bridge
# ip link add veth0 type veth peer name veth1
# ip link set veth0 master br0 up
# ip link set veth1 master br0 up
# ip link set br0 type bridge stp_state 1
# ip link set br0 up
# sleep 30
# ip -s link show br0
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
956553768 12861249 0 0 0 12861249 <-. Keep
TX: bytes packets errors dropped carrier collsns | increasing
1027834 11951 0 0 0 0 <-' rapidly
This is because llc_rcv() drops all packets in non-root netns and BPDU
is dropped.
Let's add extack warning when enabling STP in netns.
# unshare -n
# ip link add br0 type bridge
# ip link set br0 type bridge stp_state 1
Warning: bridge: STP does not work in non-root netns.
Note this commit will be reverted later when we namespacify the whole LLC
infra.
Fixes: e730c15519d0 ("[NET]: Make packet reception network namespace safe")
Suggested-by: Harry Coin <hcoin@quietfountain.com>
Link: https://lore.kernel.org/netdev/0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com/
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lion says:
-------
In the QFQ scheduler a similar issue to CVE-2023-31436
persists.
Consider the following code in net/sched/sch_qfq.c:
static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
unsigned int len = qdisc_pkt_len(skb), gso_segs;
// ...
if (unlikely(cl->agg->lmax < len)) {
pr_debug("qfq: increasing maxpkt from %u to %u for class %u",
cl->agg->lmax, len, cl->common.classid);
err = qfq_change_agg(sch, cl, cl->agg->class_weight, len);
if (err) {
cl->qstats.drops++;
return qdisc_drop(skb, sch, to_free);
}
// ...
}
Similarly to CVE-2023-31436, "lmax" is increased without any bounds
checks according to the packet length "len". Usually this would not
impose a problem because packet sizes are naturally limited.
This is however not the actual packet length, rather the
"qdisc_pkt_len(skb)" which might apply size transformations according to
"struct qdisc_size_table" as created by "qdisc_get_stab()" in
net/sched/sch_api.c if the TCA_STAB option was set when modifying the qdisc.
A user may choose virtually any size using such a table.
As a result the same issue as in CVE-2023-31436 can occur, allowing heap
out-of-bounds read / writes in the kmalloc-8192 cache.
-------
We can create the issue with the following commands:
tc qdisc add dev $DEV root handle 1: stab mtu 2048 tsize 512 mpu 0 \
overhead 999999999 linklayer ethernet qfq
tc class add dev $DEV parent 1: classid 1:1 htb rate 6mbit burst 15k
tc filter add dev $DEV parent 1: matchall classid 1:1
ping -I $DEV 1.1.1.2
This is caused by incorrectly assuming that qdisc_pkt_len() returns a
length within the QFQ_MIN_LMAX < len < QFQ_MAX_LMAX.
Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: Lion <nnamrec@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
25369891fcef deletes a check for the case where no 'lmax' is
specified which 3037933448f6 previously fixed as 'lmax'
could be set to the device's MTU without any bound checking
for QFQ_LMAX_MIN and QFQ_LMAX_MAX. Therefore, reintroduce the check.
Fixes: 25369891fcef ("net/sched: sch_qfq: refactor parsing of netlink parameters")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
pull-request: bpf 2023-07-12
We've added 5 non-merge commits during the last 7 day(s) which contain
a total of 7 files changed, 93 insertions(+), 28 deletions(-).
The main changes are:
1) Fix max stack depth check for async callbacks, from Kumar.
2) Fix inconsistent JIT image generation, from Björn.
3) Use trusted arguments in XDP hints kfuncs, from Larysa.
4) Fix memory leak in cpu_map_update_elem, from Pu.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xdp: use trusted arguments in XDP hints kfuncs
bpf: cpumap: Fix memory leak in cpu_map_update_elem
riscv, bpf: Fix inconsistent JIT image generation
selftests/bpf: Add selftest for check_stack_max_depth bug
bpf: Fix max stack depth check for async callbacks
====================
Link: https://lore.kernel.org/r/20230712223045.40182-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix ethernet header length field after stripping the mesh header
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/CT5GNZSK28AI.2K6M69OXM9RW5@syracuse/
Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces")
Reported-and-tested-by: Nicolas Escande <nico.escande@gmail.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230711115052.68430-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kernel does not currently validate that both the minimum and maximum
ports of a port range are specified. This can lead user space to think
that a filter matching on a port range was successfully added, when in
fact it was not. For example, with a patched (buggy) iproute2 that only
sends the minimum port, the following commands do not return an error:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 100-200 action pass
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp dst_port 100-200 action pass
# tc filter show dev swp1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
filter protocol ip pref 1 flower chain 0 handle 0x2
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action pass
random type none pass val 0
index 2 ref 1 bind 1
Fix by returning an error unless both ports are specified:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 100-200 action pass
Error: Both min and max source ports must be specified.
We have an error talking to the kernel
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp dst_port 100-200 action pass
Error: Both min and max destination ports must be specified.
We have an error talking to the kernel
Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, verifier does not reject XDP programs that pass NULL pointer to
hints functions. At the same time, this case is not handled in any driver
implementation (including veth). For example, changing
bpf_xdp_metadata_rx_timestamp(ctx, ×tamp);
to
bpf_xdp_metadata_rx_timestamp(ctx, NULL);
in xdp_metadata test successfully crashes the system.
Add KF_TRUSTED_ARGS flag to hints kfunc definitions, so driver code
does not have to worry about getting invalid pointers.
Fixes: 3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs")
Reported-by: Stanislav Fomichev <sdf@google.com>
Closes: https://lore.kernel.org/bpf/ZKWo0BbpLfkZHbyE@google.com/
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230711105930.29170-1-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
Direct replacement is safe here since return value of -errno
is used to check for truncation instead of sizeof(dest).
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now in addrconf_mod_rs_timer(), reference idev depends on whether
rs_timer is not pending. Then modify rs_timer timeout.
There is a time gap in [1], during which if the pending rs_timer
becomes not pending. It will miss to hold idev, but the rs_timer
is activated. Thus rs_timer callback function addrconf_rs_timer()
will be executed and put idev later without holding idev. A refcount
underflow issue for idev can be caused by this.
if (!timer_pending(&idev->rs_timer))
in6_dev_hold(idev);
<--------------[1]
mod_timer(&idev->rs_timer, jiffies + when);
To fix the issue, hold idev if mod_timer() return 0.
Fixes: b7b1bfce0bb6 ("ipv6: split duplicate address detection and router solicitation timer")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Amit Klein reported that udp6_ehash_secret was initialized but never used.
Fixes: 1bbdceef1e53 ("inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once")
Reported-by: Amit Klein <aksecurity@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With some IPv6 Ext Hdr (RPL, SRv6, etc.), we can send a packet that
has the link-local address as src and dst IP and will be forwarded to
an external IP in the IPv6 Ext Hdr.
For example, the script below generates a packet whose src IP is the
link-local address and dst is updated to 11::.
# for f in $(find /proc/sys/net/ -name *seg6_enabled*); do echo 1 > $f; done
# python3
>>> from socket import *
>>> from scapy.all import *
>>>
>>> SRC_ADDR = DST_ADDR = "fe80::5054:ff:fe12:3456"
>>>
>>> pkt = IPv6(src=SRC_ADDR, dst=DST_ADDR)
>>> pkt /= IPv6ExtHdrSegmentRouting(type=4, addresses=["11::", "22::"], segleft=1)
>>>
>>> sk = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW)
>>> sk.sendto(bytes(pkt), (DST_ADDR, 0))
For such a packet, we call ip6_route_input() to look up a route for the
next destination in these three functions depending on the header type.
* ipv6_rthdr_rcv()
* ipv6_rpl_srh_rcv()
* ipv6_srh_rcv()
If no route is found, ip6_null_entry is set to skb, and the following
dst_input(skb) calls ip6_pkt_drop().
Finally, in icmp6_dev(), we dereference skb_rt6_info(skb)->rt6i_idev->dev
as the input device is the loopback interface. Then, we have to check if
skb_rt6_info(skb)->rt6i_idev is NULL or not to avoid NULL pointer deref
for ip6_null_entry.
BUG: kernel NULL pointer dereference, address: 0000000000000000
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 157 Comm: python3 Not tainted 6.4.0-11996-gb121d614371c #35
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
FS: 00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<IRQ>
ip6_pkt_drop (net/ipv6/route.c:4513)
ipv6_rthdr_rcv (net/ipv6/exthdrs.c:640 net/ipv6/exthdrs.c:686)
ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
ip6_input_finish (./include/linux/rcupdate.h:781 net/ipv6/ip6_input.c:483)
__netif_receive_skb_one_core (net/core/dev.c:5455)
process_backlog (./include/linux/rcupdate.h:781 net/core/dev.c:5895)
__napi_poll (net/core/dev.c:6460)
net_rx_action (net/core/dev.c:6529 net/core/dev.c:6660)
__do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554)
do_softirq (kernel/softirq.c:454 kernel/softirq.c:441)
</IRQ>
<TASK>
__local_bh_enable_ip (kernel/softirq.c:381)
__dev_queue_xmit (net/core/dev.c:4231)
ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:135)
rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
sock_sendmsg (net/socket.c:725 net/socket.c:748)
__sys_sendto (net/socket.c:2134)
__x64_sys_sendto (net/socket.c:2146 net/socket.c:2142 net/socket.c:2142)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f9dc751baea
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffe98712c38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffe98712cf8 RCX: 00007f9dc751baea
RDX: 0000000000000060 RSI: 00007f9dc6460b90 RDI: 0000000000000003
RBP: 00007f9dc56e8be0 R08: 00007ffe98712d70 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f9dc6af5d1b
</TASK>
Modules linked in:
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
FS: 00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
Reported-by: Wang Yufen <wangyufen@huawei.com>
Closes: https://lore.kernel.org/netdev/c41403a9-c2f6-3b7e-0c96-e1901e605cd0@huawei.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ian reported several skb corruptions triggered by rx-gro-list,
collecting different oops alike:
[ 62.624003] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[ 62.631083] #PF: supervisor read access in kernel mode
[ 62.636312] #PF: error_code(0x0000) - not-present page
[ 62.641541] PGD 0 P4D 0
[ 62.644174] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 62.648629] CPU: 1 PID: 913 Comm: napi/eno2-79 Not tainted 6.4.0 #364
[ 62.655162] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F, BIOS 1.7a 10/13/2022
[ 62.663344] RIP: 0010:__udp_gso_segment (./include/linux/skbuff.h:2858
./include/linux/udp.h:23 net/ipv4/udp_offload.c:228 net/ipv4/udp_offload.c:261
net/ipv4/udp_offload.c:277)
[ 62.687193] RSP: 0018:ffffbd3a83b4f868 EFLAGS: 00010246
[ 62.692515] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000
[ 62.699743] RDX: ffffa124def8a000 RSI: 0000000000000079 RDI: ffffa125952a14d4
[ 62.706970] RBP: ffffa124def8a000 R08: 0000000000000022 R09: 00002000001558c9
[ 62.714199] R10: 0000000000000000 R11: 00000000be554639 R12: 00000000000000e2
[ 62.721426] R13: ffffa125952a1400 R14: ffffa125952a1400 R15: 00002000001558c9
[ 62.728654] FS: 0000000000000000(0000) GS:ffffa127efa40000(0000)
knlGS:0000000000000000
[ 62.736852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 62.742702] CR2: 00000000000000c0 CR3: 00000001034b0000 CR4: 00000000003526e0
[ 62.749948] Call Trace:
[ 62.752498] <TASK>
[ 62.779267] inet_gso_segment (net/ipv4/af_inet.c:1398)
[ 62.787605] skb_mac_gso_segment (net/core/gro.c:141)
[ 62.791906] __skb_gso_segment (net/core/dev.c:3403 (discriminator 2))
[ 62.800492] validate_xmit_skb (./include/linux/netdevice.h:4862
net/core/dev.c:3659)
[ 62.804695] validate_xmit_skb_list (net/core/dev.c:3710)
[ 62.809158] sch_direct_xmit (net/sched/sch_generic.c:330)
[ 62.813198] __dev_queue_xmit (net/core/dev.c:3805 net/core/dev.c:4210)
net/netfilter/core.c:626)
[ 62.821093] br_dev_queue_push_xmit (net/bridge/br_forward.c:55)
[ 62.825652] maybe_deliver (net/bridge/br_forward.c:193)
[ 62.829420] br_flood (net/bridge/br_forward.c:233)
[ 62.832758] br_handle_frame_finish (net/bridge/br_input.c:215)
[ 62.837403] br_handle_frame (net/bridge/br_input.c:298
net/bridge/br_input.c:416)
[ 62.851417] __netif_receive_skb_core.constprop.0 (net/core/dev.c:5387)
[ 62.866114] __netif_receive_skb_list_core (net/core/dev.c:5570)
[ 62.871367] netif_receive_skb_list_internal (net/core/dev.c:5638
net/core/dev.c:5727)
[ 62.876795] napi_complete_done (./include/linux/list.h:37
./include/net/gro.h:434 ./include/net/gro.h:429 net/core/dev.c:6067)
[ 62.881004] ixgbe_poll (drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3191)
[ 62.893534] __napi_poll (net/core/dev.c:6498)
[ 62.897133] napi_threaded_poll (./include/linux/netpoll.h:89
net/core/dev.c:6640)
[ 62.905276] kthread (kernel/kthread.c:379)
[ 62.913435] ret_from_fork (arch/x86/entry/entry_64.S:314)
[ 62.917119] </TASK>
In the critical scenario, rx-gro-list GRO-ed packets are fed, via a
bridge, both to the local input path and to an egress device (tun).
The segmentation of such packets unsafely writes to the cloned skbs
with shared heads.
This change addresses the issue by uncloning as needed the
to-be-segmented skbs.
Reported-by: Ian Kumlien <ian.kumlien@gmail.com>
Tested-by: Ian Kumlien <ian.kumlien@gmail.com>
Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The tracepoint has existed for 12 years, but it only covered udp
over the legacy IPv4 protocol. Having it enabled for udp6 removes
the unnecessary difference in error visibility.
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Fixes: 296f7ea75b45 ("udp: add tracepoints for queueing skb to rcvbuf")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the event of a failure in tcf_change_indev(), fw_set_parms() will
immediately return an error after incrementing or decrementing
reference counter in tcf_bind_filter(). If attacker can control
reference counter to zero and make reference freed, leading to
use after free.
In order to prevent this, move the point of possible failure above the
point where the TC_FW_CLASSID is handled.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: M A Ramdhan <ramdhan@starlabs.sg>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Message-ID: <20230705161530.52003-1-ramdhan@starlabs.sg>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix missing overflow use refcount checks in nf_tables.
2) Do not set IPS_ASSURED for IPS_NAT_CLASH entries in GRE tracker,
from Florian Westphal.
3) Bail out if nf_ct_helper_hash is NULL before registering helper,
from Florent Revest.
4) Use siphash() instead siphash_4u64() to fix performance regression,
also from Florian.
5) Do not allow to add rules to removed chains via ID,
from Thadeu Lima de Souza Cascardo.
6) Fix oob read access in byteorder expression, also from Thadeu.
netfilter pull request 23-07-06
====================
Link: https://lore.kernel.org/r/20230705230406.52201-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.
It may lead to a stack-out-of-bounds access like the one below:
[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint
Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.
Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and wireguard.
Current release - regressions:
- nvme-tcp: fix comma-related oops after sendpage changes
Current release - new code bugs:
- ptp: make max_phase_adjustment sysfs device attribute invisible
when not supported
Previous releases - regressions:
- sctp: fix potential deadlock on &net->sctp.addr_wq_lock
- mptcp:
- ensure subflow is unhashed before cleaning the backlog
- do not rely on implicit state check in mptcp_listen()
Previous releases - always broken:
- net: fix net_dev_start_xmit trace event vs skb_transport_offset()
- Bluetooth:
- fix use-bdaddr-property quirk
- L2CAP: fix multiple UaFs
- ISO: use hci_sync for setting CIG parameters
- hci_event: fix Set CIG Parameters error status handling
- hci_event: fix parsing of CIS Established Event
- MGMT: fix marking SCAN_RSP as not connectable
- wireguard: queuing: use saner cpu selection wrapping
- sched: act_ipt: various bug fixes for iptables <> TC interactions
- sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX
- dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging
- eth: sfc: fix null-deref in devlink port without MAE access
- eth: ibmvnic: do not reset dql stats on NON_FATAL err
Misc:
- xsk: honor SO_BINDTODEVICE on bind"
* tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
nfp: clean mc addresses in application firmware when closing port
selftests: mptcp: pm_nl_ctl: fix 32-bit support
selftests: mptcp: depend on SYN_COOKIES
selftests: mptcp: userspace_pm: report errors with 'remove' tests
selftests: mptcp: userspace_pm: use correct server port
selftests: mptcp: sockopt: return error if wrong mark
selftests: mptcp: sockopt: use 'iptables-legacy' if available
selftests: mptcp: connect: fail if nft supposed to work
mptcp: do not rely on implicit state check in mptcp_listen()
mptcp: ensure subflow is unhashed before cleaning the backlog
s390/qeth: Fix vipa deletion
octeontx-af: fix hardware timestamp configuration
net: dsa: sja1105: always enable the send_meta options
net: dsa: tag_sja1105: fix MAC DA patching from meta frames
net: Replace strlcpy with strscpy
pptp: Fix fib lookup calls.
mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check
net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
xsk: Honor SO_BINDTODEVICE on bind
ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported
...
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-07-05
We've added 2 non-merge commits during the last 1 day(s) which contain
a total of 3 files changed, 16 insertions(+), 4 deletions(-).
The main changes are:
1) Fix BTF to warn but not returning an error for a NULL BTF to still be
able to load modules under CONFIG_DEBUG_INFO_BTF, from SeongJae Park.
2) Fix xsk sockets to honor SO_BINDTODEVICE in bind(), from Ilya Maximets.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xsk: Honor SO_BINDTODEVICE on bind
bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()
====================
Link: https://lore.kernel.org/r/20230705171716.6494-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When adding a rule to a chain referring to its ID, if that chain had been
deleted on the same batch, the rule might end up referring to a deleted
chain.
This will lead to a WARNING like following:
[ 33.098431] ------------[ cut here ]------------
[ 33.098678] WARNING: CPU: 5 PID: 69 at net/netfilter/nf_tables_api.c:2037 nf_tables_chain_destroy+0x23d/0x260
[ 33.099217] Modules linked in:
[ 33.099388] CPU: 5 PID: 69 Comm: kworker/5:1 Not tainted 6.4.0+ #409
[ 33.099726] Workqueue: events nf_tables_trans_destroy_work
[ 33.100018] RIP: 0010:nf_tables_chain_destroy+0x23d/0x260
[ 33.100306] Code: 8b 7c 24 68 e8 64 9c ed fe 4c 89 e7 e8 5c 9c ed fe 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 89 c6 89 c7 c3 cc cc cc cc <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 89 c6 89 c7
[ 33.101271] RSP: 0018:ffffc900004ffc48 EFLAGS: 00010202
[ 33.101546] RAX: 0000000000000001 RBX: ffff888006fc0a28 RCX: 0000000000000000
[ 33.101920] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 33.102649] RBP: ffffc900004ffc78 R08: 0000000000000000 R09: 0000000000000000
[ 33.103018] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880135ef500
[ 33.103385] R13: 0000000000000000 R14: dead000000000122 R15: ffff888006fc0a10
[ 33.103762] FS: 0000000000000000(0000) GS:ffff888024c80000(0000) knlGS:0000000000000000
[ 33.104184] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 33.104493] CR2: 00007fe863b56a50 CR3: 00000000124b0001 CR4: 0000000000770ee0
[ 33.104872] PKRU: 55555554
[ 33.104999] Call Trace:
[ 33.105113] <TASK>
[ 33.105214] ? show_regs+0x72/0x90
[ 33.105371] ? __warn+0xa5/0x210
[ 33.105520] ? nf_tables_chain_destroy+0x23d/0x260
[ 33.105732] ? report_bug+0x1f2/0x200
[ 33.105902] ? handle_bug+0x46/0x90
[ 33.106546] ? exc_invalid_op+0x19/0x50
[ 33.106762] ? asm_exc_invalid_op+0x1b/0x20
[ 33.106995] ? nf_tables_chain_destroy+0x23d/0x260
[ 33.107249] ? nf_tables_chain_destroy+0x30/0x260
[ 33.107506] nf_tables_trans_destroy_work+0x669/0x680
[ 33.107782] ? mark_held_locks+0x28/0xa0
[ 33.107996] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10
[ 33.108294] ? _raw_spin_unlock_irq+0x28/0x70
[ 33.108538] process_one_work+0x68c/0xb70
[ 33.108755] ? lock_acquire+0x17f/0x420
[ 33.108977] ? __pfx_process_one_work+0x10/0x10
[ 33.109218] ? do_raw_spin_lock+0x128/0x1d0
[ 33.109435] ? _raw_spin_lock_irq+0x71/0x80
[ 33.109634] worker_thread+0x2bd/0x700
[ 33.109817] ? __pfx_worker_thread+0x10/0x10
[ 33.110254] kthread+0x18b/0x1d0
[ 33.110410] ? __pfx_kthread+0x10/0x10
[ 33.110581] ret_from_fork+0x29/0x50
[ 33.110757] </TASK>
[ 33.110866] irq event stamp: 1651
[ 33.111017] hardirqs last enabled at (1659): [<ffffffffa206a209>] __up_console_sem+0x79/0xa0
[ 33.111379] hardirqs last disabled at (1666): [<ffffffffa206a1ee>] __up_console_sem+0x5e/0xa0
[ 33.111740] softirqs last enabled at (1616): [<ffffffffa1f5d40e>] __irq_exit_rcu+0x9e/0xe0
[ 33.112094] softirqs last disabled at (1367): [<ffffffffa1f5d40e>] __irq_exit_rcu+0x9e/0xe0
[ 33.112453] ---[ end trace 0000000000000000 ]---
This is due to the nft_chain_lookup_byid ignoring the genmask. After this
change, adding the new rule will fail as it will not find the chain.
Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
Cc: stable@vger.kernel.org
Reported-by: Mingi Cho of Theori working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Originally this used jhash2() over tuple and folded the zone id,
the pernet hash value, destination port and l4 protocol number into the
32bit seed value.
When the switch to siphash was done, I used an on-stack temporary
buffer to build a suitable key to be hashed via siphash().
But this showed up as performance regression, so I got rid of
the temporary copy and collected to-be-hashed data in 4 u64 variables.
This makes it easy to build tuples that produce the same hash, which isn't
desirable even though chain lengths are limited.
Switch back to plain siphash, but just like with jhash2(), take advantage
of the fact that most of to-be-hashed data is already in a suitable order.
Use an empty struct as annotation in 'struct nf_conntrack_tuple' to mark
last member that can be used as hash input.
The only remaining data that isn't present in the tuple structure are the
zone identifier and the pernet hash: fold those into the key.
Fixes: d2c806abcf0b ("netfilter: conntrack: use siphash_4u64")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If nf_conntrack_init_start() fails (for example due to a
register_nf_conntrack_bpf() failure), the nf_conntrack_helper_fini()
clean-up path frees the nf_ct_helper_hash map.
When built with NF_CONNTRACK=y, further netfilter modules (e.g:
netfilter_conntrack_ftp) can still be loaded and call
nf_conntrack_helpers_register(), independently of whether nf_conntrack
initialized correctly. This accesses the nf_ct_helper_hash dangling
pointer and causes a uaf, possibly leading to random memory corruption.
This patch guards nf_conntrack_helper_register() from accessing a freed
or uninitialized nf_ct_helper_hash pointer and fixes possible
uses-after-free when loading a conntrack module.
Cc: stable@vger.kernel.org
Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Florent Revest <revest@chromium.org>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Now that conntrack core is allowd to insert clashing entries, make sure
GRE won't set assured flag on NAT_CLASH entries, just like UDP.
Doing so prevents early_drop logic for these entries.
Fixes: d671fd82eaa9 ("netfilter: conntrack: allow insertion clash of gre protocol")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Overflow use refcount checks are not complete.
Add helper function to deal with object reference counter tracking.
Report -EMFILE in case UINT_MAX is reached.
nft_use_dec() splats in case that reference counter underflows,
which should not ever happen.
Add nft_use_inc_restore() and nft_use_dec_restore() which are used
to restore reference counter from error and abort paths.
Use u32 in nft_flowtable and nft_object since helper functions cannot
work on bitfields.
Remove the few early incomplete checks now that the helper functions
are in place and used to check for refcount overflow.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Since the blamed commit, closing the first subflow resets the first
subflow socket state to SS_UNCONNECTED.
The current mptcp listen implementation relies only on such
state to prevent touching not-fully-disconnected sockets.
Incoming mptcp fastclose (or paired endpoint removal) unconditionally
closes the first subflow.
All the above allows an incoming fastclose followed by a listen() call
to successfully race with a blocking recvmsg(), potentially causing the
latter to hit a divide by zero bug in cleanup_rbuf/__tcp_select_window().
Address the issue explicitly checking the msk socket state in
mptcp_listen(). An alternative solution would be moving the first
subflow socket state update into mptcp_disconnect(), but in the long
term the first subflow socket should be removed: better avoid relaying
on it for internal consistency check.
Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/414
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While tacking care of the mptcp-level listener I unintentionally
moved the subflow level unhash after the subflow listener backlog
cleanup.
That could cause some nasty race and makes the code harder to read.
Address the issue restoring the proper order of operations.
Fixes: 57fc0f1ceaa4 ("mptcp: ensure listener is unhashed before updating the sk status")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
incl_srcpt has the limitation, mentioned in commit b4638af8885a ("net:
dsa: sja1105: always enable the INCL_SRCPT option"), that frames with a
MAC DA of 01:80:c2:xx:yy:zz will be received as 01:80:c2:00:00:zz unless
PTP RX timestamping is enabled.
The incl_srcpt option was initially unconditionally enabled, then that
changed with commit 42824463d38d ("net: dsa: sja1105: Limit use of
incl_srcpt to bridge+vlan mode"), then again with b4638af8885a ("net:
dsa: sja1105: always enable the INCL_SRCPT option"). Bottom line is that
it now needs to be always enabled, otherwise the driver does not have a
reliable source of information regarding source_port and switch_id for
link-local traffic (tag_8021q VLANs may be imprecise since now they
identify an entire bridging domain when ports are not standalone).
If we accept that PTP RX timestamping (and therefore, meta frame
generation) is always enabled in hardware, then that limitation could be
avoided and packets with any MAC DA can be properly received, because
meta frames do contain the original bytes from the MAC DA of their
associated link-local packet.
This change enables meta frame generation unconditionally, which also
has the nice side effects of simplifying the switch control path
(a switch reset is no longer required on hwtstamping settings change)
and the tagger data path (it no longer needs to be informed whether to
expect meta frames or not - it always does).
Fixes: 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through standalone ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The SJA1105 manual says that at offset 4 into the meta frame payload we
have "MAC destination byte 2" and at offset 5 we have "MAC destination
byte 1". These are counted from the LSB, so byte 1 is h_dest[ETH_HLEN-2]
aka h_dest[4] and byte 2 is h_dest[ETH_HLEN-3] aka h_dest[3].
The sja1105_meta_unpack() function decodes these the other way around,
so a frame with MAC DA 01:80:c2:11:22:33 is received by the network
stack as having 01:80:c2:22:11:33.
Fixes: e53e18a6fe4d ("net: dsa: sja1105: Receive and decode meta frames")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The attribute TCA_PEDIT_PARMS_EX is not be included in pedit_policy and
one malicious user could fake a TCA_PEDIT_PARMS_EX whose length is
smaller than the intended sizeof(struct tc_pedit). Hence, the
dereference in tcf_pedit_init() could access dirty heap data.
static int tcf_pedit_init(...)
{
// ...
pattr = tb[TCA_PEDIT_PARMS]; // TCA_PEDIT_PARMS is included
if (!pattr)
pattr = tb[TCA_PEDIT_PARMS_EX]; // but this is not
// ...
parm = nla_data(pattr);
index = parm->index; // parm is able to be smaller than 4 bytes
// and this dereference gets dirty skb_buff
// data created in netlink_sendmsg
}
This commit adds TCA_PEDIT_PARMS_EX length in pedit_policy which avoid
the above case, just like the TCA_PEDIT_PARMS.
Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230703110842.590282-1-linma@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. A
privileged process might create the socket and pass it to a non-privileged
process for later use. However, that process will be able to bind the socket
to any network interface. Even though it will not be able to receive any
traffic without modification of the BPF map, the situation is not ideal.
Sockets already have a mechanism that can be used to restrict what interface
they can be attached to. That is SO_BINDTODEVICE.
To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW.
Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer workflow
when non-privileged process is using AF_XDP.
The intended workflow is following:
1. First process creates a bare socket with socket(AF_XDP, ...).
2. First process loads the XSK program to the interface.
3. First process adds the socket fd to a BPF map.
4. First process ties socket fd to a particular interface using
SO_BINDTODEVICE.
5. First process sends socket fd to a second process.
6. Second process allocates UMEM.
7. Second process binds socket to the interface with bind(...).
8. Second process sends/receives the traffic.
All the steps above are possible today if the first process is privileged
and the second one has sufficient RLIMIT_MEMLOCK and no capabilities.
However, the second process will be able to bind the socket to any interface
it wants on step 7 and send traffic from it. With the proposed change, the
second process will be able to bind the socket only to a specific interface
chosen by the first process at step 4.
Fixes: 965a99098443 ("xsk: add support for bind for Rx")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/bpf/20230703175329.3259672-1-i.maximets@ovn.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
"New Drivers:
- Add support for Intel Cherry Trail Whiskey Cove PMIC LEDs
- Add support for Awinic AW20036/AW20054/AW20072 LEDs
New Device Support:
- Add support for PMI632 LPG to QCom LPG
- Add support for PMI8998 to QCom Flash
- Add support for MT6331, WLEDs and MT6332 to Mediatek MT6323 PMIC
New Functionality:
- Implement the LP55xx Charge Pump
- Add support for suspend / resume to Intel Cherry Trail Whiskey Cove PMIC
- Add support for breathing mode to Intel Cherry Trail Whiskey Cove PMIC
- Enable per-pin resolution Pinctrl in LEDs GPIO
Fix-ups:
- Allow thread to sleep by switching from spinlock to mutex
- Add lots of Device Tree bindings / support
- Adapt relationships / dependencies driven by Kconfig
- Switch I2C drivers from .probe_new() to .probe()
- Remove superfluous / duplicate code
- Replace strlcpy() with strscpy() for efficiency and overflow prevention
- Staticify various functions
- Trivial: Fixing coding style
- Simplify / reduce code
Bug Fixes:
- Prevent NETDEV_LED_MODE_LINKUP from being cleared on rename
- Repair race between led_set_brightness(LED_{OFF,FULL})
- Fix Oops relating to sleeping in critical sections
- Clear LED_INIT_DEFAULT_TRIGGER flag when clearing the current trigger
- Do not leak resources in error handling paths
- Fix unsigned comparison which can never be negative
- Provide missing NULL terminating entries in tables
- Fix misnaming issues"
* tag 'leds-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (53 commits)
leds: leds-mt6323: Adjust return/parameter types in wled get/set callbacks
leds: sgm3140: Add richtek,rt5033-led compatible
dt-bindings: leds: sgm3140: Document richtek,rt5033 compatible
dt-bindings: backlight: kinetic,ktz8866: Add missing type for "current-num-sinks"
dt-bindings: leds: Drop unneeded quotes
leds: Fix config reference for AW200xx driver
leds: leds-mt6323: Add support for WLEDs and MT6332
leds: leds-mt6323: Add support for MT6331 leds
leds: leds-mt6323: Open code and drop MT6323_CAL_HW_DUTY macro
leds: leds-mt6323: Drop MT6323_ prefix from macros and defines
leds: leds-mt6323: Specify registers and specs in platform data
dt-bindings: leds: leds-mt6323: Document mt6332 compatible
dt-bindings: leds: leds-mt6323: Document mt6331 compatible
leds: simatic-ipc-leds-gpio: Introduce more Kconfig switches
leds: simatic-ipc-leds-gpio: Split up into multiple drivers
leds: simatic-ipc-leds-gpio: Move two extra gpio pins into another table
leds: simatic-ipc-leds-gpio: Add terminating entries to gpio tables
leds: flash: leds-qcom-flash: Fix an unsigned comparison which can never be negative
leds: cht-wcove: Remove unneeded semicolon
leds: cht-wcove: Fix an unsigned comparison which can never be negative
...
|