Age | Commit message (Collapse) | Author |
|
We received a report[1] of kernel crashes when Cilium is used in XDP
mode with virtio_net after updating to newer kernels. After
investigating the reason it turned out that when using mergeable bufs
with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf()
calculates the build_skb address wrong because the offset can become less
than the headroom so it gets the address of the previous page (-X bytes
depending on how lower offset is):
page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256
This is a pr_err() I added in the beginning of page_to_skb which clearly
shows offset that is less than headroom by adding 4 bytes of metadata
via an xdp prog. The calculations done are:
receive_mergeable():
headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes
offset = xdp.data - page_address(xdp_page) -
vi->hdr_len - metasize;
page_to_skb():
p = page_address(page) + offset;
...
buf = p - headroom;
Now buf goes -4 bytes from the page's starting address as can be seen
above which is set as skb->head and skb->data by build_skb later. Depending
on what's done with the skb (when it's freed most often) we get all kinds
of corruptions and BUG_ON() triggers in mm[2]. We have to recalculate
the new headroom after the xdp program has run, similar to how offset
and len are recalculated. Headroom is directly related to
data_hard_start, data and data_meta, so we use them to get the new size.
The result is correct (similar pr_err() in page_to_skb, one case of
xdp_page and one case of virtnet buf):
a) Case with 4 bytes of metadata
[ 115.949641] page_to_skb: page addr ffff8b4dcfad2000 offset 252 headroom 252
[ 121.084105] page_to_skb: page addr ffff8b4dcf018000 offset 20732 headroom 252
b) Case of pushing data +32 bytes
[ 153.181401] page_to_skb: page addr ffff8b4dd0c4d000 offset 288 headroom 288
[ 158.480421] page_to_skb: page addr ffff8b4dd00b0000 offset 24864 headroom 288
c) Case of pushing data -33 bytes
[ 835.906830] page_to_skb: page addr ffff8b4dd3270000 offset 223 headroom 223
[ 840.839910] page_to_skb: page addr ffff8b4dcdd68000 offset 12511 headroom 223
Offset and headroom are equal because offset points to the start of
reserved bytes for the virtio_net header which are at buf start +
headroom, while data points at buf start + vnet hdr size + headroom so
when data or data_meta are adjusted by the xdp prog both the headroom size
and the offset change equally. We can use data_hard_start to compute the
new headroom after the xdp prog (linearized / page start case, the
virtnet buf case is similar just with bigger base offset):
xdp.data_hard_start = page_address + vnet_hdr
xdp.data = page_address + vnet_hdr + headroom
new headroom after xdp prog = xdp.data - xdp.data_hard_start - metasize
An example reproducer xdp prog[3] is below.
[1] https://github.com/cilium/cilium/issues/19453
[2] Two of the many traces:
[ 40.437400] BUG: Bad page state in process swapper/0 pfn:14940
[ 40.916726] BUG: Bad page state in process systemd-resolve pfn:053b7
[ 41.300891] kernel BUG at include/linux/mm.h:720!
[ 41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G B W 5.18.0-rc1+ #37
[ 41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
[ 41.306018] RIP: 0010:page_frag_free+0x79/0xe0
[ 41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6
[ 41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292
[ 41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000
[ 41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff
[ 41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff
[ 41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600
[ 41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c
[ 41.317700] FS: 00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000
[ 41.319150] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0
[ 41.321387] Call Trace:
[ 41.321819] <TASK>
[ 41.322193] skb_release_data+0x13f/0x1c0
[ 41.322902] __kfree_skb+0x20/0x30
[ 41.343870] tcp_recvmsg_locked+0x671/0x880
[ 41.363764] tcp_recvmsg+0x5e/0x1c0
[ 41.384102] inet_recvmsg+0x42/0x100
[ 41.406783] ? sock_recvmsg+0x1d/0x70
[ 41.428201] sock_read_iter+0x84/0xd0
[ 41.445592] ? 0xffffffffa3000000
[ 41.462442] new_sync_read+0x148/0x160
[ 41.479314] ? 0xffffffffa3000000
[ 41.496937] vfs_read+0x138/0x190
[ 41.517198] ksys_read+0x87/0xc0
[ 41.535336] do_syscall_64+0x3b/0x90
[ 41.551637] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 41.568050] RIP: 0033:0x48765b
[ 41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[ 41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000
[ 41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b
[ 41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016
[ 41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4
[ 41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9
[ 41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff
[ 41.744254] </TASK>
[ 41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net
and
[ 33.524802] BUG: Bad page state in process systemd-network pfn:11e60
[ 33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e
[ 33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60
[ 33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
[ 33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000
[ 33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000
[ 33.532471] page dumped because: nonzero mapcount
[ 33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net
[ 33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37
[ 33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
[ 33.532484] Call Trace:
[ 33.532496] <TASK>
[ 33.532500] dump_stack_lvl+0x45/0x5a
[ 33.532506] bad_page.cold+0x63/0x94
[ 33.532510] free_pcp_prepare+0x290/0x420
[ 33.532515] free_unref_page+0x1b/0x100
[ 33.532518] skb_release_data+0x13f/0x1c0
[ 33.532524] kfree_skb_reason+0x3e/0xc0
[ 33.532527] ip6_mc_input+0x23c/0x2b0
[ 33.532531] ip6_sublist_rcv_finish+0x83/0x90
[ 33.532534] ip6_sublist_rcv+0x22b/0x2b0
[3] XDP program to reproduce(xdp_pass.c):
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("xdp_pass")
int xdp_pkt_pass(struct xdp_md *ctx)
{
bpf_xdp_adjust_head(ctx, -(int)32);
return XDP_PASS;
}
char _license[] SEC("license") = "GPL";
compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o
load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass
CC: stable@vger.kernel.org
CC: Jason Wang <jasowang@redhat.com>
CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: virtualization@lists.linux-foundation.org
Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220425103703.3067292-1-razor@blackwall.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The other port_hidden functions rely on the port_read/port_write
functions to access the hidden control port. These functions apply the
offset for port_base_addr where applicable. Update port_hidden_wait to
use the port_wait_bit so that port_base_addr offsets are accounted for
when waiting for the busy bit to change.
Without the offset the port_hidden_wait function would timeout on
devices that have a non-zero port_base_addr (e.g. MV88E6141), however
devices that have a zero port_base_addr would operate correctly (e.g.
MV88E6390).
Fixes: 609070133aff ("net: dsa: mv88e6xxx: update code operating on hidden registers")
Signed-off-by: Nathan Rossi <nathan@nathanrossi.com>
Reviewed-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220425070454.348584-1-nathan@nathanrossi.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Return back the error value that we get from phy_read_mmd().
Fixes: c84786fa8f91 ("net: phy: marvell10g: read copper results from CSSR1")
Signed-off-by: Baruch Siach <baruch.siach@siklu.com>
Reviewed-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/f47cb031aeae873bb008ba35001607304a171a20.1650868058.git.baruch@tkos.co.il
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The __warn() prototype is declared in CONFIG_BUG scope but the function
definition in panic.c is unconditional. The IBT enablement started using
it unconditionally but a CONFIG_X86_KERNEL_IBT=y, CONFIG_BUG=n .config
will trigger a
arch/x86/kernel/traps.c: In function ‘__exc_control_protection’:
arch/x86/kernel/traps.c:249:17: error: implicit declaration of function \
‘__warn’; did you mean ‘pr_warn’? [-Werror=implicit-function-declaration]
Pull up the declarations so that they're unconditionally visible too.
[ bp: Rewrite commit message. ]
Fixes: 991625f3dd2c ("x86/ibt: Add IBT feature, MSR and #CP handling")
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220426032007.510245-1-starzhangzsd@gmail.com
|
|
The hardware checksum offloading requires use of a transmit
status block inserted before the outgoing frame data, this was
updated in '9a9ba2a4aaaa ("net: bcmgenet: always enable status blocks")'
However, skb_tx_timestamp() assumes that it is passed a raw frame
and PTP parsing chokes on this status block.
Fix this by calling __skb_pull(), which hides the TSB before calling
skb_tx_timestamp(), so an outgoing PTP packet is parsed correctly.
As the data in the skb has already been set up for DMA, and the
dma_unmap_* calls use a separately stored address, there is no
no effective change in the data transmission.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220424165307.591145-1-jonathan.lemon@gmail.com
Fixes: d03825fba459 ("net: bcmgenet: add skb_tx_timestamp call")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The function mctp_unregister() reclaims the device's relevant resource
when a netcard detaches. However, a running routine may be unaware of
this and cause the use-after-free of the mdev->addrs object.
The race condition can be demonstrated below
cleanup thread another thread
|
unregister_netdev() | mctp_sendmsg()
... | ...
mctp_unregister() | rt = mctp_route_lookup()
... | mctl_local_output()
kfree(mdev->addrs) | ...
| saddr = rt->dev->addrs[0];
|
An attacker can adopt the (recent provided) mtcpserial driver with pty
to fake the device detaching and use the userfaultfd to increase the
race success chance (in mctp_sendmsg). The KASan report for such a POC
is shown below:
[ 86.051955] ==================================================================
[ 86.051955] BUG: KASAN: use-after-free in mctp_local_output+0x4e9/0xb7d
[ 86.051955] Read of size 1 at addr ffff888005f298c0 by task poc/295
[ 86.051955]
[ 86.051955] Call Trace:
[ 86.051955] <TASK>
[ 86.051955] dump_stack_lvl+0x33/0x42
[ 86.051955] print_report.cold.13+0xb2/0x6b3
[ 86.051955] ? preempt_schedule_irq+0x57/0x80
[ 86.051955] ? mctp_local_output+0x4e9/0xb7d
[ 86.051955] kasan_report+0xa5/0x120
[ 86.051955] ? mctp_local_output+0x4e9/0xb7d
[ 86.051955] mctp_local_output+0x4e9/0xb7d
[ 86.051955] ? mctp_dev_set_key+0x79/0x79
[ 86.051955] ? copyin+0x38/0x50
[ 86.051955] ? _copy_from_iter+0x1b6/0xf20
[ 86.051955] ? sysvec_apic_timer_interrupt+0x97/0xb0
[ 86.051955] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 86.051955] ? mctp_local_output+0x1/0xb7d
[ 86.051955] mctp_sendmsg+0x64d/0xdb0
[ 86.051955] ? mctp_sk_close+0x20/0x20
[ 86.051955] ? __fget_light+0x2fd/0x4f0
[ 86.051955] ? mctp_sk_close+0x20/0x20
[ 86.051955] sock_sendmsg+0xdd/0x110
[ 86.051955] __sys_sendto+0x1cc/0x2a0
[ 86.051955] ? __ia32_sys_getpeername+0xa0/0xa0
[ 86.051955] ? new_sync_write+0x335/0x550
[ 86.051955] ? alloc_file+0x22f/0x500
[ 86.051955] ? __ip_do_redirect+0x820/0x1820
[ 86.051955] ? vfs_write+0x44d/0x7b0
[ 86.051955] ? vfs_write+0x44d/0x7b0
[ 86.051955] ? fput_many+0x15/0x120
[ 86.051955] ? ksys_write+0x155/0x1b0
[ 86.051955] ? __ia32_sys_read+0xa0/0xa0
[ 86.051955] __x64_sys_sendto+0xd8/0x1b0
[ 86.051955] ? exit_to_user_mode_prepare+0x2f/0x120
[ 86.051955] ? syscall_exit_to_user_mode+0x12/0x20
[ 86.051955] do_syscall_64+0x3a/0x80
[ 86.051955] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 86.051955] RIP: 0033:0x7f82118a56b3
[ 86.051955] RSP: 002b:00007ffdb154b110 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[ 86.051955] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f82118a56b3
[ 86.051955] RDX: 0000000000000010 RSI: 00007f8211cd4000 RDI: 0000000000000007
[ 86.051955] RBP: 00007ffdb154c1d0 R08: 00007ffdb154b164 R09: 000000000000000c
[ 86.051955] R10: 0000000000000000 R11: 0000000000000293 R12: 000055d779800db0
[ 86.051955] R13: 00007ffdb154c2b0 R14: 0000000000000000 R15: 0000000000000000
[ 86.051955] </TASK>
[ 86.051955]
[ 86.051955] Allocated by task 295:
[ 86.051955] kasan_save_stack+0x1c/0x40
[ 86.051955] __kasan_kmalloc+0x84/0xa0
[ 86.051955] mctp_rtm_newaddr+0x242/0x610
[ 86.051955] rtnetlink_rcv_msg+0x2fd/0x8b0
[ 86.051955] netlink_rcv_skb+0x11c/0x340
[ 86.051955] netlink_unicast+0x439/0x630
[ 86.051955] netlink_sendmsg+0x752/0xc00
[ 86.051955] sock_sendmsg+0xdd/0x110
[ 86.051955] __sys_sendto+0x1cc/0x2a0
[ 86.051955] __x64_sys_sendto+0xd8/0x1b0
[ 86.051955] do_syscall_64+0x3a/0x80
[ 86.051955] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 86.051955]
[ 86.051955] Freed by task 301:
[ 86.051955] kasan_save_stack+0x1c/0x40
[ 86.051955] kasan_set_track+0x21/0x30
[ 86.051955] kasan_set_free_info+0x20/0x30
[ 86.051955] __kasan_slab_free+0x104/0x170
[ 86.051955] kfree+0x8c/0x290
[ 86.051955] mctp_dev_notify+0x161/0x2c0
[ 86.051955] raw_notifier_call_chain+0x8b/0xc0
[ 86.051955] unregister_netdevice_many+0x299/0x1180
[ 86.051955] unregister_netdevice_queue+0x210/0x2f0
[ 86.051955] unregister_netdev+0x13/0x20
[ 86.051955] mctp_serial_close+0x6d/0xa0
[ 86.051955] tty_ldisc_kill+0x31/0xa0
[ 86.051955] tty_ldisc_hangup+0x24f/0x560
[ 86.051955] __tty_hangup.part.28+0x2ce/0x6b0
[ 86.051955] tty_release+0x327/0xc70
[ 86.051955] __fput+0x1df/0x8b0
[ 86.051955] task_work_run+0xca/0x150
[ 86.051955] exit_to_user_mode_prepare+0x114/0x120
[ 86.051955] syscall_exit_to_user_mode+0x12/0x20
[ 86.051955] do_syscall_64+0x46/0x80
[ 86.051955] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 86.051955]
[ 86.051955] The buggy address belongs to the object at ffff888005f298c0
[ 86.051955] which belongs to the cache kmalloc-8 of size 8
[ 86.051955] The buggy address is located 0 bytes inside of
[ 86.051955] 8-byte region [ffff888005f298c0, ffff888005f298c8)
[ 86.051955]
[ 86.051955] The buggy address belongs to the physical page:
[ 86.051955] flags: 0x100000000000200(slab|node=0|zone=1)
[ 86.051955] raw: 0100000000000200 dead000000000100 dead000000000122 ffff888005c42280
[ 86.051955] raw: 0000000000000000 0000000080660066 00000001ffffffff 0000000000000000
[ 86.051955] page dumped because: kasan: bad access detected
[ 86.051955]
[ 86.051955] Memory state around the buggy address:
[ 86.051955] ffff888005f29780: 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00
[ 86.051955] ffff888005f29800: fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc
[ 86.051955] >ffff888005f29880: fc fc fc fb fc fc fc fc fa fc fc fc fc fa fc fc
[ 86.051955] ^
[ 86.051955] ffff888005f29900: fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc
[ 86.051955] ffff888005f29980: fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc
[ 86.051955] ==================================================================
To this end, just like the commit e04480920d1e ("Bluetooth: defer
cleanup of resources in hci_unregister_dev()") this patch defers the
destructive kfree(mdev->addrs) in mctp_unregister to the mctp_dev_put,
where the refcount of mdev is zero and the entire device is reclaimed.
This prevents the use-after-free because the sendmsg thread holds the
reference of mdev in the mctp_route object.
Fixes: 583be982d934 (mctp: Add device handling and netlink interface)
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20220422114340.32346-1-linma@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
plane_state->uapi.crtc is not what we want to be looking at.
If bigjoiner is used hw.crtc is what tells us what crtc the plane
is supposedly using.
Not an actual problem on current hardware as the only FBC capable
pipe (A) can't be a bigjoiner slave and thus uapi.crtc==hw.crtc
always here. But when we get more FBC instances this will become
actually important.
Fixes: 2e6c99f88679 ("drm/i915/fbc: Nuke lots of crap from intel_fbc_state_cache")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220413152852.7336-1-ville.syrjala@linux.intel.com
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
(cherry picked from commit 3e1faae3398789abe8d4797255bfe28d95d81308)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
Fix typo in the _SEL_FETCH_PLANE_BASE_1_B register base address.
Fixes: a5523e2ff074a5 ("drm/i915: Add PSR2 selective fetch registers")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/5400
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: <stable@vger.kernel.org> # v5.9+
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220421162221.2261895-1-imre.deak@intel.com
(cherry picked from commit af2cbc6ef967f61711a3c40fca5366ea0bc7fecc)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
It's noted that dcvs interrupts are not self-clearing, thus an interrupt
handler runs constantly, which leads to a severe regression in runtime.
To fix the problem an explicit write to clear interrupt register is
required, note that on OSM platforms the register may not be present.
Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
'size' may be used uninitialized in gsm_dlci_modem_output() if called with
an adaption that is neither 1 nor 2. The function is currently only called
by gsm_modem_upd_via_data() and only for adaption 2.
Properly handle every invalid case by returning -EINVAL to silence the
compiler warning and avoid future regressions.
Fixes: c19ffe00fed6 ("tty: n_gsm: fix invalid use of MSC in advanced option")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Starke <daniel.starke@siemens.com>
Link: https://lore.kernel.org/r/20220425104726.7986-1-daniel.starke@siemens.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-fixes
Pull Allwinner clk fixes from Jernej Skrabec:
- Add missing sentinel
- check return value for platform_get_resource()
- mark rtc-32k as critical
* tag 'sunxi-clk-fixes-for-5.18-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource()
clk: sunxi-ng: sun6i-rtc: Mark rtc-32k as critical
clk: sunxi-ng: fix not NULL terminated coccicheck error
|
|
Since version 5.13, the standard syscon bindings have been added
to all clps711x DT nodes, so we can now use the more general
syscon_regmap_lookup_by_phandle function to get the syscon pointer.
Signed-off-by: Alexander Shiyan <eagle.alexander923@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Wen Gu says:
====================
net/smc: Two fixes for smc fallback
This patch set includes two fixes for smc fallback:
Patch 1/2 introduces some simple helpers to wrap the replacement
and restore of clcsock's callback functions. Make sure that only
the original callbacks will be saved and not overwritten.
Patch 2/2 fixes a syzbot reporting slab-out-of-bound issue where
smc_fback_error_report() accesses the already freed smc sock (see
https://lore.kernel.org/r/00000000000013ca8105d7ae3ada@google.com/).
The patch fixes it by resetting sk_user_data and restoring clcsock
callback functions timely in fallback situation.
But it should be noted that although patch 2/2 can fix the issue
of 'slab-out-of-bounds/use-after-free in smc_fback_error_report',
it can't pass the syzbot reproducer test. Because after applying
these two patches in upstream, syzbot reproducer triggered another
known issue like this:
==================================================================
BUG: KASAN: use-after-free in tcp_retransmit_timer+0x2ef3/0x3360 net/ipv4/tcp_timer.c:511
Read of size 8 at addr ffff888020328380 by task udevd/4158
CPU: 1 PID: 4158 Comm: udevd Not tainted 5.18.0-rc3-syzkaller-00074-gb05a5683eba6-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313
print_report mm/kasan/report.c:429 [inline]
kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
tcp_retransmit_timer+0x2ef3/0x3360 net/ipv4/tcp_timer.c:511
tcp_write_timer_handler+0x5e6/0xbc0 net/ipv4/tcp_timer.c:622
tcp_write_timer+0xa2/0x2b0 net/ipv4/tcp_timer.c:642
call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421
expire_timers kernel/time/timer.c:1466 [inline]
__run_timers.part.0+0x679/0xa80 kernel/time/timer.c:1737
__run_timers kernel/time/timer.c:1715 [inline]
run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1750
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097
</IRQ>
...
(detail report can be found in https://syzkaller.appspot.com/text?tag=CrashReport&x=15406b44f00000)
IMHO, the above issue is the same as this known one: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed,
and it doesn't seem to be related with SMC. The discussion about this known issue is ongoing and can be found in
https://lore.kernel.org/bpf/000000000000f75af905d3ba0716@google.com/T/.
And I added the temporary solution mentioned in the above discussion on
top of my two patches, the syzbot reproducer of 'slab-out-of-bounds/
use-after-free in smc_fback_error_report' no longer triggers any issue.
====================
Link: https://lore.kernel.org/r/1650614179-11529-1-git-send-email-guwen@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported a slab-out-of-bounds/use-after-free issue,
which was caused by accessing an already freed smc sock in
fallback-specific callback functions of clcsock.
This patch fixes the issue by restoring fallback-specific
callback functions to original ones and resetting clcsock
sk_user_data to NULL before freeing smc sock.
Meanwhile, this patch introduces sk_callback_lock to make
the access and assignment to sk_user_data mutually exclusive.
Reported-by: syzbot+b425899ed22c6943e00b@syzkaller.appspotmail.com
Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
Link: https://lore.kernel.org/r/00000000000013ca8105d7ae3ada@google.com/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Both listen and fallback process will save the current clcsock
callback functions and establish new ones. But if both of them
happen, the saved callback functions will be overwritten.
So this patch introduces some helpers to ensure that only save
the original callback functions of clcsock.
Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
"This includes major bug fixes introduced in 5.18-rc1 and 5.17+:
- Remove obsolete whint_mode (5.18-rc1)
- Fix IO split issue caused by op_flags change in f2fs (5.18-rc1)
- Fix a wrong condition check to detect IO failure loop (5.18-rc1)
- Fix wrong data truncation during roll-forward (5.17+)"
* tag 'f2fs-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: should not truncate blocks during roll-forward recovery
f2fs: fix wrong condition check when failing metapage read
f2fs: keep io_flags to avoid IO split due to different op_flags in two fio holders
f2fs: remove obsolete whint_mode
|
|
It will cause null-ptr-deref if platform_get_resource() returns NULL,
we need check the return value.
Fixes: 7a6fca879f59 ("clk: sunxi: Add driver for A80 MMC config clocks/resets")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20220421134308.2885094-1-yangyingliang@huawei.com
|
|
This code is really spurious.
It always returns an ERR_PTR, even when err is known to be 0 and calls
put_device() after a successful device_register() call.
It is likely that the return statement in the normal path is missing.
Add 'return rdev;' to fix it.
Fixes: d787dcdb9c8f ("bus: sunxi-rsb: Add driver for Allwinner Reduced Serial Bus")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Tested-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/ef2b9576350bba4c8e05e669e9535e9e2a415763.1650551719.git.christophe.jaillet@wanadoo.fr
|
|
It turns out that for the CONFIG_MMU=n builds, vmalloc_huge() was never
defined, since it's defined in mm/vmalloc.c, which doesn't get built for
the no-MMU configurations.
Just implement the trivial wrapper for the no-MMU case too. In fact,
just make it an alias to the existing __vmalloc() function that has the
same signature.
Link: https://lore.kernel.org/all/CAMuHMdVdx2V1uhv_152Sw3_z2xE0spiaWp1d6Ko8-rYmAxUBAg@mail.gmail.com/
Link: https://lore.kernel.org/all/CA+G9fYscb1y4a17Sf5G_Aibt+WuSf-ks_Qjw9tYFy=A4sjCEug@mail.gmail.com/
Link: https://lore.kernel.org/all/20220425150356.GA4138752@roeck-us.net/
Reported-and-tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
functions
Fix the documentation for the hsiphash functions to avoid conflating the
HalfSipHash algorithm with the hsiphash functions, since these functions
actually implement either HalfSipHash or SipHash, and random.c now uses
HalfSipHash (in a very special way) without the hsiphash functions.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Render usage example of HalfSipHash function as code block by using
literal block syntax.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Render danger paragraph into warning block for emphasization.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This reverts 35a33ff3807d ("random: use memmove instead of memcpy for
remaining 32 bytes"), which was made on a totally bogus basis. The thing
it was worried about overlapping came from the stack, not from one of
its arguments, as Eric pointed out.
But the fact that this confusion even happened draws attention to the
fact that it's a bit non-obvious that the random_data parameter can
alias chacha_state, and in fact should do so when the caller can't rely
on the stack being cleared in a timely manner. So this commit documents
that.
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This reverts commit 1a67653de0dd, which caused a boot regression.
The behavior of the "drive-push-pull" in the kernel does not
match what the binding document describes. Revert Rob's patch
to make the DT match the kernel again, rather than the binding.
Link: https://lore.kernel.org/lkml/YlVAy95eF%2F9b1nmu@orome/
Reported-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
syzbot complained about a recent change in TCP stack,
hitting a NULL pointer [1]
tcp request sockets have an af_specific pointer, which
was used before the blamed change only for SYNACK generation
in non SYNCOOKIE mode.
tcp requests sockets momentarily created when third packet
coming from client in SYNCOOKIE mode were not using
treq->af_specific.
Make sure this field is populated, in the same way normal
TCP requests sockets do in tcp_conn_request().
[1]
TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies. Check SNMP counters.
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 1 PID: 3695 Comm: syz-executor864 Not tainted 5.18.0-rc3-syzkaller-00224-g5fd1fe4807f9 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:tcp_create_openreq_child+0xe16/0x16b0 net/ipv4/tcp_minisocks.c:534
Code: 48 c1 ea 03 80 3c 02 00 0f 85 e5 07 00 00 4c 8b b3 28 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7e 08 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 c9 07 00 00 48 8b 3c 24 48 89 de 41 ff 56 08 48
RSP: 0018:ffffc90000de0588 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff888076490330 RCX: 0000000000000100
RDX: 0000000000000001 RSI: ffffffff87d67ff0 RDI: 0000000000000008
RBP: ffff88806ee1c7f8 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff87d67f00 R11: 0000000000000000 R12: ffff88806ee1bfc0
R13: ffff88801b0e0368 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f517fe58700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffcead76960 CR3: 000000006f97b000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
tcp_v6_syn_recv_sock+0x199/0x23b0 net/ipv6/tcp_ipv6.c:1267
tcp_get_cookie_sock+0xc9/0x850 net/ipv4/syncookies.c:207
cookie_v6_check+0x15c3/0x2340 net/ipv6/syncookies.c:258
tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1131 [inline]
tcp_v6_do_rcv+0x1148/0x13b0 net/ipv6/tcp_ipv6.c:1486
tcp_v6_rcv+0x3305/0x3840 net/ipv6/tcp_ipv6.c:1725
ip6_protocol_deliver_rcu+0x2e9/0x1900 net/ipv6/ip6_input.c:422
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:464
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:473
dst_input include/net/dst.h:461 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ipv6_rcv+0x27f/0x3b0 net/ipv6/ip6_input.c:297
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405
__netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519
process_backlog+0x3a0/0x7c0 net/core/dev.c:5847
__napi_poll+0xb3/0x6e0 net/core/dev.c:6413
napi_poll net/core/dev.c:6480 [inline]
net_rx_action+0x8ec/0xc60 net/core/dev.c:6567
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097
Fixes: 5b0b9e4c2c89 ("tcp: md5: incorrect tcp_header_len for incoming connections")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I had this bug sitting for too long in my pile, it is time to fix it.
Thanks to Doug Porter for reminding me of it!
We had various attempts in the past, including commit
0cbe6a8f089e ("tcp: remove SOCK_QUEUE_SHRUNK"),
but the issue is that TCP stack currently only generates
EPOLLOUT from input path, when tp->snd_una has advanced
and skb(s) cleaned from rtx queue.
If a flow has a big RTT, and/or receives SACKs, it is possible
that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0
and no more data can be sent until tp->snd_una finally advances.
What is needed is to also check if POLLOUT needs to be generated
whenever tp->snd_nxt is advanced, from output path.
This bug triggers more often after an idle period, as
we do not receive ACK for at least one RTT. tcp_notsent_lowat
could be a fraction of what CWND and pacing rate would allow to
send during this RTT.
In a followup patch, I will remove the bogus call
to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED)
from tcp_check_space(). Fact that we have decided to generate
an EPOLLOUT does not mean the application has immediately
refilled the transmit queue. This optimistic call
might have been the reason the bug seemed not too serious.
Tested:
200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2]
$ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat
$ cat bench_rr.sh
SUM=0
for i in {1..10}
do
V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"`
echo $V
SUM=$(($SUM + $V))
done
echo SUM=$SUM
Before patch:
$ bench_rr.sh
130000000
80000000
140000000
140000000
140000000
140000000
130000000
40000000
90000000
110000000
SUM=1140000000
After patch:
$ bench_rr.sh
430000000
590000000
530000000
450000000
450000000
350000000
450000000
490000000
480000000
460000000
SUM=4680000000 # This is 410 % of the value before patch.
Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Doug Porter <dsp@fb.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bridge
DSA, through dsa_port_bridge_leave(), first notifies the port of the
fact that it left a bridge, then, if that bridge was VLAN-aware, it
notifies the port of the change in VLAN awareness state, towards
VLAN-unaware mode.
So ocelot_port_vlan_filtering() can be called when ocelot_port->bridge
is NULL, and this makes ocelot_add_vlan_unaware_pvid() create a struct
ocelot_bridge_vlan with a vid of 0 and an "untagged" setting of true on
that port.
In a way this structure correctly reflects the reality, but by design,
VID 0 (OCELOT_STANDALONE_PVID) was not meant to be kept in the bridge
VLAN list of the driver, but managed separately.
Having OCELOT_STANDALONE_PVID in ocelot->vlans makes us trip up on
several sanity checks that did not expect to have this VID there.
For example, after we leave a VLAN-aware bridge and we re-join it, we
can no longer program egress-tagged VLANs to hardware:
# ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
# ip link set swp0 master br0
# ip link set swp0 nomaster
# ip link set swp0 master br0
# bridge vlan add dev swp0 vid 100
Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs.
But this configuration is in fact supported by the hardware, since we
could use OCELOT_PORT_TAG_NATIVE. According to its comment:
/* all VLANs except the native VLAN and VID 0 are egress-tagged */
yet when assessing the eligibility for this mode, we do not check for
VID 0 in ocelot_port_uses_native_vlan(), instead we just ensure that
ocelot_port_num_untagged_vlans() == 1. This is simply because VID 0
doesn't have a bridge VLAN structure.
The way I identify the problem is that ocelot_port_vlan_filtering(false)
only means to call ocelot_add_vlan_unaware_pvid() when we dynamically
turn off VLAN awareness for a bridge we are under, and the PVID changes
from the bridge PVID to a reserved PVID based on the bridge number.
Since OCELOT_STANDALONE_PVID is statically added to the VLAN table
during ocelot_vlan_init() and never removed afterwards, calling
ocelot_add_vlan_unaware_pvid() for it is not intended and does not serve
any purpose.
Fix the issue by avoiding the call to ocelot_add_vlan_unaware_pvid(vid=0)
when we're resetting VLAN awareness after leaving the bridge, to become
a standalone port.
Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Both the felix DSA driver and ocelot switchdev driver declare
dev->features & NETIF_F_HW_VLAN_CTAG_FILTER under certain circumstances*,
so the 8021q module will add VID 0 to our RX filter when the port goes
up, to ensure 802.1p traffic is not dropped.
We treat VID 0 as a special value (OCELOT_STANDALONE_PVID) which
deliberately does not have a struct ocelot_bridge_vlan associated with
it. Instead, this gets programmed to the VLAN table in ocelot_vlan_init().
If we allow external calls to modify VID 0, we reach the following
situation:
# ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
# ip link set swp0 master br0
# ip link set swp0 up # this adds VID 0 to ocelot->vlans with untagged=false
bridge vlan
port vlan-id
swp0 1 PVID Egress Untagged # the bridge also adds VID 1
br0 1 PVID Egress Untagged
# bridge vlan add dev swp0 vid 100 untagged
Error: mscc_ocelot_switch_lib: Port with egress-tagged VLANs cannot have more than one egress-untagged (native) VLAN.
This configuration should have been accepted, because
ocelot_port_manage_port_tag() should select OCELOT_PORT_TAG_NATIVE.
Yet it isn't, because we have an entry in ocelot->vlans which says
VID 0 should be egress-tagged, something the hardware can't do.
Fix this by suppressing additions/deletions on VID 0 and managing this
VLAN exclusively using OCELOT_STANDALONE_PVID.
*DSA toggles it when the port becomes VLAN-aware by joining a VLAN-aware
bridge. Ocelot declares it unconditionally for some reason.
Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Certain DSA switches can eliminate flooding to the CPU when none of the
ports have the IFF_ALLMULTI or IFF_PROMISC flags set. This is done by
synthesizing a call to dsa_port_bridge_flags() for the CPU port, a call
which normally comes from the bridge driver via switchdev.
The bridge port flags and IFF_PROMISC|IFF_ALLMULTI have slightly
different semantics, and due to inattention/lack of proper testing, the
IFF_PROMISC flag allows unknown unicast to be flooded to the CPU, but
not unknown multicast.
This must be fixed by setting both BR_FLOOD (unicast) and BR_MCAST_FLOOD
in the synthesized dsa_port_bridge_flags() call, since IFF_PROMISC means
that packets should not be filtered regardless of their MAC DA.
Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As pointed out by Jakub Kicinski, currently using TUNNEL_SEQ in
collect_md mode is racy for [IP6]GRE[TAP] devices. Consider the
following sequence of events:
1. An [IP6]GRE[TAP] device is created in collect_md mode using "ip link
add ... external". "ip" ignores "[o]seq" if "external" is specified,
so TUNNEL_SEQ is off, and the device is marked as NETIF_F_LLTX (i.e.
it uses lockless TX);
2. Someone sets TUNNEL_SEQ on outgoing skb's, using e.g.
bpf_skb_set_tunnel_key() in an eBPF program attached to this device;
3. gre_fb_xmit() or __gre6_xmit() processes these skb's:
gre_build_header(skb, tun_hlen,
flags, protocol,
tunnel_id_to_key32(tun_info->key.tun_id),
(flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++)
: 0); ^^^^^^^^^^^^^^^^^
Since we are not using the TX lock (&txq->_xmit_lock), multiple CPUs may
try to do this tunnel->o_seqno++ in parallel, which is racy. Fix it by
making o_seqno atomic_t.
As mentioned by Eric Dumazet in commit b790e01aee74 ("ip_gre: lockless
xmit"), making o_seqno atomic_t increases "chance for packets being out
of order at receiver" when NETIF_F_LLTX is on.
Maybe a better fix would be:
1. Do not ignore "oseq" in external mode. Users MUST specify "oseq" if
they want the kernel to allow sequencing of outgoing packets;
2. Reject all outgoing TUNNEL_SEQ packets if the device was not created
with "oseq".
Unfortunately, that would break userspace.
We could now make [IP6]GRE[TAP] devices always NETIF_F_LLTX, but let us
do it in separate patches to keep this fix minimal.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.")
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For IP6GRE and IP6GRETAP devices, currently o_seqno starts from 1 in
native mode. According to RFC 2890 2.2., "The first datagram is sent
with a sequence number of 0." Fix it.
It is worth mentioning that o_seqno already starts from 0 in collect_md
mode, see the "if (tunnel->parms.collect_md)" clause in __gre6_xmit(),
where tunnel->o_seqno is passed to gre_build_header() before getting
incremented.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For GRE and GRETAP devices, currently o_seqno starts from 1 in native
mode. According to RFC 2890 2.2., "The first datagram is sent with a
sequence number of 0." Fix it.
It is worth mentioning that o_seqno already starts from 0 in collect_md
mode, see gre_fb_xmit(), where tunnel->o_seqno is passed to
gre_build_header() before getting incremented.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The lan966x->ports[] array has lan966x->num_phys_ports elements. These
are assigned in lan966x_probe(). That means the > comparison should be
changed to >=.
The first off by one check is harmless but the second one could lead to
an out of bounds access and a crash.
Fixes: 5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the current implementation, when TCP initiates a connection
to an unavailable [ip,port], ECONNREFUSED will be stored in the
TCP socket, but SMC will not. However, some apps (like curl) use
getsockopt(,,SO_ERROR,,) to get the error information, which makes
them miss the error message and behave strangely.
Fixes: 50717a37db03 ("net/smc: nonblocking connect rework")
Signed-off-by: liuyacan <liuyacan@corp.netease.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In one of the error paths of the device_for_each_child_node() loop
in hns_mac_init, add missing call to fwnode_handle_put.
Signed-off-by: Peng Wu <wupeng58@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Guangbin Huang says:
====================
net: hns3: add some fixes for -net
This series adds some fixes for the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, there are some querying mailboxes sent from VF to PF,
and VF will wait the PF's handling result. For mailbox
HCLGE_MBX_GET_QID_IN_PF and HCLGE_MBX_GET_RSS_KEY, it may fail
when the input parameter is invalid, but the prototype of their
handler function is void. In this case, PF always return success
to VF, which may cause the VF get incorrect result.
Fixes it by adding return value for these function.
Fixes: 63b1279d9905 ("net: hns3: check queue id range before using")
Fixes: 532cfc0df1e4 ("net: hns3: add a check for index in hclge_get_rss_key()")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add validity check for message data length in function
hclge_send_mbx_msg(), avoid unexpected overflow.
Fixes: dde1a86e93ca ("net: hns3: Add mailbox support to PF driver")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, function hclge_get_ring_chain_from_mbx will return -ENOMEM if
ring_num is bigger than HCLGE_MBX_MAX_RING_CHAIN_PARAM_NUM. It is better to
return -EINVAL for the invalid parameter case.
So this patch fixes it by return -EINVAL in this abnormal branch.
Fixes: 5d02a58dae60 ("net: hns3: fix for buffer overflow smatch warning")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The comments in function hclge_comm_tqps_update_stats is not right,
so fix it.
Fixes: 287db5c40d15 ("net: hns3: create new set of common tqp stats APIs for PF and VF reuse")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For debugfs node rx/tx_queue_info and rx/tx_bd_info, their output info is
aligned to the right, it's not aligned with output of other debugfs node,
so uniform their output info.
Fixes: 907676b13071 ("net: hns3: use tx bounce buffer for small packets")
Fixes: e44c495d95e0 ("net: hns3: refactor queue info of debugfs")
Fixes: 77e9184869c9 ("net: hns3: refactor dump bd info of debugfs")
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If failed to register netdev, it needs to clear INITED state and stop
client in case of cause problem when concurrency with uninitialized
process of driver.
Fixes: a289a7e5c1d4 ("net: hns3: put off calling register_netdev() until client initialize complete")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix incorrect printing of memory size of IPVS connection hash table,
from Pengcheng Yang.
2) Fix spurious EEXIST errors in nft_set_rbtree.
3) Remove leftover empty flowtable file, from Rongguang Wei.
4) Fix ip6_route_me_harder() with vrf driver, from Martin Willi.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit referenced below fixed packet re-routing if Netfilter mangles
a routing key property of a packet and the packet is routed in a VRF L3
domain. The fix, however, addressed IPv4 re-routing, only.
This commit applies the same behavior for IPv6. While at it, untangle
the nested ternary operator to make the code more readable.
Fixes: 6d8b49c3a3a3 ("netfilter: Update ip_route_me_harder to consider L3 domain")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Willi <martin@strongswan.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The request will be inserted into the ci->i_unsafe_dirops before
assigning the req->r_session, so it's possible that we will hit
NULL pointer dereference bug here.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/55327
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Once the session is opened the s->s_ttl will be set, and when receiving
a new mdsmap and the MDS map is changed, it will be possibly will close
some sessions and open new ones. And then some sessions will be in
CLOSING state evening without unmounting.
URL: https://tracker.ceph.com/issues/54979
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
ceph_add_cap says in its function documentation that the caller should
hold the read lock on the session snap_rwsem. Furthermore, not only
ceph_add_cap needs that lock, when it calls to ceph_lookup_snap_realm it
eventually calls ceph_get_snap_realm which states via lockdep that
snap_rwsem needs to be held. handle_cap_export calls ceph_add_cap
without that mdsc->snap_rwsem held. Thus, since ceph_get_snap_realm
and ceph_add_cap both need the lock, the common place to acquire that
lock is inside handle_cap_export.
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
CONFIG_NF_FLOW_TABLE_IPV4 is already removed and the real user is also
removed(nf_flow_table_ipv4.c is empty).
Fixes: c42ba4290b2147aa ("netfilter: flowtable: remove ipv4/ipv6 modules")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
--------------------------------------------
unreferenced object 0xffff000010742a00 (size 128):
comm "swapper/0", pid 1, jiffies 4294902015 (age 1187.652s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000b4dfebaa>] __kmalloc+0x338/0x474
[<00000000d6e716db>] sun50i_cpufreq_nvmem_probe+0xc4/0x36c
[<000000007d6082a0>] platform_probe+0x98/0x11c
[<00000000c990f549>] really_probe+0x234/0x5a0
[<000000002d9fecc6>] __driver_probe_device+0x194/0x224
[<00000000cf0b94fa>] driver_probe_device+0x64/0x13c
[<00000000f238e4cf>] __device_attach_driver+0xf8/0x180
[<000000006720e418>] bus_for_each_drv+0xf8/0x160
[<00000000df4f14f6>] __device_attach+0x174/0x29c
[<00000000782002fb>] device_initial_probe+0x20/0x30
[<00000000c2681b06>] bus_probe_device+0xfc/0x110
[<00000000964cf3bd>] device_add+0x5f0/0xcd0
[<000000004b9264e3>] platform_device_add+0x198/0x390
[<00000000fa82a9d0>] platform_device_register_full+0x178/0x210
[<000000009a5daf13>] sun50i_cpufreq_init+0xf8/0x168
[<000000000377cc7c>] do_one_initcall+0xe4/0x570
--------------------------------------------
if sun50i_cpufreq_get_efuse failed, then opp_tables leak.
Fixes: f328584f7bff ("cpufreq: Add sun50i nvmem based CPU scaling driver")
Signed-off-by: Xiaobing Luo <luoxiaobing0926@gmail.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|