summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-09-06Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-09-06 We've added 9 non-merge commits during the last 6 day(s) which contain a total of 12 files changed, 189 insertions(+), 44 deletions(-). The main changes are: 1) Fix bpf_sk_storage to address an invalid wait context lockdep report and another one to address missing omem uncharge, from Martin KaFai Lau. 2) Two BPF recursion detection related fixes, from Sebastian Andrzej Siewior. 3) Fix tailcall limit enforcement in trampolines for s390 JIT, from Ilya Leoshkevich. 4) Fix a sockmap refcount race where skbs in sk_psock_backlog can be referenced after user space side has already skb_consumed them, from John Fastabend. 5) Fix BPF CI flake/race wrt sockmap vsock write test where the transport endpoint is not connected, from Xu Kuohai. 6) Follow-up doc fix to address a cross-link warning, from Eduard Zingerman. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc bpf: bpf_sk_storage: Fix invalid wait context lockdep report s390/bpf: Pass through tail call counter in trampolines bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check. bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf(). bpf, sockmap: Fix skb refcnt race after locking changes docs/bpf: Fix "file doesn't exist" warnings in {llvm_reloc,btf}.rst selftests/bpf: Fix a CI failure caused by vsock write ==================== Link: https://lore.kernel.org/r/20230906095117.16941-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-09-06Merge tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "Mixed with some fixes and cleanups, this brings in reasonably complete fscrypt support to CephFS! The list of things which don't work with encryption should be fairly short, mostly around the edges: fallocate (not supported well in CephFS to begin with), copy_file_range (requires re-encryption), non-default striping patterns. This was a multi-year effort principally by Jeff Layton with assistance from Xiubo Li, Luís Henriques and others, including several dependant changes in the MDS, netfs helper library and fscrypt framework itself" * tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client: (53 commits) ceph: make num_fwd and num_retry to __u32 ceph: make members in struct ceph_mds_request_args_ext a union rbd: use list_for_each_entry() helper libceph: do not include crypto/algapi.h ceph: switch ceph_lookup/atomic_open() to use new fscrypt helper ceph: fix updating i_truncate_pagecache_size for fscrypt ceph: wait for OSD requests' callbacks to finish when unmounting ceph: drop messages from MDS when unmounting ceph: update documentation regarding snapshot naming limitations ceph: prevent snapshot creation in encrypted locked directories ceph: add support for encrypted snapshot names ceph: invalidate pages when doing direct/sync writes ceph: plumb in decryption during reads ceph: add encryption support to writepage and writepages ceph: add read/modify/write to ceph_sync_write ceph: align data in pages in ceph_sync_write ceph: don't use special DIO path for encrypted inodes ceph: add truncate size handling support for fscrypt ceph: add object version support for sync read libceph: allow ceph_osdc_new_request to accept a multi-op read ...
2023-09-06Merge tag 'input-for-v6.6-rc0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - a new driver for Azoteq IQS7210A/7211A/E touch controllers - support for Azoteq IQS7222D variant added to iqs7222 driver - support for touch keys functionality added to Melfas MMS114 driver - new hardware IDs added to exc3000 and Goodix drivers - xpad driver gained support for GameSir T4 Kaleid Controller - a fix for xpad driver to properly support some third-party controllers that need a magic packet to start properly - a fix for psmouse driver to more reliably switch to RMI4 mode on devices that use native RMI4/SMbus protocol - a quirk for i8042 for TUXEDO Gemini 17 Gen1/Clevo PD70PN laptops - multiple drivers have been updated to make use of devm and other newer APIs such as dev_err_probe(), devm_regulator_get_enable(), and others. * tag 'input-for-v6.6-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (83 commits) Input: goodix - add support for ACPI ID GDX9110 Input: rpckbd - fix the return value handle for platform_get_irq() Input: tca6416-keypad - switch to using input core's polling features Input: tca6416-keypad - convert to use devm_* api Input: tca6416-keypad - fix interrupt enable disbalance Input: tca6416-keypad - rely on I2C core to set up suspend/resume Input: tca6416-keypad - always expect proper IRQ number in i2c client Input: lm8323 - convert to use devm_* api Input: lm8323 - rely on device core to create kp_disable attribute Input: qt2160 - convert to use devm_* api Input: qt2160 - do not hard code interrupt trigger Input: qt2160 - switch to using threaded interrupt handler Input: qt2160 - tweak check for i2c adapter functionality Input: psmouse - add delay when deactivating for SMBus mode Input: mcs-touchkey - fix uninitialized use of error in mcs_touchkey_probe() Input: qt1070 - convert to use devm_* api Input: mcs-touchkey - convert to use devm_* api Input: amikbd - convert to use devm_* api Input: lm8333 - convert to use devm_* api Input: mms114 - add support for touch keys ...
2023-09-06Merge tag 'linux-watchdog-6.6-rc1' of ↵Linus Torvalds
git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - add marvell GTI watchdog driver - add support for Amlogic-T7 SoCs - document the IPQ5018 watchdog compatible - enable COMPILE_TEST for more watchdog device drivers - core: stop watchdog when executing poweroff command - other small improvements and fixes * tag 'linux-watchdog-6.6-rc1' of git://www.linux-watchdog.org/linux-watchdog: (21 commits) watchdog: Add support for Amlogic-T7 SoCs watchdog: Add a new struct for Amlogic-GXBB driver dt-bindings: watchdog: Add support for Amlogic-T7 SoCs dt-bindings: watchdog: qcom-wdt: document IPQ5018 watchdog: imx2_wdt: Improve dev_crit() message watchdog: stm32: Drop unnecessary of_match_ptr() watchdog: sama5d4: readout initial state watchdog: intel-mid_wdt: add MODULE_ALIAS() to allow auto-load watchdog: core: stop watchdog when executing poweroff command watchdog: pm8916_wdt: Remove redundant of_match_ptr() watchdog: xilinx_wwdt: Use div_u64() in xilinx_wwdt_start() watchdog: starfive: Remove #ifdef guards for PM related functions watchdog: s3c2410: Fix potential deadlock on &wdt->lock watchdog:rit_wdt: Add support for WDIOF_CARDRESET dt-bindings: watchdog: ti,rti-wdt: Add support for WDIOF_CARDRESET watchdog: Enable COMPILE_TEST for more drivers watchdog: advantech_ec_wdt: fix Kconfig dependencies watchdog: Explicitly include correct DT includes Watchdog: Add marvell GTI watchdog driver dt-bindings: watchdog: marvell GTI system watchdog driver ...
2023-09-06netfilter: nf_tables: Unbreak audit log resetPablo Neira Ayuso
Deliver audit log from __nf_tables_dump_rules(), table dereference at the end of the table list loop might point to the list head, leading to this crash. [ 4137.407349] BUG: unable to handle page fault for address: 00000000001f3c50 [ 4137.407357] #PF: supervisor read access in kernel mode [ 4137.407359] #PF: error_code(0x0000) - not-present page [ 4137.407360] PGD 0 P4D 0 [ 4137.407363] Oops: 0000 [#1] PREEMPT SMP PTI [ 4137.407365] CPU: 4 PID: 500177 Comm: nft Not tainted 6.5.0+ #277 [ 4137.407369] RIP: 0010:string+0x49/0xd0 [ 4137.407374] Code: ff 77 36 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e9 58 e5 ff ff 48 c7 c0 0e b2 ff 81 [ 4137.407377] RSP: 0018:ffff8881179737f0 EFLAGS: 00010286 [ 4137.407379] RAX: 00000000001f2c50 RBX: ffff888117973848 RCX: ffff0a00ffffff04 [ 4137.407380] RDX: 00000000001f3c50 RSI: 0000000000000000 RDI: 0000000000000000 [ 4137.407381] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff [ 4137.407383] R10: ffffffffffffffff R11: ffff88813584d200 R12: 0000000000000000 [ 4137.407384] R13: ffffffffa15cf709 R14: 0000000000000000 R15: ffffffffa15cf709 [ 4137.407385] FS: 00007fcfc18bb580(0000) GS:ffff88840e700000(0000) knlGS:0000000000000000 [ 4137.407387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4137.407388] CR2: 00000000001f3c50 CR3: 00000001055b2001 CR4: 00000000001706e0 [ 4137.407390] Call Trace: [ 4137.407392] <TASK> [ 4137.407393] ? __die+0x1b/0x60 [ 4137.407397] ? page_fault_oops+0x6b/0xa0 [ 4137.407399] ? exc_page_fault+0x60/0x120 [ 4137.407403] ? asm_exc_page_fault+0x22/0x30 [ 4137.407408] ? string+0x49/0xd0 [ 4137.407410] vsnprintf+0x257/0x4f0 [ 4137.407414] kvasprintf+0x3e/0xb0 [ 4137.407417] kasprintf+0x3e/0x50 [ 4137.407419] nf_tables_dump_rules+0x1c0/0x360 [nf_tables] [ 4137.407439] ? __alloc_skb+0xc3/0x170 [ 4137.407442] netlink_dump+0x170/0x330 [ 4137.407447] __netlink_dump_start+0x227/0x300 [ 4137.407449] nf_tables_getrule+0x205/0x390 [nf_tables] Deliver audit log only once at the end of the rule dump+reset for consistency with the set dump+reset. Ensure audit reset access to table under rcu read side lock. The table list iteration holds rcu read lock side, but recent audit code dereferences table object out of the rcu read lock side. Fixes: ea078ae9108e ("netfilter: nf_tables: Audit log rule reset") Fixes: 7e9be1124dbe ("netfilter: nf_tables: Audit log setelem reset") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ↵Kyle Zeng
ip_set_hash_netportnet.c The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can lead to the use of wrong `CIDR_POS(c)` for calculating array offsets, which can lead to integer underflow. As a result, it leads to slab out-of-bound access. This patch adds back the IP_SET_HASH_WITH_NET0 macro to ip_set_hash_netportnet to address the issue. Fixes: 886503f34d63 ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net") Suggested-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06netfilter: nft_set_rbtree: skip sync GC for new elements in this transactionPablo Neira Ayuso
New elements in this transaction might expired before such transaction ends. Skip sync GC for such elements otherwise commit path might walk over an already released object. Once transaction is finished, async GC will collect such expired element. Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_IDPhil Sutter
Add a brief description to the enum's comment. Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06netfilter: nfnetlink_osf: avoid OOB readWander Lairson Costa
The opt_num field is controlled by user mode and is not currently validated inside the kernel. An attacker can take advantage of this to trigger an OOB read and potentially leak information. BUG: KASAN: slab-out-of-bounds in nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88 Read of size 2 at addr ffff88804bc64272 by task poc/6431 CPU: 1 PID: 6431 Comm: poc Not tainted 6.0.0-rc4 #1 Call Trace: nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88 nf_osf_find+0x186/0x2f0 net/netfilter/nfnetlink_osf.c:281 nft_osf_eval+0x37f/0x590 net/netfilter/nft_osf.c:47 expr_call_ops_eval net/netfilter/nf_tables_core.c:214 nft_do_chain+0x2b0/0x1490 net/netfilter/nf_tables_core.c:264 nft_do_chain_ipv4+0x17c/0x1f0 net/netfilter/nft_chain_filter.c:23 [..] Also add validation to genre, subtype and version fields. Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match") Reported-by: Lucas Leong <wmliang@infosec.exchange> Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06netfilter: nftables: exthdr: fix 4-byte stack OOB writeFlorian Westphal
If priv->len is a multiple of 4, then dst[len / 4] can write past the destination array which leads to stack corruption. This construct is necessary to clean the remainder of the register in case ->len is NOT a multiple of the register size, so make it conditional just like nft_payload.c does. The bug was added in 4.1 cycle and then copied/inherited when tcp/sctp and ip option support was added. Bug reported by Zero Day Initiative project (ZDI-CAN-21950, ZDI-CAN-21951, ZDI-CAN-21961). Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing") Fixes: 935b7f643018 ("netfilter: nft_exthdr: add TCP option matching") Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks") Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options") Signed-off-by: Florian Westphal <fw@strlen.de>
2023-09-06Merge tag 'backlight-next-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight Pull backlight updates from Lee Jones: "New Functionality: - Ensure correct includes are present and remove some that are not required - Drop redundant of_match_ptr() call to cast pointer to NULL Bug Fixes: - Revert to old (expected) behaviour of initialising PWM state on first brightness change - Correctly handle / propagate errors - Fix 'sometimes-uninitialised' issues" * tag 'backlight-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: backlight: led_bl: Remove redundant of_match_ptr() backlight: lp855x: Drop ret variable in brightness change function backlight: gpio_backlight: Drop output GPIO direction check for initial power state backlight: lp855x: Catch errors when changing brightness backlight: lp855x: Initialize PWM state on first brightness change backlight: qcom-wled: Explicitly include correct DT includes
2023-09-06selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_allocMartin KaFai Lau
This patch checks the sk_omem_alloc has been uncharged by bpf_sk_storage during the __sk_destruct. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230901231129.578493-4-martin.lau@linux.dev
2023-09-06bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_allocMartin KaFai Lau
The commit c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions for reuse"), refactored the bpf_{sk,task,inode}_storage_free() into bpf_local_storage_unlink_nolock() which then later renamed to bpf_local_storage_destroy(). The commit accidentally passed the "bool uncharge_mem = false" argument to bpf_selem_unlink_storage_nolock() which then stopped the uncharge from happening to the sk->sk_omem_alloc. This missing uncharge only happens when the sk is going away (during __sk_destruct). This patch fixes it by always passing "uncharge_mem = true". It is a noop to the task/inode/cgroup storage because they do not have the map_local_storage_(un)charge enabled in the map_ops. A followup patch will be done in bpf-next to remove the uncharge_mem argument. A selftest is added in the next patch. Fixes: c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions for reuse") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230901231129.578493-3-martin.lau@linux.dev
2023-09-06bpf: bpf_sk_storage: Fix invalid wait context lockdep reportMartin KaFai Lau
'./test_progs -t test_local_storage' reported a splat: [ 27.137569] ============================= [ 27.138122] [ BUG: Invalid wait context ] [ 27.138650] 6.5.0-03980-gd11ae1b16b0a #247 Tainted: G O [ 27.139542] ----------------------------- [ 27.140106] test_progs/1729 is trying to lock: [ 27.140713] ffff8883ef047b88 (stock_lock){-.-.}-{3:3}, at: local_lock_acquire+0x9/0x130 [ 27.141834] other info that might help us debug this: [ 27.142437] context-{5:5} [ 27.142856] 2 locks held by test_progs/1729: [ 27.143352] #0: ffffffff84bcd9c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x40 [ 27.144492] #1: ffff888107deb2c0 (&storage->lock){..-.}-{2:2}, at: bpf_local_storage_update+0x39e/0x8e0 [ 27.145855] stack backtrace: [ 27.146274] CPU: 0 PID: 1729 Comm: test_progs Tainted: G O 6.5.0-03980-gd11ae1b16b0a #247 [ 27.147550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 27.149127] Call Trace: [ 27.149490] <TASK> [ 27.149867] dump_stack_lvl+0x130/0x1d0 [ 27.152609] dump_stack+0x14/0x20 [ 27.153131] __lock_acquire+0x1657/0x2220 [ 27.153677] lock_acquire+0x1b8/0x510 [ 27.157908] local_lock_acquire+0x29/0x130 [ 27.159048] obj_cgroup_charge+0xf4/0x3c0 [ 27.160794] slab_pre_alloc_hook+0x28e/0x2b0 [ 27.161931] __kmem_cache_alloc_node+0x51/0x210 [ 27.163557] __kmalloc+0xaa/0x210 [ 27.164593] bpf_map_kzalloc+0xbc/0x170 [ 27.165147] bpf_selem_alloc+0x130/0x510 [ 27.166295] bpf_local_storage_update+0x5aa/0x8e0 [ 27.167042] bpf_fd_sk_storage_update_elem+0xdb/0x1a0 [ 27.169199] bpf_map_update_value+0x415/0x4f0 [ 27.169871] map_update_elem+0x413/0x550 [ 27.170330] __sys_bpf+0x5e9/0x640 [ 27.174065] __x64_sys_bpf+0x80/0x90 [ 27.174568] do_syscall_64+0x48/0xa0 [ 27.175201] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 27.175932] RIP: 0033:0x7effb40e41ad [ 27.176357] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d8 [ 27.179028] RSP: 002b:00007ffe64c21fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141 [ 27.180088] RAX: ffffffffffffffda RBX: 00007ffe64c22768 RCX: 00007effb40e41ad [ 27.181082] RDX: 0000000000000020 RSI: 00007ffe64c22008 RDI: 0000000000000002 [ 27.182030] RBP: 00007ffe64c21ff0 R08: 0000000000000000 R09: 00007ffe64c22788 [ 27.183038] R10: 0000000000000064 R11: 0000000000000202 R12: 0000000000000000 [ 27.184006] R13: 00007ffe64c22788 R14: 00007effb42a1000 R15: 0000000000000000 [ 27.184958] </TASK> It complains about acquiring a local_lock while holding a raw_spin_lock. It means it should not allocate memory while holding a raw_spin_lock since it is not safe for RT. raw_spin_lock is needed because bpf_local_storage supports tracing context. In particular for task local storage, it is easy to get a "current" task PTR_TO_BTF_ID in tracing bpf prog. However, task (and cgroup) local storage has already been moved to bpf mem allocator which can be used after raw_spin_lock. The splat is for the sk storage. For sk (and inode) storage, it has not been moved to bpf mem allocator. Using raw_spin_lock or not, kzalloc(GFP_ATOMIC) could theoretically be unsafe in tracing context. However, the local storage helper requires a verifier accepted sk pointer (PTR_TO_BTF_ID), it is hypothetical if that (mean running a bpf prog in a kzalloc unsafe context and also able to hold a verifier accepted sk pointer) could happen. This patch avoids kzalloc after raw_spin_lock to silent the splat. There is an existing kzalloc before the raw_spin_lock. At that point, a kzalloc is very likely required because a lookup has just been done before. Thus, this patch always does the kzalloc before acquiring the raw_spin_lock and remove the later kzalloc usage after the raw_spin_lock. After this change, it will have a charge and then uncharge during the syscall bpf_map_update_elem() code path. This patch opts for simplicity and not continue the old optimization to save one charge and uncharge. This issue is dated back to the very first commit of bpf_sk_storage which had been refactored multiple times to create task, inode, and cgroup storage. This patch uses a Fixes tag with a more recent commit that should be easier to do backport. Fixes: b00fa38a9c1c ("bpf: Enable non-atomic allocations in local storage") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230901231129.578493-2-martin.lau@linux.dev
2023-09-06s390/bpf: Pass through tail call counter in trampolinesIlya Leoshkevich
s390x eBPF programs use the following extension to the s390x calling convention: tail call counter is passed on stack at offset STK_OFF_TCCNT, which callees otherwise use as scratch space. Currently trampoline does not respect this and clobbers tail call counter. This breaks enforcing tail call limits in eBPF programs, which have trampolines attached to them. Fix by forwarding a copy of the tail call counter to the original eBPF program in the trampoline (for fexit), and by restoring it at the end of the trampoline (for fentry). Fixes: 528eb2cb87bc ("s390/bpf: Implement arch_prepare_bpf_trampoline()") Reported-by: Leon Hwang <hffilwlqm@gmail.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230906004448.111674-1-iii@linux.ibm.com
2023-09-06bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.Sebastian Andrzej Siewior
__bpf_prog_enter_recur() assigns bpf_tramp_run_ctx::saved_run_ctx before performing the recursion check which means in case of a recursion __bpf_prog_exit_recur() uses the previously set bpf_tramp_run_ctx::saved_run_ctx value. __bpf_prog_enter_sleepable_recur() assigns bpf_tramp_run_ctx::saved_run_ctx after the recursion check which means in case of a recursion __bpf_prog_exit_sleepable_recur() uses an uninitialized value. This does not look right. If I read the entry trampoline code right, then bpf_tramp_run_ctx isn't initialized upfront. Align __bpf_prog_enter_sleepable_recur() with __bpf_prog_enter_recur() and set bpf_tramp_run_ctx::saved_run_ctx before the recursion check is made. Remove the assignment of saved_run_ctx in kern_sys_bpf() since it happens a few cycles later. Fixes: e384c7b7b46d0 ("bpf, x86: Create bpf_tramp_run_ctx on the caller thread's stack") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230830080405.251926-3-bigeasy@linutronix.de
2023-09-06bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf().Sebastian Andrzej Siewior
If __bpf_prog_enter_sleepable_recur() detects recursion then it returns 0 without undoing rcu_read_lock_trace(), migrate_disable() or decrementing the recursion counter. This is fine in the JIT case because the JIT code will jump in the 0 case to the end and invoke the matching exit trampoline (__bpf_prog_exit_sleepable_recur()). This is not the case in kern_sys_bpf() which returns directly to the caller with an error code. Add __bpf_prog_exit_sleepable_recur() as clean up in the recursion case. Fixes: b1d18a7574d0d ("bpf: Extend sys_bpf commands for bpf_syscall programs.") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230830080405.251926-2-bigeasy@linutronix.de
2023-09-06net: phylink: fix sphinx complaint about invalid literalJakub Kicinski
sphinx complains about the use of "%PHYLINK_PCS_NEG_*": Documentation/networking/kapi:144: ./include/linux/phylink.h:601: WARNING: Inline literal start-string without end-string. Documentation/networking/kapi:144: ./include/linux/phylink.h:633: WARNING: Inline literal start-string without end-string. These are not valid symbols so drop the '%' prefix. Alternatively we could use %PHYLINK_PCS_NEG_\* (escape the *) or use normal literal ``PHYLINK_PCS_NEG_*`` but there is already a handful of un-adorned DEFINE_* in this file. Fixes: f99d471afa03 ("net: phylink: add PCS negotiation mode") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/all/20230626162908.2f149f98@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06Merge branch 'sja1105-fixes'David S. Miller
Vladimir Oltean says: ==================== tc-cbs offload fixes for SJA1105 DSA Yanan Yang has pointed out to me that certain tc-cbs offloaded configurations do not appear to do any shaping on the LS1021A-TSN board (SJA1105T). This is due to an apparent documentation error that also made its way into the driver, which patch 1/3 now fixes. While investigating and then testing, I've found 2 more bugs, which are patches 2/3 and 3/3. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06net: dsa: sja1105: complete tc-cbs offload support on SJA1110Vladimir Oltean
The blamed commit left this delta behind: struct sja1105_cbs_entry { - u64 port; - u64 prio; + u64 port; /* Not used for SJA1110 */ + u64 prio; /* Not used for SJA1110 */ u64 credit_hi; u64 credit_lo; u64 send_slope; u64 idle_slope; }; but did not actually implement tc-cbs offload fully for the new switch. The offload is accepted, but it doesn't work. The difference compared to earlier switch generations is that now, the table of CBS shapers is sparse, because there are many more shapers, so the mapping between a {port, prio} and a table index is static, rather than requiring us to store the port and prio into the sja1105_cbs_entry. So, the problem is that the code programs the CBS shaper parameters at a dynamic table index which is incorrect. All that needs to be done for SJA1110 CBS shapers to work is to bypass the logic which allocates shapers in a dense manner, as for SJA1105, and use the fixed mapping instead. Fixes: 3e77e59bf8cf ("net: dsa: sja1105: add support for the SJA1110 switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06net: dsa: sja1105: fix -ENOSPC when replacing the same tc-cbs too many timesVladimir Oltean
After running command [2] too many times in a row: [1] $ tc qdisc add dev sw2p0 root handle 1: mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0 [2] $ tc qdisc replace dev sw2p0 parent 1:1 cbs offload 1 \ idleslope 120000 sendslope -880000 locredit -1320 hicredit 180 (aka more than priv->info->num_cbs_shapers times) we start seeing the following error message: Error: Specified device failed to setup cbs hardware offload. This comes from the fact that ndo_setup_tc(TC_SETUP_QDISC_CBS) presents the same API for the qdisc create and replace cases, and the sja1105 driver fails to distinguish between the 2. Thus, it always thinks that it must allocate the same shaper for a {port, queue} pair, when it may instead have to replace an existing one. Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06net: dsa: sja1105: fix bandwidth discrepancy between tc-cbs software and offloadVladimir Oltean
More careful measurement of the tc-cbs bandwidth shows that the stream bandwidth (effectively idleslope) increases, there is a larger and larger discrepancy between the rate limit obtained by the software Qdisc, and the rate limit obtained by its offloaded counterpart. The discrepancy becomes so large, that e.g. at an idleslope of 40000 (40Mbps), the offloaded cbs does not actually rate limit anything, and traffic will pass at line rate through a 100 Mbps port. The reason for the discrepancy is that the hardware documentation I've been following is incorrect. UM11040.pdf (for SJA1105P/Q/R/S) states about IDLE_SLOPE that it is "the rate (in unit of bytes/sec) at which the credit counter is increased". Cross-checking with UM10944.pdf (for SJA1105E/T) and UM11107.pdf (for SJA1110), the wording is different: "This field specifies the value, in bytes per second times link speed, by which the credit counter is increased". So there's an extra scaling for link speed that the driver is currently not accounting for, and apparently (empirically), that link speed is expressed in Kbps. I've pondered whether to pollute the sja1105_mac_link_up() implementation with CBS shaper reprogramming, but I don't think it is worth it. IMO, the UAPI exposed by tc-cbs requires user space to recalculate the sendslope anyway, since the formula for that depends on port_transmit_rate (see man tc-cbs), which is not an invariant from tc's perspective. So we use the offload->sendslope and offload->idleslope to deduce the original port_transmit_rate from the CBS formula, and use that value to scale the offload->sendslope and offload->idleslope to values that the hardware understands. Some numerical data points: 40Mbps stream, max interfering frame size 1500, port speed 100M --------------------------------------------------------------- tc-cbs parameters: idleslope 40000 sendslope -60000 locredit -900 hicredit 600 which result in hardware values: Before (doesn't work) After (works) credit_hi 600 600 credit_lo 900 900 send_slope 7500000 75 idle_slope 5000000 50 40Mbps stream, max interfering frame size 1500, port speed 1G ------------------------------------------------------------- tc-cbs parameters: idleslope 40000 sendslope -960000 locredit -1440 hicredit 60 which result in hardware values: Before (doesn't work) After (works) credit_hi 60 60 credit_lo 1440 1440 send_slope 120000000 120 idle_slope 5000000 5 5.12Mbps stream, max interfering frame size 1522, port speed 100M ----------------------------------------------------------------- tc-cbs parameters: idleslope 5120 sendslope -94880 locredit -1444 hicredit 77 which result in hardware values: Before (doesn't work) After (works) credit_hi 77 77 credit_lo 1444 1444 send_slope 11860000 118 idle_slope 640000 6 Tested on SJA1105T, SJA1105S and SJA1110A, at 1Gbps and 100Mbps. Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc") Reported-by: Yanan Yang <yanan.yang@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Change MIN_TXD and MIN_RXD to allow set rx/tx value between 64 and 80 Olga Zaborska says: Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx value between 64 and 80. All igb, igbvf and igc devices can use as low as 64 descriptors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06mlx5/core: E-Switch, Create ACL FT for eswitch manager in switchdev modeBodong Wang
ACL flow table is required in switchdev mode when metadata is enabled, driver creates such table when loading each vport. However, not every vport is loaded in switchdev mode. Such as ECPF if it's the eswitch manager. In this case, ACL flow table is still needed. To make it modularized, create ACL flow table for eswitch manager as default and skip such operations when loading manager vport. Also, there is no need to load the eswitch manager vport in switchdev mode. This means there is no need to load it on regular connect-x HCAs where the PF is the eswitch manager. This will avoid creating duplicate ACL flow table for host PF vport. Fixes: 29bcb6e4fe70 ("net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules") Fixes: eb8e9fae0a22 ("mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager") Fixes: 5019833d661f ("net/mlx5: E-switch, Introduce helper function to enable/disable vports") Signed-off-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06net/mlx5e: Clear mirred devices array if the rule is splitJianbo Liu
In the cited commit, the mirred devices are recorded and checked while parsing the actions. In order to avoid system crash, the duplicate action in a single rule is not allowed. But the rule is actually break down into several FTEs in different tables, for either mirroring, or the specified types of actions which use post action infrastructure. It will reject certain action list by mistake, for example: actions:enp8s0f0_1,set(ipv4(ttl=63)),enp8s0f0_0,enp8s0f0_1. Here the rule is split to two FTEs because of pedit action. To fix this issue, when parsing the rule actions, reset if_count to clear the mirred devices array if the rule is split to multiple FTEs, and then the duplicate checking is restarted. Fixes: 554fe75c1b3f ("net/mlx5e: Avoid duplicating rule destinations") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06ip_tunnels: use DEV_STATS_INC()Eric Dumazet
syzbot/KCSAN reported data-races in iptunnel_xmit_stats() [1] This can run from multiple cpus without mutual exclusion. Adopt SMP safe DEV_STATS_INC() to update dev->stats fields. [1] BUG: KCSAN: data-race in iptunnel_xmit / iptunnel_xmit read-write to 0xffff8881353df170 of 8 bytes by task 30263 on cpu 1: iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline] iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87 ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560 __dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] __bpf_tx_skb net/core/filter.c:2129 [inline] __bpf_redirect_no_mac net/core/filter.c:2159 [inline] __bpf_redirect+0x723/0x9c0 net/core/filter.c:2182 ____bpf_clone_redirect net/core/filter.c:2453 [inline] bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425 ___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954 __bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195 bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline] __bpf_prog_run include/linux/filter.h:609 [inline] bpf_prog_run include/linux/filter.h:616 [inline] bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423 bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045 bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996 __sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353 __do_sys_bpf kernel/bpf/syscall.c:5439 [inline] __se_sys_bpf kernel/bpf/syscall.c:5437 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read-write to 0xffff8881353df170 of 8 bytes by task 30249 on cpu 0: iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline] iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87 ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560 __dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] __bpf_tx_skb net/core/filter.c:2129 [inline] __bpf_redirect_no_mac net/core/filter.c:2159 [inline] __bpf_redirect+0x723/0x9c0 net/core/filter.c:2182 ____bpf_clone_redirect net/core/filter.c:2453 [inline] bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425 ___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954 __bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195 bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline] __bpf_prog_run include/linux/filter.h:609 [inline] bpf_prog_run include/linux/filter.h:616 [inline] bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423 bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045 bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996 __sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353 __do_sys_bpf kernel/bpf/syscall.c:5439 [inline] __se_sys_bpf kernel/bpf/syscall.c:5437 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x0000000000018830 -> 0x0000000000018831 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 30249 Comm: syz-executor.4 Not tainted 6.5.0-syzkaller-11704-g3f86ed6ec0b3 #0 Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06net: team: do not use dynamic lockdep keyTaehee Yoo
team interface has used a dynamic lockdep key to avoid false-positive lockdep deadlock detection. Virtual interfaces such as team usually have their own lock for protecting private data. These interfaces can be nested. team0 | team1 Each interface's lock is actually different(team0->lock and team1->lock). So, mutex_lock(&team0->lock); mutex_lock(&team1->lock); mutex_unlock(&team1->lock); mutex_unlock(&team0->lock); The above case is absolutely safe. But lockdep warns about deadlock. Because the lockdep understands these two locks are same. This is a false-positive lockdep warning. So, in order to avoid this problem, the team interfaces started to use dynamic lockdep key. The false-positive problem was fixed, but it introduced a new problem. When the new team virtual interface is created, it registers a dynamic lockdep key(creates dynamic lockdep key) and uses it. But there is the limitation of the number of lockdep keys. So, If so many team interfaces are created, it consumes all lockdep keys. Then, the lockdep stops to work and warns about it. In order to fix this problem, team interfaces use the subclass instead of the dynamic key. So, when a new team interface is created, it doesn't register(create) a new lockdep, but uses existed subclass key instead. It is already used by the bonding interface for a similar case. As the bonding interface does, the subclass variable is the same as the 'dev->nested_level'. This variable indicates the depth in the stacked interface graph. The 'dev->nested_level' is protected by RTNL and RCU. So, 'mutex_lock_nested()' for 'team->lock' requires RTNL or RCU. In the current code, 'team->lock' is usually acquired under RTNL, there is no problem with using 'dev->nested_level'. The 'team_nl_team_get()' and The 'lb_stats_refresh()' functions acquire 'team->lock' without RTNL. But these don't iterate their own ports nested so they don't need nested lock. Reproducer: for i in {0..1000} do ip link add team$i type team ip link add dummy$i master team$i type dummy ip link set dummy$i up ip link set team$i up done Splat looks like: BUG: MAX_LOCKDEP_ENTRIES too low! turning off the locking correctness validator. Please attach the output of /proc/lock_stat to the bug report CPU: 0 PID: 4104 Comm: ip Not tainted 6.5.0-rc7+ #45 Call Trace: <TASK> dump_stack_lvl+0x64/0xb0 add_lock_to_list+0x30d/0x5e0 check_prev_add+0x73a/0x23a0 ... sock_def_readable+0xfe/0x4f0 netlink_broadcast+0x76b/0xac0 nlmsg_notify+0x69/0x1d0 dev_open+0xed/0x130 ... Reported-by: syzbot+9bbbacfbf1e04d5221f7@syzkaller.appspotmail.com Fixes: 369f61bee0f5 ("team: fix nested locking lockdep warning") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06net/ipv6: SKB symmetric hash should incorporate transport portsQuan Tian
__skb_get_hash_symmetric() was added to compute a symmetric hash over the protocol, addresses and transport ports, by commit eb70db875671 ("packet: Use symmetric hash for PACKET_FANOUT_HASH."). It uses flow_keys_dissector_symmetric_keys as the flow_dissector to incorporate IPv4 addresses, IPv6 addresses and ports. However, it should not specify the flag as FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, which stops further dissection when an IPv6 flow label is encountered, making transport ports not being incorporated in such case. As a consequence, the symmetric hash is based on 5-tuple for IPv4 but 3-tuple for IPv6 when flow label is present. It caused a few problems, e.g. when nft symhash and openvswitch l4_sym rely on the symmetric hash to perform load balancing as different L4 flows between two given IPv6 addresses would always get the same symmetric hash, leading to uneven traffic distribution. Removing the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL makes sure the symmetric hash is based on 5-tuple for both IPv4 and IPv6 consistently. Fixes: eb70db875671 ("packet: Use symmetric hash for PACKET_FANOUT_HASH.") Reported-by: Lars Ekman <uablrek@gmail.com> Closes: https://github.com/antrea-io/antrea/issues/5457 Signed-off-by: Quan Tian <qtian@vmware.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-06dt-bindings: rtc: ds3231: Remove text bindingFabio Estevam
The "maxim,ds3231" compatible is described in the rtc-ds1307.yaml, so there is no need to keep the text bindings version. Remove the maxim,ds3231.txt file in favor of the rtc-ds1307.yaml binding. Signed-off-by: Fabio Estevam <festevam@denx.de> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230902134407.2589099-1-festevam@gmail.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2023-09-06rtc: wm8350: remove unnecessary messagesAlexandre Belloni
The RTC core already prints a message when the RTC is registered and when registering fails, it is not necessary to have more in the driver. Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://lore.kernel.org/r/20230827221643.544259-3-alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2023-09-06rtc: twl: remove unnecessary messagesAlexandre Belloni
The RTC core already prints a message when the RTC is registered and when registering fails, it is not necessary to have more in the driver. Link: https://lore.kernel.org/r/20230827221643.544259-2-alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2023-09-06rtc: sun6i: remove unnecessary messageAlexandre Belloni
The core already print a message once the rtc is successfully registered, it is not necessary to print an other one. Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://lore.kernel.org/r/20230827221643.544259-1-alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2023-09-06rtc: stop warning for invalid alarms when the alarm is disabledAlexandre Belloni
When the alarm is not enabled, it may never have been set and so we can't expect it to be valid. This will prevent the apparition of boot messages like this one: rtc rtc0: invalid alarm value: 2023-7-8 45:85:85 Link: https://lore.kernel.org/r/20230827221532.543353-1-alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2023-09-06i3c: master: svc: fix probe failure when no i3c device existFrank Li
I3C masters are expected to support hot-join. This means at initialization time we might not yet discover any device and this should not be treated as a fatal error. During the DAA procedure which happens at probe time, if no device has joined, all CCC will be NACKed (from a bus perspective). This leads to an early return with an error code which fails the probe of the master. Let's avoid this by just telling the core through an I3C_ERROR_M2 return command code that no device was discovered, which is a valid situation. This way the master will no longer bail out and fail to probe for a wrong reason. Cc: stable@vger.kernel.org Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver") Signed-off-by: Frank Li <Frank.Li@nxp.com> Acked-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20230831141324.2841525-1-Frank.Li@nxp.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2023-09-05dt-bindings: irqchip: convert st,stih407-irq-syscfg to DT schemaRaphael Gallais-Pou
Convert deprecated format to DT schema format. Signed-off-by: Raphael Gallais-Pou <rgallaispou@gmail.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230905072740.23859-1-rgallaispou@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-09-05media: dt-bindings: Convert Omnivision OV7251 to DT schemaRob Herring
Convert the OmniVision OV7251 Image Sensor binding to DT schema format. vddd-supply was listed as required, but the example and actual user don't have it. Also, the data brief says it has an internal regulator, so perhaps it is truly optional. Add missing common "link-frequencies" which is used and required by the Linux driver. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C Link: https://lore.kernel.org/r/20230817202713.2180195-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-09-05media: dt-bindings: Merge OV5695 into OV5693 bindingRob Herring
The OV5695 binding is almost the same as the OV5693 binding. The only difference is 'clock-names' is defined for OV5695. However, the lack of clock-names is an omission as the Linux OV5693 driver expects the same 'xvclk' clock name. 'link-frequencies' is required by OV5693, but not OV5695, so make that conditional. Really, this shouldn't vary by device, but we're stuck with the existing binding use. The rockchip-isp1 binding example is missing required properties, so it has to be updated as well. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230817202647.2179609-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-09-05Merge tag 'gfs2-v6.5-rc5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix a glock state (non-)transition bug when a dlm request times out and is canceled, and we have locking requests that can now be granted immediately - Various fixes and cleanups in how the logd and quotad daemons are woken up and terminated - Fix several bugs in the quota data reference counting and shrinking. Free quota data objects synchronously in put_super() instead of letting call_rcu() run wild - Make sure not to deallocate quota data during a withdraw; rather, defer quota data deallocation to put_super(). Withdraws can happen in contexts in which callers on the stack are holding quota data references - Many minor quota fixes and cleanups by Bob - Update the the mailing list address for gfs2 and dlm. (It's the same list for both and we are moving it to gfs2@lists.linux.dev) - Various other minor cleanups * tag 'gfs2-v6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (51 commits) MAINTAINERS: Update dlm mailing list MAINTAINERS: Update gfs2 mailing list gfs2: change qd_slot_count to qd_slot_ref gfs2: check for no eligible quota changes gfs2: Remove useless assignment gfs2: simplify slot_get gfs2: Simplify qd2offset gfs2: introduce qd_bh_get_or_undo gfs2: Remove quota allocation info from quota file gfs2: use constant for array size gfs2: Set qd_sync_gen in do_sync gfs2: Remove useless err set gfs2: Small gfs2_quota_lock cleanup gfs2: move qdsb_put and reduce redundancy gfs2: improvements to sysfs status gfs2: Don't try to sync non-changes gfs2: Simplify function need_sync gfs2: remove unneeded pg_oflow variable gfs2: remove unneeded variable done gfs2: pass sdp to gfs2_write_buf_to_page ...
2023-09-05regulator: tps6594-regulator: Fix random kernel crashJerome Neanne
Random kernel crash detected in TI CICD when regulator driver is added. This is root caused to irq index increment being done twice causing irq_data being allocated outside of the range. - Rework tps6594_request_reg_irqs with correct index increment - Adjust irq_data kmalloc size to the exact size needed for the device This has been reported on TI mainline. No public bug report associated. Reported-by: Udit Kumar <u-kumar1@ti.com> Fixes: f17ccc5deb4d ("regulator: tps6594-regulator: Add driver for TI TPS6594 regulators") Signed-off-by: Jerome Neanne <jneanne@baylibre.com> Link: https://lore.kernel.org/r/20230828-tps6594_random_boot_crash_fix-v1-1-f29cbf9ddb37@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-09-05Merge tag 'fuse-update-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Revert non-waiting FLUSH due to a regression - Fix a lookup counter leak in readdirplus - Add an option to allow shared mmaps in no-cache mode - Add btime support and statx intrastructure to the protocol - Invalidate positive/negative dentry on failed create/delete * tag 'fuse-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: conditionally fill kstat in fuse_do_statx() fuse: invalidate dentry on EEXIST creates or ENOENT deletes fuse: cache btime fuse: implement statx fuse: add ATTR_TIMEOUT macro fuse: add STATX request fuse: handle empty request_mask in statx fuse: write back dirty pages before direct write in direct_io_relax mode fuse: add a new fuse init flag to relax restrictions in no cache mode fuse: invalidate page cache pages before direct write fuse: nlookup missing decrement in fuse_direntplus_link Revert "fuse: in fuse_flush only wait if someone wants the return code"
2023-09-05Merge tag 'ata-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ata updates from Damien Le Moal: - Fix OF include file for ata platform drivers (Rob) - Simplify various ahci, sata and pata platform drivers using the function devm_platform_ioremap_resource() (Yangtao) - Cleanup libata time related argument types (e.g. timeouts values) (Sergey) - Cleanup libata code around error handling as all ata drivers now define a error_handler operation (Hannes and Niklas) - Remove functions intended for libsas that are in fact unused (Niklas) - Change the remove device callback of platform drivers to a null function (Uwe) - Simplify the pata_imx driver using devm_clk_get_enabled() (Li) - Remove old and uinused remnants of the ide code in arm, parisc, powerpc, sparc and m68k architectures and associated drivers (pata_buddha, pata_falcon and pata_gayle) (Geert) - Add missing MODULE_DESCRIPTION() in the sata_gemini and pata_ftide010 drivers (me) - Several fixes for the pata_ep93xx and pata_falcon drivers (Nikita, Michael) - Add Elkhart Lake AHCI controller support to the ahci driver (Werner) - Disable NCQ trim on Micron 1100 drives (Pawel) * tag 'ata-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (60 commits) ata: libata-core: Disable NCQ_TRIM on Micron 1100 drives ata: ahci: Add Elkhart Lake AHCI controller ata: pata_falcon: add data_swab option to byte-swap disk data ata: pata_falcon: fix IO base selection for Q40 ata: pata_ep93xx: use soc_device_match for UDMA modes ata: pata_ep93xx: fix error return code in probe ata: sata_gemini: Add missing MODULE_DESCRIPTION ata: pata_ftide010: Add missing MODULE_DESCRIPTION m68k: Remove <asm/ide.h> ata: pata_gayle: Remove #include <asm/ide.h> ata: pata_falcon: Remove #include <asm/ide.h> ata: pata_buddha: Remove #include <asm/ide.h> asm-generic: Remove ide_iops.h sparc: Remove <asm/ide.h> powerpc: Remove <asm/ide.h> parisc: Remove <asm/ide.h> ARM: Remove <asm/ide.h> ata: pata_imx: Use helper function devm_clk_get_enabled() ata: sata_rcar: Convert to platform remove callback returning void ata: sata_mv: Convert to platform remove callback returning void ...
2023-09-05Merge tag 'mailbox-v6.6' of ↵Linus Torvalds
git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: - qcom: fix incorrect num_chans counting - mhu: Remove redundant dev_err - bcm: fix comments - common changes: - convert to use devm_platform_get_and_ioremap_resource - correct DT includes * tag 'mailbox-v6.6' of git://git.linaro.org/landing-teams/working/fujitsu/integration: mailbox: qcom-ipcc: fix incorrect num_chans counting mailbox: Explicitly include correct DT includes mailbox: ti-msgmgr: Use devm_platform_ioremap_resource_byname() mailbox: platform-mhu: Remove redundant dev_err() mailbox: bcm-pdc: Fix some kernel-doc comments mailbox: mailbox-test: Fix an error check in mbox_test_probe() mailbox: tegra-hsp: Convert to devm_platform_ioremap_resource() mailbox: rockchip: Use devm_platform_get_and_ioremap_resource() mailbox: mailbox-test: Use devm_platform_get_and_ioremap_resource() mailbox: bcm-pdc: Use devm_platform_get_and_ioremap_resource() mailbox: bcm-ferxrm-mailbox: Use devm_platform_get_and_ioremap_resource()
2023-09-05Merge tag 'mm-hotfixes-stable-2023-09-05-11-51' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Seven hotfixes. Four are cc:stable and the remainder pertain to issues which were introduced in the current merge window" * tag 'mm-hotfixes-stable-2023-09-05-11-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: sparc64: add missing initialization of folio in tlb_batch_add() mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs() revert "memfd: improve userspace warnings for missing exec-related flags". rcu: dump vmalloc memory info safely mm/vmalloc: add a safer version of find_vm_area() for debug tools/mm: fix undefined reference to pthread_once memcontrol: ensure memcg acquired by id is properly set up
2023-09-05Merge tag 'tpmdd-v6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull more tpm updates from Jarkko Sakkinen: "Two more bug fixes for tpm_crb, categorically disabling rng for AMD CPU's in the tpm_crb driver, discarding the earlier probing approach" * tag 'tpmdd-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: Enable hwrng only for Pluton on AMD CPUs tpm_crb: Fix an error handling path in crb_acpi_add()
2023-09-05s390/vmem: do not silently ignore mapping limitAlexander Gordeev
The only interface that allows drivers establishing liner mappings is vmem_add_mapping(). It does check a requested range against allowed limits and a call to modify_pagetable() with an invalid mapping range is impossible. Hence, an attempt to map an address range outside of the identity mapping or vmemmap array could only be kernel bug. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-09-05s390/zcrypt: utilize dev_set_name() ability to use a formatted stringAndy Shevchenko
With the dev_set_name() prototype it's not obvious that it takes a formatted string as a parameter. Use its facility instead of duplicating the same with strncpy()/snprintf() calls. With this, also prevent return error code to be shadowed. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230831110000.24279-2-andriy.shevchenko@linux.intel.com Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-09-05s390/zcrypt: don't leak memory if dev_set_name() failsAndy Shevchenko
When dev_set_name() fails, zcdn_create() doesn't free the newly allocated resources. Do it. Fixes: 00fab2350e6b ("s390/zcrypt: multiple zcrypt device nodes support") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230831110000.24279-1-andriy.shevchenko@linux.intel.com Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-09-05s390/mm: fix MAX_DMA_ADDRESS physical vs virtual confusionAlexander Gordeev
MAX_DMA_ADDRESS is defined and treated as a physical address, whereas it should be virtual. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-09-05sparc64: add missing initialization of folio in tlb_batch_add()Mike Rapoport (IBM)
Commit 1a10a44dfc1d ("sparc64: implement the new page table range API") missed initialization of folio variable in tlb_batch_add() which causes boot tests to crash. Add missing initialization. Link: https://lkml.kernel.org/r/20230904174350.GF3223@kernel.org Fixes: 1a10a44dfc1d ("sparc64: implement the new page table range API") Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-05mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs()Tong Tiangen
We found a softlock issue in our test, analyzed the logs, and found that the relevant CPU call trace as follows: CPU0: _do_fork -> copy_process() -> write_lock_irq(&tasklist_lock) //Disable irq,waiting for //tasklist_lock CPU1: wp_page_copy() ->pte_offset_map_lock() -> spin_lock(&page->ptl); //Hold page->ptl -> ptep_clear_flush() -> flush_tlb_others() ... -> smp_call_function_many() -> arch_send_call_function_ipi_mask() -> csd_lock_wait() //Waiting for other CPUs respond //IPI CPU2: collect_procs_anon() -> read_lock(&tasklist_lock) //Hold tasklist_lock ->for_each_process(tsk) -> page_mapped_in_vma() -> page_vma_mapped_walk() -> map_pte() ->spin_lock(&page->ptl) //Waiting for page->ptl We can see that CPU1 waiting for CPU0 respond IPI,CPU0 waiting for CPU2 unlock tasklist_lock, CPU2 waiting for CPU1 unlock page->ptl. As a result, softlockup is triggered. For collect_procs_anon(), what we're doing is task list iteration, during the iteration, with the help of call_rcu(), the task_struct object is freed only after one or more grace periods elapse. the logic as follows: release_task() -> __exit_signal() -> __unhash_process() -> list_del_rcu() -> put_task_struct_rcu_user() -> call_rcu(&task->rcu, delayed_put_task_struct) delayed_put_task_struct() -> put_task_struct() -> if (refcount_sub_and_test()) __put_task_struct() -> free_task() Therefore, under the protection of the rcu lock, we can safely use get_task_struct() to ensure a safe reference to task_struct during the iteration. By removing the use of tasklist_lock in task list iteration, we can break the softlock chain above. The same logic can also be applied to: - collect_procs_file() - collect_procs_fsdax() - collect_procs_ksm() Link: https://lkml.kernel.org/r/20230828022527.241693-1-tongtiangen@huawei.com Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>