summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-08-17locking/rwsem: Add rtmutex based R/W semaphore implementationThomas Gleixner
The RT specific R/W semaphore implementation used to restrict the number of readers to one, because a writer cannot block on multiple readers and inherit its priority or budget. The single reader restricting was painful in various ways: - Performance bottleneck for multi-threaded applications in the page fault path (mmap sem) - Progress blocker for drivers which are carefully crafted to avoid the potential reader/writer deadlock in mainline. The analysis of the writer code paths shows that properly written RT tasks should not take them. Syscalls like mmap(), file access which take mmap sem write locked have unbound latencies, which are completely unrelated to mmap sem. Other R/W sem users like graphics drivers are not suitable for RT tasks either. So there is little risk to hurt RT tasks when the RT rwsem implementation is done in the following way: - Allow concurrent readers - Make writers block until the last reader left the critical section. This blocking is not subject to priority/budget inheritance. - Readers blocked on a writer inherit their priority/budget in the normal way. There is a drawback with this scheme: R/W semaphores become writer unfair though the applications which have triggered writer starvation (mostly on mmap_sem) in the past are not really the typical workloads running on a RT system. So while it's unlikely to hit writer starvation, it's possible. If there are unexpected workloads on RT systems triggering it, the problem has to be revisited. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.016885947@linutronix.de
2021-08-17locking/rt: Add base code for RT rw_semaphore and rwlockThomas Gleixner
On PREEMPT_RT, rw_semaphores and rwlocks are substituted with an rtmutex and a reader count. The implementation is writer unfair, as it is not feasible to do priority inheritance on multiple readers, but experience has shown that real-time workloads are not the typical workloads which are sensitive to writer starvation. The inner workings of rw_semaphores and rwlocks on RT are almost identical except for the task state and signal handling. rw_semaphores are not state preserving over a contention, they are expected to enter and leave with state == TASK_RUNNING. rwlocks have a mechanism to preserve the state of the task at entry and restore it after unblocking taking potential non-lock related wakeups into account. rw_semaphores can also be subject to signal handling interrupting a blocked state, while rwlocks ignore signals. To avoid code duplication, provide a shared implementation which takes the small difference vs. state and signals into account. The code is included into the relevant rw_semaphore/rwlock base code and compiled for each use case separately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.957920571@linutronix.de
2021-08-17locking/rtmutex: Provide rt_mutex_base_is_locked()Thomas Gleixner
Provide rt_mutex_base_is_locked(), which will be used for various wrapped locking primitives for RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.899572818@linutronix.de
2021-08-17locking/rtmutex: Split out the inner parts of 'struct rtmutex'Peter Zijlstra
RT builds substitutions for rwsem, mutex, spinlock and rwlock around rtmutexes. Split the inner working out so each lock substitution can use them with the appropriate lockdep annotations. This avoids having an extra unused lockdep map in the wrapped rtmutex. No functional change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.784739994@linutronix.de
2021-08-17locking/rtmutex: Remove rt_mutex_is_locked()Peter Zijlstra
There are no more users left. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.552218335@linutronix.de
2021-08-17sched/wake_q: Provide WAKE_Q_HEAD_INITIALIZER()Thomas Gleixner
The RT specific spin/rwlock implementation requires special handling of the to be woken waiters. Provide a WAKE_Q_HEAD_INITIALIZER(), which can be used by the rtmutex code to implement an RT aware wake_q derivative. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.429918071@linutronix.de
2021-08-17sched/core: Provide a scheduling point for RT locksThomas Gleixner
RT enabled kernels substitute spin/rwlocks with 'sleeping' variants based on rtmutexes. Blocking on such a lock is similar to preemption versus: - I/O scheduling and worker handling, because these functions might block on another substituted lock, or come from a lock contention within these functions. - RCU considers this like a preemption, because the task might be in a read side critical section. Add a separate scheduling point for this, and hand a new scheduling mode argument to __schedule() which allows, along with separate mode masks, to handle this gracefully from within the scheduler, without proliferating that to other subsystems like RCU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.372319055@linutronix.de
2021-08-17sched/wakeup: Prepare for RT sleeping spin/rwlocksThomas Gleixner
Waiting for spinlocks and rwlocks on non RT enabled kernels is task::state preserving. Any wakeup which matches the state is valid. RT enabled kernels substitutes them with 'sleeping' spinlocks. This creates an issue vs. task::__state. In order to block on the lock, the task has to overwrite task::__state and a consecutive wakeup issued by the unlocker sets the state back to TASK_RUNNING. As a consequence the task loses the state which was set before the lock acquire and also any regular wakeup targeted at the task while it is blocked on the lock. To handle this gracefully, add a 'saved_state' member to task_struct which is used in the following way: 1) When a task blocks on a 'sleeping' spinlock, the current state is saved in task::saved_state before it is set to TASK_RTLOCK_WAIT. 2) When the task unblocks and after acquiring the lock, it restores the saved state. 3) When a regular wakeup happens for a task while it is blocked then the state change of that wakeup is redirected to operate on task::saved_state. This is also required when the task state is running because the task might have been woken up from the lock wait and has not yet restored the saved state. To make it complete, provide the necessary helpers to save and restore the saved state along with the necessary documentation how the RT lock blocking is supposed to work. For non-RT kernels there is no functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.258751046@linutronix.de
2021-08-17sched/wakeup: Reorganize the current::__state helpersThomas Gleixner
In order to avoid more duplicate implementations for the debug and non-debug variants of the state change macros, split the debug portion out and make that conditional on CONFIG_DEBUG_ATOMIC_SLEEP=y. Suggested-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.200898048@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-08-17sched/wakeup: Introduce the TASK_RTLOCK_WAIT state bitThomas Gleixner
RT kernels have an extra quirk for try_to_wake_up() to handle task state preservation across periods of blocking on a 'sleeping' spin/rwlock. For this to function correctly and under all circumstances try_to_wake_up() must be able to identify whether the wakeup is lock related or not and whether the task is waiting for a lock or not. The original approach was to use a special wake_flag argument for try_to_wake_up() and just use TASK_UNINTERRUPTIBLE for the tasks wait state and the try_to_wake_up() state argument. This works in principle, but due to the fact that try_to_wake_up() cannot determine whether the task is waiting for an RT lock wakeup or for a regular wakeup it's suboptimal. RT kernels save the original task state when blocking on an RT lock and restore it when the lock has been acquired. Any non lock related wakeup is checked against the saved state and if it matches the saved state is set to running so that the wakeup is not lost when the state is restored. While the necessary logic for the wake_flag based solution is trivial, the downside is that any regular wakeup with TASK_UNINTERRUPTIBLE in the state argument set will wake the task despite the fact that it is still blocked on the lock. That's not a fatal problem as the lock wait has do deal with spurious wakeups anyway, but it introduces unnecessary latencies. Introduce the TASK_RTLOCK_WAIT state bit which will be set when a task blocks on an RT lock. The lock wakeup will use wake_up_state(TASK_RTLOCK_WAIT), so both the waiting state and the wakeup state are distinguishable, which avoids spurious wakeups and allows better analysis. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.144989915@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-08-17locking/rtmutex: Set proper wait context for lockdepThomas Gleixner
RT mutexes belong to the LD_WAIT_SLEEP class. Make them so. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.031014562@linutronix.de
2021-08-17locking/local_lock: Add missing owner initializationThomas Gleixner
If CONFIG_DEBUG_LOCK_ALLOC=y is enabled then local_lock_t has an 'owner' member which is checked for consistency, but nothing initialized it to zero explicitly. The static initializer does so implicit, and the run time allocated per CPU storage is usually zero initialized as well, but relying on that is not really good practice. Fixes: 91710728d172 ("locking: Introduce local_lock()") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211301.969975279@linutronix.de
2021-08-17Merge tag 'v5.14-rc6' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-08-15Merge tag 'irq-urgent-2021-08-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for PCI/MSI and x86 interrupt startup: - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked entries stay around e.g. when a crashkernel is booted. - Enforce masking of a MSI-X table entry when updating it, which mandatory according to speification - Ensure that writes to MSI[-X} tables are flushed. - Prevent invalid bits being set in the MSI mask register - Properly serialize modifications to the mask cache and the mask register for multi-MSI. - Cure the violation of the affinity setting rules on X86 during interrupt startup which can cause lost and stale interrupts. Move the initial affinity setting ahead of actualy enabling the interrupt. - Ensure that MSI interrupts are completely torn down before freeing them in the error handling case. - Prevent an array out of bounds access in the irq timings code" * tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: driver core: Add missing kernel doc for device::msi_lock genirq/msi: Ensure deactivation on teardown genirq/timings: Prevent potential array overflow in __irq_timings_store() x86/msi: Force affinity setup before startup x86/ioapic: Force affinity setup before startup genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP PCI/MSI: Protect msi_desc::masked for multi-MSI PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() PCI/MSI: Correct misleading comments PCI/MSI: Do not set invalid bits in MSI mask PCI/MSI: Enforce MSI[X] entry updates to be visible PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Mask all unused MSI-X entries PCI/MSI: Enable and mask MSI-X early
2021-08-13driver core: Add missing kernel doc for device::msi_lockThomas Gleixner
Fixes: 77e89afc25f3 ("PCI/MSI: Protect msi_desc::masked for multi-MSI") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2021-08-12Merge tag 'net-5.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from netfilter, bpf, can and ieee802154. The size of this is pretty normal, but we got more fixes for 5.14 changes this week than last week. Nothing major but the trend is the opposite of what we like. We'll see how the next week goes.. Current release - regressions: - r8169: fix ASPM-related link-up regressions - bridge: fix flags interpretation for extern learn fdb entries - phy: micrel: fix link detection on ksz87xx switch - Revert "tipc: Return the correct errno code" - ptp: fix possible memory leak caused by invalid cast Current release - new code bugs: - bpf: add missing bpf_read_[un]lock_trace() for syscall program - bpf: fix potentially incorrect results with bpf_get_local_storage() - page_pool: mask the page->signature before the checking, avoid dma mapping leaks - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps - bnxt_en: fix firmware interface issues with PTP - mlx5: Bridge, fix ageing time Previous releases - regressions: - linkwatch: fix failure to restore device state across suspend/resume - bareudp: fix invalid read beyond skb's linear data Previous releases - always broken: - bpf: fix integer overflow involving bucket_size - ppp: fix issues when desired interface name is specified via netlink - wwan: mhi_wwan_ctrl: fix possible deadlock - dsa: microchip: ksz8795: fix number of VLAN related bugs - dsa: drivers: fix broken backpressure in .port_fdb_dump - dsa: qca: ar9331: make proper initial port defaults Misc: - bpf: add lockdown check for probe_write_user helper - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is out - netfilter: conntrack: collect all entries in one cycle, heuristically slow down garbage collection scans on idle systems to prevent frequent wake ups" * tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) vsock/virtio: avoid potential deadlock when vsock device remove wwan: core: Avoid returning NULL from wwan_create_dev() net: dsa: sja1105: unregister the MDIO buses during teardown Revert "tipc: Return the correct errno code" net: mscc: Fix non-GPL export of regmap APIs net: igmp: increase size of mr_ifc_count MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets net: pcs: xpcs: fix error handling on failed to allocate memory net: linkwatch: fix failure to restore device state across suspend/resume net: bridge: fix memleak in br_add_if() net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge net: bridge: fix flags interpretation for extern learn fdb entries net: dsa: sja1105: fix broken backpressure in .port_fdb_dump net: dsa: lantiq: fix broken backpressure in .port_fdb_dump net: dsa: lan9303: fix broken backpressure in .port_fdb_dump net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump bpf, core: Fix kernel-doc notation net: igmp: fix data-race in igmp_ifc_timer_expire() net: Fix memory leak in ieee802154_raw_deliver ...
2021-08-11net: igmp: increase size of mr_ifc_countEric Dumazet
Some arches support cmpxchg() on 4-byte and 8-byte only. Increase mr_ifc_count width to 32bit to fix this problem. Fixes: 4a2b285e7e10 ("net: igmp: fix data-race in igmp_ifc_timer_expire()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210811195715.3684218-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11vmlinux.lds.h: Handle clang's module.{c,d}tor sectionsNathan Chancellor
A recent change in LLVM causes module_{c,d}tor sections to appear when CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings because these are not handled anywhere: ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' Fangrui explains: "the function asan.module_ctor has the SHF_GNU_RETAIN flag, so it is in a separate section even with -fno-function-sections (default)". Place them in the TEXT_TEXT section so that these technologies continue to work with the newer compiler versions. All of the KASAN and KCSAN KUnit tests continue to pass after this change. Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1432 Link: https://github.com/llvm/llvm-project/commit/7b789562244ee941b7bf2cefeb3fc08a59a01865 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Fangrui Song <maskray@google.com> Acked-by: Marco Elver <elver@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210731023107.1932981-1-nathan@kernel.org
2021-08-10net: bridge: fix flags interpretation for extern learn fdb entriesNikolay Aleksandrov
Ignore fdb flags when adding port extern learn entries and always set BR_FDB_LOCAL flag when adding bridge extern learn entries. This is closest to the behaviour we had before and avoids breaking any use cases which were allowed. This patch fixes iproute2 calls which assume NUD_PERMANENT and were allowed before, example: $ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn Extern learn entries are allowed to roam, but do not expire, so static or dynamic flags make no sense for them. Also add a comment for future reference. Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space") Fixes: 0541a6293298 ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== bpf 2021-08-10 We've added 5 non-merge commits during the last 2 day(s) which contain a total of 7 files changed, 27 insertions(+), 15 deletions(-). 1) Fix missing bpf_read_lock_trace() context for BPF loader progs, from Yonghong Song. 2) Fix corner case where BPF prog retrieves wrong local storage, also from Yonghong Song. 3) Restrict availability of BPF write_user helper behind lockdown, from Daniel Borkmann. 4) Fix multiple kernel-doc warnings in BPF core, from Randy Dunlap. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, core: Fix kernel-doc notation bpf: Fix potentially incorrect results with bpf_get_local_storage() bpf: Add missing bpf_read_[un]lock_trace() for syscall program bpf: Add lockdown check for probe_write_user helper bpf: Add _kernel suffix to internal lockdown_bpf_read ==================== Link: https://lore.kernel.org/r/20210810144025.22814-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUPThomas Gleixner
X86 IO/APIC and MSI interrupts (when used without interrupts remapping) require that the affinity setup on startup is done before the interrupt is enabled for the first time as the non-remapped operation mode cannot safely migrate enabled interrupts from arbitrary contexts. Provide a new irq chip flag which allows affected hardware to request this. This has to be opt-in because there have been reports in the past that some interrupt chips cannot handle affinity setting before startup. Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.779791738@linutronix.de
2021-08-10PCI/MSI: Protect msi_desc::masked for multi-MSIThomas Gleixner
Multi-MSI uses a single MSI descriptor and there is a single mask register when the device supports per vector masking. To avoid reading back the mask register the value is cached in the MSI descriptor and updates are done by clearing and setting bits in the cache and writing it to the device. But nothing protects msi_desc::masked and the mask register from being modified concurrently on two different CPUs for two different Linux interrupts which belong to the same multi-MSI descriptor. Add a lock to struct device and protect any operation on the mask and the mask register with it. This makes the update of msi_desc::masked unconditional, but there is no place which requires a modification of the hardware register without updating the masked cache. msi_mask_irq() is now an empty wrapper which will be cleaned up in follow up changes. The problem goes way back to the initial support of multi-MSI, but picking the commit which introduced the mask cache is a valid cut off point (2.6.30). Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de
2021-08-10Merge tag 'mlx5-fixes-2021-08-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-08-09 This series introduces fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-10bpf: Fix potentially incorrect results with bpf_get_local_storage()Yonghong Song
Commit b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") fixed a bug for bpf_get_local_storage() helper so different tasks won't mess up with each other's percpu local storage. The percpu data contains 8 slots so it can hold up to 8 contexts (same or different tasks), for 8 different program runs, at the same time. This in general is sufficient. But our internal testing showed the following warning multiple times: [...] warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193 __cgroup_bpf_run_filter_sock_ops+0x13e/0x180 RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180 <IRQ> tcp_call_bpf.constprop.99+0x93/0xc0 tcp_conn_request+0x41e/0xa50 ? tcp_rcv_state_process+0x203/0xe00 tcp_rcv_state_process+0x203/0xe00 ? sk_filter_trim_cap+0xbc/0x210 ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160 tcp_v6_do_rcv+0x181/0x3e0 tcp_v6_rcv+0xc65/0xcb0 ip6_protocol_deliver_rcu+0xbd/0x450 ip6_input_finish+0x11/0x20 ip6_input+0xb5/0xc0 ip6_sublist_rcv_finish+0x37/0x50 ip6_sublist_rcv+0x1dc/0x270 ipv6_list_rcv+0x113/0x140 __netif_receive_skb_list_core+0x1a0/0x210 netif_receive_skb_list_internal+0x186/0x2a0 gro_normal_list.part.170+0x19/0x40 napi_complete_done+0x65/0x150 mlx5e_napi_poll+0x1ae/0x680 __napi_poll+0x25/0x120 net_rx_action+0x11e/0x280 __do_softirq+0xbb/0x271 irq_exit_rcu+0x97/0xa0 common_interrupt+0x7f/0xa0 </IRQ> asm_common_interrupt+0x1e/0x40 RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac ? __cgroup_bpf_run_filter_skb+0x378/0x4e0 ? do_softirq+0x34/0x70 ? ip6_finish_output2+0x266/0x590 ? ip6_finish_output+0x66/0xa0 ? ip6_output+0x6c/0x130 ? ip6_xmit+0x279/0x550 ? ip6_dst_check+0x61/0xd0 [...] Using drgn [0] to dump the percpu buffer contents showed that on this CPU slot 0 is still available, but slots 1-7 are occupied and those tasks in slots 1-7 mostly don't exist any more. So we might have issues in bpf_cgroup_storage_unset(). Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset(). Currently, it tries to unset "current" slot with searching from the start. So the following sequence is possible: 1. A task is running and claims slot 0 2. Running BPF program is done, and it checked slot 0 has the "task" and ready to reset it to NULL (not yet). 3. An interrupt happens, another BPF program runs and it claims slot 1 with the *same* task. 4. The unset() in interrupt context releases slot 0 since it matches "task". 5. Interrupt is done, the task in process context reset slot 0. At the end, slot 1 is not reset and the same process can continue to occupy slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF program won't be able to claim an empty slot and a warning will be issued. To fix the issue, for unset() function, we should traverse from the last slot to the first. This way, the above issue can be avoided. The same reverse traversal should also be done in bpf_get_local_storage() helper itself. Otherwise, incorrect local storage may be returned to BPF program. [0] https://github.com/osandov/drgn Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com
2021-08-10bpf: Add lockdown check for probe_write_user helperDaniel Borkmann
Back then, commit 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") added the bpf_probe_write_user() helper in order to allow to override user space memory. Its original goal was to have a facility to "debug, divert, and manipulate execution of semi-cooperative processes" under CAP_SYS_ADMIN. Write to kernel was explicitly disallowed since it would otherwise tamper with its integrity. One use case was shown in cf9b1199de27 ("samples/bpf: Add test/example of using bpf_probe_write_user bpf helper") where the program DNATs traffic at the time of connect(2) syscall, meaning, it rewrites the arguments to a syscall while they're still in userspace, and before the syscall has a chance to copy the argument into kernel space. These days we have better mechanisms in BPF for achieving the same (e.g. for load-balancers), but without having to write to userspace memory. Of course the bpf_probe_write_user() helper can also be used to abuse many other things for both good or bad purpose. Outside of BPF, there is a similar mechanism for ptrace(2) such as PTRACE_PEEK{TEXT,DATA} and PTRACE_POKE{TEXT,DATA}, but would likely require some more effort. Commit 96ae52279594 explicitly dedicated the helper for experimentation purpose only. Thus, move the helper's availability behind a newly added LOCKDOWN_BPF_WRITE_USER lockdown knob so that the helper is disabled under the "integrity" mode. More fine-grained control can be implemented also from LSM side with this change. Fixes: 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org>
2021-08-09net/mlx5: Synchronize correct IRQ when destroying CQShay Drory
The CQ destroy is performed based on the IRQ number that is stored in cq->irqn. That number wasn't set explicitly during CQ creation and as expected some of the API users of mlx5_core_create_cq() forgot to update it. This caused to wrong synchronization call of the wrong IRQ with a number 0 instead of the real one. As a fix, set the IRQ number directly in the mlx5_core_create_cq() and update all users accordingly. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Fixes: ef1659ade359 ("IB/mlx5: Add DEVX support for CQ events") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09psample: Add a fwd declaration for skbuffRoi Dayan
Without this there is a warning if source files include psample.h before skbuff.h or doesn't include it at all. Fixes: 6ae0a6286171 ("net: Introduce psample, a new genetlink channel for packet sampling") Signed-off-by: Roi Dayan <roid@nvidia.com> Link: https://lore.kernel.org/r/20210808065242.1522535-1-roid@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-09bpf: Add _kernel suffix to internal lockdown_bpf_readDaniel Borkmann
Rename LOCKDOWN_BPF_READ into LOCKDOWN_BPF_READ_KERNEL so we have naming more consistent with a LOCKDOWN_BPF_WRITE_USER option that we are adding. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org>
2021-08-08Merge tag 'tty-5.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some small tty/serial driver fixes for 5.14-rc5 to resolve a number of reported problems. They include: - mips serial driver fixes - 8250 driver fixes for reported problems - fsl_lpuart driver fixes - other tiny driver fixes All have been in linux-next for a while with no reported problems" * tag 'tty-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250_pci: Avoid irq sharing for MSI(-X) interrupts. serial: 8250_mtk: fix uart corruption issue when rx power off tty: serial: fsl_lpuart: fix the wrong return value in lpuart32_get_mctrl serial: 8250_pci: Enumerate Elkhart Lake UARTs via dedicated driver serial: 8250: fix handle_irq locking serial: tegra: Only print FIFO error message when an error occurs MIPS: Malta: Do not byte-swap accesses to the CBUS UART serial: 8250: Mask out floating 16/32-bit bus bits serial: max310x: Unprepare and disable clock in error path
2021-08-08Merge tag 'usb-5.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are some small USB driver fixes for 5.14-rc5. They resolve a number of small reported issues, including: - cdnsp driver fixes - usb serial driver fixes and device id updates - usb gadget hid fixes - usb host driver fixes - usb dwc3 driver fixes - other usb gadget driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (21 commits) usb: typec: tcpm: Keep other events when receiving FRS and Sourcing_vbus events usb: dwc3: gadget: Avoid runtime resume if disabling pullup usb: dwc3: gadget: Use list_replace_init() before traversing lists USB: serial: ftdi_sio: add device ID for Auto-M3 OP-COM v2 USB: serial: pl2303: fix GT type detection USB: serial: option: add Telit FD980 composition 0x1056 USB: serial: pl2303: fix HX type detection USB: serial: ch341: fix character loss at high transfer rates usb: cdnsp: Fix the IMAN_IE_SET and IMAN_IE_CLEAR macro usb: cdnsp: Fixed issue with ZLP usb: cdnsp: Fix incorrect supported maximum speed usb: cdns3: Fixed incorrect gadget state usb: gadget: f_hid: idle uses the highest byte for duration Revert "thunderbolt: Hide authorized attribute if router does not support PCIe tunnels" usb: otg-fsm: Fix hrtimer list corruption usb: host: ohci-at91: suspend/resume ports after/before OHCI accesses usb: musb: Fix suspend and resume issues for PHYs on I2C and SPI usb: gadget: f_hid: added GET_IDLE and SET_IDLE handlers usb: gadget: f_hid: fixed NULL pointer dereference usb: gadget: remove leaked entry from udc driver list ...
2021-08-08once: Fix panic when module unloadKefeng Wang
DO_ONCE DEFINE_STATIC_KEY_TRUE(___once_key); __do_once_done once_disable_jump(once_key); INIT_WORK(&w->work, once_deferred); struct once_work *w; w->key = key; schedule_work(&w->work); module unload //*the key is destroy* process_one_work once_deferred BUG_ON(!static_key_enabled(work->key)); static_key_count((struct static_key *)x) //*access key, crash* When module uses DO_ONCE mechanism, it could crash due to the above concurrency problem, we could reproduce it with link[1]. Fix it by add/put module refcount in the once work process. [1] https://lore.kernel.org/netdev/eaa6c371-465e-57eb-6be9-f4b16b9d7cbf@huawei.com/ Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Reported-by: Minmin chen <chenmingmin@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-06Merge tag 'soc-fixes-5.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Lots of small fixes for Arm SoCs this time, nothing too worrying: - omap/beaglebone boot regression fix in gpt12 timer - revert for i.mx8 soc driver breaking as a platform_driver - kexec/kdump fixes for op-tee - various fixes for incorrect DT settings on imx, mvebu, omap, stm32, and tegra causing problems. - device tree fixes for static checks in nomadik, versatile, stm32 - code fixes for issues found in build testing and with static checking on tegra, ixp4xx, imx, omap" * tag 'soc-fixes-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (36 commits) soc: ixp4xx/qmgr: fix invalid __iomem access soc: ixp4xx: fix printing resources ARM: ixp4xx: goramo_mlr depends on old PCI driver ARM: ixp4xx: fix compile-testing soc drivers soc/tegra: Make regulator couplers depend on CONFIG_REGULATOR ARM: dts: nomadik: Fix up interrupt controller node names ARM: dts: stm32: Fix touchscreen IRQ line assignment on DHCOM ARM: dts: stm32: Disable LAN8710 EDPD on DHCOM ARM: dts: stm32: Prefer HW RTC on DHCOM SoM omap5-board-common: remove not physically existing vdds_1v8_main fixed-regulator ARM: dts: am437x-l4: fix typo in can@0 node ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218 bus: ti-sysc: AM3: RNG is GP only ARM: omap2+: hwmod: fix potential NULL pointer access arm64: dts: armada-3720-turris-mox: remove mrvl,i2c-fast-mode arm64: dts: armada-3720-turris-mox: fixed indices for the SDHC controllers ARM: dts: imx: Swap M53Menlo pinctrl_power_button/pinctrl_power_out pins ARM: imx: fix missing 3rd argument in macro imx_mmdc_perf_init ARM: dts: colibri-imx6ull: limit SDIO clock to 25MHz arm64: dts: ls1028: sl28: fix networking for variant 2 ...
2021-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Restrict range element expansion in ipset to avoid soft lockup, from Jozsef Kadlecsik. 2) Memleak in error path for nf_conntrack_bridge for IPv4 packets, from Yajun Deng. 3) Simplify conntrack garbage collection strategy to avoid frequent wake-ups, from Florian Westphal. 4) Fix NFNLA_HOOK_FUNCTION_NAME string, do not include module name. 5) Missing chain family netlink attribute in chain description in nfnetlink_hook. 6) Incorrect sequence number on nfnetlink_hook dumps. 7) Use netlink request family in reply message for consistency. 8) Remove offload_pickup sysctl, use conntrack for established state instead, from Florian Westphal. 9) Translate NFPROTO_INET/ingress to NFPROTO_NETDEV/ingress, since NFPROTO_INET is not exposed through nfnetlink_hook. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: nfnetlink_hook: translate inet ingress to netdev netfilter: conntrack: remove offload_pickup sysctl again netfilter: nfnetlink_hook: Use same family as request message netfilter: nfnetlink_hook: use the sequence number of the request message netfilter: nfnetlink_hook: missing chain family netfilter: nfnetlink_hook: strip off module name from hookfn netfilter: conntrack: collect all entries in one cycle netfilter: nf_conntrack_bridge: Fix memory leak when error netfilter: ipset: Limit the maximal range of consecutive elements to add/delete ==================== Link: https://lore.kernel.org/r/20210806151149.6356-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-06netfilter: conntrack: remove offload_pickup sysctl againFlorian Westphal
These two sysctls were added because the hardcoded defaults (2 minutes, tcp, 30 seconds, udp) turned out to be too low for some setups. They appeared in 5.14-rc1 so it should be fine to remove it again. Marcelo convinced me that there should be no difference between a flow that was offloaded vs. a flow that was not wrt. timeout handling. Thus the default is changed to those for TCP established and UDP stream, 5 days and 120 seconds, respectively. Marcelo also suggested to account for the timeout value used for the offloading, this avoids increase beyond the value in the conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls. Example: nf_conntrack_udp_timeout_stream=60 nf_flowtable_udp_timeout=60 This will remove offloaded udp flows after one minute, rather than two. An earlier version of this patch also cleared the ASSURED bit to allow nf_conntrack to evict the entry via early_drop (i.e., table full). However, it looks like we can safely assume that connection timed out via HW is still in established state, so this isn't needed. Quoting Oz: [..] the hardware sends all packets with a set FIN flags to sw. [..] Connections that are aged in hardware are expected to be in the established state. In case it turns out that back-to-sw-path transition can occur for 'dodgy' connections too (e.g., one side disappeared while software-path would have been in RETRANS timeout), we can adjust this later. Cc: Oz Shlomo <ozsh@nvidia.com> Cc: Paul Blakey <paulb@nvidia.com> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-06netfilter: nfnetlink_hook: missing chain familyPablo Neira Ayuso
The family is relevant for pseudo-families like NFPROTO_INET otherwise the user needs to rely on the hook function name to differentiate it from NFPROTO_IPV4 and NFPROTO_IPV6 names. Add nfnl_hook_chain_desc_attributes instead of using the existing NFTA_CHAIN_* attributes, since these do not provide a family number. Fixes: e2cf17d3774c ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-05Merge tag 'net-5.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from ipsec. Current release - regressions: - sched: taprio: fix init procedure to avoid inf loop when dumping - sctp: move the active_key update after sh_keys is added Current release - new code bugs: - sparx5: fix build with old GCC & bitmask on 32-bit targets Previous releases - regressions: - xfrm: redo the PREEMPT_RT RCU vs hash_resize_mutex deadlock fix - xfrm: fixes for the compat netlink attribute translator - phy: micrel: Fix detection of ksz87xx switch Previous releases - always broken: - gro: set inner transport header offset in tcp/udp GRO hook to avoid crashes when such packets reach GSO - vsock: handle VIRTIO_VSOCK_OP_CREDIT_REQUEST, as required by spec - dsa: sja1105: fix static FDB entries on SJA1105P/Q/R/S and SJA1110 - bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry - usb: lan78xx: don't modify phy_device state concurrently - usb: pegasus: check for errors of IO routines" * tag 'net-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits) net: vxge: fix use-after-free in vxge_device_unregister net: fec: fix use-after-free in fec_drv_remove net: pegasus: fix uninit-value in get_interrupt_interval net: ethernet: ti: am65-cpsw: fix crash in am65_cpsw_port_offload_fwd_mark_update() bnx2x: fix an error code in bnx2x_nic_load() net: wwan: iosm: fix recursive lock acquire in unregister net: wwan: iosm: correct data protocol mask bit net: wwan: iosm: endianness type correction net: wwan: iosm: fix lkp buildbot warning net: usb: lan78xx: don't modify phy_device state concurrently docs: networking: netdevsim rules net: usb: pegasus: Remove the changelog and DRIVER_VERSION. net: usb: pegasus: Check the return value of get_geristers() and friends; net/prestera: Fix devlink groups leakage in error flow net: sched: fix lockdep_set_class() typo error for sch->seqlock net: dsa: qca: ar9331: reorder MDIO write sequence VSOCK: handle VIRTIO_VSOCK_OP_CREDIT_REQUEST mptcp: drop unused rcu member in mptcp_pm_addr_entry net: ipv6: fix returned variable type in ip6_skb_dst_mtu nfp: update ethtool reporting of pauseframe control ...
2021-08-05Bluetooth: defer cleanup of resources in hci_unregister_dev()Tetsuo Handa
syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to calling lock_sock() with rw spinlock held [1]. It seems that history of this locking problem is a trial and error. Commit b40df5743ee8 ("[PATCH] bluetooth: fix socket locking in hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to lock_sock() as an attempt to fix lockdep warning. Then, commit 4ce61d1c7a8e ("[BLUETOOTH]: Fix locking in hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to local_bh_disable() + bh_lock_sock_nested() as an attempt to fix the sleep in atomic context warning. Then, commit 4b5dd696f81b ("Bluetooth: Remove local_bh_disable() from hci_sock.c") in 3.3-rc1 removed local_bh_disable(). Then, commit e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to lock_sock() as an attempt to fix CVE-2021-3573. This difficulty comes from current implementation that hci_sock_dev_event(HCI_DEV_UNREG) is responsible for dropping all references from sockets because hci_unregister_dev() immediately reclaims resources as soon as returning from hci_sock_dev_event(HCI_DEV_UNREG). But the history suggests that hci_sock_dev_event(HCI_DEV_UNREG) was not doing what it should do. Therefore, instead of trying to detach sockets from device, let's accept not detaching sockets from device at hci_sock_dev_event(HCI_DEV_UNREG), by moving actual cleanup of resources from hci_unregister_dev() to hci_cleanup_dev() which is called by bt_host_release() when all references to this unregistered device (which is a kobject) are gone. Since hci_sock_dev_event(HCI_DEV_UNREG) no longer resets hci_pi(sk)->hdev, we need to check whether this device was unregistered and return an error based on HCI_UNREGISTER flag. There might be subtle behavioral difference in "monitor the hdev" functionality; please report if you found something went wrong due to this patch. Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1] Reported-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object") Acked-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-04locking/atomic: simplify non-atomic wrappersMark Rutland
Since the non-atomic arch_*() bitops use plain accesses, they are implicitly instrumnted by the compiler, and we work around this in the instrumented wrappers to avoid double instrumentation. It's simpler to avoid the wrappers entirely, and use the preprocessor to alias the arch_*() bitops to their regular versions, removing the need for checks in the instrumented wrappers. Suggested-by: Marco Elver <elver@google.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/20210721155813.17082-1-mark.rutland@arm.com
2021-08-04Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-08-04 1) Fix a sysbot reported memory leak in xfrm_user_rcv_msg. From Pavel Skripkin. 2) Revert "xfrm: policy: Read seqcount outside of rcu-read side in xfrm_policy_lookup_bytype". This commit tried to fix a lockin bug, but only cured some of the symptoms. A proper fix is applied on top of this revert. 3) Fix a locking bug on xfrm state hash resize. A recent change on sequence counters accidentally repaced a spinlock by a mutex. Fix from Frederic Weisbecker. 4) Fix possible user-memory-access in xfrm_user_rcv_msg_compat(). From Dmitry Safonov. 5) Add initialiation sefltest fot xfrm_spdattr_type_t. From Dmitry Safonov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-04netfilter: ipset: Limit the maximal range of consecutive elements to add/deleteJozsef Kadlecsik
The range size of consecutive elements were not limited. Thus one could define a huge range which may result soft lockup errors due to the long execution time. Now the range size is limited to 2^20 entries. Reported-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-03net: ipv6: fix returned variable type in ip6_skb_dst_mtuAntoine Tenart
The patch fixing the returned value of ip6_skb_dst_mtu (int -> unsigned int) was rebased between its initial review and the version applied. In the meantime fade56410c22 was applied, which added a new variable (int) used as the returned value. This lead to a mismatch between the function prototype and the variable used as the return value. Fixes: 40fc3054b458 ("net: ipv6: fix return value of ip6_skb_dst_mtu") Cc: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03net: sched: provide missing kdoc for tcf_pkt_info and tcf_ematch_opsBijie Xu
Provide missing kdoc of fields of struct tcf_pkt_info and tcf_ematch_ops. Found using ./scripts/kernel-doc -none -Werror include/net/pkt_cls.h Signed-off-by: Bijie Xu <bijie.xu@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03net: flow_offload: correct comments mismatch with codeBijie Xu
Correct mismatch between the name of flow_offload_has_one_action() and its kdoc entry. Found using ./scripts/kernel-doc -Werror -none include/net/flow_offload.h Signed-off-by: Bijie Xu <bijie.xu@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03net: really fix the build...David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-02Revert "mhi: Fix networking tree build."Jakub Kicinski
This reverts commit 40e159403896f7d55c98f858d0b20fee1d941fa4. Looks like this commit breaks the build for me. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-02Merge tag 'tee-kexec-fixes-for-v5.14' of ↵Arnd Bergmann
git://git.linaro.org:/people/jens.wiklander/linux-tee into arm/fixes tee: Improve support for kexec and kdump This fixes several bugs uncovered while exercising the OP-TEE, ftpm (firmware TPM), and tee_bnxt_fw (Broadcom BNXT firmware manager) drivers with kexec and kdump (emergency kexec) based workflows. * tag 'tee-kexec-fixes-for-v5.14' of git://git.linaro.org:/people/jens.wiklander/linux-tee: firmware: tee_bnxt: Release TEE shm, session, and context during kexec tpm_ftpm_tee: Free and unregister TEE shared memory during kexec tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag tee: add tee_shm_alloc_kernel_buf() optee: Clear stale cache entries during initialization optee: fix tee out of memory failure seen during kexec reboot optee: Refuse to load the driver under the kdump kernel optee: Fix memory leak when failing to register shm pages Link: https://lore.kernel.org/r/20210726081039.GA2482361@jade Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-08-02mhi: Fix networking tree build.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-30Merge tag 'net-5.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi (mac80211) and netfilter trees. Current release - regressions: - mac80211: fix starting aggregation sessions on mesh interfaces Current release - new code bugs: - sctp: send pmtu probe only if packet loss in Search Complete state - bnxt_en: add missing periodic PHC overflow check - devlink: fix phys_port_name of virtual port and merge error - hns3: change the method of obtaining default ptp cycle - can: mcba_usb_start(): add missing urb->transfer_dma initialization Previous releases - regressions: - set true network header for ECN decapsulation - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and LRO - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811 PHY - sctp: fix return value check in __sctp_rcv_asconf_lookup Previous releases - always broken: - bpf: - more spectre corner case fixes, introduce a BPF nospec instruction for mitigating Spectre v4 - fix OOB read when printing XDP link fdinfo - sockmap: fix cleanup related races - mac80211: fix enabling 4-address mode on a sta vif after assoc - can: - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF - j1939: j1939_session_deactivate(): clarify lifetime of session object, avoid UAF - fix number of identical memory leaks in USB drivers - tipc: - do not blindly write skb_shinfo frags when doing decryption - fix sleeping in tipc accept routine" * tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits) gve: Update MAINTAINERS list can: esd_usb2: fix memory leak can: ems_usb: fix memory leak can: usb_8dev: fix memory leak can: mcba_usb_start(): add missing urb->transfer_dma initialization can: hi311x: fix a signedness bug in hi3110_cmd() MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver bpf: Fix leakage due to insufficient speculative store bypass mitigation bpf: Introduce BPF nospec instruction for mitigating Spectre v4 sis900: Fix missing pci_disable_device() in probe and remove net: let flow have same hash in two directions nfc: nfcsim: fix use after free during module unload tulip: windbond-840: Fix missing pci_disable_device() in probe and remove sctp: fix return value check in __sctp_rcv_asconf_lookup nfc: s3fwrn5: fix undefined parameter values in dev_err() net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32 net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev() net/mlx5: Unload device upon firmware fatal error net/mlx5e: Fix page allocation failure for ptp-RQ over SF net/mlx5e: Fix page allocation failure for trap-RQ over SF ...
2021-07-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - resume timing fix for intel-ish driver (Ye Xiang) - fix for using incorrect MMIO register in amd_sfh driver (Dylan MacKenzie) - Cintiq 24HDT / 27QHDT regression fix and touch processing fix for Wacom driver (Jason Gerecke) - device removal bugfix for ft260 driver (Michael Zaidman) - other small assorted fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: ft260: fix device removal due to USB disconnect HID: wacom: Skip processing of touches with negative slot values HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT HID: Kconfig: Fix spelling mistake "Uninterruptable" -> "Uninterruptible" HID: apple: Add support for Keychron K1 wireless keyboard HID: fix typo in Kconfig HID: ft260: fix format type warning in ft260_word_show() HID: amd_sfh: Use correct MMIO register for DMA address HID: asus: Remove check for same LED brightness on set HID: intel-ish-hid: use async resume function
2021-07-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2021-07-29 The following pull-request contains BPF updates for your *net* tree. We've added 9 non-merge commits during the last 14 day(s) which contain a total of 20 files changed, 446 insertions(+), 138 deletions(-). The main changes are: 1) Fix UBSAN out-of-bounds splat for showing XDP link fdinfo, from Lorenz Bauer. 2) Fix insufficient Spectre v4 mitigation in BPF runtime, from Daniel Borkmann, Piotr Krysiuk and Benedict Schlueter. 3) Batch of fixes for BPF sockmap found under stress testing, from John Fastabend. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>