summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2021-08-17locking/rtmutex: Add mutex variant for RTThomas Gleixner
Add the necessary defines, helpers and API functions for replacing struct mutex on a PREEMPT_RT enabled kernel with an rtmutex based variant. No functional change when CONFIG_PREEMPT_RT=n Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.081517417@linutronix.de
2021-08-17locking/ww_mutex: Implement rtmutex based ww_mutex API functionsPeter Zijlstra
Add the actual ww_mutex API functions which replace the mutex based variant on RT enabled kernels. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.024057938@linutronix.de
2021-08-17locking/rtmutex: Extend the rtmutex core to support ww_mutexPeter Zijlstra
Add a ww acquire context pointer to the waiter and various functions and add the ww_mutex related invocations to the proper spots in the locking code, similar to the mutex based variant. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.966139174@linutronix.de
2021-08-17locking/ww_mutex: Add rt_mutex based lock type and accessorsPeter Zijlstra
Provide the defines for RT mutex based ww_mutexes and fix up the debug logic so it's either enabled by DEBUG_MUTEXES or DEBUG_RT_MUTEXES on RT kernels. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.908012566@linutronix.de
2021-08-17locking/ww_mutex: Add RT priority to W/W orderPeter Zijlstra
RT mutex based ww_mutexes cannot order based on timestamps. They have to order based on priority. Add the necessary decision logic. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.847536630@linutronix.de
2021-08-17locking/ww_mutex: Implement rt_mutex accessorsPeter Zijlstra
Provide the type defines and the helper inlines for rtmutex based ww_mutexes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.790760545@linutronix.de
2021-08-17locking/ww_mutex: Abstract out internal lock accessesThomas Gleixner
Accessing the internal wait_lock of mutex and rtmutex is slightly different. Provide helper functions for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.734635961@linutronix.de
2021-08-17locking/ww_mutex: Abstract out mutex typesPeter Zijlstra
Some ww_mutex helper functions use pointers for the underlying mutex and mutex_waiter. The upcoming rtmutex based implementation needs to share these functions. Add and use defines for the types and replace the direct types in the affected functions. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.678720245@linutronix.de
2021-08-17locking/ww_mutex: Abstract out mutex accessorsPeter Zijlstra
Move the mutex related access from various ww_mutex functions into helper functions so they can be substituted for rtmutex based ww_mutex later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.622477030@linutronix.de
2021-08-17locking/ww_mutex: Abstract out waiter enqueueingPeter Zijlstra
The upcoming rtmutex based ww_mutex needs a different handling for enqueueing a waiter. Split it out into a helper function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.566318143@linutronix.de
2021-08-17locking/ww_mutex: Abstract out the waiter iterationPeter Zijlstra
Split out the waiter iteration functions so they can be substituted for a rtmutex based ww_mutex later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.509186185@linutronix.de
2021-08-17locking/ww_mutex: Remove the __sched annotation from ww_mutex APIsPeter Zijlstra
None of these functions will be on the stack when blocking in schedule(), hence __sched is not needed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.453235952@linutronix.de
2021-08-17locking/ww_mutex: Split out the W/W implementation logic into ↵Peter Zijlstra (Intel)
kernel/locking/ww_mutex.h Split the W/W mutex helper functions out into a separate header file, so they can be shared with a rtmutex based variant later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.396893399@linutronix.de
2021-08-17locking/ww_mutex: Split up ww_mutex_unlock()Peter Zijlstra (Intel)
Split the ww related part out into a helper function so it can be reused for a rtmutex based ww_mutex implementation. [ mingo: Fixed bisection failure. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.340166556@linutronix.de
2021-08-17locking/ww_mutex: Gather mutex_waiter initializationPeter Zijlstra
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.281927514@linutronix.de
2021-08-17locking/ww_mutex: Simplify lockdep annotationsPeter Zijlstra
No functional change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.222921634@linutronix.de
2021-08-17locking/mutex: Make mutex::wait_lock rawThomas Gleixner
The wait_lock of mutex is really a low level lock. Convert it to a raw_spinlock like the wait_lock of rtmutex. [ mingo: backmerged the test_lockup.c build fix by bigeasy. ] Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.166863404@linutronix.de
2021-08-17locking/mutex: Move the 'struct mutex_waiter' definition from ↵Thomas Gleixner
<linux/mutex.h> to the internal header Move the mutex waiter declaration from the public <linux/mutex.h> header to the internal kernel/locking/mutex.h header. There is no reason to expose it outside of the core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.054325923@linutronix.de
2021-08-17locking/mutex: Consolidate core headers, remove kernel/locking/mutex-debug.hThomas Gleixner
Having two header files which contain just the non-debug and debug variants is mostly waste of disc space and has no real value. Stick the debug variants into the common mutex.h file as counterpart to the stubs for the non-debug case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.995350521@linutronix.de
2021-08-17locking/rtmutex: Squash !RT tasks to DEFAULT_PRIOPeter Zijlstra
Ensure all !RT tasks have the same prio such that they end up in FIFO order and aren't split up according to nice level. The reason why nice levels were taken into account so far is historical. In the early days of the rtmutex code it was done to give the PI boosting and deboosting a larger coverage. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.938676930@linutronix.de
2021-08-17locking/rwlock: Provide RT variantThomas Gleixner
Similar to rw_semaphores, on RT the rwlock substitution is not writer fair, because it's not feasible to have a writer inherit its priority to multiple readers. Readers blocked on a writer follow the normal rules of priority inheritance. Like RT spinlocks, RT rwlocks are state preserving across the slow lock operations (contended case). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.882793524@linutronix.de
2021-08-17locking/spinlock: Provide RT variantThomas Gleixner
Provide the actual locking functions which make use of the general and spinlock specific rtmutex code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.826621464@linutronix.de
2021-08-17locking/rtmutex: Provide the spin/rwlock core lock functionThomas Gleixner
A simplified version of the rtmutex slowlock function, which neither handles signals nor timeouts, and is careful about preserving the state of the blocked task across the lock operation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.770228446@linutronix.de
2021-08-17locking/rtmutex: Guard regular sleeping locks specific functionsThomas Gleixner
Guard the regular sleeping lock specific functionality, which is used for rtmutex on non-RT enabled kernels and for mutex, rtmutex and semaphores on RT enabled kernels so the code can be reused for the RT specific implementation of spinlocks and rwlocks in a different compilation unit. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.311535693@linutronix.de
2021-08-17locking/rtmutex: Prepare RT rt_mutex_wake_q for RT locksThomas Gleixner
Add an rtlock_task pointer to rt_mutex_wake_q, which allows to handle the RT specific wakeup for spin/rwlock waiters. The pointer is just consuming 4/8 bytes on the stack so it is provided unconditionaly to avoid #ifdeffery all over the place. This cannot use a regular wake_q, because a task can have concurrent wakeups which would make it miss either lock or the regular wakeups, depending on what gets queued first, unless task struct gains a separate wake_q_node for this, which would be overkill, because there can only be a single task which gets woken up in the spin/rw_lock unlock path. No functional change for non-RT enabled kernels. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.253614678@linutronix.de
2021-08-17locking/rtmutex: Use rt_mutex_wake_q_headThomas Gleixner
Prepare for the required state aware handling of waiter wakeups via wake_q and switch the rtmutex code over to the rtmutex specific wrapper. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.197113263@linutronix.de
2021-08-17locking/rtmutex: Provide rt_wake_q_head and helpersThomas Gleixner
To handle the difference between wakeups for regular sleeping locks (mutex, rtmutex, rw_semaphore) and the wakeups for 'sleeping' spin/rwlocks on PREEMPT_RT enabled kernels correctly, it is required to provide a wake_q_head construct which allows to keep them separate. Provide a wrapper around wake_q_head and the required helpers, which will be extended with the state handling later. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.139337655@linutronix.de
2021-08-17locking/rtmutex: Add wake_state to rt_mutex_waiterThomas Gleixner
Regular sleeping locks like mutexes, rtmutexes and rw_semaphores are always entering and leaving a blocking section with task state == TASK_RUNNING. On a non-RT kernel spinlocks and rwlocks never affect the task state, but on RT kernels these locks are converted to rtmutex based 'sleeping' locks. So in case of contention the task goes to block, which requires to carefully preserve the task state, and restore it after acquiring the lock taking regular wakeups for the task into account, which happened while the task was blocked. This state preserving is achieved by having a separate task state for blocking on a RT spin/rwlock and a saved_state field in task_struct along with careful handling of these wakeup scenarios in try_to_wake_up(). To avoid conditionals in the rtmutex code, store the wake state which has to be used for waking a lock waiter in rt_mutex_waiter which allows to handle the regular and RT spin/rwlocks by handing it to wake_up_state(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.079800739@linutronix.de
2021-08-17locking/rwsem: Add rtmutex based R/W semaphore implementationThomas Gleixner
The RT specific R/W semaphore implementation used to restrict the number of readers to one, because a writer cannot block on multiple readers and inherit its priority or budget. The single reader restricting was painful in various ways: - Performance bottleneck for multi-threaded applications in the page fault path (mmap sem) - Progress blocker for drivers which are carefully crafted to avoid the potential reader/writer deadlock in mainline. The analysis of the writer code paths shows that properly written RT tasks should not take them. Syscalls like mmap(), file access which take mmap sem write locked have unbound latencies, which are completely unrelated to mmap sem. Other R/W sem users like graphics drivers are not suitable for RT tasks either. So there is little risk to hurt RT tasks when the RT rwsem implementation is done in the following way: - Allow concurrent readers - Make writers block until the last reader left the critical section. This blocking is not subject to priority/budget inheritance. - Readers blocked on a writer inherit their priority/budget in the normal way. There is a drawback with this scheme: R/W semaphores become writer unfair though the applications which have triggered writer starvation (mostly on mmap_sem) in the past are not really the typical workloads running on a RT system. So while it's unlikely to hit writer starvation, it's possible. If there are unexpected workloads on RT systems triggering it, the problem has to be revisited. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.016885947@linutronix.de
2021-08-17locking/rt: Add base code for RT rw_semaphore and rwlockThomas Gleixner
On PREEMPT_RT, rw_semaphores and rwlocks are substituted with an rtmutex and a reader count. The implementation is writer unfair, as it is not feasible to do priority inheritance on multiple readers, but experience has shown that real-time workloads are not the typical workloads which are sensitive to writer starvation. The inner workings of rw_semaphores and rwlocks on RT are almost identical except for the task state and signal handling. rw_semaphores are not state preserving over a contention, they are expected to enter and leave with state == TASK_RUNNING. rwlocks have a mechanism to preserve the state of the task at entry and restore it after unblocking taking potential non-lock related wakeups into account. rw_semaphores can also be subject to signal handling interrupting a blocked state, while rwlocks ignore signals. To avoid code duplication, provide a shared implementation which takes the small difference vs. state and signals into account. The code is included into the relevant rw_semaphore/rwlock base code and compiled for each use case separately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.957920571@linutronix.de
2021-08-17locking/rtmutex: Provide rt_mutex_slowlock_locked()Thomas Gleixner
Split the inner workings of rt_mutex_slowlock() out into a separate function, which can be reused by the upcoming RT lock substitutions, e.g. for rw_semaphores. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.841971086@linutronix.de
2021-08-17locking/rtmutex: Split out the inner parts of 'struct rtmutex'Peter Zijlstra
RT builds substitutions for rwsem, mutex, spinlock and rwlock around rtmutexes. Split the inner working out so each lock substitution can use them with the appropriate lockdep annotations. This avoids having an extra unused lockdep map in the wrapped rtmutex. No functional change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.784739994@linutronix.de
2021-08-17locking/rtmutex: Split API from implementationThomas Gleixner
Prepare for reusing the inner functions of rtmutex for RT lock substitutions: introduce kernel/locking/rtmutex_api.c and move them there. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.726560996@linutronix.de
2021-08-17locking/rtmutex: Switch to from cmpxchg_*() to try_cmpxchg_*()Thomas Gleixner
Allows the compiler to generate better code depending on the architecture. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.668958502@linutronix.de
2021-08-17locking/rtmutex: Convert macros to inlinesSebastian Andrzej Siewior
Inlines are type-safe... Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.610830960@linutronix.de
2021-08-17sched/core: Provide a scheduling point for RT locksThomas Gleixner
RT enabled kernels substitute spin/rwlocks with 'sleeping' variants based on rtmutexes. Blocking on such a lock is similar to preemption versus: - I/O scheduling and worker handling, because these functions might block on another substituted lock, or come from a lock contention within these functions. - RCU considers this like a preemption, because the task might be in a read side critical section. Add a separate scheduling point for this, and hand a new scheduling mode argument to __schedule() which allows, along with separate mode masks, to handle this gracefully from within the scheduler, without proliferating that to other subsystems like RCU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.372319055@linutronix.de
2021-08-17sched/core: Rework the __schedule() preempt argumentThomas Gleixner
PREEMPT_RT needs to hand a special state into __schedule() when a task blocks on a 'sleeping' spin/rwlock. This is required to handle rcu_note_context_switch() correctly without having special casing in the RCU code. From an RCU point of view the blocking on the sleeping spinlock is equivalent to preemption, because the task might be in a read side critical section. schedule_debug() also has a check which would trigger with the !preempt case, but that could be handled differently. To avoid adding another argument and extra checks which cannot be optimized out by the compiler, the following solution has been chosen: - Replace the boolean 'preempt' argument with an unsigned integer 'sched_mode' argument and define constants to hand in: (0 == no preemption, 1 = preemption). - Add two masks to apply on that mode: one for the debug/rcu invocations, and one for the actual scheduling decision. For a non RT kernel these masks are UINT_MAX, i.e. all bits are set, which allows the compiler to optimize the AND operation out, because it is not masking out anything. IOW, it's not different from the boolean. RT enabled kernels will define these masks separately. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.315473019@linutronix.de
2021-08-17sched/wakeup: Prepare for RT sleeping spin/rwlocksThomas Gleixner
Waiting for spinlocks and rwlocks on non RT enabled kernels is task::state preserving. Any wakeup which matches the state is valid. RT enabled kernels substitutes them with 'sleeping' spinlocks. This creates an issue vs. task::__state. In order to block on the lock, the task has to overwrite task::__state and a consecutive wakeup issued by the unlocker sets the state back to TASK_RUNNING. As a consequence the task loses the state which was set before the lock acquire and also any regular wakeup targeted at the task while it is blocked on the lock. To handle this gracefully, add a 'saved_state' member to task_struct which is used in the following way: 1) When a task blocks on a 'sleeping' spinlock, the current state is saved in task::saved_state before it is set to TASK_RTLOCK_WAIT. 2) When the task unblocks and after acquiring the lock, it restores the saved state. 3) When a regular wakeup happens for a task while it is blocked then the state change of that wakeup is redirected to operate on task::saved_state. This is also required when the task state is running because the task might have been woken up from the lock wait and has not yet restored the saved state. To make it complete, provide the necessary helpers to save and restore the saved state along with the necessary documentation how the RT lock blocking is supposed to work. For non-RT kernels there is no functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.258751046@linutronix.de
2021-08-17sched/wakeup: Split out the wakeup ->__state checkThomas Gleixner
RT kernels have a slightly more complicated handling of wakeups due to 'sleeping' spin/rwlocks. If a task is blocked on such a lock then the original state of the task is preserved over the blocking period, and any regular (non lock related) wakeup has to be targeted at the saved state to ensure that these wakeups are not lost. Once the task acquires the lock it restores the task state from the saved state. To avoid cluttering try_to_wake_up() with that logic, split the wakeup state check out into an inline helper and use it at both places where task::__state is checked against the state argument of try_to_wake_up(). No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.088945085@linutronix.de
2021-08-17locking/rtmutex: Set proper wait context for lockdepThomas Gleixner
RT mutexes belong to the LD_WAIT_SLEEP class. Make them so. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211302.031014562@linutronix.de
2021-08-17Merge tag 'v5.14-rc6' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-08-15Merge tag 'irq-urgent-2021-08-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for PCI/MSI and x86 interrupt startup: - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked entries stay around e.g. when a crashkernel is booted. - Enforce masking of a MSI-X table entry when updating it, which mandatory according to speification - Ensure that writes to MSI[-X} tables are flushed. - Prevent invalid bits being set in the MSI mask register - Properly serialize modifications to the mask cache and the mask register for multi-MSI. - Cure the violation of the affinity setting rules on X86 during interrupt startup which can cause lost and stale interrupts. Move the initial affinity setting ahead of actualy enabling the interrupt. - Ensure that MSI interrupts are completely torn down before freeing them in the error handling case. - Prevent an array out of bounds access in the irq timings code" * tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: driver core: Add missing kernel doc for device::msi_lock genirq/msi: Ensure deactivation on teardown genirq/timings: Prevent potential array overflow in __irq_timings_store() x86/msi: Force affinity setup before startup x86/ioapic: Force affinity setup before startup genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP PCI/MSI: Protect msi_desc::masked for multi-MSI PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() PCI/MSI: Correct misleading comments PCI/MSI: Do not set invalid bits in MSI mask PCI/MSI: Enforce MSI[X] entry updates to be visible PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Mask all unused MSI-X entries PCI/MSI: Enable and mask MSI-X early
2021-08-15Merge tag 'locking_urgent_for_v5.14_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Fix a CONFIG symbol's spelling * tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Use the correct rtmutex debugging config option
2021-08-12Merge tag 'net-5.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from netfilter, bpf, can and ieee802154. The size of this is pretty normal, but we got more fixes for 5.14 changes this week than last week. Nothing major but the trend is the opposite of what we like. We'll see how the next week goes.. Current release - regressions: - r8169: fix ASPM-related link-up regressions - bridge: fix flags interpretation for extern learn fdb entries - phy: micrel: fix link detection on ksz87xx switch - Revert "tipc: Return the correct errno code" - ptp: fix possible memory leak caused by invalid cast Current release - new code bugs: - bpf: add missing bpf_read_[un]lock_trace() for syscall program - bpf: fix potentially incorrect results with bpf_get_local_storage() - page_pool: mask the page->signature before the checking, avoid dma mapping leaks - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps - bnxt_en: fix firmware interface issues with PTP - mlx5: Bridge, fix ageing time Previous releases - regressions: - linkwatch: fix failure to restore device state across suspend/resume - bareudp: fix invalid read beyond skb's linear data Previous releases - always broken: - bpf: fix integer overflow involving bucket_size - ppp: fix issues when desired interface name is specified via netlink - wwan: mhi_wwan_ctrl: fix possible deadlock - dsa: microchip: ksz8795: fix number of VLAN related bugs - dsa: drivers: fix broken backpressure in .port_fdb_dump - dsa: qca: ar9331: make proper initial port defaults Misc: - bpf: add lockdown check for probe_write_user helper - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is out - netfilter: conntrack: collect all entries in one cycle, heuristically slow down garbage collection scans on idle systems to prevent frequent wake ups" * tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) vsock/virtio: avoid potential deadlock when vsock device remove wwan: core: Avoid returning NULL from wwan_create_dev() net: dsa: sja1105: unregister the MDIO buses during teardown Revert "tipc: Return the correct errno code" net: mscc: Fix non-GPL export of regmap APIs net: igmp: increase size of mr_ifc_count MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets net: pcs: xpcs: fix error handling on failed to allocate memory net: linkwatch: fix failure to restore device state across suspend/resume net: bridge: fix memleak in br_add_if() net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge net: bridge: fix flags interpretation for extern learn fdb entries net: dsa: sja1105: fix broken backpressure in .port_fdb_dump net: dsa: lantiq: fix broken backpressure in .port_fdb_dump net: dsa: lan9303: fix broken backpressure in .port_fdb_dump net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump bpf, core: Fix kernel-doc notation net: igmp: fix data-race in igmp_ifc_timer_expire() net: Fix memory leak in ieee802154_raw_deliver ...
2021-08-12Merge branch 'for-v5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucounts fix from Eric Biederman: "This fixes the ucount sysctls on big endian architectures. The counts were expanded to be longs instead of ints, and the sysctl code was overlooked, so only the low 32bit were being processed. On litte endian just processing the low 32bits is fine, but on 64bit big endian processing just the low 32bits results in the high order bits instead of the low order bits being processed and nothing works proper. This change took a little bit to mature as we have the SYSCTL_ZERO, and SYSCTL_INT_MAX macros that are only usable for sysctls operating on ints, but unfortunately are not obviously broken. Which resulted in the versions of this change working on big endian and not on little endian, because the int SYSCTL_ZERO when extended 64bit wound up being 0x100000000. So we only allowed values greater than 0x100000000 and less than 0faff. Which unfortunately broken everything that tried to set the sysctls. (First reported with the windows subsystem for linux). I have tested this on x86_64 64bit after first reproducing the problems with the earlier version of this change, and then verifying the problems do not exist when we use appropriate long min and max values for extra1 and extra2" * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: add missing data type changes
2021-08-11Merge tag 'seccomp-v5.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp fixes from Kees Cook: - Fix typo in user notification documentation (Rodrigo Campos) - Fix userspace counter report when using TSYNC (Hsuan-Chi Kuo, Wiktor Garbacz) * tag 'seccomp-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: Fix setting loaded filter count during TSYNC Documentation: seccomp: Fix typo in user notification
2021-08-11seccomp: Fix setting loaded filter count during TSYNCHsuan-Chi Kuo
The desired behavior is to set the caller's filter count to thread's. This value is reported via /proc, so this fixes the inaccurate count exposed to userspace; it is not used for reference counting, etc. Signed-off-by: Hsuan-Chi Kuo <hsuanchikuo@gmail.com> Link: https://lore.kernel.org/r/20210304233708.420597-1-hsuanchikuo@gmail.com Co-developed-by: Wiktor Garbacz <wiktorg@google.com> Signed-off-by: Wiktor Garbacz <wiktorg@google.com> Link: https://lore.kernel.org/lkml/20210810125158.329849-1-wiktorg@google.com Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Fixes: c818c03b661c ("seccomp: Report number of loaded filters in /proc/$pid/status")
2021-08-10genirq/msi: Ensure deactivation on teardownBixuan Cui
msi_domain_alloc_irqs() invokes irq_domain_activate_irq(), but msi_domain_free_irqs() does not enforce deactivation before tearing down the interrupts. This happens when PCI/MSI interrupts are set up and never used before being torn down again, e.g. in error handling pathes. The only place which cleans that up is the error handling path in msi_domain_alloc_irqs(). Move the cleanup from msi_domain_alloc_irqs() into msi_domain_free_irqs() to cure that. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210518033117.78104-1-cuibixuan@huawei.com
2021-08-10genirq/timings: Prevent potential array overflow in __irq_timings_store()Ben Dai
When the interrupt interval is greater than 2 ^ PREDICTION_BUFFER_SIZE * PREDICTION_FACTOR us and less than 1s, the calculated index will be greater than the length of irqs->ema_time[]. Check the calculated index before using it to prevent array overflow. Fixes: 23aa3b9a6b7d ("genirq/timings: Encapsulate storing function") Signed-off-by: Ben Dai <ben.dai@unisoc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210425150903.25456-1-ben.dai9703@gmail.com
2021-08-10bpf, core: Fix kernel-doc notationRandy Dunlap
Fix kernel-doc warnings in kernel/bpf/core.c (found by scripts/kernel-doc and W=1 builds). That is, correct a function name in a comment and add return descriptions for 2 functions. Fixes these kernel-doc warnings: kernel/bpf/core.c:1372: warning: expecting prototype for __bpf_prog_run(). Prototype was for ___bpf_prog_run() instead kernel/bpf/core.c:1372: warning: No description found for return value of '___bpf_prog_run' kernel/bpf/core.c:1883: warning: No description found for return value of 'bpf_prog_select_runtime' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210809215229.7556-1-rdunlap@infradead.org