summaryrefslogtreecommitdiff
path: root/kernel/rcu
AgeCommit message (Collapse)Author
2024-09-18Merge tag 'slab-for-6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: "This time it's mostly refactoring and improving APIs for slab users in the kernel, along with some debugging improvements. - kmem_cache_create() refactoring (Christian Brauner) Over the years have been growing new parameters to kmem_cache_create() where most of them are needed only for a small number of caches - most recently the rcu_freeptr_offset parameter. To avoid adding new parameters to kmem_cache_create() and adjusting all its callers, or creating new wrappers such as kmem_cache_create_rcu(), we can now pass extra parameters using the new struct kmem_cache_args. Not explicitly initialized fields default to values interpreted as unused. kmem_cache_create() is for now a wrapper that works both with the new form: kmem_cache_create(name, object_size, args, flags) and the legacy form: kmem_cache_create(name, object_size, align, flags, ctor) - kmem_cache_destroy() waits for kfree_rcu()'s in flight (Vlastimil Babka, Uladislau Rezki) Since SLOB removal, kfree() is allowed for freeing objects allocated by kmem_cache_create(). By extension kfree_rcu() as allowed as well, which can allow converting simple call_rcu() callbacks that only do kmem_cache_free(), as there was never a kmem_cache_free_rcu() variant. However, for caches that can be destroyed e.g. on module removal, the cache owners knew to issue rcu_barrier() first to wait for the pending call_rcu()'s, and this is not sufficient for pending kfree_rcu()'s due to its internal batching optimizations. Ulad has provided a new kvfree_rcu_barrier() and to make the usage less error-prone, kmem_cache_destroy() calls it. Additionally, destroying SLAB_TYPESAFE_BY_RCU caches now again issues rcu_barrier() synchronously instead of using an async work, because the past motivation for async work no longer applies. Users of custom call_rcu() callbacks should however keep calling rcu_barrier() before cache destruction. - Debugging use-after-free in SLAB_TYPESAFE_BY_RCU caches (Jann Horn) Currently, KASAN cannot catch UAFs in such caches as it is legal to access them within a grace period, and we only track the grace period when trying to free the underlying slab page. The new CONFIG_SLUB_RCU_DEBUG option changes the freeing of individual object to be RCU-delayed, after which KASAN can poison them. - Delayed memcg charging (Shakeel Butt) In some cases, the memcg is uknown at allocation time, such as receiving network packets in softirq context. With kmem_cache_charge() these may be now charged later when the user and its memcg is known. - Misc fixes and improvements (Pedro Falcato, Axel Rasmussen, Christoph Lameter, Yan Zhen, Peng Fan, Xavier)" * tag 'slab-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (34 commits) mm, slab: restore kerneldoc for kmem_cache_create() io_uring: port to struct kmem_cache_args slab: make __kmem_cache_create() static inline slab: make kmem_cache_create_usercopy() static inline slab: remove kmem_cache_create_rcu() file: port to struct kmem_cache_args slab: create kmem_cache_create() compatibility layer slab: port KMEM_CACHE_USERCOPY() to struct kmem_cache_args slab: port KMEM_CACHE() to struct kmem_cache_args slab: remove rcu_freeptr_offset from struct kmem_cache slab: pass struct kmem_cache_args to do_kmem_cache_create() slab: pull kmem_cache_open() into do_kmem_cache_create() slab: pass struct kmem_cache_args to create_cache() slab: port kmem_cache_create_usercopy() to struct kmem_cache_args slab: port kmem_cache_create_rcu() to struct kmem_cache_args slab: port kmem_cache_create() to struct kmem_cache_args slab: add struct kmem_cache_args slab: s/__kmem_cache_create/do_kmem_cache_create/g memcg: add charging of already allocated slab objects mm/slab: Optimize the code logic in find_mergeable() ...
2024-09-18Merge tag 'rcu.release.v6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Neeraj Upadhyay: "Context tracking: - rename context tracking state related symbols and remove references to "dynticks" in various context tracking state variables and related helpers - force context_tracking_enabled_this_cpu() to be inlined to avoid leaving a noinstr section CSD lock: - enhance CSD-lock diagnostic reports - add an API to provide an indication of ongoing CSD-lock stall nocb: - update and simplify RCU nocb code to handle (de-)offloading of callbacks only for offline CPUs - fix RT throttling hrtimer being armed from offline CPU rcutorture: - remove redundant rcu_torture_ops get_gp_completed fields - add SRCU ->same_gp_state and ->get_comp_state functions - add generic test for NUM_ACTIVE_*RCU_POLL* for testing RCU and SRCU polled grace periods - add CFcommon.arch for arch-specific Kconfig options - print number of update types in rcu_torture_write_types() - add rcutree.nohz_full_patience_delay testing to the TREE07 scenario - add a stall_cpu_repeat module parameter to test repeated CPU stalls - add argument to limit number of CPUs a guest OS can use in torture.sh rcustall: - abbreviate RCU CPU stall warnings during CSD-lock stalls - Allow dump_cpu_task() to be called without disabling preemption - defer printing stall-warning backtrace when holding rcu_node lock srcu: - make SRCU gp seq wrap-around faster - add KCSAN checks for concurrent updates to ->srcu_n_exp_nodelay and ->reschedule_count which are used in heuristics governing auto-expediting of normal SRCU grace periods and grace-period-state-machine delays - mark idle SRCU-barrier callbacks to help identify stuck SRCU-barrier callback rcu tasks: - remove RCU Tasks Rude asynchronous APIs as they are no longer used - stop testing RCU Tasks Rude asynchronous APIs - fix access to non-existent percpu regions - check processor-ID assumptions during chosen CPU calculation for callback enqueuing - update description of rtp->tasks_gp_seq grace-period sequence number - add rcu_barrier_cb_is_done() to identify whether a given rcu_barrier callback is stuck - mark idle Tasks-RCU-barrier callbacks - add *torture_stats_print() functions to print detailed diagnostics for Tasks-RCU variants - capture start time of rcu_barrier_tasks*() operation to help distinguish a hung barrier operation from a long series of barrier operations refscale: - add a TINY scenario to support tests of Tiny RCU and Tiny SRCU - optimize process_durations() operation rcuscale: - dump stacks of stalled rcu_scale_writer() instances and grace-period statistics when rcu_scale_writer() stalls - mark idle RCU-barrier callbacks to identify stuck RCU-barrier callbacks - print detailed grace-period and barrier diagnostics on rcu_scale_writer() hangs for Tasks-RCU variants - warn if async module parameter is specified for RCU implementations that do not have async primitives such as RCU Tasks Rude - make all writer tasks report upon hang - tolerate repeated GFP_KERNEL failure in rcu_scale_writer() - use special allocator for rcu_scale_writer() - NULL out top-level pointers to heap memory to avoid double-free bugs on modprobe failures - maintain per-task instead of per-CPU callbacks count to avoid any issues with migration of either tasks or callbacks - constify struct ref_scale_ops Fixes: - use system_unbound_wq for kfree_rcu work to avoid disturbing isolated CPUs Misc: - warn on unexpected rcu_state.srs_done_tail state - better define "atomic" for list_replace_rcu() and hlist_replace_rcu() routines - annotate struct kvfree_rcu_bulk_data with __counted_by()" * tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (90 commits) rcu: Defer printing stall-warning backtrace when holding rcu_node lock rcu/nocb: Remove superfluous memory barrier after bypass enqueue rcu/nocb: Conditionally wake up rcuo if not already waiting on GP rcu/nocb: Fix RT throttling hrtimer armed from offline CPU rcu/nocb: Simplify (de-)offloading state machine context_tracking: Tag context_tracking_enabled_this_cpu() __always_inline context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching rcu: Update stray documentation references to rcu_dynticks_eqs_{enter, exit}() rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs() rcu: Rename rcu_implicit_dynticks_qs() into rcu_watching_snap_recheck() rcu: Rename dyntick_save_progress_counter() into rcu_watching_snap_save() rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap rcu: Rename struct rcu_data .dynticks_snap into .watching_snap rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs() rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since() rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs() rcu: Rename rcu_dynticks_eqs_online() into rcu_watching_online() context_tracking, rcu: Rename rcu_dynticks_curr_cpu_in_eqs() into rcu_is_watching_curr_cpu() context_tracking, rcu: Rename rcu_dynticks_task*() into rcu_task*() refscale: Constify struct ref_scale_ops ...
2024-09-17Merge tag 'printk-for-6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: "This is the "last" part of the support for the new nbcon consoles. Where "nbcon" stays for "No Big console lock CONsoles" aka not under the console_lock. New callbacks are added to struct console: - write_thread() for flushing nbcon consoles in task context. - write_atomic() for flushing nbcon consoles in atomic context, including NMI. - con->device_lock() and device_unlock() for taking the driver specific lock, for example, port->lock. New printk-specific kthreads are created: - per-console kthreads which get responsible for flushing normal priority messages on nbcon consoles. - thread which gets responsible for flushing normal priority messages on all consoles when CONFIG_RT enabled. The new callbacks are called under a special per-console lock which has already been added back in v6.7. It allows to distinguish three severities: normal, emergency, and panic. A context with a higher priority could take over the ownership when it is safe even in the middle of handling a record. The panic context could do it even when it is not safe. But it is allowed only for the final desperate flush before entering the infinite loop. The new lock helps to flush the messages directly in emergency and panic contexts. But it is not enough in all situations: - console_lock() is still need for synchronization against boot consoles. - con->device_lock() is need for synchronization against other operations on the same HW, e.g. serial port speed setting, non-printk related read/write. The dependency on con->device_lock() is mutual. Any code taking the driver specific lock has to acquire the related nbcon console context as well. For example, see the new uart_port_lock() API. It provides the necessary synchronization against emergency and panic contexts where the messages are flushed only under the new per-console lock. Maybe surprisingly, a quite tricky part is the decision how to flush the consoles in various situations. It has to take into account: - message priority: normal, emergency, panic - scheduling context: task, atomic, deferred_legacy - registered consoles: boot, legacy, nbcon - threads are running: early boot, suspend, shutdown, panic - caller: printk(), pr_flush(), printk_flush_in_panic(), console_unlock(), console_start(), ... The primary decision is made in printk_get_console_flush_type(). It creates a hint what the caller should do: - flush nbcon consoles directly or via the kthread - call the legacy loop (console_unlock()) directly or via irq_work The existing behavior is preserved for the legacy consoles. The only exception is that they are not longer flushed directly from printk() in panic() before CPUs are stopped. But this blocking happens only when at least one nbcon console is registered. The motivation is to increase a chance to produce the crash dump. They legacy consoles might create a deadlock in compare with nbcon consoles. The nbcon console should allow to see the messages even when the crash dump fails. There are three possible ways how nbcon consoles are flushed: - The per-nbcon-console kthread is responsible for flushing messages added with the normal priority. This is the default mode. - The legacy loop, aka console_unlock(), is used when there is still a boot console registered. There is no easy way how to match an early console driver with a nbcon console driver. And the console_lock() provides the only reliable serialization at the moment. The legacy loop uses either con->write_atomic() or con->write_thread() callbacks depending on whether it is allowed to schedule. The atomic variant has to be used from printk(). - In other situations, the messages are flushed directly using write_atomic() which can be called in any context, including NMI. It is primary needed during early boot or shutdown, in emergency situations, and panic. The emergency priority is used by a code called within nbcon_cpu_emergency_enter()/exit(). At the moment, it is used in four situations: WARN(), Oops, lockdep, and RCU stall reports. Finally, there is no nbcon console at the moment. It means that the changes should _not_ modify the existing behavior. The only exception is CONFIG_RT which would force offloading the legacy loop, for normal priority context, into the dedicated kthread" * tag 'printk-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (54 commits) printk: Avoid false positive lockdep report for legacy printing printk: nbcon: Assign nice -20 for printing threads printk: Implement legacy printer kthread for PREEMPT_RT tty: sysfs: Add nbcon support for 'active' proc: Add nbcon support for /proc/consoles proc: consoles: Add notation to c_start/c_stop printk: nbcon: Show replay message on takeover printk: Provide helper for message prepending printk: nbcon: Rely on kthreads for normal operation printk: nbcon: Use thread callback if in task context for legacy printk: nbcon: Relocate nbcon_atomic_emit_one() printk: nbcon: Introduce printer kthreads printk: nbcon: Init @nbcon_seq to highest possible printk: nbcon: Add context to usable() and emit() printk: Flush console on unregister_console() printk: Fail pr_flush() if before SYSTEM_SCHEDULING printk: nbcon: Add function for printers to reacquire ownership printk: nbcon: Use raw_cpu_ptr() instead of open coding printk: Use the BITS_PER_LONG macro lockdep: Mark emergency sections in lockdep splats ...
2024-09-09Merge branches 'context_tracking.15.08.24a', 'csd.lock.15.08.24a', ↵Neeraj Upadhyay
'nocb.09.09.24a', 'rcutorture.14.08.24a', 'rcustall.09.09.24a', 'srcu.12.08.24a', 'rcu.tasks.14.08.24a', 'rcu_scaling_tests.15.08.24a', 'fixes.12.08.24a' and 'misc.11.08.24a' into next.09.09.24a
2024-09-09rcu: Defer printing stall-warning backtrace when holding rcu_node lockPaul E. McKenney
The rcu_dump_cpu_stacks() holds the leaf rcu_node structure's ->lock when dumping the stakcks of any CPUs stalling the current grace period. This lock is held to prevent confusion that would otherwise occur when the stalled CPU reported its quiescent state (and then went on to do unrelated things) just as the backtrace NMI was heading towards it. This has worked well, but on larger systems has recently been observed to cause severe lock contention resulting in CSD-lock stalls and other general unhappiness. This commit therefore does printk_deferred_enter() before acquiring the lock and printk_deferred_exit() after releasing it, thus deferring the overhead of actually outputting the stack trace out of that lock's critical section. Reported-by: Rik van Riel <riel@surriel.com> Suggested-by: Rik van Riel <riel@surriel.com> Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-09-09rcu/nocb: Remove superfluous memory barrier after bypass enqueueFrederic Weisbecker
Pre-GP accesses performed by the update side must be ordered against post-GP accesses performed by the readers. This is ensured by the bypass or nocb locking on enqueue time, followed by the fully ordered rnp locking initiated while callbacks are accelerated, and then propagated throughout the whole GP lifecyle associated with the callbacks. Therefore the explicit barrier advertizing ordering between bypass enqueue and rcuo wakeup is superfluous. If anything, it would even only order the first bypass callback enqueue against the rcuo wakeup and ignore all the subsequent ones. Remove the needless barrier. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-09-09rcu/nocb: Conditionally wake up rcuo if not already waiting on GPFrederic Weisbecker
A callback enqueuer currently wakes up the rcuo kthread if it is adding the first non-done callback of a CPU, whether the kthread is waiting on a grace period or not (unless the CPU is offline). This looks like a desired behaviour because then the rcuo kthread doesn't wait for the end of the current grace period to handle the callback. It is accelerated right away and assigned to the next grace period. The GP kthread is notified about that fact and iterates with the upcoming GP without sleeping in-between. However this best-case scenario is contradicted by a few details, depending on the situation: 1) If the callback is a non-bypass one queued with IRQs enabled, the wake up only occurs if no other pending callbacks are on the list. Therefore the theoretical "optimization" actually applies on rare occasions. 2) If the callback is a non-bypass one queued with IRQs disabled, the situation is similar with even more uncertainty due to the deferred wake up. 3) If the callback is lazy, a few jiffies don't make any difference. 4) If the callback is bypass, the wake up timer is programmed 2 jiffies ahead by rcuo in case the regular pending queue has been handled in the meantime. The rare storm of callbacks can otherwise wait for the currently elapsing grace period to be flushed and handled. For all those reasons, the optimization is only theoretical and occasional. Therefore it is reasonable that callbacks enqueuers only wake up the rcuo kthread when it is not already waiting on a grace period to complete. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-09-09rcu/nocb: Fix RT throttling hrtimer armed from offline CPUFrederic Weisbecker
After a CPU is marked offline and until it reaches its final trip to idle, rcuo has several opportunities to be woken up, either because a callback has been queued in the meantime or because rcutree_report_cpu_dead() has issued the final deferred NOCB wake up. If RCU-boosting is enabled, RCU kthreads are set to SCHED_FIFO policy. And if RT-bandwidth is enabled, the related hrtimer might be armed. However this then happens after hrtimers have been migrated at the CPUHP_AP_HRTIMERS_DYING stage, which is broken as reported by the following warning: Call trace: enqueue_hrtimer+0x7c/0xf8 hrtimer_start_range_ns+0x2b8/0x300 enqueue_task_rt+0x298/0x3f0 enqueue_task+0x94/0x188 ttwu_do_activate+0xb4/0x27c try_to_wake_up+0x2d8/0x79c wake_up_process+0x18/0x28 __wake_nocb_gp+0x80/0x1a0 do_nocb_deferred_wakeup_common+0x3c/0xcc rcu_report_dead+0x68/0x1ac cpuhp_report_idle_dead+0x48/0x9c do_idle+0x288/0x294 cpu_startup_entry+0x34/0x3c secondary_start_kernel+0x138/0x158 Fix this with waking up rcuo using an IPI if necessary. Since the existing API to deal with this situation only handles swait queue, rcuo is only woken up from offline CPUs if it's not already waiting on a grace period. In the worst case some callbacks will just wait for a grace period to complete before being assigned to a subsequent one. Reported-by: "Cheng-Jui Wang (王正睿)" <Cheng-Jui.Wang@mediatek.com> Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-09-09rcu/nocb: Simplify (de-)offloading state machineFrederic Weisbecker
Now that the (de-)offloading process can only apply to offline CPUs, there is no more concurrency between rcu_core and nocb kthreads. Also the mutation now happens on empty queues. Therefore the state machine can be reduced to a single bit called SEGCBLIST_OFFLOADED. Simplify the transition as follows: * Upon offloading: queue the rdp to be added to the rcuog list and wait for the rcuog kthread to set the SEGCBLIST_OFFLOADED bit. Unpark rcuo kthread. * Upon de-offloading: Park rcuo kthread. Queue the rdp to be removed from the rcuog list and wait for the rcuog kthread to clear the SEGCBLIST_OFFLOADED bit. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-27rcu/kvfree: Add kvfree_rcu_barrier() APIUladzislau Rezki (Sony)
Add a kvfree_rcu_barrier() function. It waits until all in-flight pointers are freed over RCU machinery. It does not wait any GP completion and it is within its right to return immediately if there are no outstanding pointers. This function is useful when there is a need to guarantee that a memory is fully freed before destroying memory caches. For example, during unloading a kernel module. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-08-21rcu: Mark emergency sections in rcu stallsJohn Ogness
Mark emergency sections wherever multiple lines of rcu stall information are generated. In an emergency section, every printk() call will attempt to directly flush to the consoles using the EMERGENCY priority. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240820063001.36405-35-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-20softirq: Remove unused 'action' parameter from action callbackCaleb Sander Mateos
When soft interrupt actions are called, they are passed a pointer to the struct softirq action which contains the action's function pointer. This pointer isn't useful, as the action callback already knows what function it is. And since each callback handles a specific soft interrupt, the callback also knows which soft interrupt number is running. No soft interrupt action callback actually uses this parameter, so remove it from the function pointer signature. This clarifies that soft interrupt actions are global routines and makes it slightly cheaper to call them. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/all/20240815171549.3260003-1-csander@purestorage.com
2024-08-15rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()Valentin Schneider
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, replace "dyntick_idle" into "eqs" to drop the dyntick reference. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Rename rcu_implicit_dynticks_qs() into rcu_watching_snap_recheck()Valentin Schneider
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, drop the dyntick reference and update the name of this helper to express that it rechecks rdp->watching_snap after an earlier rcu_watching_snap_save(). Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Rename dyntick_save_progress_counter() into rcu_watching_snap_save()Valentin Schneider
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any meaning. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snapValentin Schneider
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the snapshot helpers are now prefix by "rcu_watching". Reflect that change into the storage variables for these snapshots. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Rename struct rcu_data .dynticks_snap into .watching_snapValentin Schneider
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the snapshot helpers are now prefix by "rcu_watching". Reflect that change into the storage variables for these snapshots. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs()Valentin Schneider
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()Valentin Schneider
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, the dynticks prefix can go. While at it, this helper is only meant to be called after failing an earlier call to rcu_watching_snap_in_eqs(), document this in the comments and add a WARN_ON_ONCE() for good measure. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()Valentin Schneider
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers. While at it, update a comment that still refers to rcu_dynticks_snap(), which was removed by commit: 7be2e6323b9b ("rcu: Remove full memory barrier on RCU stall printout") Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Rename rcu_dynticks_eqs_online() into rcu_watching_online()Valentin Schneider
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15context_tracking, rcu: Rename rcu_dynticks_curr_cpu_in_eqs() into ↵Valentin Schneider
rcu_is_watching_curr_cpu() The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers. Note that "watching" is the opposite of "in EQS", so the negation is lifted out of the helper and into the callsites. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15refscale: Constify struct ref_scale_opsChristophe JAILLET
'struct ref_scale_ops' are not modified in these drivers. Constifying this structure moves some data to a read-only section, so increase overall security. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 34231 4167 736 39134 98de kernel/rcu/refscale.o After: ===== text data bss dec hex filename 35175 3239 736 39150 98ee kernel/rcu/refscale.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcuscale: Count outstanding callbacks per-task rather than per-CPUPaul E. McKenney
The current rcu_scale_writer() asynchronous grace-period testing uses a per-CPU counter to track the number of outstanding callbacks. This is subject to CPU-imbalance errors when tasks migrate from one CPU to another between the time that the counter is incremented and the callback is queued, and additionally in kernels configured such that callbacks can be invoked on some CPU other than the one that queued it. This commit therefore arranges for per-task callback counts, thus avoiding any issues with migration of either tasks or callbacks. Reported-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcuscale: NULL out top-level pointers to heap memoryPaul E. McKenney
Currently, if someone modprobes and rmmods rcuscale successfully, but the next run errors out during the modprobe, non-NULL pointers to freed memory will remain. If the run after that also errors out during the modprobe, there will be double-free bugs. This commit therefore NULLs out top-level pointers to memory that has just been freed. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcuscale: Use special allocator for rcu_scale_writer()Paul E. McKenney
The rcu_scale_writer() function needs only a fixed number of rcu_head structures per kthread, which means that a trivial allocator suffices. This commit therefore uses an llist-based allocator using a fixed array of structures per kthread. This allows aggressive testing of RCU performance without stressing the slab allocators. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcuscale: Make rcu_scale_writer() tolerate repeated GFP_KERNEL failurePaul E. McKenney
Under some conditions, kmalloc(GFP_KERNEL) allocations have been observed to repeatedly fail. This situation has been observed to cause one of the rcu_scale_writer() instances to loop indefinitely retrying memory allocation for an asynchronous grace-period primitive. The problem is that if memory is short, all the other instances will allocate all available memory before the looping task is awakened from its rcu_barrier*() call. This in turn results in hangs, so that rcuscale fails to complete. This commit therefore removes the tight retry loop, so that when this condition occurs, the affected task is still passing through the full loop with its full set of termination checks. This spreads the risk of indefinite memory-allocation retry failures across all instances of rcu_scale_writer() tasks, which in turn prevents the hangs. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcuscale: Make all writer tasks report upon hangPaul E. McKenney
This commit causes all writer tasks to provide a brief report after a hang has been reported, spaced at one-second intervals. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcuscale: Provide clear error when async specified without primitivesPaul E. McKenney
Currently, if the rcuscale module's async module parameter is specified for RCU implementations that do not have async primitives such as RCU Tasks Rude (which now lacks a call_rcu_tasks_rude() function), there will be a series of splats due to calls to a NULL pointer. This commit therefore warns of this situation, but switches to non-async testing. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Let dump_cpu_task() be used without preemption disabledRyo Takakura
The commit 2d7f00b2f0130 ("rcu: Suppress smp_processor_id() complaint in synchronize_rcu_expedited_wait()") disabled preemption around dump_cpu_task() to suppress warning on its usage within preemtible context. Calling dump_cpu_task() doesn't required to be in non-preemptible context except for suppressing the smp_processor_id() warning. As the smp_processor_id() is evaluated along with in_hardirq() to check if it's in interrupt context, this patch removes the need for its preemtion disablement by reordering the condition so that smp_processor_id() only gets evaluated when it's in interrupt context. Signed-off-by: Ryo Takakura <takakura@valinux.co.jp> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Summarize expedited RCU CPU stall warnings during CSD-lock stallsPaul E. McKenney
During CSD-lock stalls, the additional information output by expedited RCU CPU stall warnings is usually redundant, flooding the console for not good reason. However, this has been the way things work for a few years. This commit therefore uses rcutree.csd_lock_suppress_rcu_stall kernel boot parameter that causes expedited RCU CPU stall warnings to be abbreviated to a single line when there is at least one CPU that has been stuck waiting for CSD lock for more than five seconds. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Extract synchronize_rcu_expedited_stall() from ↵Paul E. McKenney
synchronize_rcu_expedited_wait() This commit extracts the RCU CPU stall-warning report code from synchronize_rcu_expedited_wait() and places it in a new function named synchronize_rcu_expedited_stall(). This is strictly a code-movement commit. A later commit will use this reorganization to avoid printing expedited RCU CPU stall warnings while there are ongoing CSD-lock stall reports. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-15rcu: Summarize RCU CPU stall warnings during CSD-lock stallsPaul E. McKenney
During CSD-lock stalls, the additional information output by RCU CPU stall warnings is usually redundant, flooding the console for not good reason. However, this has been the way things work for a few years. This commit therefore adds an rcutree.csd_lock_suppress_rcu_stall kernel boot parameter that causes RCU CPU stall warnings to be abbreviated to a single line when there is at least one CPU that has been stuck waiting for CSD lock for more than five seconds. To make this abbreviated message happen with decent probability: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 8 \ --configs "2*TREE01" --kconfig "CONFIG_CSD_LOCK_WAIT_DEBUG=y" \ --bootargs "csdlock_debug=1 rcutorture.stall_cpu=200 \ rcutorture.stall_cpu_holdoff=120 rcutorture.stall_cpu_irqsoff=1 \ rcutree.csd_lock_suppress_rcu_stall=1 \ rcupdate.rcu_exp_cpu_stall_timeout=5000" --trust-make [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcuscale: Print detailed grace-period and barrier diagnosticsPaul E. McKenney
This commit uses the new rcu_tasks_torture_stats_print(), rcu_tasks_trace_torture_stats_print(), and rcu_tasks_rude_torture_stats_print() functions in order to provide detailed diagnostics on grace-period, callback, and barrier state when rcu_scale_writer() hangs. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcu: Mark callbacks not currently participating in barrier operationPaul E. McKenney
RCU keeps a count of the number of callbacks that the current rcu_barrier() is waiting on, but there is currently no easy way to work out which callback is stuck. One way to do this is to mark idle RCU-barrier callbacks by making the ->next pointer point to the callback itself, and this commit does just that. Later commits will use this for debug output. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcuscale: Dump grace-period statistics when rcu_scale_writer() stallsPaul E. McKenney
This commit adds a .stats function pointer to the rcu_scale_ops structure, and if this is non-NULL, it is invoked after stack traces are dumped in response to a rcu_scale_writer() stall. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcuscale: Dump stacks of stalled rcu_scale_writer() instancesPaul E. McKenney
This commit improves debuggability by dumping the stacks of rcu_scale_writer() instances that have not completed in a reasonable timeframe. These stacks are dumped remotely, but they will be accurate in the thus-far common case where the stalled rcu_scale_writer() instances are blocked. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcuscale: Save a few lines with whitespace-only changePaul E. McKenney
This whitespace-only commit fuses a few lines of code, taking advantage of the newish 100-character-per-line limit to save a few lines of code. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14refscale: Optimize process_durations()Christophe JAILLET
process_durations() is not a hot path, but there is no good reason to iterate over and over the data already in 'buf'. Using a seq_buf saves some useless strcat() and the need of a temp buffer. Data is written directly at the correct place. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: "Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcu/tasks: Add rcu_barrier_tasks*() start time to diagnosticsPaul E. McKenney
This commit adds the start time, in jiffies, of the most recently started rcu_barrier_tasks*() operation to the diagnostic output used by rcuscale. This information can be helpful in distinguishing a hung barrier operation from a long series of barrier operations. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcu/tasks: Add detailed grace-period and barrier diagnosticsPaul E. McKenney
This commit adds rcu_tasks_torture_stats_print(), rcu_tasks_trace_torture_stats_print(), and rcu_tasks_rude_torture_stats_print() functions that provide detailed diagnostics on grace-period, callback, and barrier state. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcu/tasks: Mark callbacks not currently participating in barrier operationPaul E. McKenney
Each Tasks RCU flavor keeps a count of the number of callbacks that the current rcu_barrier_tasks*() is waiting on, but there is currently no easy way to work out which callback is stuck. One way to do this is to mark idle RCU-barrier callbacks by making the ->next pointer point to the callback itself, and this commit does just that. Later commits will use this for debug output. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcu: Provide rcu_barrier_cb_is_done() to check rcu_barrier() CBsPaul E. McKenney
This commit provides a rcu_barrier_cb_is_done() function that returns true if the *rcu_barrier*() callback passed in is done. This will be used when printing grace-period debugging information. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcu/tasks: Update rtp->tasks_gp_seq commentPaul E. McKenney
The rtp->tasks_gp_seq grace-period sequence number is not a strict count, but rather the usual RCU sequence number with the lower few bits tracking per-grace-period state and the upper bits the count of grace periods since boot, give or take the initial value. This commit therefore adjusts this comment. Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcu/tasks: Check processor-ID assumptionsPaul E. McKenney
The current mapping of smp_processor_id() to a CPU processing Tasks-RCU callbacks makes some assumptions about layout. This commit therefore adds a WARN_ON() to check these assumptions. [ neeraj.upadhyay: Replace nr_cpu_ids with rcu_task_cpu_ids. ] Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcu-tasks: Fix access non-existent percpu rtpcp variable in ↵Zqiang
rcu_tasks_need_gpcb() For kernels built with CONFIG_FORCE_NR_CPUS=y, the nr_cpu_ids is defined as NR_CPUS instead of the number of possible cpus, this will cause the following system panic: smpboot: Allowing 4 CPUs, 0 hotplug CPUs ... setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:512 nr_node_ids:1 ... BUG: unable to handle page fault for address: ffffffff9911c8c8 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 15 Comm: rcu_tasks_trace Tainted: G W 6.6.21 #1 5dc7acf91a5e8e9ac9dcfc35bee0245691283ea6 RIP: 0010:rcu_tasks_need_gpcb+0x25d/0x2c0 RSP: 0018:ffffa371c00a3e60 EFLAGS: 00010082 CR2: ffffffff9911c8c8 CR3: 000000040fa20005 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x23/0x80 ? page_fault_oops+0xa4/0x180 ? exc_page_fault+0x152/0x180 ? asm_exc_page_fault+0x26/0x40 ? rcu_tasks_need_gpcb+0x25d/0x2c0 ? __pfx_rcu_tasks_kthread+0x40/0x40 rcu_tasks_one_gp+0x69/0x180 rcu_tasks_kthread+0x94/0xc0 kthread+0xe8/0x140 ? __pfx_kthread+0x40/0x40 ret_from_fork+0x34/0x80 ? __pfx_kthread+0x40/0x40 ret_from_fork_asm+0x1b/0x80 </TASK> Considering that there may be holes in the CPU numbers, use the maximum possible cpu number, instead of nr_cpu_ids, for configuring enqueue and dequeue limits. [ neeraj.upadhyay: Fix htmldocs build error reported by Stephen Rothwell ] Closes: https://lore.kernel.org/linux-input/CALMA0xaTSMN+p4xUXkzrtR5r6k7hgoswcaXx7baR_z9r5jjskw@mail.gmail.com/T/#u Reported-by: Zhixu Liu <zhixu.liu@gmail.com> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcu-tasks: Remove RCU Tasks Rude asynchronous APIsPaul E. McKenney
The call_rcu_tasks_rude() and rcu_barrier_tasks_rude() APIs are currently unused. This commit therefore removes their definitions and boot-time self-tests. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcuscale: Stop testing RCU Tasks Rude asynchronous APIsPaul E. McKenney
The call_rcu_tasks_rude() and rcu_barrier_tasks_rude() APIs are currently unused. Furthermore, the idea is to get rid of RCU Tasks Rude entirely once all architectures have their deep-idle and entry/exit code correctly marked as inline or noinstr. As a step towards this goal, this commit therefore removes these two functions from rcuscale testing. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcutorture: Stop testing RCU Tasks Rude asynchronous APIsPaul E. McKenney
The call_rcu_tasks_rude() and rcu_barrier_tasks_rude() APIs are currently unused. Furthermore, the idea is to get rid of RCU Tasks Rude entirely once all architectures have their deep-idle and entry/exit code correctly marked as inline or noinstr. As a first step towards this goal, this commit therefore removes these two functions from rcutorture testing. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcutorture: Add a stall_cpu_repeat module parameterPaul E. McKenney
This commit adds an stall_cpu_repeat kernel, which is also the rcutorture.stall_cpu_repeat boot parameter, to test repeated CPU stalls. Note that only the first stall will pay attention to the stall_cpu_irqsoff module parameter. For the second and subsequent stalls, interrupts will be enabled. This is helpful when testing the interaction between RCU CPU stall warnings and CSD-lock stall warnings. Reported-by: Rik van Riel <riel@surriel.com> Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>