diff options
Diffstat (limited to 'Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html')
-rw-r--r-- | Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html | 668 |
1 files changed, 0 insertions, 668 deletions
diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html deleted file mode 100644 index 57300db4b5ff..000000000000 --- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html +++ /dev/null @@ -1,668 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" - "http://www.w3.org/TR/html4/loose.dtd"> - <html> - <head><title>A Tour Through TREE_RCU's Expedited Grace Periods</title> - <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> - -<h2>Introduction</h2> - -This document describes RCU's expedited grace periods. -Unlike RCU's normal grace periods, which accept long latencies to attain -high efficiency and minimal disturbance, expedited grace periods accept -lower efficiency and significant disturbance to attain shorter latencies. - -<p> -There are two flavors of RCU (RCU-preempt and RCU-sched), with an earlier -third RCU-bh flavor having been implemented in terms of the other two. -Each of the two implementations is covered in its own section. - -<ol> -<li> <a href="#Expedited Grace Period Design"> - Expedited Grace Period Design</a> -<li> <a href="#RCU-preempt Expedited Grace Periods"> - RCU-preempt Expedited Grace Periods</a> -<li> <a href="#RCU-sched Expedited Grace Periods"> - RCU-sched Expedited Grace Periods</a> -<li> <a href="#Expedited Grace Period and CPU Hotplug"> - Expedited Grace Period and CPU Hotplug</a> -<li> <a href="#Expedited Grace Period Refinements"> - Expedited Grace Period Refinements</a> -</ol> - -<h2><a name="Expedited Grace Period Design"> -Expedited Grace Period Design</a></h2> - -<p> -The expedited RCU grace periods cannot be accused of being subtle, -given that they for all intents and purposes hammer every CPU that -has not yet provided a quiescent state for the current expedited -grace period. -The one saving grace is that the hammer has grown a bit smaller -over time: The old call to <tt>try_stop_cpus()</tt> has been -replaced with a set of calls to <tt>smp_call_function_single()</tt>, -each of which results in an IPI to the target CPU. -The corresponding handler function checks the CPU's state, motivating -a faster quiescent state where possible, and triggering a report -of that quiescent state. -As always for RCU, once everything has spent some time in a quiescent -state, the expedited grace period has completed. - -<p> -The details of the <tt>smp_call_function_single()</tt> handler's -operation depend on the RCU flavor, as described in the following -sections. - -<h2><a name="RCU-preempt Expedited Grace Periods"> -RCU-preempt Expedited Grace Periods</a></h2> - -<p> -<tt>CONFIG_PREEMPT=y</tt> kernels implement RCU-preempt. -The overall flow of the handling of a given CPU by an RCU-preempt -expedited grace period is shown in the following diagram: - -<p><img src="ExpRCUFlow.svg" alt="ExpRCUFlow.svg" width="55%"> - -<p> -The solid arrows denote direct action, for example, a function call. -The dotted arrows denote indirect action, for example, an IPI -or a state that is reached after some time. - -<p> -If a given CPU is offline or idle, <tt>synchronize_rcu_expedited()</tt> -will ignore it because idle and offline CPUs are already residing -in quiescent states. -Otherwise, the expedited grace period will use -<tt>smp_call_function_single()</tt> to send the CPU an IPI, which -is handled by <tt>rcu_exp_handler()</tt>. - -<p> -However, because this is preemptible RCU, <tt>rcu_exp_handler()</tt> -can check to see if the CPU is currently running in an RCU read-side -critical section. -If not, the handler can immediately report a quiescent state. -Otherwise, it sets flags so that the outermost <tt>rcu_read_unlock()</tt> -invocation will provide the needed quiescent-state report. -This flag-setting avoids the previous forced preemption of all -CPUs that might have RCU read-side critical sections. -In addition, this flag-setting is done so as to avoid increasing -the overhead of the common-case fastpath through the scheduler. - -<p> -Again because this is preemptible RCU, an RCU read-side critical section -can be preempted. -When that happens, RCU will enqueue the task, which will the continue to -block the current expedited grace period until it resumes and finds its -outermost <tt>rcu_read_unlock()</tt>. -The CPU will report a quiescent state just after enqueuing the task because -the CPU is no longer blocking the grace period. -It is instead the preempted task doing the blocking. -The list of blocked tasks is managed by <tt>rcu_preempt_ctxt_queue()</tt>, -which is called from <tt>rcu_preempt_note_context_switch()</tt>, which -in turn is called from <tt>rcu_note_context_switch()</tt>, which in -turn is called from the scheduler. - -<table> -<tr><th> </th></tr> -<tr><th align="left">Quick Quiz:</th></tr> -<tr><td> - Why not just have the expedited grace period check the - state of all the CPUs? - After all, that would avoid all those real-time-unfriendly IPIs. -</td></tr> -<tr><th align="left">Answer:</th></tr> -<tr><td bgcolor="#ffffff"><font color="ffffff"> - Because we want the RCU read-side critical sections to run fast, - which means no memory barriers. - Therefore, it is not possible to safely check the state from some - other CPU. - And even if it was possible to safely check the state, it would - still be necessary to IPI the CPU to safely interact with the - upcoming <tt>rcu_read_unlock()</tt> invocation, which means that - the remote state testing would not help the worst-case - latency that real-time applications care about. - - <p><font color="ffffff">One way to prevent your real-time - application from getting hit with these IPIs is to - build your kernel with <tt>CONFIG_NO_HZ_FULL=y</tt>. - RCU would then perceive the CPU running your application - as being idle, and it would be able to safely detect that - state without needing to IPI the CPU. -</font></td></tr> -<tr><td> </td></tr> -</table> - -<p> -Please note that this is just the overall flow: -Additional complications can arise due to races with CPUs going idle -or offline, among other things. - -<h2><a name="RCU-sched Expedited Grace Periods"> -RCU-sched Expedited Grace Periods</a></h2> - -<p> -<tt>CONFIG_PREEMPT=n</tt> kernels implement RCU-sched. -The overall flow of the handling of a given CPU by an RCU-sched -expedited grace period is shown in the following diagram: - -<p><img src="ExpSchedFlow.svg" alt="ExpSchedFlow.svg" width="55%"> - -<p> -As with RCU-preempt, RCU-sched's -<tt>synchronize_rcu_expedited()</tt> ignores offline and -idle CPUs, again because they are in remotely detectable -quiescent states. -However, because the -<tt>rcu_read_lock_sched()</tt> and <tt>rcu_read_unlock_sched()</tt> -leave no trace of their invocation, in general it is not possible to tell -whether or not the current CPU is in an RCU read-side critical section. -The best that RCU-sched's <tt>rcu_exp_handler()</tt> can do is to check -for idle, on the off-chance that the CPU went idle while the IPI -was in flight. -If the CPU is idle, then <tt>rcu_exp_handler()</tt> reports -the quiescent state. - -<p> Otherwise, the handler forces a future context switch by setting the -NEED_RESCHED flag of the current task's thread flag and the CPU preempt -counter. -At the time of the context switch, the CPU reports the quiescent state. -Should the CPU go offline first, it will report the quiescent state -at that time. - -<h2><a name="Expedited Grace Period and CPU Hotplug"> -Expedited Grace Period and CPU Hotplug</a></h2> - -<p> -The expedited nature of expedited grace periods require a much tighter -interaction with CPU hotplug operations than is required for normal -grace periods. -In addition, attempting to IPI offline CPUs will result in splats, but -failing to IPI online CPUs can result in too-short grace periods. -Neither option is acceptable in production kernels. - -<p> -The interaction between expedited grace periods and CPU hotplug operations -is carried out at several levels: - -<ol> -<li> The number of CPUs that have ever been online is tracked - by the <tt>rcu_state</tt> structure's <tt>->ncpus</tt> - field. - The <tt>rcu_state</tt> structure's <tt>->ncpus_snap</tt> - field tracks the number of CPUs that have ever been online - at the beginning of an RCU expedited grace period. - Note that this number never decreases, at least in the absence - of a time machine. -<li> The identities of the CPUs that have ever been online is - tracked by the <tt>rcu_node</tt> structure's - <tt>->expmaskinitnext</tt> field. - The <tt>rcu_node</tt> structure's <tt>->expmaskinit</tt> - field tracks the identities of the CPUs that were online - at least once at the beginning of the most recent RCU - expedited grace period. - The <tt>rcu_state</tt> structure's <tt>->ncpus</tt> and - <tt>->ncpus_snap</tt> fields are used to detect when - new CPUs have come online for the first time, that is, - when the <tt>rcu_node</tt> structure's <tt>->expmaskinitnext</tt> - field has changed since the beginning of the last RCU - expedited grace period, which triggers an update of each - <tt>rcu_node</tt> structure's <tt>->expmaskinit</tt> - field from its <tt>->expmaskinitnext</tt> field. -<li> Each <tt>rcu_node</tt> structure's <tt>->expmaskinit</tt> - field is used to initialize that structure's - <tt>->expmask</tt> at the beginning of each RCU - expedited grace period. - This means that only those CPUs that have been online at least - once will be considered for a given grace period. -<li> Any CPU that goes offline will clear its bit in its leaf - <tt>rcu_node</tt> structure's <tt>->qsmaskinitnext</tt> - field, so any CPU with that bit clear can safely be ignored. - However, it is possible for a CPU coming online or going offline - to have this bit set for some time while <tt>cpu_online</tt> - returns <tt>false</tt>. -<li> For each non-idle CPU that RCU believes is currently online, the grace - period invokes <tt>smp_call_function_single()</tt>. - If this succeeds, the CPU was fully online. - Failure indicates that the CPU is in the process of coming online - or going offline, in which case it is necessary to wait for a - short time period and try again. - The purpose of this wait (or series of waits, as the case may be) - is to permit a concurrent CPU-hotplug operation to complete. -<li> In the case of RCU-sched, one of the last acts of an outgoing CPU - is to invoke <tt>rcu_report_dead()</tt>, which - reports a quiescent state for that CPU. - However, this is likely paranoia-induced redundancy. <!-- @@@ --> -</ol> - -<table> -<tr><th> </th></tr> -<tr><th align="left">Quick Quiz:</th></tr> -<tr><td> - Why all the dancing around with multiple counters and masks - tracking CPUs that were once online? - Why not just have a single set of masks tracking the currently - online CPUs and be done with it? -</td></tr> -<tr><th align="left">Answer:</th></tr> -<tr><td bgcolor="#ffffff"><font color="ffffff"> - Maintaining single set of masks tracking the online CPUs <i>sounds</i> - easier, at least until you try working out all the race conditions - between grace-period initialization and CPU-hotplug operations. - For example, suppose initialization is progressing down the - tree while a CPU-offline operation is progressing up the tree. - This situation can result in bits set at the top of the tree - that have no counterparts at the bottom of the tree. - Those bits will never be cleared, which will result in - grace-period hangs. - In short, that way lies madness, to say nothing of a great many - bugs, hangs, and deadlocks. - - <p><font color="ffffff"> - In contrast, the current multi-mask multi-counter scheme ensures - that grace-period initialization will always see consistent masks - up and down the tree, which brings significant simplifications - over the single-mask method. - - <p><font color="ffffff"> - This is an instance of - <a href="http://www.cs.columbia.edu/~library/TR-repository/reports/reports-1992/cucs-039-92.ps.gz"><font color="ffffff"> - deferring work in order to avoid synchronization</a>. - Lazily recording CPU-hotplug events at the beginning of the next - grace period greatly simplifies maintenance of the CPU-tracking - bitmasks in the <tt>rcu_node</tt> tree. -</font></td></tr> -<tr><td> </td></tr> -</table> - -<h2><a name="Expedited Grace Period Refinements"> -Expedited Grace Period Refinements</a></h2> - -<ol> -<li> <a href="#Idle-CPU Checks">Idle-CPU checks</a>. -<li> <a href="#Batching via Sequence Counter"> - Batching via sequence counter</a>. -<li> <a href="#Funnel Locking and Wait/Wakeup"> - Funnel locking and wait/wakeup</a>. -<li> <a href="#Use of Workqueues">Use of Workqueues</a>. -<li> <a href="#Stall Warnings">Stall warnings</a>. -<li> <a href="#Mid-Boot Operation">Mid-boot operation</a>. -</ol> - -<h3><a name="Idle-CPU Checks">Idle-CPU Checks</a></h3> - -<p> -Each expedited grace period checks for idle CPUs when initially forming -the mask of CPUs to be IPIed and again just before IPIing a CPU -(both checks are carried out by <tt>sync_rcu_exp_select_cpus()</tt>). -If the CPU is idle at any time between those two times, the CPU will -not be IPIed. -Instead, the task pushing the grace period forward will include the -idle CPUs in the mask passed to <tt>rcu_report_exp_cpu_mult()</tt>. - -<p> -For RCU-sched, there is an additional check: -If the IPI has interrupted the idle loop, then -<tt>rcu_exp_handler()</tt> invokes <tt>rcu_report_exp_rdp()</tt> -to report the corresponding quiescent state. - -<p> -For RCU-preempt, there is no specific check for idle in the -IPI handler (<tt>rcu_exp_handler()</tt>), but because -RCU read-side critical sections are not permitted within the -idle loop, if <tt>rcu_exp_handler()</tt> sees that the CPU is within -RCU read-side critical section, the CPU cannot possibly be idle. -Otherwise, <tt>rcu_exp_handler()</tt> invokes -<tt>rcu_report_exp_rdp()</tt> to report the corresponding quiescent -state, regardless of whether or not that quiescent state was due to -the CPU being idle. - -<p> -In summary, RCU expedited grace periods check for idle when building -the bitmask of CPUs that must be IPIed, just before sending each IPI, -and (either explicitly or implicitly) within the IPI handler. - -<h3><a name="Batching via Sequence Counter"> -Batching via Sequence Counter</a></h3> - -<p> -If each grace-period request was carried out separately, expedited -grace periods would have abysmal scalability and -problematic high-load characteristics. -Because each grace-period operation can serve an unlimited number of -updates, it is important to <i>batch</i> requests, so that a single -expedited grace-period operation will cover all requests in the -corresponding batch. - -<p> -This batching is controlled by a sequence counter named -<tt>->expedited_sequence</tt> in the <tt>rcu_state</tt> structure. -This counter has an odd value when there is an expedited grace period -in progress and an even value otherwise, so that dividing the counter -value by two gives the number of completed grace periods. -During any given update request, the counter must transition from -even to odd and then back to even, thus indicating that a grace -period has elapsed. -Therefore, if the initial value of the counter is <tt>s</tt>, -the updater must wait until the counter reaches at least the -value <tt>(s+3)&~0x1</tt>. -This counter is managed by the following access functions: - -<ol> -<li> <tt>rcu_exp_gp_seq_start()</tt>, which marks the start of - an expedited grace period. -<li> <tt>rcu_exp_gp_seq_end()</tt>, which marks the end of an - expedited grace period. -<li> <tt>rcu_exp_gp_seq_snap()</tt>, which obtains a snapshot of - the counter. -<li> <tt>rcu_exp_gp_seq_done()</tt>, which returns <tt>true</tt> - if a full expedited grace period has elapsed since the - corresponding call to <tt>rcu_exp_gp_seq_snap()</tt>. -</ol> - -<p> -Again, only one request in a given batch need actually carry out -a grace-period operation, which means there must be an efficient -way to identify which of many concurrent reqeusts will initiate -the grace period, and that there be an efficient way for the -remaining requests to wait for that grace period to complete. -However, that is the topic of the next section. - -<h3><a name="Funnel Locking and Wait/Wakeup"> -Funnel Locking and Wait/Wakeup</a></h3> - -<p> -The natural way to sort out which of a batch of updaters will initiate -the expedited grace period is to use the <tt>rcu_node</tt> combining -tree, as implemented by the <tt>exp_funnel_lock()</tt> function. -The first updater corresponding to a given grace period arriving -at a given <tt>rcu_node</tt> structure records its desired grace-period -sequence number in the <tt>->exp_seq_rq</tt> field and moves up -to the next level in the tree. -Otherwise, if the <tt>->exp_seq_rq</tt> field already contains -the sequence number for the desired grace period or some later one, -the updater blocks on one of four wait queues in the -<tt>->exp_wq[]</tt> array, using the second-from-bottom -and third-from bottom bits as an index. -An <tt>->exp_lock</tt> field in the <tt>rcu_node</tt> structure -synchronizes access to these fields. - -<p> -An empty <tt>rcu_node</tt> tree is shown in the following diagram, -with the white cells representing the <tt>->exp_seq_rq</tt> field -and the red cells representing the elements of the -<tt>->exp_wq[]</tt> array. - -<p><img src="Funnel0.svg" alt="Funnel0.svg" width="75%"> - -<p> -The next diagram shows the situation after the arrival of Task A -and Task B at the leftmost and rightmost leaf <tt>rcu_node</tt> -structures, respectively. -The current value of the <tt>rcu_state</tt> structure's -<tt>->expedited_sequence</tt> field is zero, so adding three and -clearing the bottom bit results in the value two, which both tasks -record in the <tt>->exp_seq_rq</tt> field of their respective -<tt>rcu_node</tt> structures: - -<p><img src="Funnel1.svg" alt="Funnel1.svg" width="75%"> - -<p> -Each of Tasks A and B will move up to the root -<tt>rcu_node</tt> structure. -Suppose that Task A wins, recording its desired grace-period sequence -number and resulting in the state shown below: - -<p><img src="Funnel2.svg" alt="Funnel2.svg" width="75%"> - -<p> -Task A now advances to initiate a new grace period, while Task B -moves up to the root <tt>rcu_node</tt> structure, and, seeing that -its desired sequence number is already recorded, blocks on -<tt>->exp_wq[1]</tt>. - -<table> -<tr><th> </th></tr> -<tr><th align="left">Quick Quiz:</th></tr> -<tr><td> - Why <tt>->exp_wq[1]</tt>? - Given that the value of these tasks' desired sequence number is - two, so shouldn't they instead block on <tt>->exp_wq[2]</tt>? -</td></tr> -<tr><th align="left">Answer:</th></tr> -<tr><td bgcolor="#ffffff"><font color="ffffff"> - No. - - <p><font color="ffffff"> - Recall that the bottom bit of the desired sequence number indicates - whether or not a grace period is currently in progress. - It is therefore necessary to shift the sequence number right one - bit position to obtain the number of the grace period. - This results in <tt>->exp_wq[1]</tt>. -</font></td></tr> -<tr><td> </td></tr> -</table> - -<p> -If Tasks C and D also arrive at this point, they will compute the -same desired grace-period sequence number, and see that both leaf -<tt>rcu_node</tt> structures already have that value recorded. -They will therefore block on their respective <tt>rcu_node</tt> -structures' <tt>->exp_wq[1]</tt> fields, as shown below: - -<p><img src="Funnel3.svg" alt="Funnel3.svg" width="75%"> - -<p> -Task A now acquires the <tt>rcu_state</tt> structure's -<tt>->exp_mutex</tt> and initiates the grace period, which -increments <tt>->expedited_sequence</tt>. -Therefore, if Tasks E and F arrive, they will compute -a desired sequence number of 4 and will record this value as -shown below: - -<p><img src="Funnel4.svg" alt="Funnel4.svg" width="75%"> - -<p> -Tasks E and F will propagate up the <tt>rcu_node</tt> -combining tree, with Task F blocking on the root <tt>rcu_node</tt> -structure and Task E wait for Task A to finish so that -it can start the next grace period. -The resulting state is as shown below: - -<p><img src="Funnel5.svg" alt="Funnel5.svg" width="75%"> - -<p> -Once the grace period completes, Task A -starts waking up the tasks waiting for this grace period to complete, -increments the <tt>->expedited_sequence</tt>, -acquires the <tt>->exp_wake_mutex</tt> and then releases the -<tt>->exp_mutex</tt>. -This results in the following state: - -<p><img src="Funnel6.svg" alt="Funnel6.svg" width="75%"> - -<p> -Task E can then acquire <tt>->exp_mutex</tt> and increment -<tt>->expedited_sequence</tt> to the value three. -If new tasks G and H arrive and moves up the combining tree at the -same time, the state will be as follows: - -<p><img src="Funnel7.svg" alt="Funnel7.svg" width="75%"> - -<p> -Note that three of the root <tt>rcu_node</tt> structure's -waitqueues are now occupied. -However, at some point, Task A will wake up the -tasks blocked on the <tt>->exp_wq</tt> waitqueues, resulting -in the following state: - -<p><img src="Funnel8.svg" alt="Funnel8.svg" width="75%"> - -<p> -Execution will continue with Tasks E and H completing -their grace periods and carrying out their wakeups. - -<table> -<tr><th> </th></tr> -<tr><th align="left">Quick Quiz:</th></tr> -<tr><td> - What happens if Task A takes so long to do its wakeups - that Task E's grace period completes? -</td></tr> -<tr><th align="left">Answer:</th></tr> -<tr><td bgcolor="#ffffff"><font color="ffffff"> - Then Task E will block on the <tt>->exp_wake_mutex</tt>, - which will also prevent it from releasing <tt>->exp_mutex</tt>, - which in turn will prevent the next grace period from starting. - This last is important in preventing overflow of the - <tt>->exp_wq[]</tt> array. -</font></td></tr> -<tr><td> </td></tr> -</table> - -<h3><a name="Use of Workqueues">Use of Workqueues</a></h3> - -<p> -In earlier implementations, the task requesting the expedited -grace period also drove it to completion. -This straightforward approach had the disadvantage of needing to -account for POSIX signals sent to user tasks, -so more recent implemementations use the Linux kernel's -<a href="https://www.kernel.org/doc/Documentation/core-api/workqueue.rst">workqueues</a>. - -<p> -The requesting task still does counter snapshotting and funnel-lock -processing, but the task reaching the top of the funnel lock -does a <tt>schedule_work()</tt> (from <tt>_synchronize_rcu_expedited()</tt> -so that a workqueue kthread does the actual grace-period processing. -Because workqueue kthreads do not accept POSIX signals, grace-period-wait -processing need not allow for POSIX signals. - -In addition, this approach allows wakeups for the previous expedited -grace period to be overlapped with processing for the next expedited -grace period. -Because there are only four sets of waitqueues, it is necessary to -ensure that the previous grace period's wakeups complete before the -next grace period's wakeups start. -This is handled by having the <tt>->exp_mutex</tt> -guard expedited grace-period processing and the -<tt>->exp_wake_mutex</tt> guard wakeups. -The key point is that the <tt>->exp_mutex</tt> is not released -until the first wakeup is complete, which means that the -<tt>->exp_wake_mutex</tt> has already been acquired at that point. -This approach ensures that the previous grace period's wakeups can -be carried out while the current grace period is in process, but -that these wakeups will complete before the next grace period starts. -This means that only three waitqueues are required, guaranteeing that -the four that are provided are sufficient. - -<h3><a name="Stall Warnings">Stall Warnings</a></h3> - -<p> -Expediting grace periods does nothing to speed things up when RCU -readers take too long, and therefore expedited grace periods check -for stalls just as normal grace periods do. - -<table> -<tr><th> </th></tr> -<tr><th align="left">Quick Quiz:</th></tr> -<tr><td> - But why not just let the normal grace-period machinery - detect the stalls, given that a given reader must block - both normal and expedited grace periods? -</td></tr> -<tr><th align="left">Answer:</th></tr> -<tr><td bgcolor="#ffffff"><font color="ffffff"> - Because it is quite possible that at a given time there - is no normal grace period in progress, in which case the - normal grace period cannot emit a stall warning. -</font></td></tr> -<tr><td> </td></tr> -</table> - -The <tt>synchronize_sched_expedited_wait()</tt> function loops waiting -for the expedited grace period to end, but with a timeout set to the -current RCU CPU stall-warning time. -If this time is exceeded, any CPUs or <tt>rcu_node</tt> structures -blocking the current grace period are printed. -Each stall warning results in another pass through the loop, but the -second and subsequent passes use longer stall times. - -<h3><a name="Mid-Boot Operation">Mid-boot operation</a></h3> - -<p> -The use of workqueues has the advantage that the expedited -grace-period code need not worry about POSIX signals. -Unfortunately, it has the -corresponding disadvantage that workqueues cannot be used until -they are initialized, which does not happen until some time after -the scheduler spawns the first task. -Given that there are parts of the kernel that really do want to -execute grace periods during this mid-boot “dead zone”, -expedited grace periods must do something else during thie time. - -<p> -What they do is to fall back to the old practice of requiring that the -requesting task drive the expedited grace period, as was the case -before the use of workqueues. -However, the requesting task is only required to drive the grace period -during the mid-boot dead zone. -Before mid-boot, a synchronous grace period is a no-op. -Some time after mid-boot, workqueues are used. - -<p> -Non-expedited non-SRCU synchronous grace periods must also operate -normally during mid-boot. -This is handled by causing non-expedited grace periods to take the -expedited code path during mid-boot. - -<p> -The current code assumes that there are no POSIX signals during -the mid-boot dead zone. -However, if an overwhelming need for POSIX signals somehow arises, -appropriate adjustments can be made to the expedited stall-warning code. -One such adjustment would reinstate the pre-workqueue stall-warning -checks, but only during the mid-boot dead zone. - -<p> -With this refinement, synchronous grace periods can now be used from -task context pretty much any time during the life of the kernel. -That is, aside from some points in the suspend, hibernate, or shutdown -code path. - -<h3><a name="Summary"> -Summary</a></h3> - -<p> -Expedited grace periods use a sequence-number approach to promote -batching, so that a single grace-period operation can serve numerous -requests. -A funnel lock is used to efficiently identify the one task out of -a concurrent group that will request the grace period. -All members of the group will block on waitqueues provided in -the <tt>rcu_node</tt> structure. -The actual grace-period processing is carried out by a workqueue. - -<p> -CPU-hotplug operations are noted lazily in order to prevent the need -for tight synchronization between expedited grace periods and -CPU-hotplug operations. -The dyntick-idle counters are used to avoid sending IPIs to idle CPUs, -at least in the common case. -RCU-preempt and RCU-sched use different IPI handlers and different -code to respond to the state changes carried out by those handlers, -but otherwise use common code. - -<p> -Quiescent states are tracked using the <tt>rcu_node</tt> tree, -and once all necessary quiescent states have been reported, -all tasks waiting on this expedited grace period are awakened. -A pair of mutexes are used to allow one grace period's wakeups -to proceed concurrently with the next grace period's processing. - -<p> -This combination of mechanisms allows expedited grace periods to -run reasonably efficiently. -However, for non-time-critical tasks, normal grace periods should be -used instead because their longer duration permits much higher -degrees of batching, and thus much lower per-request overheads. - -</body></html> |