Skip to content

Commit de4d195

Browse files
committedMay 10, 2017
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar: "The main changes are: - Debloat RCU headers - Parallelize SRCU callback handling (plus overlapping patches) - Improve the performance of Tree SRCU on a CPU-hotplug stress test - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Open-code the rcu_cblist_n_lazy_cbs() function rcu: Open-code the rcu_cblist_n_cbs() function rcu: Open-code the rcu_cblist_empty() function rcu: Separately compile large rcu_segcblist functions srcu: Debloat the <linux/rcu_segcblist.h> header srcu: Adjust default auto-expediting holdoff srcu: Specify auto-expedite holdoff time srcu: Expedite first synchronize_srcu() when idle srcu: Expedited grace periods with reduced memory contention srcu: Make rcutorture writer stalls print SRCU GP state srcu: Exact tracking of srcu_data structures containing callbacks srcu: Make SRCU be built by default srcu: Fix Kconfig botch when SRCU not selected rcu: Make non-preemptive schedule be Tasks RCU quiescent state srcu: Expedite srcu_schedule_cbs_snp() callback invocation srcu: Parallelize callback handling kvm: Move srcu_struct fields to end of struct kvm rcu: Fix typo in PER_RCU_NODE_PERIOD header comment rcu: Use true/false in assignment to bool rcu: Use bool value directly ...
2 parents dc9edaa + 20652ed commit de4d195

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+3904
-1129
lines changed
 

‎Documentation/RCU/00-INDEX

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ rcu_dereference.txt
1717
rcubarrier.txt
1818
- RCU and Unloadable Modules
1919
rculist_nulls.txt
20-
- RCU list primitives for use with SLAB_DESTROY_BY_RCU
20+
- RCU list primitives for use with SLAB_TYPESAFE_BY_RCU
2121
rcuref.txt
2222
- Reference-count design for elements of lists/arrays protected by RCU
2323
rcu.txt

‎Documentation/RCU/Design/Data-Structures/Data-Structures.html

+169-64
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ <h3>Introduction</h3>
1919
The <tt>rcu_state</tt> Structure</a>
2020
<li> <a href="#The rcu_node Structure">
2121
The <tt>rcu_node</tt> Structure</a>
22+
<li> <a href="#The rcu_segcblist Structure">
23+
The <tt>rcu_segcblist</tt> Structure</a>
2224
<li> <a href="#The rcu_data Structure">
2325
The <tt>rcu_data</tt> Structure</a>
2426
<li> <a href="#The rcu_dynticks Structure">
@@ -841,6 +843,134 @@ <h5>Sizing the <tt>rcu_node</tt> Array</h5>
841843
Finally, lines&nbsp;64-66 produce an error if the maximum number of
842844
CPUs is too large for the specified fanout.
843845

846+
<h3><a name="The rcu_segcblist Structure">
847+
The <tt>rcu_segcblist</tt> Structure</a></h3>
848+
849+
The <tt>rcu_segcblist</tt> structure maintains a segmented list of
850+
callbacks as follows:
851+
852+
<pre>
853+
1 #define RCU_DONE_TAIL 0
854+
2 #define RCU_WAIT_TAIL 1
855+
3 #define RCU_NEXT_READY_TAIL 2
856+
4 #define RCU_NEXT_TAIL 3
857+
5 #define RCU_CBLIST_NSEGS 4
858+
6
859+
7 struct rcu_segcblist {
860+
8 struct rcu_head *head;
861+
9 struct rcu_head **tails[RCU_CBLIST_NSEGS];
862+
10 unsigned long gp_seq[RCU_CBLIST_NSEGS];
863+
11 long len;
864+
12 long len_lazy;
865+
13 };
866+
</pre>
867+
868+
<p>
869+
The segments are as follows:
870+
871+
<ol>
872+
<li> <tt>RCU_DONE_TAIL</tt>: Callbacks whose grace periods have elapsed.
873+
These callbacks are ready to be invoked.
874+
<li> <tt>RCU_WAIT_TAIL</tt>: Callbacks that are waiting for the
875+
current grace period.
876+
Note that different CPUs can have different ideas about which
877+
grace period is current, hence the <tt>-&gt;gp_seq</tt> field.
878+
<li> <tt>RCU_NEXT_READY_TAIL</tt>: Callbacks waiting for the next
879+
grace period to start.
880+
<li> <tt>RCU_NEXT_TAIL</tt>: Callbacks that have not yet been
881+
associated with a grace period.
882+
</ol>
883+
884+
<p>
885+
The <tt>-&gt;head</tt> pointer references the first callback or
886+
is <tt>NULL</tt> if the list contains no callbacks (which is
887+
<i>not</i> the same as being empty).
888+
Each element of the <tt>-&gt;tails[]</tt> array references the
889+
<tt>-&gt;next</tt> pointer of the last callback in the corresponding
890+
segment of the list, or the list's <tt>-&gt;head</tt> pointer if
891+
that segment and all previous segments are empty.
892+
If the corresponding segment is empty but some previous segment is
893+
not empty, then the array element is identical to its predecessor.
894+
Older callbacks are closer to the head of the list, and new callbacks
895+
are added at the tail.
896+
This relationship between the <tt>-&gt;head</tt> pointer, the
897+
<tt>-&gt;tails[]</tt> array, and the callbacks is shown in this
898+
diagram:
899+
900+
</p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%">
901+
902+
</p><p>In this figure, the <tt>-&gt;head</tt> pointer references the
903+
first
904+
RCU callback in the list.
905+
The <tt>-&gt;tails[RCU_DONE_TAIL]</tt> array element references
906+
the <tt>-&gt;head</tt> pointer itself, indicating that none
907+
of the callbacks is ready to invoke.
908+
The <tt>-&gt;tails[RCU_WAIT_TAIL]</tt> array element references callback
909+
CB&nbsp;2's <tt>-&gt;next</tt> pointer, which indicates that
910+
CB&nbsp;1 and CB&nbsp;2 are both waiting on the current grace period,
911+
give or take possible disagreements about exactly which grace period
912+
is the current one.
913+
The <tt>-&gt;tails[RCU_NEXT_READY_TAIL]</tt> array element
914+
references the same RCU callback that <tt>-&gt;tails[RCU_WAIT_TAIL]</tt>
915+
does, which indicates that there are no callbacks waiting on the next
916+
RCU grace period.
917+
The <tt>-&gt;tails[RCU_NEXT_TAIL]</tt> array element references
918+
CB&nbsp;4's <tt>-&gt;next</tt> pointer, indicating that all the
919+
remaining RCU callbacks have not yet been assigned to an RCU grace
920+
period.
921+
Note that the <tt>-&gt;tails[RCU_NEXT_TAIL]</tt> array element
922+
always references the last RCU callback's <tt>-&gt;next</tt> pointer
923+
unless the callback list is empty, in which case it references
924+
the <tt>-&gt;head</tt> pointer.
925+
926+
<p>
927+
There is one additional important special case for the
928+
<tt>-&gt;tails[RCU_NEXT_TAIL]</tt> array element: It can be <tt>NULL</tt>
929+
when this list is <i>disabled</i>.
930+
Lists are disabled when the corresponding CPU is offline or when
931+
the corresponding CPU's callbacks are offloaded to a kthread,
932+
both of which are described elsewhere.
933+
934+
</p><p>CPUs advance their callbacks from the
935+
<tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the
936+
<tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments
937+
as grace periods advance.
938+
939+
</p><p>The <tt>-&gt;gp_seq[]</tt> array records grace-period
940+
numbers corresponding to the list segments.
941+
This is what allows different CPUs to have different ideas as to
942+
which is the current grace period while still avoiding premature
943+
invocation of their callbacks.
944+
In particular, this allows CPUs that go idle for extended periods
945+
to determine which of their callbacks are ready to be invoked after
946+
reawakening.
947+
948+
</p><p>The <tt>-&gt;len</tt> counter contains the number of
949+
callbacks in <tt>-&gt;head</tt>, and the
950+
<tt>-&gt;len_lazy</tt> contains the number of those callbacks that
951+
are known to only free memory, and whose invocation can therefore
952+
be safely deferred.
953+
954+
<p><b>Important note</b>: It is the <tt>-&gt;len</tt> field that
955+
determines whether or not there are callbacks associated with
956+
this <tt>rcu_segcblist</tt> structure, <i>not</i> the <tt>-&gt;head</tt>
957+
pointer.
958+
The reason for this is that all the ready-to-invoke callbacks
959+
(that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted
960+
all at once at callback-invocation time.
961+
If callback invocation must be postponed, for example, because a
962+
high-priority process just woke up on this CPU, then the remaining
963+
callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment.
964+
Either way, the <tt>-&gt;len</tt> and <tt>-&gt;len_lazy</tt> counts
965+
are adjusted after the corresponding callbacks have been invoked, and so
966+
again it is the <tt>-&gt;len</tt> count that accurately reflects whether
967+
or not there are callbacks associated with this <tt>rcu_segcblist</tt>
968+
structure.
969+
Of course, off-CPU sampling of the <tt>-&gt;len</tt> count requires
970+
the use of appropriate synchronization, for example, memory barriers.
971+
This synchronization can be a bit subtle, particularly in the case
972+
of <tt>rcu_barrier()</tt>.
973+
844974
<h3><a name="The rcu_data Structure">
845975
The <tt>rcu_data</tt> Structure</a></h3>
846976

@@ -983,62 +1113,18 @@ <h5>RCU Callback Handling</h5>
9831113
as follows:
9841114

9851115
<pre>
986-
1 struct rcu_head *nxtlist;
987-
2 struct rcu_head **nxttail[RCU_NEXT_SIZE];
988-
3 unsigned long nxtcompleted[RCU_NEXT_SIZE];
989-
4 long qlen_lazy;
990-
5 long qlen;
991-
6 long qlen_last_fqs_check;
1116+
1 struct rcu_segcblist cblist;
1117+
2 long qlen_last_fqs_check;
1118+
3 unsigned long n_cbs_invoked;
1119+
4 unsigned long n_nocbs_invoked;
1120+
5 unsigned long n_cbs_orphaned;
1121+
6 unsigned long n_cbs_adopted;
9921122
7 unsigned long n_force_qs_snap;
993-
8 unsigned long n_cbs_invoked;
994-
9 unsigned long n_cbs_orphaned;
995-
10 unsigned long n_cbs_adopted;
996-
11 long blimit;
1123+
8 long blimit;
9971124
</pre>
9981125

999-
<p>The <tt>-&gt;nxtlist</tt> pointer and the
1000-
<tt>-&gt;nxttail[]</tt> array form a four-segment list with
1001-
older callbacks near the head and newer ones near the tail.
1002-
Each segment contains callbacks with the corresponding relationship
1003-
to the current grace period.
1004-
The pointer out of the end of each of the four segments is referenced
1005-
by the element of the <tt>-&gt;nxttail[]</tt> array indexed by
1006-
<tt>RCU_DONE_TAIL</tt> (for callbacks handled by a prior grace period),
1007-
<tt>RCU_WAIT_TAIL</tt> (for callbacks waiting on the current grace period),
1008-
<tt>RCU_NEXT_READY_TAIL</tt> (for callbacks that will wait on the next
1009-
grace period), and
1010-
<tt>RCU_NEXT_TAIL</tt> (for callbacks that are not yet associated
1011-
with a specific grace period)
1012-
respectively, as shown in the following figure.
1013-
1014-
</p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%">
1015-
1016-
</p><p>In this figure, the <tt>-&gt;nxtlist</tt> pointer references the
1017-
first
1018-
RCU callback in the list.
1019-
The <tt>-&gt;nxttail[RCU_DONE_TAIL]</tt> array element references
1020-
the <tt>-&gt;nxtlist</tt> pointer itself, indicating that none
1021-
of the callbacks is ready to invoke.
1022-
The <tt>-&gt;nxttail[RCU_WAIT_TAIL]</tt> array element references callback
1023-
CB&nbsp;2's <tt>-&gt;next</tt> pointer, which indicates that
1024-
CB&nbsp;1 and CB&nbsp;2 are both waiting on the current grace period.
1025-
The <tt>-&gt;nxttail[RCU_NEXT_READY_TAIL]</tt> array element
1026-
references the same RCU callback that <tt>-&gt;nxttail[RCU_WAIT_TAIL]</tt>
1027-
does, which indicates that there are no callbacks waiting on the next
1028-
RCU grace period.
1029-
The <tt>-&gt;nxttail[RCU_NEXT_TAIL]</tt> array element references
1030-
CB&nbsp;4's <tt>-&gt;next</tt> pointer, indicating that all the
1031-
remaining RCU callbacks have not yet been assigned to an RCU grace
1032-
period.
1033-
Note that the <tt>-&gt;nxttail[RCU_NEXT_TAIL]</tt> array element
1034-
always references the last RCU callback's <tt>-&gt;next</tt> pointer
1035-
unless the callback list is empty, in which case it references
1036-
the <tt>-&gt;nxtlist</tt> pointer.
1037-
1038-
</p><p>CPUs advance their callbacks from the
1039-
<tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the
1040-
<tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments
1041-
as grace periods advance.
1126+
<p>The <tt>-&gt;cblist</tt> structure is the segmented callback list
1127+
described earlier.
10421128
The CPU advances the callbacks in its <tt>rcu_data</tt> structure
10431129
whenever it notices that another RCU grace period has completed.
10441130
The CPU detects the completion of an RCU grace period by noticing
@@ -1049,16 +1135,7 @@ <h5>RCU Callback Handling</h5>
10491135
<tt>-&gt;completed</tt> field is updated at the end of each
10501136
grace period.
10511137

1052-
</p><p>The <tt>-&gt;nxtcompleted[]</tt> array records grace-period
1053-
numbers corresponding to the list segments.
1054-
This allows CPUs that go idle for extended periods to determine
1055-
which of their callbacks are ready to be invoked after reawakening.
1056-
1057-
</p><p>The <tt>-&gt;qlen</tt> counter contains the number of
1058-
callbacks in <tt>-&gt;nxtlist</tt>, and the
1059-
<tt>-&gt;qlen_lazy</tt> contains the number of those callbacks that
1060-
are known to only free memory, and whose invocation can therefore
1061-
be safely deferred.
1138+
<p>
10621139
The <tt>-&gt;qlen_last_fqs_check</tt> and
10631140
<tt>-&gt;n_force_qs_snap</tt> coordinate the forcing of quiescent
10641141
states from <tt>call_rcu()</tt> and friends when callback
@@ -1069,6 +1146,10 @@ <h5>RCU Callback Handling</h5>
10691146
fields count the number of callbacks invoked,
10701147
sent to other CPUs when this CPU goes offline,
10711148
and received from other CPUs when those other CPUs go offline.
1149+
The <tt>-&gt;n_nocbs_invoked</tt> is used when the CPU's callbacks
1150+
are offloaded to a kthread.
1151+
1152+
<p>
10721153
Finally, the <tt>-&gt;blimit</tt> counter is the maximum number of
10731154
RCU callbacks that may be invoked at a given time.
10741155

@@ -1104,6 +1185,9 @@ <h3><a name="The rcu_dynticks Structure">
11041185
1 int dynticks_nesting;
11051186
2 int dynticks_nmi_nesting;
11061187
3 atomic_t dynticks;
1188+
4 bool rcu_need_heavy_qs;
1189+
5 unsigned long rcu_qs_ctr;
1190+
6 bool rcu_urgent_qs;
11071191
</pre>
11081192

11091193
<p>The <tt>-&gt;dynticks_nesting</tt> field counts the
@@ -1117,11 +1201,32 @@ <h3><a name="The rcu_dynticks Structure">
11171201
field, except that NMIs that interrupt non-dyntick-idle execution
11181202
are not counted.
11191203

1120-
</p><p>Finally, the <tt>-&gt;dynticks</tt> field counts the corresponding
1204+
</p><p>The <tt>-&gt;dynticks</tt> field counts the corresponding
11211205
CPU's transitions to and from dyntick-idle mode, so that this counter
11221206
has an even value when the CPU is in dyntick-idle mode and an odd
11231207
value otherwise.
11241208

1209+
</p><p>The <tt>-&gt;rcu_need_heavy_qs</tt> field is used
1210+
to record the fact that the RCU core code would really like to
1211+
see a quiescent state from the corresponding CPU, so much so that
1212+
it is willing to call for heavy-weight dyntick-counter operations.
1213+
This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
1214+
code, which provide a momentary idle sojourn in response.
1215+
1216+
</p><p>The <tt>-&gt;rcu_qs_ctr</tt> field is used to record
1217+
quiescent states from <tt>cond_resched()</tt>.
1218+
Because <tt>cond_resched()</tt> can execute quite frequently, this
1219+
must be quite lightweight, as in a non-atomic increment of this
1220+
per-CPU field.
1221+
1222+
</p><p>Finally, the <tt>-&gt;rcu_urgent_qs</tt> field is used to record
1223+
the fact that the RCU core code would really like to see a quiescent
1224+
state from the corresponding CPU, with the various other fields indicating
1225+
just how badly RCU wants this quiescent state.
1226+
This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
1227+
code, which, if nothing else, non-atomically increment <tt>-&gt;rcu_qs_ctr</tt>
1228+
in response.
1229+
11251230
<table>
11261231
<tr><th>&nbsp;</th></tr>
11271232
<tr><th align="left">Quick Quiz:</th></tr>

0 commit comments

Comments
 (0)
Please sign in to comment.