Skip to content

Commit

Permalink
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/lin…
Browse files Browse the repository at this point in the history
…ux/kernel/git/tip/tip

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  cpu: Export cpu_up()
  rcu: Apply ACCESS_ONCE() to rcu_boost() return value
  Revert "rcu: Permit rt_mutex_unlock() with irqs disabled"
  docs: Additional LWN links to RCU API
  rcu: Augment rcu_batch_end tracing for idle and callback state
  rcu: Add rcutorture tests for srcu_read_lock_raw()
  rcu: Make rcutorture test for hotpluggability before offlining CPUs
  driver-core/cpu: Expose hotpluggability to the rest of the kernel
  rcu: Remove redundant rcu_cpu_stall_suppress declaration
  rcu: Adaptive dyntick-idle preparation
  rcu: Keep invoking callbacks if CPU otherwise idle
  rcu: Irq nesting is always 0 on rcu_enter_idle_common
  rcu: Don't check irq nesting from rcu idle entry/exit
  rcu: Permit dyntick-idle with callbacks pending
  rcu: Document same-context read-side constraints
  rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass
  rcu: Remove dynticks false positives and RCU failures
  rcu: Reduce latency of rcu_prepare_for_idle()
  rcu: Eliminate RCU_FAST_NO_HZ grace-period hang
  rcu: Avoid needlessly IPIing CPUs at GP end
  ...
  • Loading branch information
torvalds committed Jan 6, 2012
2 parents 1483b38 + 919b834 commit 423d091
Show file tree
Hide file tree
Showing 58 changed files with 1,512 additions and 407 deletions.
6 changes: 6 additions & 0 deletions Documentation/RCU/checklist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,12 @@ over a rather long period of time, but improvements are always welcome!
RCU rather than SRCU, because RCU is almost always faster and
easier to use than is SRCU.

If you need to enter your read-side critical section in a
hardirq or exception handler, and then exit that same read-side
critical section in the task that was interrupted, then you need
to srcu_read_lock_raw() and srcu_read_unlock_raw(), which avoid
the lockdep checking that would otherwise this practice illegal.

Also unlike other forms of RCU, explicit initialization
and cleanup is required via init_srcu_struct() and
cleanup_srcu_struct(). These are passed a "struct srcu_struct"
Expand Down
10 changes: 5 additions & 5 deletions Documentation/RCU/rcu.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,11 @@ o How can the updater tell when a grace period has completed

Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the
same effect, but require that the readers manipulate CPU-local
counters. These counters allow limited types of blocking
within RCU read-side critical sections. SRCU also uses
CPU-local counters, and permits general blocking within
RCU read-side critical sections. These two variants of
RCU detect grace periods by sampling these counters.
counters. These counters allow limited types of blocking within
RCU read-side critical sections. SRCU also uses CPU-local
counters, and permits general blocking within RCU read-side
critical sections. These variants of RCU detect grace periods
by sampling these counters.

o If I am running on a uniprocessor kernel, which can only do one
thing at a time, why should I wait for a grace period?
Expand Down
16 changes: 10 additions & 6 deletions Documentation/RCU/stallwarn.txt
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,11 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
messages.

o A hardware or software issue shuts off the scheduler-clock
interrupt on a CPU that is not in dyntick-idle mode. This
problem really has happened, and seems to be most likely to
result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.

o A bug in the RCU implementation.

o A hardware failure. This is quite unlikely, but has occurred
Expand All @@ -109,12 +114,11 @@ o A hardware failure. This is quite unlikely, but has occurred
This resulted in a series of RCU CPU stall warnings, eventually
leading the realization that the CPU had failed.

The RCU, RCU-sched, and RCU-bh implementations have CPU stall
warning. SRCU does not have its own CPU stall warnings, but its
calls to synchronize_sched() will result in RCU-sched detecting
RCU-sched-related CPU stalls. Please note that RCU only detects
CPU stalls when there is a grace period in progress. No grace period,
no CPU stall warnings.
The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning.
SRCU does not have its own CPU stall warnings, but its calls to
synchronize_sched() will result in RCU-sched detecting RCU-sched-related
CPU stalls. Please note that RCU only detects CPU stalls when there is
a grace period in progress. No grace period, no CPU stall warnings.

To diagnose the cause of the stall, inspect the stack traces.
The offending function will usually be near the top of the stack.
Expand Down
13 changes: 13 additions & 0 deletions Documentation/RCU/torture.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,24 @@ nreaders This is the number of RCU reading threads supported.
To properly exercise RCU implementations with preemptible
read-side critical sections.

onoff_interval
The number of seconds between each attempt to execute a
randomly selected CPU-hotplug operation. Defaults to
zero, which disables CPU hotplugging. In HOTPLUG_CPU=n
kernels, rcutorture will silently refuse to do any
CPU-hotplug operations regardless of what value is
specified for onoff_interval.

shuffle_interval
The number of seconds to keep the test threads affinitied
to a particular subset of the CPUs, defaults to 3 seconds.
Used in conjunction with test_no_idle_hz.

shutdown_secs The number of seconds to run the test before terminating
the test and powering off the system. The default is
zero, which disables test termination and system shutdown.
This capability is useful for automated testing.

stat_interval The number of seconds between output of torture
statistics (via printk()). Regardless of the interval,
statistics are printed when the module is unloaded.
Expand Down
4 changes: 0 additions & 4 deletions Documentation/RCU/trace.txt
Original file line number Diff line number Diff line change
Expand Up @@ -105,14 +105,10 @@ o "dt" is the current value of the dyntick counter that is incremented
or one greater than the interrupt-nesting depth otherwise.
The number after the second "/" is the NMI nesting depth.

This field is displayed only for CONFIG_NO_HZ kernels.

o "df" is the number of times that some other CPU has forced a
quiescent state on behalf of this CPU due to this CPU being in
dynticks-idle state.

This field is displayed only for CONFIG_NO_HZ kernels.

o "of" is the number of times that some other CPU has forced a
quiescent state on behalf of this CPU due to this CPU being
offline. In a perfect world, this might never happen, but it
Expand Down
19 changes: 14 additions & 5 deletions Documentation/RCU/whatisRCU.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ to start learning about RCU:
1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/


What is RCU?
Expand Down Expand Up @@ -834,6 +835,8 @@ SRCU: Critical sections Grace period Barrier

srcu_read_lock synchronize_srcu N/A
srcu_read_unlock synchronize_srcu_expedited
srcu_read_lock_raw
srcu_read_unlock_raw
srcu_dereference

SRCU: Initialization/cleanup
Expand All @@ -855,27 +858,33 @@ list can be helpful:

a. Will readers need to block? If so, you need SRCU.

b. What about the -rt patchset? If readers would need to block
b. Is it necessary to start a read-side critical section in a
hardirq handler or exception handler, and then to complete
this read-side critical section in the task that was
interrupted? If so, you need SRCU's srcu_read_lock_raw() and
srcu_read_unlock_raw() primitives.

c. What about the -rt patchset? If readers would need to block
in an non-rt kernel, you need SRCU. If readers would block
in a -rt kernel, but not in a non-rt kernel, SRCU is not
necessary.

c. Do you need to treat NMI handlers, hardirq handlers,
d. Do you need to treat NMI handlers, hardirq handlers,
and code segments with preemption disabled (whether
via preempt_disable(), local_irq_save(), local_bh_disable(),
or some other mechanism) as if they were explicit RCU readers?
If so, you need RCU-sched.

d. Do you need RCU grace periods to complete even in the face
e. Do you need RCU grace periods to complete even in the face
of softirq monopolization of one or more of the CPUs? For
example, is your code subject to network-based denial-of-service
attacks? If so, you need RCU-bh.

e. Is your workload too update-intensive for normal use of
f. Is your workload too update-intensive for normal use of
RCU, but inappropriate for other synchronization mechanisms?
If so, consider SLAB_DESTROY_BY_RCU. But please be careful!

f. Otherwise, use RCU.
g. Otherwise, use RCU.

Of course, this all assumes that you have determined that RCU is in fact
the right tool for your job.
Expand Down
87 changes: 87 additions & 0 deletions Documentation/atomic_ops.txt
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,93 @@ compiler optimizes the section accessing atomic_t variables.

*** YOU HAVE BEEN WARNED! ***

Properly aligned pointers, longs, ints, and chars (and unsigned
equivalents) may be atomically loaded from and stored to in the same
sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE()
macro should be used to prevent the compiler from using optimizations
that might otherwise optimize accesses out of existence on the one hand,
or that might create unsolicited accesses on the other.

For example consider the following code:

while (a > 0)
do_something();

If the compiler can prove that do_something() does not store to the
variable a, then the compiler is within its rights transforming this to
the following:

tmp = a;
if (a > 0)
for (;;)
do_something();

If you don't want the compiler to do this (and you probably don't), then
you should use something like the following:

while (ACCESS_ONCE(a) < 0)
do_something();

Alternatively, you could place a barrier() call in the loop.

For another example, consider the following code:

tmp_a = a;
do_something_with(tmp_a);
do_something_else_with(tmp_a);

If the compiler can prove that do_something_with() does not store to the
variable a, then the compiler is within its rights to manufacture an
additional load as follows:

tmp_a = a;
do_something_with(tmp_a);
tmp_a = a;
do_something_else_with(tmp_a);

This could fatally confuse your code if it expected the same value
to be passed to do_something_with() and do_something_else_with().

The compiler would be likely to manufacture this additional load if
do_something_with() was an inline function that made very heavy use
of registers: reloading from variable a could save a flush to the
stack and later reload. To prevent the compiler from attacking your
code in this manner, write the following:

tmp_a = ACCESS_ONCE(a);
do_something_with(tmp_a);
do_something_else_with(tmp_a);

For a final example, consider the following code, assuming that the
variable a is set at boot time before the second CPU is brought online
and never changed later, so that memory barriers are not needed:

if (a)
b = 9;
else
b = 42;

The compiler is within its rights to manufacture an additional store
by transforming the above code into the following:

b = 42;
if (a)
b = 9;

This could come as a fatal surprise to other code running concurrently
that expected b to never have the value 42 if a was zero. To prevent
the compiler from doing this, write something like:

if (a)
ACCESS_ONCE(b) = 9;
else
ACCESS_ONCE(b) = 42;

Don't even -think- about doing this without proper use of memory barriers,
locks, or atomic operations if variable a can change at runtime!

*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! ***

Now, we move onto the atomic operation interfaces typically implemented with
the help of assembly code.

Expand Down
63 changes: 63 additions & 0 deletions Documentation/lockdep-design.txt
Original file line number Diff line number Diff line change
Expand Up @@ -221,3 +221,66 @@ when the chain is validated for the first time, is then put into a hash
table, which hash-table can be checked in a lockfree manner. If the
locking chain occurs again later on, the hash table tells us that we
dont have to validate the chain again.

Troubleshooting:
----------------

The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
Exceeding this number will trigger the following lockdep warning:

(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))

By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
desktop systems have less than 1,000 lock classes, so this warning
normally results from lock-class leakage or failure to properly
initialize locks. These two problems are illustrated below:

1. Repeated module loading and unloading while running the validator
will result in lock-class leakage. The issue here is that each
load of the module will create a new set of lock classes for
that module's locks, but module unloading does not remove old
classes (see below discussion of reuse of lock classes for why).
Therefore, if that module is loaded and unloaded repeatedly,
the number of lock classes will eventually reach the maximum.

2. Using structures such as arrays that have large numbers of
locks that are not explicitly initialized. For example,
a hash table with 8192 buckets where each bucket has its own
spinlock_t will consume 8192 lock classes -unless- each spinlock
is explicitly initialized at runtime, for example, using the
run-time spin_lock_init() as opposed to compile-time initializers
such as __SPIN_LOCK_UNLOCKED(). Failure to properly initialize
the per-bucket spinlocks would guarantee lock-class overflow.
In contrast, a loop that called spin_lock_init() on each lock
would place all 8192 locks into a single lock class.

The moral of this story is that you should always explicitly
initialize your locks.

One might argue that the validator should be modified to allow
lock classes to be reused. However, if you are tempted to make this
argument, first review the code and think through the changes that would
be required, keeping in mind that the lock classes to be removed are
likely to be linked into the lock-dependency graph. This turns out to
be harder to do than to say.

Of course, if you do run out of lock classes, the next thing to do is
to find the offending lock classes. First, the following command gives
you the number of lock classes currently in use along with the maximum:

grep "lock-classes" /proc/lockdep_stats

This command produces the following output on a modest system:

lock-classes: 748 [max: 8191]

If the number allocated (748 above) increases continually over time,
then there is likely a leak. The following command can be used to
identify the leaking lock classes:

grep "BD" /proc/lockdep

Run the command and save the output, then compare against the output from
a later run of this command to identify the leakers. This same output
can also help you find situations where runtime lock initialization has
been omitted.
6 changes: 4 additions & 2 deletions arch/arm/kernel/process.c
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,8 @@ void cpu_idle(void)

/* endless idle loop with no priority at all */
while (1) {
tick_nohz_stop_sched_tick(1);
tick_nohz_idle_enter();
rcu_idle_enter();
leds_event(led_idle_start);
while (!need_resched()) {
#ifdef CONFIG_HOTPLUG_CPU
Expand Down Expand Up @@ -213,7 +214,8 @@ void cpu_idle(void)
}
}
leds_event(led_idle_end);
tick_nohz_restart_sched_tick();
rcu_idle_exit();
tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
Expand Down
6 changes: 4 additions & 2 deletions arch/avr32/kernel/process.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,12 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
tick_nohz_stop_sched_tick(1);
tick_nohz_idle_enter();
rcu_idle_enter();
while (!need_resched())
cpu_idle_sleep();
tick_nohz_restart_sched_tick();
rcu_idle_exit();
tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
Expand Down
6 changes: 4 additions & 2 deletions arch/blackfin/kernel/process.c
Original file line number Diff line number Diff line change
Expand Up @@ -88,10 +88,12 @@ void cpu_idle(void)
#endif
if (!idle)
idle = default_idle;
tick_nohz_stop_sched_tick(1);
tick_nohz_idle_enter();
rcu_idle_enter();
while (!need_resched())
idle();
tick_nohz_restart_sched_tick();
rcu_idle_exit();
tick_nohz_idle_exit();
preempt_enable_no_resched();
schedule();
preempt_disable();
Expand Down
6 changes: 4 additions & 2 deletions arch/microblaze/kernel/process.c
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,12 @@ void cpu_idle(void)
if (!idle)
idle = default_idle;

tick_nohz_stop_sched_tick(1);
tick_nohz_idle_enter();
rcu_idle_enter();
while (!need_resched())
idle();
tick_nohz_restart_sched_tick();
rcu_idle_exit();
tick_nohz_idle_exit();

preempt_enable_no_resched();
schedule();
Expand Down
Loading

0 comments on commit 423d091

Please sign in to comment.