Skip to content

Commit

Permalink
Merge branches 'doc.2013.03.12a', 'fixes.2013.03.13a' and 'idlenocb.2…
Browse files Browse the repository at this point in the history
…013.03.26b' into HEAD

doc.2013.03.12a: Documentation changes.

fixes.2013.03.13a: Miscellaneous fixes.

idlenocb.2013.03.26b: Remove restrictions on no-CBs CPUs, make
	RCU_FAST_NO_HZ take advantage of numbered callbacks, add
	callback acceleration based on numbered callbacks.
  • Loading branch information
paulmck committed Mar 26, 2013
3 parents 3f944ad + 81e5949 + 910ee45 commit 6d87669
Show file tree
Hide file tree
Showing 11 changed files with 603 additions and 505 deletions.
33 changes: 24 additions & 9 deletions Documentation/RCU/stallwarn.txt
Original file line number Diff line number Diff line change
Expand Up @@ -92,14 +92,14 @@ If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set,
more information is printed with the stall-warning message, for example:

INFO: rcu_preempt detected stall on CPU
0: (63959 ticks this GP) idle=241/3fffffffffffffff/0
0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543
(t=65000 jiffies)

In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is
printed:

INFO: rcu_preempt detected stall on CPU
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer not pending
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D
(t=65000 jiffies)

The "(64628 ticks this GP)" indicates that this CPU has taken more
Expand All @@ -116,13 +116,28 @@ number between the two "/"s is the value of the nesting, which will
be a small positive number if in the idle loop and a very large positive
number (as shown above) otherwise.

For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the CPU is
not in the process of trying to force itself into dyntick-idle state, the
"." indicates that the CPU has not given up forcing RCU into dyntick-idle
mode (it would be "H" otherwise), and the "timer not pending" indicates
that the CPU has not recently forced RCU into dyntick-idle mode (it
would otherwise indicate the number of microseconds remaining in this
forced state).
The "softirq=" portion of the message tracks the number of RCU softirq
handlers that the stalled CPU has executed. The number before the "/"
is the number that had executed since boot at the time that this CPU
last noted the beginning of a grace period, which might be the current
(stalled) grace period, or it might be some earlier grace period (for
example, if the CPU might have been in dyntick-idle mode for an extended
time period. The number after the "/" is the number that have executed
since boot until the current time. If this latter number stays constant
across repeated stall-warning messages, it is possible that RCU's softirq
handlers are no longer able to execute on this CPU. This can happen if
the stalled CPU is spinning with interrupts are disabled, or, in -rt
kernels, if a high-priority process is starving RCU's softirq handler.

For CONFIG_RCU_FAST_NO_HZ kernels, the "last_accelerate:" prints the
low-order 16 bits (in hex) of the jiffies counter when this CPU last
invoked rcu_try_advance_all_cbs() from rcu_needs_cpu() or last invoked
rcu_accelerate_cbs() from rcu_prepare_for_idle(). The "nonlazy_posted:"
prints the number of non-lazy callbacks posted since the last call to
rcu_needs_cpu(). Finally, an "L" indicates that there are currently
no non-lazy callbacks ("." is printed otherwise, as shown above) and
"D" indicates that dyntick-idle processing is enabled ("." is printed
otherwise, for example, if disabled via the "nohz=" kernel boot parameter).


Multiple Warnings From One Stall
Expand Down
35 changes: 24 additions & 11 deletions Documentation/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2461,9 +2461,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
Invocation of these CPUs' RCU callbacks will
be offloaded to "rcuoN" kthreads created for
that purpose. This reduces OS jitter on the
be offloaded to "rcuox/N" kthreads created for
that purpose, where "x" is "b" for RCU-bh, "p"
for RCU-preempt, and "s" for RCU-sched, and "N"
is the CPU number. This reduces OS jitter on the
offloaded CPUs, which can be useful for HPC and

real-time workloads. It can also improve energy
efficiency for asymmetric multiprocessors.

Expand All @@ -2487,6 +2490,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
leaf rcu_node structure. Useful for very large
systems.

rcutree.jiffies_till_first_fqs= [KNL,BOOT]
Set delay from grace-period initialization to
first attempt to force quiescent states.
Units are jiffies, minimum value is zero,
and maximum value is HZ.

rcutree.jiffies_till_next_fqs= [KNL,BOOT]
Set delay between subsequent attempts to force
quiescent states. Units are jiffies, minimum
value is one, and maximum value is HZ.

rcutree.qhimark= [KNL,BOOT]
Set threshold of queued
RCU callbacks over which batch limiting is disabled.
Expand All @@ -2501,16 +2515,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
rcutree.rcu_cpu_stall_timeout= [KNL,BOOT]
Set timeout for RCU CPU stall warning messages.

rcutree.jiffies_till_first_fqs= [KNL,BOOT]
Set delay from grace-period initialization to
first attempt to force quiescent states.
Units are jiffies, minimum value is zero,
and maximum value is HZ.
rcutree.rcu_idle_gp_delay= [KNL,BOOT]
Set wakeup interval for idle CPUs that have
RCU callbacks (RCU_FAST_NO_HZ=y).

rcutree.jiffies_till_next_fqs= [KNL,BOOT]
Set delay between subsequent attempts to force
quiescent states. Units are jiffies, minimum
value is one, and maximum value is HZ.
rcutree.rcu_idle_lazy_gp_delay= [KNL,BOOT]
Set wakeup interval for idle CPUs that have
only "lazy" RCU callbacks (RCU_FAST_NO_HZ=y).
Lazy RCU callbacks are those which RCU can
prove do nothing more than free memory.

rcutorture.fqs_duration= [KNL,BOOT]
Set duration of force_quiescent_state bursts.
Expand Down
5 changes: 5 additions & 0 deletions include/linux/list_bl.h
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,11 @@ static inline void hlist_bl_unlock(struct hlist_bl_head *b)
__bit_spin_unlock(0, (unsigned long *)b);
}

static inline bool hlist_bl_is_locked(struct hlist_bl_head *b)
{
return bit_spin_is_locked(0, (unsigned long *)b);
}

/**
* hlist_bl_for_each_entry - iterate over list of given type
* @tpos: the type * to use as a loop cursor.
Expand Down
2 changes: 1 addition & 1 deletion include/linux/rculist_bl.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ static inline void hlist_bl_set_first_rcu(struct hlist_bl_head *h,
static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h)
{
return (struct hlist_bl_node *)
((unsigned long)rcu_dereference(h->first) & ~LIST_BL_LOCKMASK);
((unsigned long)rcu_dereference_check(h->first, hlist_bl_is_locked(h)) & ~LIST_BL_LOCKMASK);
}

/**
Expand Down
1 change: 1 addition & 0 deletions include/linux/rcupdate.h
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ extern void do_trace_rcu_torture_read(char *rcutorturename,
#define UINT_CMP_LT(a, b) (UINT_MAX / 2 < (a) - (b))
#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
#define ulong2long(a) (*(long *)(&(a)))

/* Exported common interfaces */

Expand Down
55 changes: 55 additions & 0 deletions include/trace/events/rcu.h
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,58 @@ TRACE_EVENT(rcu_grace_period,
__entry->rcuname, __entry->gpnum, __entry->gpevent)
);

/*
* Tracepoint for future grace-period events, including those for no-callbacks
* CPUs. The caller should pull the data from the rcu_node structure,
* other than rcuname, which comes from the rcu_state structure, and event,
* which is one of the following:
*
* "Startleaf": Request a nocb grace period based on leaf-node data.
* "Startedleaf": Leaf-node start proved sufficient.
* "Startedleafroot": Leaf-node start proved sufficient after checking root.
* "Startedroot": Requested a nocb grace period based on root-node data.
* "StartWait": Start waiting for the requested grace period.
* "ResumeWait": Resume waiting after signal.
* "EndWait": Complete wait.
* "Cleanup": Clean up rcu_node structure after previous GP.
* "CleanupMore": Clean up, and another no-CB GP is needed.
*/
TRACE_EVENT(rcu_future_grace_period,

TP_PROTO(char *rcuname, unsigned long gpnum, unsigned long completed,
unsigned long c, u8 level, int grplo, int grphi,
char *gpevent),

TP_ARGS(rcuname, gpnum, completed, c, level, grplo, grphi, gpevent),

TP_STRUCT__entry(
__field(char *, rcuname)
__field(unsigned long, gpnum)
__field(unsigned long, completed)
__field(unsigned long, c)
__field(u8, level)
__field(int, grplo)
__field(int, grphi)
__field(char *, gpevent)
),

TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gpnum = gpnum;
__entry->completed = completed;
__entry->c = c;
__entry->level = level;
__entry->grplo = grplo;
__entry->grphi = grphi;
__entry->gpevent = gpevent;
),

TP_printk("%s %lu %lu %lu %u %d %d %s",
__entry->rcuname, __entry->gpnum, __entry->completed,
__entry->c, __entry->level, __entry->grplo, __entry->grphi,
__entry->gpevent)
);

/*
* Tracepoint for grace-period-initialization events. These are
* distinguished by the type of RCU, the new grace-period number, the
Expand Down Expand Up @@ -601,6 +653,9 @@ TRACE_EVENT(rcu_barrier,
#define trace_rcu_grace_period(rcuname, gpnum, gpevent) do { } while (0)
#define trace_rcu_grace_period_init(rcuname, gpnum, level, grplo, grphi, \
qsmask) do { } while (0)
#define trace_rcu_future_grace_period(rcuname, gpnum, completed, c, \
level, grplo, grphi, event) \
do { } while (0)
#define trace_rcu_preempt_task(rcuname, pid, gpnum) do { } while (0)
#define trace_rcu_unlock_preempted_task(rcuname, gpnum, pid) do { } while (0)
#define trace_rcu_quiescent_state_report(rcuname, gpnum, mask, qsmask, level, \
Expand Down
73 changes: 58 additions & 15 deletions init/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -582,13 +582,16 @@ config RCU_FAST_NO_HZ
depends on NO_HZ && SMP
default n
help
This option causes RCU to attempt to accelerate grace periods in
order to allow CPUs to enter dynticks-idle state more quickly.
On the other hand, this option increases the overhead of the
dynticks-idle checking, thus degrading scheduling latency.
This option permits CPUs to enter dynticks-idle state even if
they have RCU callbacks queued, and prevents RCU from waking
these CPUs up more than roughly once every four jiffies (by
default, you can adjust this using the rcutree.rcu_idle_gp_delay
parameter), thus improving energy efficiency. On the other
hand, this option increases the duration of RCU grace periods,
for example, slowing down synchronize_rcu().

Say Y if energy efficiency is critically important, and you don't
care about real-time response.
Say Y if energy efficiency is critically important, and you
don't care about increased grace-period durations.

Say N if you are unsure.

Expand Down Expand Up @@ -655,7 +658,7 @@ config RCU_BOOST_DELAY
Accept the default if unsure.

config RCU_NOCB_CPU
bool "Offload RCU callback processing from boot-selected CPUs"
bool "Offload RCU callback processing from boot-selected CPUs (EXPERIMENTAL"
depends on TREE_RCU || TREE_PREEMPT_RCU
default n
help
Expand All @@ -666,16 +669,56 @@ config RCU_NOCB_CPU

This option offloads callback invocation from the set of
CPUs specified at boot time by the rcu_nocbs parameter.
For each such CPU, a kthread ("rcuoN") will be created to
invoke callbacks, where the "N" is the CPU being offloaded.
Nothing prevents this kthread from running on the specified
CPUs, but (1) the kthreads may be preempted between each
callback, and (2) affinity or cgroups can be used to force
the kthreads to run on whatever set of CPUs is desired.

Say Y here if you want reduced OS jitter on selected CPUs.
For each such CPU, a kthread ("rcuox/N") will be created to
invoke callbacks, where the "N" is the CPU being offloaded,
and where the "x" is "b" for RCU-bh, "p" for RCU-preempt, and
"s" for RCU-sched. Nothing prevents this kthread from running
on the specified CPUs, but (1) the kthreads may be preempted
between each callback, and (2) affinity or cgroups can be used
to force the kthreads to run on whatever set of CPUs is desired.

Say Y here if you want to help to debug reduced OS jitter.
Say N here if you are unsure.

choice
prompt "Build-forced no-CBs CPUs"
default RCU_NOCB_CPU_NONE
help
This option allows no-CBs CPUs to be specified at build time.
Additional no-CBs CPUs may be specified by the rcu_nocbs=
boot parameter.

config RCU_NOCB_CPU_NONE
bool "No build_forced no-CBs CPUs"
depends on RCU_NOCB_CPU
help
This option does not force any of the CPUs to be no-CBs CPUs.
Only CPUs designated by the rcu_nocbs= boot parameter will be
no-CBs CPUs.

config RCU_NOCB_CPU_ZERO
bool "CPU 0 is a build_forced no-CBs CPU"
depends on RCU_NOCB_CPU
help
This option forces CPU 0 to be a no-CBs CPU. Additional CPUs
may be designated as no-CBs CPUs using the rcu_nocbs= boot
parameter will be no-CBs CPUs.

Select this if CPU 0 needs to be a no-CBs CPU for real-time
or energy-efficiency reasons.

config RCU_NOCB_CPU_ALL
bool "All CPUs are build_forced no-CBs CPUs"
depends on RCU_NOCB_CPU
help
This option forces all CPUs to be no-CBs CPUs. The rcu_nocbs=
boot parameter will be ignored.

Select this if all CPUs need to be no-CBs CPUs for real-time
or energy-efficiency reasons.

endchoice

endmenu # "RCU Subsystem"

config IKCONFIG
Expand Down
Loading

0 comments on commit 6d87669

Please sign in to comment.