Skip to content

Commit

Permalink
Merge tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/k…
Browse files Browse the repository at this point in the history
…ernel/git/rcu/linux

Pull RCU updates from Neeraj Upadhyay:
 "Context tracking:
   - rename context tracking state related symbols and remove references
     to "dynticks" in various context tracking state variables and
     related helpers
   - force context_tracking_enabled_this_cpu() to be inlined to avoid
     leaving a noinstr section

  CSD lock:
   - enhance CSD-lock diagnostic reports
   - add an API to provide an indication of ongoing CSD-lock stall

  nocb:
   - update and simplify RCU nocb code to handle (de-)offloading of
     callbacks only for offline CPUs
   - fix RT throttling hrtimer being armed from offline CPU

  rcutorture:
   - remove redundant rcu_torture_ops get_gp_completed fields
   - add SRCU ->same_gp_state and ->get_comp_state functions
   - add generic test for NUM_ACTIVE_*RCU_POLL* for testing RCU and SRCU
     polled grace periods
   - add CFcommon.arch for arch-specific Kconfig options
   - print number of update types in rcu_torture_write_types()
   - add rcutree.nohz_full_patience_delay testing to the TREE07 scenario
   - add a stall_cpu_repeat module parameter to test repeated CPU stalls
   - add argument to limit number of CPUs a guest OS can use in
     torture.sh

  rcustall:
   - abbreviate RCU CPU stall warnings during CSD-lock stalls
   - Allow dump_cpu_task() to be called without disabling preemption
   - defer printing stall-warning backtrace when holding rcu_node lock

  srcu:
   - make SRCU gp seq wrap-around faster
   - add KCSAN checks for concurrent updates to ->srcu_n_exp_nodelay and
     ->reschedule_count which are used in heuristics governing
     auto-expediting of normal SRCU grace periods and
     grace-period-state-machine delays
   - mark idle SRCU-barrier callbacks to help identify stuck
     SRCU-barrier callback

  rcu tasks:
   - remove RCU Tasks Rude asynchronous APIs as they are no longer used
   - stop testing RCU Tasks Rude asynchronous APIs
   - fix access to non-existent percpu regions
   - check processor-ID assumptions during chosen CPU calculation for
     callback enqueuing
   - update description of rtp->tasks_gp_seq grace-period sequence
     number
   - add rcu_barrier_cb_is_done() to identify whether a given
     rcu_barrier callback is stuck
   - mark idle Tasks-RCU-barrier callbacks
   - add *torture_stats_print() functions to print detailed diagnostics
     for Tasks-RCU variants
   - capture start time of rcu_barrier_tasks*() operation to help
     distinguish a hung barrier operation from a long series of barrier
     operations

  refscale:
   - add a TINY scenario to support tests of Tiny RCU and Tiny
     SRCU
   - optimize process_durations() operation

  rcuscale:
   - dump stacks of stalled rcu_scale_writer() instances and
     grace-period statistics when rcu_scale_writer() stalls
   - mark idle RCU-barrier callbacks to identify stuck RCU-barrier
     callbacks
   - print detailed grace-period and barrier diagnostics on
     rcu_scale_writer() hangs for Tasks-RCU variants
   - warn if async module parameter is specified for RCU implementations
     that do not have async primitives such as RCU Tasks Rude
   - make all writer tasks report upon hang
   - tolerate repeated GFP_KERNEL failure in rcu_scale_writer()
   - use special allocator for rcu_scale_writer()
   - NULL out top-level pointers to heap memory to avoid double-free
     bugs on modprobe failures
   - maintain per-task instead of per-CPU callbacks count to avoid any
     issues with migration of either tasks or callbacks
   - constify struct ref_scale_ops

  Fixes:
   - use system_unbound_wq for kfree_rcu work to avoid disturbing
     isolated CPUs

  Misc:
   - warn on unexpected rcu_state.srs_done_tail state
   - better define "atomic" for list_replace_rcu() and
     hlist_replace_rcu() routines
   - annotate struct kvfree_rcu_bulk_data with __counted_by()"

* tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (90 commits)
  rcu: Defer printing stall-warning backtrace when holding rcu_node lock
  rcu/nocb: Remove superfluous memory barrier after bypass enqueue
  rcu/nocb: Conditionally wake up rcuo if not already waiting on GP
  rcu/nocb: Fix RT throttling hrtimer armed from offline CPU
  rcu/nocb: Simplify (de-)offloading state machine
  context_tracking: Tag context_tracking_enabled_this_cpu() __always_inline
  context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching
  rcu: Update stray documentation references to rcu_dynticks_eqs_{enter, exit}()
  rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
  rcu: Rename rcu_implicit_dynticks_qs() into rcu_watching_snap_recheck()
  rcu: Rename dyntick_save_progress_counter() into rcu_watching_snap_save()
  rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap
  rcu: Rename struct rcu_data .dynticks_snap into .watching_snap
  rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs()
  rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()
  rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
  rcu: Rename rcu_dynticks_eqs_online() into rcu_watching_online()
  context_tracking, rcu: Rename rcu_dynticks_curr_cpu_in_eqs() into rcu_is_watching_curr_cpu()
  context_tracking, rcu: Rename rcu_dynticks_task*() into rcu_task*()
  refscale: Constify struct ref_scale_ops
  ...
  • Loading branch information
torvalds committed Sep 18, 2024
2 parents 85a77db + 355debb commit 067610e
Show file tree
Hide file tree
Showing 57 changed files with 1,088 additions and 786 deletions.
28 changes: 14 additions & 14 deletions Documentation/RCU/Design/Data-Structures/Data-Structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -921,10 +921,10 @@ This portion of the ``rcu_data`` structure is declared as follows:

::

1 int dynticks_snap;
1 int watching_snap;
2 unsigned long dynticks_fqs;

The ``->dynticks_snap`` field is used to take a snapshot of the
The ``->watching_snap`` field is used to take a snapshot of the
corresponding CPU's dyntick-idle state when forcing quiescent states,
and is therefore accessed from other CPUs. Finally, the
``->dynticks_fqs`` field is used to count the number of times this CPU
Expand All @@ -935,8 +935,8 @@ This portion of the rcu_data structure is declared as follows:

::

1 long dynticks_nesting;
2 long dynticks_nmi_nesting;
1 long nesting;
2 long nmi_nesting;
3 atomic_t dynticks;
4 bool rcu_need_heavy_qs;
5 bool rcu_urgent_qs;
Expand All @@ -945,27 +945,27 @@ These fields in the rcu_data structure maintain the per-CPU dyntick-idle
state for the corresponding CPU. The fields may be accessed only from
the corresponding CPU (and from tracing) unless otherwise stated.

The ``->dynticks_nesting`` field counts the nesting depth of process
The ``->nesting`` field counts the nesting depth of process
execution, so that in normal circumstances this counter has value zero
or one. NMIs, irqs, and tracers are counted by the
``->dynticks_nmi_nesting`` field. Because NMIs cannot be masked, changes
``->nmi_nesting`` field. Because NMIs cannot be masked, changes
to this variable have to be undertaken carefully using an algorithm
provided by Andy Lutomirski. The initial transition from idle adds one,
and nested transitions add two, so that a nesting level of five is
represented by a ``->dynticks_nmi_nesting`` value of nine. This counter
represented by a ``->nmi_nesting`` value of nine. This counter
can therefore be thought of as counting the number of reasons why this
CPU cannot be permitted to enter dyntick-idle mode, aside from
process-level transitions.

However, it turns out that when running in non-idle kernel context, the
Linux kernel is fully capable of entering interrupt handlers that never
exit and perhaps also vice versa. Therefore, whenever the
``->dynticks_nesting`` field is incremented up from zero, the
``->dynticks_nmi_nesting`` field is set to a large positive number, and
whenever the ``->dynticks_nesting`` field is decremented down to zero,
the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that
``->nesting`` field is incremented up from zero, the
``->nmi_nesting`` field is set to a large positive number, and
whenever the ``->nesting`` field is decremented down to zero,
the ``->nmi_nesting`` field is set to zero. Assuming that
the number of misnested interrupts is not sufficient to overflow the
counter, this approach corrects the ``->dynticks_nmi_nesting`` field
counter, this approach corrects the ``->nmi_nesting`` field
every time the corresponding CPU enters the idle loop from process
context.

Expand All @@ -992,8 +992,8 @@ code.
+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| Why not simply combine the ``->dynticks_nesting`` and |
| ``->dynticks_nmi_nesting`` counters into a single counter that just |
| Why not simply combine the ``->nesting`` and |
| ``->nmi_nesting`` counters into a single counter that just |
| counts the number of reasons that the corresponding CPU is non-idle? |
+-----------------------------------------------------------------------+
| **Answer**: |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,10 +147,10 @@ RCU read-side critical sections preceding and following the current
idle sojourn.
This case is handled by calls to the strongly ordered
``atomic_add_return()`` read-modify-write atomic operation that
is invoked within ``rcu_dynticks_eqs_enter()`` at idle-entry
time and within ``rcu_dynticks_eqs_exit()`` at idle-exit time.
The grace-period kthread invokes first ``ct_dynticks_cpu_acquire()``
(preceded by a full memory barrier) and ``rcu_dynticks_in_eqs_since()``
is invoked within ``ct_kernel_exit_state()`` at idle-entry
time and within ``ct_kernel_enter_state()`` at idle-exit time.
The grace-period kthread invokes first ``ct_rcu_watching_cpu_acquire()``
(preceded by a full memory barrier) and ``rcu_watching_snap_stopped_since()``
(both of which rely on acquire semantics) to detect idle CPUs.

+-----------------------------------------------------------------------+
Expand Down
8 changes: 4 additions & 4 deletions Documentation/RCU/Design/Memory-Ordering/TreeRCU-dyntick.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 1 addition & 2 deletions Documentation/RCU/Design/Requirements/Requirements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2649,8 +2649,7 @@ those that are idle from RCU's perspective) and then Tasks Rude RCU can
be removed from the kernel.

The tasks-rude-RCU API is also reader-marking-free and thus quite compact,
consisting of call_rcu_tasks_rude(), synchronize_rcu_tasks_rude(),
and rcu_barrier_tasks_rude().
consisting solely of synchronize_rcu_tasks_rude().

Tasks Trace RCU
~~~~~~~~~~~~~~~
Expand Down
61 changes: 28 additions & 33 deletions Documentation/RCU/checklist.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,14 +194,13 @@ over a rather long period of time, but improvements are always welcome!
when publicizing a pointer to a structure that can
be traversed by an RCU read-side critical section.

5. If any of call_rcu(), call_srcu(), call_rcu_tasks(),
call_rcu_tasks_rude(), or call_rcu_tasks_trace() is used,
the callback function may be invoked from softirq context,
and in any case with bottom halves disabled. In particular,
this callback function cannot block. If you need the callback
to block, run that code in a workqueue handler scheduled from
the callback. The queue_rcu_work() function does this for you
in the case of call_rcu().
5. If any of call_rcu(), call_srcu(), call_rcu_tasks(), or
call_rcu_tasks_trace() is used, the callback function may be
invoked from softirq context, and in any case with bottom halves
disabled. In particular, this callback function cannot block.
If you need the callback to block, run that code in a workqueue
handler scheduled from the callback. The queue_rcu_work()
function does this for you in the case of call_rcu().

6. Since synchronize_rcu() can block, it cannot be called
from any sort of irq context. The same rule applies
Expand Down Expand Up @@ -254,10 +253,10 @@ over a rather long period of time, but improvements are always welcome!
corresponding readers must use rcu_read_lock_trace()
and rcu_read_unlock_trace().

c. If an updater uses call_rcu_tasks_rude() or
synchronize_rcu_tasks_rude(), then the corresponding
readers must use anything that disables preemption,
for example, preempt_disable() and preempt_enable().
c. If an updater uses synchronize_rcu_tasks_rude(),
then the corresponding readers must use anything that
disables preemption, for example, preempt_disable()
and preempt_enable().

Mixing things up will result in confusion and broken kernels, and
has even resulted in an exploitable security issue. Therefore,
Expand Down Expand Up @@ -326,11 +325,9 @@ over a rather long period of time, but improvements are always welcome!
d. Periodically invoke rcu_barrier(), permitting a limited
number of updates per grace period.

The same cautions apply to call_srcu(), call_rcu_tasks(),
call_rcu_tasks_rude(), and call_rcu_tasks_trace(). This is
why there is an srcu_barrier(), rcu_barrier_tasks(),
rcu_barrier_tasks_rude(), and rcu_barrier_tasks_rude(),
respectively.
The same cautions apply to call_srcu(), call_rcu_tasks(), and
call_rcu_tasks_trace(). This is why there is an srcu_barrier(),
rcu_barrier_tasks(), and rcu_barrier_tasks_trace(), respectively.

Note that although these primitives do take action to avoid
memory exhaustion when any given CPU has too many callbacks,
Expand Down Expand Up @@ -383,17 +380,17 @@ over a rather long period of time, but improvements are always welcome!
must use whatever locking or other synchronization is required
to safely access and/or modify that data structure.

Do not assume that RCU callbacks will be executed on
the same CPU that executed the corresponding call_rcu(),
call_srcu(), call_rcu_tasks(), call_rcu_tasks_rude(), or
call_rcu_tasks_trace(). For example, if a given CPU goes offline
while having an RCU callback pending, then that RCU callback
will execute on some surviving CPU. (If this was not the case,
a self-spawning RCU callback would prevent the victim CPU from
ever going offline.) Furthermore, CPUs designated by rcu_nocbs=
might well *always* have their RCU callbacks executed on some
other CPUs, in fact, for some real-time workloads, this is the
whole point of using the rcu_nocbs= kernel boot parameter.
Do not assume that RCU callbacks will be executed on the same
CPU that executed the corresponding call_rcu(), call_srcu(),
call_rcu_tasks(), or call_rcu_tasks_trace(). For example, if
a given CPU goes offline while having an RCU callback pending,
then that RCU callback will execute on some surviving CPU.
(If this was not the case, a self-spawning RCU callback would
prevent the victim CPU from ever going offline.) Furthermore,
CPUs designated by rcu_nocbs= might well *always* have their
RCU callbacks executed on some other CPUs, in fact, for some
real-time workloads, this is the whole point of using the
rcu_nocbs= kernel boot parameter.

In addition, do not assume that callbacks queued in a given order
will be invoked in that order, even if they all are queued on the
Expand Down Expand Up @@ -507,9 +504,9 @@ over a rather long period of time, but improvements are always welcome!
These debugging aids can help you find problems that are
otherwise extremely difficult to spot.

17. If you pass a callback function defined within a module to one of
call_rcu(), call_srcu(), call_rcu_tasks(), call_rcu_tasks_rude(),
or call_rcu_tasks_trace(), then it is necessary to wait for all
17. If you pass a callback function defined within a module
to one of call_rcu(), call_srcu(), call_rcu_tasks(), or
call_rcu_tasks_trace(), then it is necessary to wait for all
pending callbacks to be invoked before unloading that module.
Note that it is absolutely *not* sufficient to wait for a grace
period! For example, synchronize_rcu() implementation is *not*
Expand All @@ -522,7 +519,6 @@ over a rather long period of time, but improvements are always welcome!
- call_rcu() -> rcu_barrier()
- call_srcu() -> srcu_barrier()
- call_rcu_tasks() -> rcu_barrier_tasks()
- call_rcu_tasks_rude() -> rcu_barrier_tasks_rude()
- call_rcu_tasks_trace() -> rcu_barrier_tasks_trace()

However, these barrier functions are absolutely *not* guaranteed
Expand All @@ -539,7 +535,6 @@ over a rather long period of time, but improvements are always welcome!
- Either synchronize_srcu() or synchronize_srcu_expedited(),
together with and srcu_barrier()
- synchronize_rcu_tasks() and rcu_barrier_tasks()
- synchronize_tasks_rude() and rcu_barrier_tasks_rude()
- synchronize_tasks_trace() and rcu_barrier_tasks_trace()

If necessary, you can use something like workqueues to execute
Expand Down
2 changes: 1 addition & 1 deletion Documentation/RCU/whatisRCU.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1103,7 +1103,7 @@ RCU-Tasks-Rude::

Critical sections Grace period Barrier

N/A call_rcu_tasks_rude rcu_barrier_tasks_rude
N/A N/A
synchronize_rcu_tasks_rude


Expand Down
20 changes: 11 additions & 9 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4969,6 +4969,10 @@
Set maximum number of finished RCU callbacks to
process in one batch.

rcutree.csd_lock_suppress_rcu_stall= [KNL]
Do only a one-line RCU CPU stall warning when
there is an ongoing too-long CSD-lock wait.

rcutree.do_rcu_barrier= [KNL]
Request a call to rcu_barrier(). This is
throttled so that userspace tests can safely
Expand Down Expand Up @@ -5416,7 +5420,13 @@
Time to wait (s) after boot before inducing stall.

rcutorture.stall_cpu_irqsoff= [KNL]
Disable interrupts while stalling if set.
Disable interrupts while stalling if set, but only
on the first stall in the set.

rcutorture.stall_cpu_repeat= [KNL]
Number of times to repeat the stall sequence,
so that rcutorture.stall_cpu_repeat=3 will result
in four stall sequences.

rcutorture.stall_gp_kthread= [KNL]
Duration (s) of forced sleep within RCU
Expand Down Expand Up @@ -5604,14 +5614,6 @@
of zero will disable batching. Batching is
always disabled for synchronize_rcu_tasks().

rcupdate.rcu_tasks_rude_lazy_ms= [KNL]
Set timeout in milliseconds RCU Tasks
Rude asynchronous callback batching for
call_rcu_tasks_rude(). A negative value
will take the default. A value of zero will
disable batching. Batching is always disabled
for synchronize_rcu_tasks_rude().

rcupdate.rcu_tasks_trace_lazy_ms= [KNL]
Set timeout in milliseconds RCU Tasks
Trace asynchronous callback batching for
Expand Down
2 changes: 1 addition & 1 deletion arch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -862,7 +862,7 @@ config HAVE_CONTEXT_TRACKING_USER_OFFSTACK
Architecture neither relies on exception_enter()/exception_exit()
nor on schedule_user(). Also preempt_schedule_notrace() and
preempt_schedule_irq() can't be called in a preemptible section
while context tracking is CONTEXT_USER. This feature reflects a sane
while context tracking is CT_STATE_USER. This feature reflects a sane
entry implementation where the following requirements are met on
critical entry code, ie: before user_exit() or after user_enter():

Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/kernel/entry-common.c
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ static void noinstr exit_to_kernel_mode(struct pt_regs *regs)
static __always_inline void __enter_from_user_mode(void)
{
lockdep_hardirqs_off(CALLER_ADDR0);
CT_WARN_ON(ct_state() != CONTEXT_USER);
CT_WARN_ON(ct_state() != CT_STATE_USER);
user_exit_irqoff();
trace_hardirqs_off_finish();
mte_disable_tco_entry(current);
Expand Down
Loading

0 comments on commit 067610e

Please sign in to comment.