Skip to content

Commit

Permalink
smp: Reduce NMI traffic from CSD waiters to CSD destination
Browse files Browse the repository at this point in the history
On systems with hundreds of CPUs, if most of the CPUs detect a CSD hang,
then all of these waiting CPUs send an NMI to the destination CPU in
order to dump its backtrace.

Given enough NMIs, the destination CPU will spent much of its time
producing backtraces, thus further delaying that CPU's response to the
original CSD IPI.  In the worst case, by the time destination CPU is
done producing all of these backtrace NMIs, the CSD wait timeout will
have elapsed so that the waiters resend their backtrace NMIs again,
further delaying forward progress.

Therefore, to avoid these delays, issue the backtrace NMI only from
the first waiter.  The destination CPU's other waiters can make use of
backtrace obtained from the first waiter's NMI.

Signed-off-by: Imran Khan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Yury Norov <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
  • Loading branch information
imran-kn authored and paulmckrcu committed Jul 10, 2023
1 parent 5bd00f6 commit 0d3a00b
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion kernel/smp.c
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);

static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);

static DEFINE_PER_CPU(atomic_t, trigger_backtrace) = ATOMIC_INIT(1);

static void __flush_smp_call_function_queue(bool warn_cpu_offline);

int smpcfd_prepare_cpu(unsigned int cpu)
Expand Down Expand Up @@ -253,7 +255,8 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
*bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
}
if (cpu >= 0) {
dump_cpu_task(cpu);
if (atomic_cmpxchg_acquire(&per_cpu(trigger_backtrace, cpu), 1, 0))
dump_cpu_task(cpu);
if (!cpu_cur_csd) {
pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
arch_send_call_function_single_ipi(cpu);
Expand Down Expand Up @@ -434,9 +437,14 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
struct llist_node *entry, *prev;
struct llist_head *head;
static bool warned;
atomic_t *tbt;

lockdep_assert_irqs_disabled();

/* Allow waiters to send backtrace NMI from here onwards */
tbt = this_cpu_ptr(&trigger_backtrace);
atomic_set_release(tbt, 1);

head = this_cpu_ptr(&call_single_queue);
entry = llist_del_all(head);
entry = llist_reverse_order(entry);
Expand Down

0 comments on commit 0d3a00b

Please sign in to comment.