Skip to content

Commit

Permalink
kernel/hung_task.c: introduce sysctl to print all traces when a hung …
Browse files Browse the repository at this point in the history
…task is detected

Commit 401c636 ("kernel/hung_task.c: show all hung tasks before
panic") introduced a change in that we started to show all CPUs
backtraces when a hung task is detected _and_ the sysctl/kernel
parameter "hung_task_panic" is set.  The idea is good, because usually
when observing deadlocks (that may lead to hung tasks), the culprit is
another task holding a lock and not necessarily the task detected as
hung.

The problem with this approach is that dumping backtraces is a slightly
expensive task, specially printing that on console (and specially in
many CPU machines, as servers commonly found nowadays).  So, users that
plan to collect a kdump to investigate the hung tasks and narrow down
the deadlock definitely don't need the CPUs backtrace on dmesg/console,
which will delay the panic and pollute the log (crash tool would easily
grab all CPUs traces with 'bt -a' command).

Also, there's the reciprocal scenario: some users may be interested in
seeing the CPUs backtraces but not have the system panic when a hung
task is detected.  The current approach hence is almost as embedding a
policy in the kernel, by forcing the CPUs backtraces' dump (only) on
hung_task_panic.

This patch decouples the panic event on hung task from the CPUs
backtraces dump, by creating (and documenting) a new sysctl called
"hung_task_all_cpu_backtrace", analog to the approach taken on soft/hard
lockups, that have both a panic and an "all_cpu_backtrace" sysctl to
allow individual control.  The new mechanism for dumping the CPUs
backtraces on hung task detection respects "hung_task_warnings" by not
dumping the traces in case there's no warnings left.

Signed-off-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
Guilherme G. Piccoli authored and torvalds committed Jun 8, 2020
1 parent f117955 commit 0ec9dc9
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 2 deletions.
14 changes: 14 additions & 0 deletions Documentation/admin-guide/sysctl/kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,20 @@ Path for the hotplug policy agent.
Default value is "``/sbin/hotplug``".


hung_task_all_cpu_backtrace:
================

If this option is set, the kernel will send an NMI to all CPUs to dump
their backtraces when a hung task is detected. This file shows up if
CONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled.

0: Won't show all CPUs backtraces when a hung task is detected.
This is the default behavior.

1: Will non-maskably interrupt all CPUs and dump their backtraces when
a hung task is detected.


hung_task_panic
===============

Expand Down
7 changes: 7 additions & 0 deletions include/linux/sched/sysctl.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
struct ctl_table;

#ifdef CONFIG_DETECT_HUNG_TASK

#ifdef CONFIG_SMP
extern unsigned int sysctl_hung_task_all_cpu_backtrace;
#else
#define sysctl_hung_task_all_cpu_backtrace 0
#endif /* CONFIG_SMP */

extern int sysctl_hung_task_check_count;
extern unsigned int sysctl_hung_task_panic;
extern unsigned long sysctl_hung_task_timeout_secs;
Expand Down
20 changes: 18 additions & 2 deletions kernel/hung_task.c
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,18 @@ int __read_mostly sysctl_hung_task_warnings = 10;
static int __read_mostly did_panic;
static bool hung_task_show_lock;
static bool hung_task_call_panic;
static bool hung_task_show_all_bt;

static struct task_struct *watchdog_task;

#ifdef CONFIG_SMP
/*
* Should we dump all CPUs backtraces in a hung task event?
* Defaults to 0, can be changed via sysctl.
*/
unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
#endif /* CONFIG_SMP */

/*
* Should we panic (and reboot, if panic_timeout= is set) when a
* hung task is detected:
Expand Down Expand Up @@ -127,6 +136,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
" disables this message.\n");
sched_show_task(t);
hung_task_show_lock = true;

if (sysctl_hung_task_all_cpu_backtrace)
hung_task_show_all_bt = true;
}

touch_nmi_watchdog();
Expand Down Expand Up @@ -191,10 +203,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
rcu_read_unlock();
if (hung_task_show_lock)
debug_show_all_locks();
if (hung_task_call_panic) {

if (hung_task_show_all_bt) {
hung_task_show_all_bt = false;
trigger_all_cpu_backtrace();
panic("hung_task: blocked tasks");
}

if (hung_task_call_panic)
panic("hung_task: blocked tasks");
}

static long hung_timeout_jiffies(unsigned long last_checked,
Expand Down
11 changes: 11 additions & 0 deletions kernel/sysctl.c
Original file line number Diff line number Diff line change
Expand Up @@ -2437,6 +2437,17 @@ static struct ctl_table kern_table[] = {
},
#endif
#ifdef CONFIG_DETECT_HUNG_TASK
#ifdef CONFIG_SMP
{
.procname = "hung_task_all_cpu_backtrace",
.data = &sysctl_hung_task_all_cpu_backtrace,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
#endif /* CONFIG_SMP */
{
.procname = "hung_task_panic",
.data = &sysctl_hung_task_panic,
Expand Down

0 comments on commit 0ec9dc9

Please sign in to comment.