Skip to content

Commit

Permalink
sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
Browse files Browse the repository at this point in the history
If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
the pending sample window time on exit, setting the next update not
one window into the future, but two.

This situation on exiting NO_HZ is described by:

  this_rq->calc_load_update < jiffies < calc_load_update

In this scenario, what we should be doing is:

  this_rq->calc_load_update = calc_load_update		     [ next window ]

But what we actually do is:

  this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]

This has the effect of delaying load average updates for potentially
up to ~9seconds.

This can result in huge spikes in the load average values due to
per-cpu uninterruptible task counts being out of sync when accumulated
across all CPUs.

It's safe to update the per-cpu active count if we wake between sample
windows because any load that we left in 'calc_load_idle' will have
been zero'd when the idle load was folded in calc_global_load().

This issue is easy to reproduce before,

  commit 9d89c25 ("sched/fair: Rewrite runnable load and utilization average tracking")

just by forking short-lived process pipelines built from ps(1) and
grep(1) in a loop. I'm unable to reproduce the spikes after that
commit, but the bug still seems to be present from code review.

Signed-off-by: Matt Fleming <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Morten Rasmussen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vincent Guittot <[email protected]>
Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
  • Loading branch information
mfleming authored and Ingo Molnar committed Mar 16, 2017
1 parent dcc3b5f commit 6e5f32f
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions kernel/sched/loadavg.c
Original file line number Diff line number Diff line change
Expand Up @@ -202,8 +202,9 @@ void calc_load_exit_idle(void)
struct rq *this_rq = this_rq();

/*
* If we're still before the sample window, we're done.
* If we're still before the pending sample window, we're done.
*/
this_rq->calc_load_update = calc_load_update;
if (time_before(jiffies, this_rq->calc_load_update))
return;

Expand All @@ -212,7 +213,6 @@ void calc_load_exit_idle(void)
* accounted through the nohz accounting, so skip the entire deal and
* sync up for the next window.
*/
this_rq->calc_load_update = calc_load_update;
if (time_before(jiffies, this_rq->calc_load_update + 10))
this_rq->calc_load_update += LOAD_FREQ;
}
Expand Down

0 comments on commit 6e5f32f

Please sign in to comment.