Skip to content

Commit

Permalink
stop_machine: make stop_machine_run more virtualization friendly
Browse files Browse the repository at this point in the history
On kvm I have seen some rare hangs in stop_machine when I used more guest
cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
hang quite often. I could also reproduce the problem on a 4 way z/VM host with
a 64 way guest.

It turned out that the guest was consuming all available cpus mostly for
spinning on scheduler locks like rq->lock. This is expected as the threads are
calling yield all the time.
The problem is now, that the host scheduling decisings together with the guest
scheduling decisions and spinlocks not being fair managed to create an
interesting scenario similar to a live lock. (Sometimes the hang resolved
itself after some minutes)

Changing stop_machine to yield the cpu to the hypervisor when yielding inside
the guest fixed the problem for me. While I am not completely happy with this
patch, I think it causes no harm and it really improves the situation for me.

I used cpu_relax for yielding to the hypervisor, does that work on all
architectures?

p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use
stop_machine_run and both triggered the problem after some retries.

Signed-off-by: Christian Borntraeger <[email protected]>
CC: Ingo Molnar <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
  • Loading branch information
borntraeger authored and rustyrussell committed May 23, 2008
1 parent 4d2e7d0 commit 3401a61
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions kernel/stop_machine.c
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,7 @@ static int stopmachine(void *cpu)
* help our sisters onto their CPUs. */
if (!prepared && !irqs_disabled)
yield();
else
cpu_relax();
cpu_relax();
}

/* Ack: we are exiting. */
Expand Down Expand Up @@ -106,8 +105,10 @@ static int stop_machine(void)
}

/* Wait for them all to come to life. */
while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads)
while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads) {
yield();
cpu_relax();
}

/* If some failed, kill them all. */
if (ret < 0) {
Expand Down

0 comments on commit 3401a61

Please sign in to comment.