Skip to content

Commit

Permalink
eventfd: document lockless access in eventfd_poll
Browse files Browse the repository at this point in the history
Since commit e22553e ("eventfd: don't take the spinlock in
eventfd_poll", 2015-02-17), eventfd is reading ctx->count outside
ctx->wqh.lock.

However, things aren't as simple as the read barrier in eventfd_poll
would suggest.  In fact, the read barrier, besides lacking a comment, is
not paired in any obvious manner with another read barrier, and it is
pointless because it is sitting between a write (deep in poll_wait) and
the read of ctx->count.  The read barrier is acting just as a compiler
barrier, for which we can use READ_ONCE instead.  This is what the code
change in this patch does.

The documentation change is just as important, however.  The question,
posed by Andrea Arcangeli, is then why the thing is safe on
architectures where spin_unlock does not imply a store-load memory
barrier.  The answer is that it's safe because writes of ctx->count use
the same lock as poll_wait, and hence an acquire barrier implicit in
poll_wait provides the necessary synchronization between eventfd_poll
and callers of wake_up_locked_poll.  This is sort of mentioned in the
commit message with respect to eventfd_ctx_read ("eventfd_read is
similar, it will do a single decrement with the lock held") but it
applies to all other callers too.  It's tricky enough that it should be
documented in the code.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Davide Libenzi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
bonzini authored and torvalds committed Mar 22, 2016
1 parent 0335695 commit a484c3d
Showing 1 changed file with 40 additions and 2 deletions.
42 changes: 40 additions & 2 deletions fs/eventfd.c
Original file line number Diff line number Diff line change
Expand Up @@ -121,8 +121,46 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait)
u64 count;

poll_wait(file, &ctx->wqh, wait);
smp_rmb();
count = ctx->count;

/*
* All writes to ctx->count occur within ctx->wqh.lock. This read
* can be done outside ctx->wqh.lock because we know that poll_wait
* takes that lock (through add_wait_queue) if our caller will sleep.
*
* The read _can_ therefore seep into add_wait_queue's critical
* section, but cannot move above it! add_wait_queue's spin_lock acts
* as an acquire barrier and ensures that the read be ordered properly
* against the writes. The following CAN happen and is safe:
*
* poll write
* ----------------- ------------
* lock ctx->wqh.lock (in poll_wait)
* count = ctx->count
* __add_wait_queue
* unlock ctx->wqh.lock
* lock ctx->qwh.lock
* ctx->count += n
* if (waitqueue_active)
* wake_up_locked_poll
* unlock ctx->qwh.lock
* eventfd_poll returns 0
*
* but the following, which would miss a wakeup, cannot happen:
*
* poll write
* ----------------- ------------
* count = ctx->count (INVALID!)
* lock ctx->qwh.lock
* ctx->count += n
* **waitqueue_active is false**
* **no wake_up_locked_poll!**
* unlock ctx->qwh.lock
* lock ctx->wqh.lock (in poll_wait)
* __add_wait_queue
* unlock ctx->wqh.lock
* eventfd_poll returns 0
*/
count = READ_ONCE(ctx->count);

if (count > 0)
events |= POLLIN;
Expand Down

0 comments on commit a484c3d

Please sign in to comment.