Skip to content

Commit

Permalink
mm/filemap: avoid buffered read/write race to read inconsistent data
Browse files Browse the repository at this point in the history
The following concurrency may cause the data read to be inconsistent with
the data on disk:

             cpu1                           cpu2
------------------------------|------------------------------
                               // Buffered write 2048 from 0
                               ext4_buffered_write_iter
                                generic_perform_write
                                 copy_page_from_iter_atomic
                                 ext4_da_write_end
                                  ext4_da_do_write_end
                                   block_write_end
                                    __block_commit_write
                                     folio_mark_uptodate
// Buffered read 4096 from 0          smp_wmb()
ext4_file_read_iter                   set_bit(PG_uptodate, folio_flags)
 generic_file_read_iter            i_size_write // 2048
  filemap_read                     unlock_page(page)
   filemap_get_pages
    filemap_get_read_batch
    folio_test_uptodate(folio)
     ret = test_bit(PG_uptodate, folio_flags)
     if (ret)
      smp_rmb();
      // Ensure that the data in page 0-2048 is up-to-date.

                               // New buffered write 2048 from 2048
                               ext4_buffered_write_iter
                                generic_perform_write
                                 copy_page_from_iter_atomic
                                 ext4_da_write_end
                                  ext4_da_do_write_end
                                   block_write_end
                                    __block_commit_write
                                     folio_mark_uptodate
                                      smp_wmb()
                                      set_bit(PG_uptodate, folio_flags)
                                   i_size_write // 4096
                                   unlock_page(page)

   isize = i_size_read(inode) // 4096
   // Read the latest isize 4096, but without smp_rmb(), there may be
   // Load-Load disorder resulting in the data in the 2048-4096 range
   // in the page is not up-to-date.
   copy_page_to_iter
   // copyout 4096

In the concurrency above, we read the updated i_size, but there is no read
barrier to ensure that the data in the page is the same as the i_size at
this point, so we may copy the unsynchronized page out.  Hence adding the
missing read memory barrier to fix this.

This is a Load-Load reordering issue, which only occurs on some weak
mem-ordering architectures (e.g.  ARM64, ALPHA), but not on strong
mem-ordering architectures (e.g.  X86).  And theoretically the problem
doesn't only happen on ext4, filesystems that call filemap_read() but
don't hold inode lock (e.g.  btrfs, f2fs, ubifs ...) will have this
problem, while filesystems with inode lock (e.g.  xfs, nfs) won't have
this problem.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Ritesh Harjani (IBM) <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Cc: yangerkun <[email protected]>
Cc: Yu Kuai <[email protected]>
Cc: Zhang Yi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
  • Loading branch information
LiBaokun96 authored and akpm00 committed Dec 20, 2023
1 parent b2325bf commit e2c27b8
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions mm/filemap.c
Original file line number Diff line number Diff line change
Expand Up @@ -2607,6 +2607,15 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
goto put_folios;
end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);

/*
* Pairs with a barrier in
* block_write_end()->mark_buffer_dirty() or other page
* dirtying routines like iomap_write_end() to ensure
* changes to page contents are visible before we see
* increased inode size.
*/
smp_rmb();

/*
* Once we start copying data, we don't want to be touching any
* cachelines that might be contended:
Expand Down

0 comments on commit e2c27b8

Please sign in to comment.