Skip to content

Commit

Permalink
mm: remap_file_pages() fixes
Browse files Browse the repository at this point in the history
We have many vma manipulation functions that are fast in the typical
case, but can optionally be instructed to populate an unbounded number
of ptes within the region they work on:

 - mmap with MAP_POPULATE or MAP_LOCKED flags;
 - remap_file_pages() with MAP_NONBLOCK not set or when working on a
   VM_LOCKED vma;
 - mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in
   effect;
 - brk() when mlock(MCL_FUTURE) is in effect.

Current code handles these pte operations locally, while the
sourrounding code has to hold the mmap_sem write side since it's
manipulating vmas.  This means we're doing an unbounded amount of pte
population work with mmap_sem held, and this causes problems as Andy
Lutomirski reported (we've hit this at Google as well, though it's not
entirely clear why people keep trying to use mlock(MCL_FUTURE) in the
first place).

I propose introducing a new mm_populate() function to do this pte
population work after the mmap_sem has been released.  mm_populate()
does need to acquire the mmap_sem read side, but critically, it doesn't
need to hold it continuously for the entire duration of the operation -
it can drop it whenever things take too long (such as when hitting disk
for a file read) and re-acquire it later on.

The following patches are included

- Patches 1 fixes some issues I noticed while working on the existing code.
  If needed, they could potentially go in before the rest of the patches.

- Patch 2 introduces the new mm_populate() function and changes
  mmap_region() call sites to use it after they drop mmap_sem. This is
  inspired from Andy Lutomirski's proposal and is built as an extension
  of the work I had previously done for mlock() and mlockall() around
  v2.6.38-rc1. I had tried doing something similar at the time but had
  given up as there were so many do_mmap() call sites; the recent cleanups
  by Linus and Viro are a tremendous help here.

- Patches 3-5 convert some of the less-obvious places doing unbounded
  pte populates to the new mm_populate() mechanism.

- Patches 6-7 are code cleanups that are made possible by the
  mm_populate() work. In particular, they remove more code than the
  entire patch series added, which should be a good thing :)

- Patch 8 is optional to this entire series. It only helps to deal more
  nicely with racy userspace programs that might modify their mappings
  while we're trying to populate them. It adds a new VM_POPULATE flag
  on the mappings we do want to populate, so that if userspace replaces
  them with mappings it doesn't want populated, mm_populate() won't
  populate those replacement mappings.

This patch:

Assorted small fixes. The first two are quite small:

- Move check for vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR)
  within existing if (!(vma->vm_flags & VM_NONLINEAR)) block.
  Purely cosmetic.

- In the VM_LOCKED case, when dropping PG_Mlocked for the over-mapped
  range, make sure we own the mmap_sem write lock around the
  munlock_vma_pages_range call as this manipulates the vma's vm_flags.

Last fix requires a longer explanation. remap_file_pages() can do its work
either through VM_NONLINEAR manipulation or by creating extra vmas.
These two cases were inconsistent with each other (and ultimately, both wrong)
as to exactly when did they fault in the newly mapped file pages:

- In the VM_NONLINEAR case, new file pages would be populated if
  the MAP_NONBLOCK flag wasn't passed. If MAP_NONBLOCK was passed,
  new file pages wouldn't be populated even if the vma is already
  marked as VM_LOCKED.

- In the linear (emulated) case, the work is passed to the mmap_region()
  function which would populate the pages if the vma is marked as
  VM_LOCKED, and would not otherwise - regardless of the value of the
  MAP_NONBLOCK flag, because MAP_POPULATE wasn't being passed to
  mmap_region().

The desired behavior is that we want the pages to be populated and locked
if the vma is marked as VM_LOCKED, or to be populated if the MAP_NONBLOCK
flag is not passed to remap_file_pages().

Signed-off-by: Michel Lespinasse <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Tested-by: Andy Lutomirski <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
walken-google authored and torvalds committed Feb 24, 2013
1 parent dafcb73 commit 940e7da
Showing 1 changed file with 14 additions and 8 deletions.
22 changes: 14 additions & 8 deletions mm/fremap.c
Original file line number Diff line number Diff line change
Expand Up @@ -160,15 +160,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
/*
* Make sure the vma is shared, that it supports prefaulting,
* and that the remapped range is valid and fully within
* the single existing vma. vm_private_data is used as a
* swapout cursor in a VM_NONLINEAR vma.
* the single existing vma.
*/
if (!vma || !(vma->vm_flags & VM_SHARED))
goto out;

if (vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR))
goto out;

if (!vma->vm_ops || !vma->vm_ops->remap_pages)
goto out;

Expand All @@ -177,13 +173,21 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,

/* Must set VM_NONLINEAR before any pages are populated. */
if (!(vma->vm_flags & VM_NONLINEAR)) {
/*
* vm_private_data is used as a swapout cursor
* in a VM_NONLINEAR vma.
*/
if (vma->vm_private_data)
goto out;

/* Don't need a nonlinear mapping, exit success */
if (pgoff == linear_page_index(vma, start)) {
err = 0;
goto out;
}

if (!has_write_lock) {
get_write_lock:
up_read(&mm->mmap_sem);
down_write(&mm->mmap_sem);
has_write_lock = 1;
Expand All @@ -199,7 +203,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
unsigned long addr;
struct file *file = get_file(vma->vm_file);

flags &= MAP_NONBLOCK;
flags = (flags & MAP_NONBLOCK) | MAP_POPULATE;
addr = mmap_region(file, start, size,
flags, vma->vm_flags, pgoff);
fput(file);
Expand All @@ -225,20 +229,22 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
* drop PG_Mlocked flag for over-mapped range
*/
vm_flags_t saved_flags = vma->vm_flags;
if (!has_write_lock)
goto get_write_lock;
munlock_vma_pages_range(vma, start, start + size);
vma->vm_flags = saved_flags;
}

mmu_notifier_invalidate_range_start(mm, start, start + size);
err = vma->vm_ops->remap_pages(vma, start, size, pgoff);
mmu_notifier_invalidate_range_end(mm, start, start + size);
if (!err && !(flags & MAP_NONBLOCK)) {
if (!err) {
if (vma->vm_flags & VM_LOCKED) {
/*
* might be mapping previously unmapped range of file
*/
mlock_vma_pages_range(vma, start, start + size);
} else {
} else if (!(flags & MAP_NONBLOCK)) {
if (unlikely(has_write_lock)) {
downgrade_write(&mm->mmap_sem);
has_write_lock = 0;
Expand Down

0 comments on commit 940e7da

Please sign in to comment.