Skip to content

Commit

Permalink
mm/munlock: delete page_mlock() and all its works
Browse files Browse the repository at this point in the history
We have recommended some applications to mlock their userspace, but that
turns out to be counter-productive: when many processes mlock the same
file, contention on rmap's i_mmap_rwsem can become intolerable at exit: it
is needed for write, to remove any vma mapping that file from rmap's tree;
but hogged for read by those with mlocks calling page_mlock() (formerly
known as try_to_munlock()) on *each* page mapped from the file (the
purpose being to find out whether another process has the page mlocked,
so therefore it should not be unmlocked yet).

Several optimizations have been made in the past: one is to skip
page_mlock() when mapcount tells that nothing else has this page
mapped; but that doesn't help at all when others do have it mapped.
This time around, I initially intended to add a preliminary search
of the rmap tree for overlapping VM_LOCKED ranges; but that gets
messy with locking order, when in doubt whether a page is actually
present; and risks adding even more contention on the i_mmap_rwsem.

A solution would be much easier, if only there were space in struct page
for an mlock_count... but actually, most of the time, there is space for
it - an mlocked page spends most of its life on an unevictable LRU, but
since 3.18 removed the scan_unevictable_pages sysctl, that "LRU" has
been redundant.  Let's try to reuse its page->lru.

But leave that until a later patch: in this patch, clear the ground by
removing page_mlock(), and all the infrastructure that has gathered
around it - which mostly hinders understanding, and will make reviewing
new additions harder.  Don't mind those old comments about THPs, they
date from before 4.5's refcounting rework: splitting is not a risk here.

Just keep a minimal version of munlock_vma_page(), as reminder of what it
should attend to (in particular, the odd way PGSTRANDED is counted out of
PGMUNLOCKED), and likewise a stub for munlock_vma_pages_range().  Move
unchanged __mlock_posix_error_return() out of the way, down to above its
caller: this series then makes no further change after mlock_fixup().

After this and each following commit, the kernel builds, boots and runs;
but with deficiencies which may show up in testing of mlock and munlock.
The system calls succeed or fail as before, and mlock remains effective
in preventing page reclaim; but meminfo's Unevictable and Mlocked amounts
may be shown too low after mlock, grow, then stay too high after munlock:
with previously mlocked pages remaining unevictable for too long, until
finally unmapped and freed and counts corrected. Normal service will be
resumed in "mm/munlock: mlock_pte_range() when mlocking or munlocking".

Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
  • Loading branch information
Hugh Dickins authored and Matthew Wilcox (Oracle) committed Feb 17, 2022
1 parent f71077a commit ebcbc6e
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 438 deletions.
6 changes: 0 additions & 6 deletions include/linux/rmap.h
Original file line number Diff line number Diff line change
Expand Up @@ -237,12 +237,6 @@ unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
*/
int folio_mkclean(struct folio *);

/*
* called in munlock()/munmap() path to check for other vmas holding
* the page mlocked.
*/
void page_mlock(struct page *page);

void remove_migration_ptes(struct page *old, struct page *new, bool locked);

/*
Expand Down
2 changes: 1 addition & 1 deletion mm/internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -409,7 +409,7 @@ static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
* must be called with vma's mmap_lock held for read or write, and page locked.
*/
extern void mlock_vma_page(struct page *page);
extern unsigned int munlock_vma_page(struct page *page);
extern void munlock_vma_page(struct page *page);

extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
unsigned long len);
Expand Down
Loading

0 comments on commit ebcbc6e

Please sign in to comment.