Skip to content

Commit

Permalink
mm: enable MADV_DONTNEED for hugetlb mappings
Browse files Browse the repository at this point in the history
Patch series "Add hugetlb MADV_DONTNEED support", v3.

Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP
testing.  However, mremap support was recently added in commit
550a7d6 ("mm, hugepages: add mremap() support for hugepage backed
vma").  While attempting to enable mremap support in the test, it was
discovered that the mremap test indirectly depends on MADV_DONTNEED.

madvise does not allow MADV_DONTNEED for hugetlb mappings.  However, that
is primarily due to the check in can_madv_lru_vma().  By simply removing
the check and adding huge page alignment, MADV_DONTNEED can be made to
work for hugetlb mappings.

Do note that there is no compelling use case for adding this support.
This was discussed in the RFC [1].  However, adding support makes sense as
it is fairly trivial and brings hugetlb functionality more in line with
'normal' memory.

After enabling support, add selftest for MADV_DONTNEED as well as
MADV_REMOVE.  Then update userfaultfd selftest.

If new functionality is accepted, then madvise man page will be updated to
indicate hugetlb is supported.  It will also be updated to clarify what
happens to the passed length argument.

This patch (of 3):

MADV_DONTNEED is currently disabled for hugetlb mappings.  This certainly
makes sense in shared file mappings as the pagecache maintains a reference
to the page and it will never be freed.  However, it could be useful to
unmap and free pages in private mappings.  In addition, userfaultfd minor
fault users may be able to simplify code by using MADV_DONTNEED.

The primary thing preventing MADV_DONTNEED from working on hugetlb
mappings is a check in can_madv_lru_vma().  To allow support for hugetlb
mappings create and use a new routine madvise_dontneed_free_valid_vma()
that allows hugetlb mappings in this specific case.

For normal mappings, madvise requires the start address be PAGE aligned
and rounds up length to the next multiple of PAGE_SIZE.  Do similarly for
hugetlb mappings: require start address be huge page size aligned and
round up length to the next multiple of huge page size.  Use the new
madvise_dontneed_free_valid_vma routine to check alignment and round up
length/end.  zap_page_range requires this alignment for hugetlb vmas
otherwise we will hit BUGs.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Kravetz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Mina Almasry <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
mjkravetz authored and torvalds committed Mar 25, 2022
1 parent c32caa2 commit 90e7e7f
Showing 1 changed file with 30 additions and 3 deletions.
33 changes: 30 additions & 3 deletions mm/madvise.c
Original file line number Diff line number Diff line change
Expand Up @@ -502,9 +502,14 @@ static void madvise_cold_page_range(struct mmu_gather *tlb,
tlb_end_vma(tlb, vma);
}

static inline bool can_madv_lru_non_huge_vma(struct vm_area_struct *vma)
{
return !(vma->vm_flags & (VM_LOCKED|VM_PFNMAP));
}

static inline bool can_madv_lru_vma(struct vm_area_struct *vma)
{
return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP));
return can_madv_lru_non_huge_vma(vma) && !is_vm_hugetlb_page(vma);
}

static long madvise_cold(struct vm_area_struct *vma,
Expand Down Expand Up @@ -777,6 +782,23 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
return 0;
}

static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
unsigned long start,
unsigned long *end,
int behavior)
{
if (!is_vm_hugetlb_page(vma))
return can_madv_lru_non_huge_vma(vma);

if (behavior != MADV_DONTNEED)
return false;
if (start & ~huge_page_mask(hstate_vma(vma)))
return false;

*end = ALIGN(*end, huge_page_size(hstate_vma(vma)));
return true;
}

static long madvise_dontneed_free(struct vm_area_struct *vma,
struct vm_area_struct **prev,
unsigned long start, unsigned long end,
Expand All @@ -785,7 +807,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
struct mm_struct *mm = vma->vm_mm;

*prev = vma;
if (!can_madv_lru_vma(vma))
if (!madvise_dontneed_free_valid_vma(vma, start, &end, behavior))
return -EINVAL;

if (!userfaultfd_remove(vma, start, end)) {
Expand All @@ -807,7 +829,12 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
*/
return -ENOMEM;
}
if (!can_madv_lru_vma(vma))
/*
* Potential end adjustment for hugetlb vma is OK as
* the check below keeps end within vma.
*/
if (!madvise_dontneed_free_valid_vma(vma, start, &end,
behavior))
return -EINVAL;
if (end > vma->vm_end) {
/*
Expand Down

0 comments on commit 90e7e7f

Please sign in to comment.