Skip to content

Commit

Permalink
mm/hugetlb: fix getting refcount 0 page in hugetlb_fault()
Browse files Browse the repository at this point in the history
When running the test which causes the race as shown in the previous patch,
we can hit the BUG "get_page() on refcount 0 page" in hugetlb_fault().

This race happens when pte turns into migration entry just after the first
check of is_hugetlb_entry_migration() in hugetlb_fault() passed with false.
To fix this, we need to check pte_present() again after huge_ptep_get().

This patch also reorders taking ptl and doing pte_page(), because
pte_page() should be done in ptl.  Due to this reordering, we need use
trylock_page() in page != pagecache_page case to respect locking order.

Fixes: 66aebce ("hugetlb: fix race condition in hugetlb_fault()")
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: James Hogan <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Nishanth Aravamudan <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: <[email protected]>	[3.2+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
Naoya Horiguchi authored and torvalds committed Feb 12, 2015
1 parent e66f17f commit 0f792cf
Showing 1 changed file with 36 additions and 16 deletions.
52 changes: 36 additions & 16 deletions mm/hugetlb.c
Original file line number Diff line number Diff line change
Expand Up @@ -3134,6 +3134,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
struct page *pagecache_page = NULL;
struct hstate *h = hstate_vma(vma);
struct address_space *mapping;
int need_wait_lock = 0;

address &= huge_page_mask(h);

Expand Down Expand Up @@ -3171,6 +3172,16 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,

ret = 0;

/*
* entry could be a migration/hwpoison entry at this point, so this
* check prevents the kernel from going below assuming that we have
* a active hugepage in pagecache. This goto expects the 2nd page fault,
* and is_hugetlb_entry_(migration|hwpoisoned) check will properly
* handle it.
*/
if (!pte_present(entry))
goto out_mutex;

/*
* If we are going to COW the mapping later, we examine the pending
* reservations for this page now. This will ensure that any
Expand All @@ -3190,51 +3201,60 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
vma, address);
}

ptl = huge_pte_lock(h, mm, ptep);

/* Check for a racing update before calling hugetlb_cow */
if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
goto out_ptl;

/*
* hugetlb_cow() requires page locks of pte_page(entry) and
* pagecache_page, so here we need take the former one
* when page != pagecache_page or !pagecache_page.
* Note that locking order is always pagecache_page -> page,
* so no worry about deadlock.
*/
page = pte_page(entry);
get_page(page);
if (page != pagecache_page)
lock_page(page);

ptl = huge_pte_lockptr(h, mm, ptep);
spin_lock(ptl);
/* Check for a racing update before calling hugetlb_cow */
if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
goto out_ptl;
if (!trylock_page(page)) {
need_wait_lock = 1;
goto out_ptl;
}

get_page(page);

if (flags & FAULT_FLAG_WRITE) {
if (!huge_pte_write(entry)) {
ret = hugetlb_cow(mm, vma, address, ptep, entry,
pagecache_page, ptl);
goto out_ptl;
goto out_put_page;
}
entry = huge_pte_mkdirty(entry);
}
entry = pte_mkyoung(entry);
if (huge_ptep_set_access_flags(vma, address, ptep, entry,
flags & FAULT_FLAG_WRITE))
update_mmu_cache(vma, address, ptep);

out_put_page:
if (page != pagecache_page)
unlock_page(page);
put_page(page);
out_ptl:
spin_unlock(ptl);

if (pagecache_page) {
unlock_page(pagecache_page);
put_page(pagecache_page);
}
if (page != pagecache_page)
unlock_page(page);
put_page(page);

out_mutex:
mutex_unlock(&htlb_fault_mutex_table[hash]);
/*
* Generally it's safe to hold refcount during waiting page lock. But
* here we just wait to defer the next page fault to avoid busy loop and
* the page is not used after unlocked before returning from the current
* page fault. So we are safe from accessing freed page, even if we wait
* here without taking refcount.
*/
if (need_wait_lock)
wait_on_page_locked(page);
return ret;
}

Expand Down

0 comments on commit 0f792cf

Please sign in to comment.