Skip to content

Commit

Permalink
mm/hwpoison: mf_mutex for soft offline and unpoison
Browse files Browse the repository at this point in the history
Patch series "mm/hwpoison: fix unpoison_memory()", v4.

The main purpose of this series is to sync unpoison code to recent
changes around how hwpoison code takes page refcount.  Unpoison should
work or simply fail (without crash) if impossible.

The recent works of keeping hwpoison pages in shmem pagecache introduce
a new state of hwpoisoned pages, but unpoison for such pages is not
supported yet with this series.

It seems that soft-offline and unpoison can be used as general purpose
page offline/online mechanism (not in the context of memory error).  I
think that we need some additional works to realize it because currently
soft-offline and unpoison are assumed not to happen so frequently (print
out too many messages for aggressive usecases).  But anyway this could
be another interesting next topic.

v1: https://lore.kernel.org/linux-mm/[email protected]/
v2: https://lore.kernel.org/linux-mm/[email protected]/
v3: https://lore.kernel.org/linux-mm/[email protected]/

This patch (of 3):

Originally mf_mutex is introduced to serialize multiple MCE events, but
it is not that useful to allow unpoison to run in parallel with
memory_failure() and soft offline.  So apply mf_mutex to soft offline
and unpoison.  The memory failure handler and soft offline handler get
simpler with this.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Ding Hui <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
nhoriguchi authored and torvalds committed Jan 15, 2022
1 parent e1c63e1 commit 91d0054
Showing 1 changed file with 18 additions and 44 deletions.
62 changes: 18 additions & 44 deletions mm/memory-failure.c
Original file line number Diff line number Diff line change
Expand Up @@ -1502,14 +1502,6 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
lock_page(head);
page_flags = head->flags;

if (!PageHWPoison(head)) {
pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
num_poisoned_pages_dec();
unlock_page(head);
put_page(head);
return 0;
}

/*
* TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
* simply disable it. In order to make it work properly, we need
Expand Down Expand Up @@ -1623,6 +1615,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
return rc;
}

static DEFINE_MUTEX(mf_mutex);

/**
* memory_failure - Handle memory failure of a page.
* @pfn: Page Number of the corrupted page
Expand All @@ -1649,7 +1643,6 @@ int memory_failure(unsigned long pfn, int flags)
int res = 0;
unsigned long page_flags;
bool retry = true;
static DEFINE_MUTEX(mf_mutex);

if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn);
Expand Down Expand Up @@ -1783,16 +1776,6 @@ int memory_failure(unsigned long pfn, int flags)
*/
page_flags = p->flags;

/*
* unpoison always clear PG_hwpoison inside page lock
*/
if (!PageHWPoison(p)) {
pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
num_poisoned_pages_dec();
unlock_page(p);
put_page(p);
goto unlock_mutex;
}
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
Expand Down Expand Up @@ -1973,6 +1956,7 @@ int unpoison_memory(unsigned long pfn)
struct page *page;
struct page *p;
int freeit = 0;
int ret = 0;
unsigned long flags = 0;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
Expand All @@ -1983,69 +1967,54 @@ int unpoison_memory(unsigned long pfn)
p = pfn_to_page(pfn);
page = compound_head(p);

mutex_lock(&mf_mutex);

if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
pfn, &unpoison_rs);
return 0;
goto unlock_mutex;
}

if (page_count(page) > 1) {
unpoison_pr_info("Unpoison: Someone grabs the hwpoison page %#lx\n",
pfn, &unpoison_rs);
return 0;
goto unlock_mutex;
}

if (page_mapped(page)) {
unpoison_pr_info("Unpoison: Someone maps the hwpoison page %#lx\n",
pfn, &unpoison_rs);
return 0;
goto unlock_mutex;
}

if (page_mapping(page)) {
unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n",
pfn, &unpoison_rs);
return 0;
}

/*
* unpoison_memory() can encounter thp only when the thp is being
* worked by memory_failure() and the page lock is not held yet.
* In such case, we yield to memory_failure() and make unpoison fail.
*/
if (!PageHuge(page) && PageTransHuge(page)) {
unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
pfn, &unpoison_rs);
return 0;
goto unlock_mutex;
}

if (!get_hwpoison_page(p, flags)) {
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
pfn, &unpoison_rs);
return 0;
goto unlock_mutex;
}

lock_page(page);
/*
* This test is racy because PG_hwpoison is set outside of page lock.
* That's acceptable because that won't trigger kernel panic. Instead,
* the PG_hwpoison page will be caught and isolated on the entrance to
* the free buddy page pool.
*/
if (TestClearPageHWPoison(page)) {
unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
pfn, &unpoison_rs);
num_poisoned_pages_dec();
freeit = 1;
}
unlock_page(page);

put_page(page);
if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
put_page(page);

return 0;
unlock_mutex:
mutex_unlock(&mf_mutex);
return ret;
}
EXPORT_SYMBOL(unpoison_memory);

Expand Down Expand Up @@ -2226,9 +2195,12 @@ int soft_offline_page(unsigned long pfn, int flags)
return -EIO;
}

mutex_lock(&mf_mutex);

if (PageHWPoison(page)) {
pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
put_ref_page(ref_page);
mutex_unlock(&mf_mutex);
return 0;
}

Expand All @@ -2247,5 +2219,7 @@ int soft_offline_page(unsigned long pfn, int flags)
}
}

mutex_unlock(&mf_mutex);

return ret;
}

0 comments on commit 91d0054

Please sign in to comment.