Skip to content

Commit

Permalink
mm: migrate: prevent racy access to tlb_flush_pending
Browse files Browse the repository at this point in the history
Patch series "fixes of TLB batching races", v6.

It turns out that Linux TLB batching mechanism suffers from various
races.  Races that are caused due to batching during reclamation were
recently handled by Mel and this patch-set deals with others.  The more
fundamental issue is that concurrent updates of the page-tables allow
for TLB flushes to be batched on one core, while another core changes
the page-tables.  This other core may assume a PTE change does not
require a flush based on the updated PTE value, while it is unaware that
TLB flushes are still pending.

This behavior affects KSM (which may result in memory corruption) and
MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior).  A
proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED.
Memory corruption in KSM is harder to produce in practice, but was
observed by hacking the kernel and adding a delay before flushing and
replacing the KSM page.

Finally, there is also one memory barrier missing, which may affect
architectures with weak memory model.

This patch (of 7):

Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work().  If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.

This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending.  The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.

An actual data corruption was not observed, yet the race was was
confirmed by adding assertion to check tlb_flush_pending is not set by
two threads, adding artificial latency in change_protection_range() and
using sysctl to reduce kernel.numa_balancing_scan_delay_ms.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 2084140 ("mm: fix TLB flush race between migration, and
change_protection_range")
Signed-off-by: Nadav Amit <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
anadav authored and torvalds committed Aug 10, 2017
1 parent 9eeb52a commit 16af97d
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 13 deletions.
31 changes: 22 additions & 9 deletions include/linux/mm_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -493,7 +493,7 @@ struct mm_struct {
* can move process memory needs to flush the TLB when moving a
* PROT_NONE or PROT_NUMA mapped page.
*/
bool tlb_flush_pending;
atomic_t tlb_flush_pending;
#endif
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
/* See flush_tlb_batched_pending() */
Expand Down Expand Up @@ -532,33 +532,46 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
{
barrier();
return mm->tlb_flush_pending;
return atomic_read(&mm->tlb_flush_pending) > 0;
}
static inline void set_tlb_flush_pending(struct mm_struct *mm)

static inline void init_tlb_flush_pending(struct mm_struct *mm)
{
mm->tlb_flush_pending = true;
atomic_set(&mm->tlb_flush_pending, 0);
}

static inline void inc_tlb_flush_pending(struct mm_struct *mm)
{
atomic_inc(&mm->tlb_flush_pending);

/*
* Guarantee that the tlb_flush_pending store does not leak into the
* Guarantee that the tlb_flush_pending increase does not leak into the
* critical section updating the page tables
*/
smp_mb__before_spinlock();
}

/* Clearing is done after a TLB flush, which also provides a barrier. */
static inline void clear_tlb_flush_pending(struct mm_struct *mm)
static inline void dec_tlb_flush_pending(struct mm_struct *mm)
{
barrier();
mm->tlb_flush_pending = false;
atomic_dec(&mm->tlb_flush_pending);
}
#else
static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
{
return false;
}
static inline void set_tlb_flush_pending(struct mm_struct *mm)

static inline void init_tlb_flush_pending(struct mm_struct *mm)
{
}
static inline void clear_tlb_flush_pending(struct mm_struct *mm)

static inline void inc_tlb_flush_pending(struct mm_struct *mm)
{
}

static inline void dec_tlb_flush_pending(struct mm_struct *mm)
{
}
#endif
Expand Down
2 changes: 1 addition & 1 deletion kernel/fork.c
Original file line number Diff line number Diff line change
Expand Up @@ -807,7 +807,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm_init_aio(mm);
mm_init_owner(mm, p);
mmu_notifier_mm_init(mm);
clear_tlb_flush_pending(mm);
init_tlb_flush_pending(mm);
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
mm->pmd_huge_pte = NULL;
#endif
Expand Down
2 changes: 1 addition & 1 deletion mm/debug.c
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct *mm)
mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
#endif
#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
mm->tlb_flush_pending,
atomic_read(&mm->tlb_flush_pending),
#endif
mm->def_flags, &mm->def_flags
);
Expand Down
4 changes: 2 additions & 2 deletions mm/mprotect.c
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,
BUG_ON(addr >= end);
pgd = pgd_offset(mm, addr);
flush_cache_range(vma, addr, end);
set_tlb_flush_pending(mm);
inc_tlb_flush_pending(mm);
do {
next = pgd_addr_end(addr, end);
if (pgd_none_or_clear_bad(pgd))
Expand All @@ -256,7 +256,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,
/* Only flush the TLB if we actually modified any entries: */
if (pages)
flush_tlb_range(vma, start, end);
clear_tlb_flush_pending(mm);
dec_tlb_flush_pending(mm);

return pages;
}
Expand Down

0 comments on commit 16af97d

Please sign in to comment.