Skip to content

Commit

Permalink
mm: soft-dirty bits for user memory changes tracking
Browse files Browse the repository at this point in the history
The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to.  In order to do this tracking one should

  1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
  2. Wait some time.
  3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

To do this tracking, the writable bit is cleared from PTEs when the
soft-dirty bit is.  Thus, after this, when the task tries to modify a
page at some virtual address the #PF occurs and the kernel sets the
soft-dirty bit on the respective PTE.

Note, that although all the task's address space is marked as r/o after
the soft-dirty bits clear, the #PF-s that occur after that are processed
fast.  This is so, since the pages are still mapped to physical memory,
and thus all the kernel does is finds this fact out and puts back
writable, dirty and soft-dirty bits on the PTE.

Another thing to note, is that when mremap moves PTEs they are marked
with soft-dirty as well, since from the user perspective mremap modifies
the virtual memory at mremap's new address.

Signed-off-by: Pavel Emelyanov <[email protected]>
Cc: Matt Mackall <[email protected]>
Cc: Xiao Guangrong <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
xemul authored and torvalds committed Jul 3, 2013
1 parent 2b0a9f0 commit 0f8975e
Show file tree
Hide file tree
Showing 11 changed files with 158 additions and 10 deletions.
7 changes: 6 additions & 1 deletion Documentation/filesystems/proc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -473,7 +473,8 @@ This file is only present if the CONFIG_MMU kernel configuration option is
enabled.

The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG
bits on both physical and virtual pages associated with a process.
bits on both physical and virtual pages associated with a process, and the
soft-dirty bit on pte (see Documentation/vm/soft-dirty.txt for details).
To clear the bits for all the pages associated with the process
> echo 1 > /proc/PID/clear_refs

Expand All @@ -482,6 +483,10 @@ To clear the bits for the anonymous pages associated with the process

To clear the bits for the file mapped pages associated with the process
> echo 3 > /proc/PID/clear_refs

To clear the soft-dirty bit
> echo 4 > /proc/PID/clear_refs

Any other value written to /proc/PID/clear_refs will have no effect.

The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
Expand Down
36 changes: 36 additions & 0 deletions Documentation/vm/soft-dirty.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
SOFT-DIRTY PTEs

The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to. In order to do this tracking one should

1. Clear soft-dirty bits from the task's PTEs.

This is done by writing "4" into the /proc/PID/clear_refs file of the
task in question.

2. Wait some time.

3. Read soft-dirty bits from the PTEs.

This is done by reading from the /proc/PID/pagemap. The bit 55 of the
64-bit qword is the soft-dirty one. If set, the respective PTE was
written to since step 1.


Internally, to do this tracking, the writable bit is cleared from PTEs
when the soft-dirty bit is cleared. So, after this, when the task tries to
modify a page at some virtual address the #PF occurs and the kernel sets
the soft-dirty bit on the respective PTE.

Note, that although all the task's address space is marked as r/o after the
soft-dirty bits clear, the #PF-s that occur after that are processed fast.
This is so, since the pages are still mapped to physical memory, and thus all
the kernel does is finds this fact out and puts both writable and soft-dirty
bits on the PTE.


This feature is actively used by the checkpoint-restore project. You
can find more details about it on http://criu.org


-- Pavel Emelyanov, Apr 9, 2013
3 changes: 3 additions & 0 deletions arch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,9 @@ config HAVE_IRQ_TIME_ACCOUNTING
config HAVE_ARCH_TRANSPARENT_HUGEPAGE
bool

config HAVE_ARCH_SOFT_DIRTY
bool

config HAVE_MOD_ARCH_SPECIFIC
bool
help
Expand Down
1 change: 1 addition & 0 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ config X86
select HAVE_ARCH_SECCOMP_FILTER
select BUILDTIME_EXTABLE_SORT
select GENERIC_CMOS_UPDATE
select HAVE_ARCH_SOFT_DIRTY
select CLOCKSOURCE_WATCHDOG
select GENERIC_CLOCKEVENTS
select ARCH_CLOCKSOURCE_DATA if X86_64
Expand Down
24 changes: 22 additions & 2 deletions arch/x86/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ static inline pte_t pte_mkexec(pte_t pte)

static inline pte_t pte_mkdirty(pte_t pte)
{
return pte_set_flags(pte, _PAGE_DIRTY);
return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
}

static inline pte_t pte_mkyoung(pte_t pte)
Expand Down Expand Up @@ -271,7 +271,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd)

static inline pmd_t pmd_mkdirty(pmd_t pmd)
{
return pmd_set_flags(pmd, _PAGE_DIRTY);
return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
}

static inline pmd_t pmd_mkhuge(pmd_t pmd)
Expand All @@ -294,6 +294,26 @@ static inline pmd_t pmd_mknotpresent(pmd_t pmd)
return pmd_clear_flags(pmd, _PAGE_PRESENT);
}

static inline int pte_soft_dirty(pte_t pte)
{
return pte_flags(pte) & _PAGE_SOFT_DIRTY;
}

static inline int pmd_soft_dirty(pmd_t pmd)
{
return pmd_flags(pmd) & _PAGE_SOFT_DIRTY;
}

static inline pte_t pte_mksoft_dirty(pte_t pte)
{
return pte_set_flags(pte, _PAGE_SOFT_DIRTY);
}

static inline pmd_t pmd_mksoft_dirty(pmd_t pmd)
{
return pmd_set_flags(pmd, _PAGE_SOFT_DIRTY);
}

/*
* Mask out unsupported bits in a present pgprot. Non-present pgprots
* can use those bits for other purposes, so leave them be.
Expand Down
12 changes: 12 additions & 0 deletions arch/x86/include/asm/pgtable_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,18 @@
#define _PAGE_HIDDEN (_AT(pteval_t, 0))
#endif

/*
* The same hidden bit is used by kmemcheck, but since kmemcheck
* works on kernel pages while soft-dirty engine on user space,
* they do not conflict with each other.
*/

#ifdef CONFIG_MEM_SOFT_DIRTY
#define _PAGE_SOFT_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_HIDDEN)
#else
#define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0))
#endif

#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
#define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX)
#else
Expand Down
47 changes: 42 additions & 5 deletions fs/proc/task_mmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include <linux/rmap.h>
#include <linux/swap.h>
#include <linux/swapops.h>
#include <linux/mmu_notifier.h>

#include <asm/elf.h>
#include <asm/uaccess.h>
Expand Down Expand Up @@ -692,13 +693,32 @@ enum clear_refs_types {
CLEAR_REFS_ALL = 1,
CLEAR_REFS_ANON,
CLEAR_REFS_MAPPED,
CLEAR_REFS_SOFT_DIRTY,
CLEAR_REFS_LAST,
};

struct clear_refs_private {
struct vm_area_struct *vma;
enum clear_refs_types type;
};

static inline void clear_soft_dirty(struct vm_area_struct *vma,
unsigned long addr, pte_t *pte)
{
#ifdef CONFIG_MEM_SOFT_DIRTY
/*
* The soft-dirty tracker uses #PF-s to catch writes
* to pages, so write-protect the pte as well. See the
* Documentation/vm/soft-dirty.txt for full description
* of how soft-dirty works.
*/
pte_t ptent = *pte;
ptent = pte_wrprotect(ptent);
ptent = pte_clear_flags(ptent, _PAGE_SOFT_DIRTY);
set_pte_at(vma->vm_mm, addr, pte, ptent);
#endif
}

static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, struct mm_walk *walk)
{
Expand All @@ -718,6 +738,11 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
if (!pte_present(ptent))
continue;

if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
clear_soft_dirty(vma, addr, pte);
continue;
}

page = vm_normal_page(vma, addr, ptent);
if (!page)
continue;
Expand Down Expand Up @@ -759,13 +784,16 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
mm = get_task_mm(task);
if (mm) {
struct clear_refs_private cp = {
.type = type,
};
struct mm_walk clear_refs_walk = {
.pmd_entry = clear_refs_pte_range,
.mm = mm,
.private = &cp,
};
down_read(&mm->mmap_sem);
if (type == CLEAR_REFS_SOFT_DIRTY)
mmu_notifier_invalidate_range_start(mm, 0, -1);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
cp.vma = vma;
if (is_vm_hugetlb_page(vma))
Expand All @@ -786,6 +814,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
walk_page_range(vma->vm_start, vma->vm_end,
&clear_refs_walk);
}
if (type == CLEAR_REFS_SOFT_DIRTY)
mmu_notifier_invalidate_range_end(mm, 0, -1);
flush_tlb_mm(mm);
up_read(&mm->mmap_sem);
mmput(mm);
Expand Down Expand Up @@ -827,6 +857,7 @@ struct pagemapread {
/* in "new" pagemap pshift bits are occupied with more status bits */
#define PM_STATUS2(v2, x) (__PM_PSHIFT(v2 ? x : PAGE_SHIFT))

#define __PM_SOFT_DIRTY (1LL)
#define PM_PRESENT PM_STATUS(4LL)
#define PM_SWAP PM_STATUS(2LL)
#define PM_FILE PM_STATUS(1LL)
Expand Down Expand Up @@ -868,6 +899,7 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
{
u64 frame, flags;
struct page *page = NULL;
int flags2 = 0;

if (pte_present(pte)) {
frame = pte_pfn(pte);
Expand All @@ -888,13 +920,15 @@ static void pte_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,

if (page && !PageAnon(page))
flags |= PM_FILE;
if (pte_soft_dirty(pte))
flags2 |= __PM_SOFT_DIRTY;

*pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, 0) | flags);
*pme = make_pme(PM_PFRAME(frame) | PM_STATUS2(pm->v2, flags2) | flags);
}

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
pmd_t pmd, int offset)
pmd_t pmd, int offset, int pmd_flags2)
{
/*
* Currently pmd for thp is always present because thp can not be
Expand All @@ -903,13 +937,13 @@ static void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *p
*/
if (pmd_present(pmd))
*pme = make_pme(PM_PFRAME(pmd_pfn(pmd) + offset)
| PM_STATUS2(pm->v2, 0) | PM_PRESENT);
| PM_STATUS2(pm->v2, pmd_flags2) | PM_PRESENT);
else
*pme = make_pme(PM_NOT_PRESENT(pm->v2));
}
#else
static inline void thp_pmd_to_pagemap_entry(pagemap_entry_t *pme, struct pagemapread *pm,
pmd_t pmd, int offset)
pmd_t pmd, int offset, int pmd_flags2)
{
}
#endif
Expand All @@ -926,12 +960,15 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
/* find the first VMA at or above 'addr' */
vma = find_vma(walk->mm, addr);
if (vma && pmd_trans_huge_lock(pmd, vma) == 1) {
int pmd_flags2;

pmd_flags2 = (pmd_soft_dirty(*pmd) ? __PM_SOFT_DIRTY : 0);
for (; addr != end; addr += PAGE_SIZE) {
unsigned long offset;

offset = (addr & ~PAGEMAP_WALK_MASK) >>
PAGE_SHIFT;
thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset);
thp_pmd_to_pagemap_entry(&pme, pm, *pmd, offset, pmd_flags2);
err = add_to_pagemap(addr, &pme, pm);
if (err)
break;
Expand Down
22 changes: 22 additions & 0 deletions include/asm-generic/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,28 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm,
#define arch_start_context_switch(prev) do {} while (0)
#endif

#ifndef CONFIG_HAVE_ARCH_SOFT_DIRTY
static inline int pte_soft_dirty(pte_t pte)
{
return 0;
}

static inline int pmd_soft_dirty(pmd_t pmd)
{
return 0;
}

static inline pte_t pte_mksoft_dirty(pte_t pte)
{
return pte;
}

static inline pmd_t pmd_mksoft_dirty(pmd_t pmd)
{
return pmd;
}
#endif

#ifndef __HAVE_PFNMAP_TRACKING
/*
* Interfaces that can be used by architecture code to keep track of
Expand Down
12 changes: 12 additions & 0 deletions mm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -477,3 +477,15 @@ config FRONTSWAP
and swap data is stored as normal on the matching swap device.

If unsure, say Y to enable frontswap.

config MEM_SOFT_DIRTY
bool "Track memory changes"
depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY
select PROC_PAGE_MONITOR
help
This option enables memory changes tracking by introducing a
soft-dirty bit on pte-s. This bit it set when someone writes
into a page just as regular dirty bit, but unlike the latter
it can be cleared by hands.

See Documentation/vm/soft-dirty.txt for more details.
2 changes: 1 addition & 1 deletion mm/huge_memory.c
Original file line number Diff line number Diff line change
Expand Up @@ -1429,7 +1429,7 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
if (ret == 1) {
pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
VM_BUG_ON(!pmd_none(*new_pmd));
set_pmd_at(mm, new_addr, new_pmd, pmd);
set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
spin_unlock(&mm->page_table_lock);
}
out:
Expand Down
2 changes: 1 addition & 1 deletion mm/mremap.c
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
continue;
pte = ptep_get_and_clear(mm, old_addr, old_pte);
pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
set_pte_at(mm, new_addr, new_pte, pte);
set_pte_at(mm, new_addr, new_pte, pte_mksoft_dirty(pte));
}

arch_leave_lazy_mmu_mode();
Expand Down

0 comments on commit 0f8975e

Please sign in to comment.