Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge updates from Andrew Morton:
 "Incoming:

   - a small number of updates to scripts/, ocfs2 and fs/buffer.c

   - most of MM

  I still have quite a lot of material (mostly not MM) staged after
  linux-next due to -next dependencies. I'll send those across next week
  as the preprequisites get merged up"

* emailed patches from Andrew Morton <[email protected]>: (135 commits)
  mm/page_io.c: annotate refault stalls from swap_readpage
  mm/Kconfig: fix trivial help text punctuation
  mm/Kconfig: fix indentation
  mm/memory_hotplug.c: remove __online_page_set_limits()
  mm: fix typos in comments when calling __SetPageUptodate()
  mm: fix struct member name in function comments
  mm/shmem.c: cast the type of unmap_start to u64
  mm: shmem: use proper gfp flags for shmem_writepage()
  mm/shmem.c: make array 'values' static const, makes object smaller
  userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
  fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
  userfaultfd: wrap the common dst_vma check into an inlined function
  userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
  userfaultfd: use vma_pagesize for all huge page size calculation
  mm/madvise.c: use PAGE_ALIGN[ED] for range checking
  mm/madvise.c: replace with page_size() in madvise_inject_error()
  mm/mmap.c: make vma_merge() comment more easy to understand
  mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
  autonuma: reduce cache footprint when scanning page tables
  autonuma: fix watermark checking in migrate_balanced_pgdat()
  ...
  • Loading branch information
torvalds committed Dec 2, 2019
2 parents c3bfc5d + 9377906 commit 596cf45
Show file tree
Hide file tree
Showing 95 changed files with 2,661 additions and 1,506 deletions.
7 changes: 6 additions & 1 deletion Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1288,7 +1288,12 @@ PAGE_SIZE multiple when read back.
inactive_anon, active_anon, inactive_file, active_file, unevictable
Amount of memory, swap-backed and filesystem-backed,
on the internal memory management lists used by the
page reclaim algorithm
page reclaim algorithm.

As these represent internal list state (eg. shmem pages are on anon
memory management lists), inactive_foo + active_foo may not be equal to
the value for the foo counter, since the foo counter is type-based, not
list-based.

slab_reclaimable
Part of "slab" that might be reclaimed, such as
Expand Down
63 changes: 63 additions & 0 deletions Documentation/dev-tools/kasan.rst
Original file line number Diff line number Diff line change
Expand Up @@ -218,3 +218,66 @@ brk handler is used to print bug reports.
A potential expansion of this mode is a hardware tag-based mode, which would
use hardware memory tagging support instead of compiler instrumentation and
manual shadow memory manipulation.

What memory accesses are sanitised by KASAN?
--------------------------------------------

The kernel maps memory in a number of different parts of the address
space. This poses something of a problem for KASAN, which requires
that all addresses accessed by instrumented code have a valid shadow
region.

The range of kernel virtual addresses is large: there is not enough
real memory to support a real shadow region for every address that
could be accessed by the kernel.

By default
~~~~~~~~~~

By default, architectures only map real memory over the shadow region
for the linear mapping (and potentially other small areas). For all
other areas - such as vmalloc and vmemmap space - a single read-only
page is mapped over the shadow area. This read-only shadow page
declares all memory accesses as permitted.

This presents a problem for modules: they do not live in the linear
mapping, but in a dedicated module space. By hooking in to the module
allocator, KASAN can temporarily map real shadow memory to cover
them. This allows detection of invalid accesses to module globals, for
example.

This also creates an incompatibility with ``VMAP_STACK``: if the stack
lives in vmalloc space, it will be shadowed by the read-only page, and
the kernel will fault when trying to set up the shadow data for stack
variables.

CONFIG_KASAN_VMALLOC
~~~~~~~~~~~~~~~~~~~~

With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
cost of greater memory usage. Currently this is only supported on x86.

This works by hooking into vmalloc and vmap, and dynamically
allocating real shadow memory to back the mappings.

Most mappings in vmalloc space are small, requiring less than a full
page of shadow space. Allocating a full shadow page per mapping would
therefore be wasteful. Furthermore, to ensure that different mappings
use different shadow pages, mappings would have to be aligned to
``KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE``.

Instead, we share backing space across multiple mappings. We allocate
a backing page when a mapping in vmalloc space uses a particular page
of the shadow region. This page can be shared by other vmalloc
mappings later on.

We hook in to the vmap infrastructure to lazily clean up unused shadow
memory.

To avoid the difficulties around swapping mappings around, we expect
that the part of the shadow region that covers the vmalloc space will
not be covered by the early shadow page, but will be left
unmapped. This will require changes in arch-specific code.

This allows ``VMAP_STACK`` support on x86, and can simplify support of
architectures that do not have a fixed module region.
9 changes: 5 additions & 4 deletions arch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -836,16 +836,17 @@ config HAVE_ARCH_VMAP_STACK
config VMAP_STACK
default y
bool "Use a virtually-mapped stack"
depends on HAVE_ARCH_VMAP_STACK && !KASAN
depends on HAVE_ARCH_VMAP_STACK
depends on !KASAN || KASAN_VMALLOC
---help---
Enable this if you want the use virtually-mapped kernel stacks
with guard pages. This causes kernel stack overflows to be
caught immediately rather than causing difficult-to-diagnose
corruption.

This is presently incompatible with KASAN because KASAN expects
the stack to map directly to the KASAN shadow map using a formula
that is incorrect if the stack is in vmalloc space.
To use this with KASAN, the architecture must support backing
virtual mappings with real shadow memory, and KASAN_VMALLOC must
be enabled.

config ARCH_OPTIONAL_KERNEL_RWX
def_bool n
Expand Down
1 change: 0 additions & 1 deletion arch/arc/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@
#define _ASM_ARC_PGTABLE_H

#include <linux/bits.h>
#define __ARCH_USE_5LEVEL_HACK
#include <asm-generic/pgtable-nopmd.h>
#include <asm/page.h>
#include <asm/mmu.h> /* to propagate CONFIG_ARC_MMU_VER <n> */
Expand Down
10 changes: 8 additions & 2 deletions arch/arc/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ noinline static int handle_kernel_vaddr_fault(unsigned long address)
* with the 'reference' page table.
*/
pgd_t *pgd, *pgd_k;
p4d_t *p4d, *p4d_k;
pud_t *pud, *pud_k;
pmd_t *pmd, *pmd_k;

Expand All @@ -39,8 +40,13 @@ noinline static int handle_kernel_vaddr_fault(unsigned long address)
if (!pgd_present(*pgd_k))
goto bad_area;

pud = pud_offset(pgd, address);
pud_k = pud_offset(pgd_k, address);
p4d = p4d_offset(pgd, address);
p4d_k = p4d_offset(pgd_k, address);
if (!p4d_present(*p4d_k))
goto bad_area;

pud = pud_offset(p4d, address);
pud_k = pud_offset(p4d_k, address);
if (!pud_present(*pud_k))
goto bad_area;

Expand Down
4 changes: 3 additions & 1 deletion arch/arc/mm/highmem.c
Original file line number Diff line number Diff line change
Expand Up @@ -111,12 +111,14 @@ EXPORT_SYMBOL(__kunmap_atomic);
static noinline pte_t * __init alloc_kmap_pgtable(unsigned long kvaddr)
{
pgd_t *pgd_k;
p4d_t *p4d_k;
pud_t *pud_k;
pmd_t *pmd_k;
pte_t *pte_k;

pgd_k = pgd_offset_k(kvaddr);
pud_k = pud_offset(pgd_k, kvaddr);
p4d_k = p4d_offset(pgd_k, kvaddr);
pud_k = pud_offset(p4d_k, kvaddr);
pmd_k = pmd_offset(pud_k, kvaddr);

pte_k = (pte_t *)memblock_alloc_low(PAGE_SIZE, PAGE_SIZE);
Expand Down
3 changes: 0 additions & 3 deletions arch/powerpc/include/asm/book3s/64/pgtable-4k.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,6 @@ static inline int get_hugepd_cache_index(int index)
/* should not reach */
}

#else /* !CONFIG_HUGETLB_PAGE */
static inline int pmd_huge(pmd_t pmd) { return 0; }
static inline int pud_huge(pud_t pud) { return 0; }
#endif /* CONFIG_HUGETLB_PAGE */

#endif /* __ASSEMBLY__ */
Expand Down
3 changes: 0 additions & 3 deletions arch/powerpc/include/asm/book3s/64/pgtable-64k.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,6 @@ static inline int get_hugepd_cache_index(int index)
BUG();
}

#else /* !CONFIG_HUGETLB_PAGE */
static inline int pmd_huge(pmd_t pmd) { return 0; }
static inline int pud_huge(pud_t pud) { return 0; }
#endif /* CONFIG_HUGETLB_PAGE */

static inline int remap_4k_pfn(struct vm_area_struct *vma, unsigned long addr,
Expand Down
1 change: 1 addition & 0 deletions arch/powerpc/mm/book3s64/radix_pgtable.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include <linux/memblock.h>
#include <linux/of_fdt.h>
#include <linux/mm.h>
#include <linux/hugetlb.h>
#include <linux/string_helpers.h>
#include <linux/stop_machine.h>

Expand Down
1 change: 1 addition & 0 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ config X86
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN if X86_64
select HAVE_ARCH_KASAN_VMALLOC if X86_64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
Expand Down
61 changes: 61 additions & 0 deletions arch/x86/mm/kasan_init_64.c
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,49 @@ static void __init kasan_map_early_shadow(pgd_t *pgd)
} while (pgd++, addr = next, addr != end);
}

static void __init kasan_shallow_populate_p4ds(pgd_t *pgd,
unsigned long addr,
unsigned long end)
{
p4d_t *p4d;
unsigned long next;
void *p;

p4d = p4d_offset(pgd, addr);
do {
next = p4d_addr_end(addr, end);

if (p4d_none(*p4d)) {
p = early_alloc(PAGE_SIZE, NUMA_NO_NODE, true);
p4d_populate(&init_mm, p4d, p);
}
} while (p4d++, addr = next, addr != end);
}

static void __init kasan_shallow_populate_pgds(void *start, void *end)
{
unsigned long addr, next;
pgd_t *pgd;
void *p;

addr = (unsigned long)start;
pgd = pgd_offset_k(addr);
do {
next = pgd_addr_end(addr, (unsigned long)end);

if (pgd_none(*pgd)) {
p = early_alloc(PAGE_SIZE, NUMA_NO_NODE, true);
pgd_populate(&init_mm, pgd, p);
}

/*
* we need to populate p4ds to be synced when running in
* four level mode - see sync_global_pgds_l4()
*/
kasan_shallow_populate_p4ds(pgd, addr, next);
} while (pgd++, addr = next, addr != (unsigned long)end);
}

#ifdef CONFIG_KASAN_INLINE
static int kasan_die_handler(struct notifier_block *self,
unsigned long val,
Expand Down Expand Up @@ -354,6 +397,24 @@ void __init kasan_init(void)

kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
kasan_mem_to_shadow((void *)VMALLOC_START));

/*
* If we're in full vmalloc mode, don't back vmalloc space with early
* shadow pages. Instead, prepopulate pgds/p4ds so they are synced to
* the global table and we can populate the lower levels on demand.
*/
if (IS_ENABLED(CONFIG_KASAN_VMALLOC))
kasan_shallow_populate_pgds(
kasan_mem_to_shadow((void *)VMALLOC_START),
kasan_mem_to_shadow((void *)VMALLOC_END));
else
kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)VMALLOC_START),
kasan_mem_to_shadow((void *)VMALLOC_END));

kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)VMALLOC_END + 1),
shadow_cpu_entry_begin);

kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin,
Expand Down
40 changes: 15 additions & 25 deletions drivers/base/memory.c
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,12 @@
#include <linux/memory.h>
#include <linux/memory_hotplug.h>
#include <linux/mm.h>
#include <linux/mutex.h>
#include <linux/stat.h>
#include <linux/slab.h>

#include <linux/atomic.h>
#include <linux/uaccess.h>

static DEFINE_MUTEX(mem_sysfs_mutex);

#define MEMORY_CLASS_NAME "memory"

#define to_memory_block(dev) container_of(dev, struct memory_block, dev)
Expand Down Expand Up @@ -538,12 +535,7 @@ static ssize_t soft_offline_page_store(struct device *dev,
if (kstrtoull(buf, 0, &pfn) < 0)
return -EINVAL;
pfn >>= PAGE_SHIFT;
if (!pfn_valid(pfn))
return -ENXIO;
/* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */
if (!pfn_to_online_page(pfn))
return -EIO;
ret = soft_offline_page(pfn_to_page(pfn), 0);
ret = soft_offline_page(pfn, 0);
return ret == 0 ? count : ret;
}

Expand Down Expand Up @@ -705,6 +697,8 @@ static void unregister_memory(struct memory_block *memory)
* Create memory block devices for the given memory area. Start and size
* have to be aligned to memory block granularity. Memory block devices
* will be initialized as offline.
*
* Called under device_hotplug_lock.
*/
int create_memory_block_devices(unsigned long start, unsigned long size)
{
Expand All @@ -718,7 +712,6 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
!IS_ALIGNED(size, memory_block_size_bytes())))
return -EINVAL;

mutex_lock(&mem_sysfs_mutex);
for (block_id = start_block_id; block_id != end_block_id; block_id++) {
ret = init_memory_block(&mem, block_id, MEM_OFFLINE);
if (ret)
Expand All @@ -730,18 +723,21 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
for (block_id = start_block_id; block_id != end_block_id;
block_id++) {
mem = find_memory_block_by_id(block_id);
if (WARN_ON_ONCE(!mem))
continue;
mem->section_count = 0;
unregister_memory(mem);
}
}
mutex_unlock(&mem_sysfs_mutex);
return ret;
}

/*
* Remove memory block devices for the given memory area. Start and size
* have to be aligned to memory block granularity. Memory block devices
* have to be offline.
*
* Called under device_hotplug_lock.
*/
void remove_memory_block_devices(unsigned long start, unsigned long size)
{
Expand All @@ -754,7 +750,6 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
!IS_ALIGNED(size, memory_block_size_bytes())))
return;

mutex_lock(&mem_sysfs_mutex);
for (block_id = start_block_id; block_id != end_block_id; block_id++) {
mem = find_memory_block_by_id(block_id);
if (WARN_ON_ONCE(!mem))
Expand All @@ -763,7 +758,6 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
unregister_memory_block_under_nodes(mem);
unregister_memory(mem);
}
mutex_unlock(&mem_sysfs_mutex);
}

/* return true if the memory block is offlined, otherwise, return false */
Expand Down Expand Up @@ -797,12 +791,13 @@ static const struct attribute_group *memory_root_attr_groups[] = {
};

/*
* Initialize the sysfs support for memory devices...
* Initialize the sysfs support for memory devices. At the time this function
* is called, we cannot have concurrent creation/deletion of memory block
* devices, the device_hotplug_lock is not needed.
*/
void __init memory_dev_init(void)
{
int ret;
int err;
unsigned long block_sz, nr;

/* Validate the configured memory block size */
Expand All @@ -813,24 +808,19 @@ void __init memory_dev_init(void)

ret = subsys_system_register(&memory_subsys, memory_root_attr_groups);
if (ret)
goto out;
panic("%s() failed to register subsystem: %d\n", __func__, ret);

/*
* Create entries for memory sections that were found
* during boot and have been initialized
*/
mutex_lock(&mem_sysfs_mutex);
for (nr = 0; nr <= __highest_present_section_nr;
nr += sections_per_block) {
err = add_memory_block(nr);
if (!ret)
ret = err;
ret = add_memory_block(nr);
if (ret)
panic("%s() failed to add memory block: %d\n", __func__,
ret);
}
mutex_unlock(&mem_sysfs_mutex);

out:
if (ret)
panic("%s() failed: %d\n", __func__, ret);
}

/**
Expand Down
Loading

0 comments on commit 596cf45

Please sign in to comment.