Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge yet more updates from Andrew Morton:
 "The rest of MM and a kernel-wide procfs cleanup.

  Summary of the more significant patches:

   - Patch series "mm/memory_hotplug: Factor out memory block
     devicehandling", v3. David Hildenbrand.

     Some spring-cleaning of the memory hotplug code, notably in
     drivers/base/memory.c

   - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang
     Shi.

     Fix /proc/pid/smaps output for THP pages used in shmem.

   - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit.

     Bugfix and speedup for kernel/resource.c

   - Patch series "mm: Further memory block device cleanups", David
     Hildenbrand.

     More spring-cleaning of the memory hotplug code.

   - Patch series "mm: Sub-section memory hotplug support". Dan
     Williams.

     Generalise the memory hotplug code so that pmem can use it more
     completely. Then remove the hacks from the libnvdimm code which
     were there to work around the memory-hotplug code's constraints.

   - "proc/sysctl: add shared variables for range check", Matteo Croce.

     We have about 250 instances of

          int zero;
          ...
                  .extra1 = &zero,

     in the tree. This is a tree-wide sweep to make all those private
     "zero"s and "one"s use global variables.

     Alas, it isn't practical to make those two global integers const"

* emailed patches from Andrew Morton <[email protected]>: (38 commits)
  proc/sysctl: add shared variables for range check
  mm: migrate: remove unused mode argument
  mm/sparsemem: cleanup 'section number' data types
  libnvdimm/pfn: stop padding pmem namespaces to section alignment
  libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
  mm/devm_memremap_pages: enable sub-section remap
  mm: document ZONE_DEVICE memory-model implications
  mm/sparsemem: support sub-section hotplug
  mm/sparsemem: prepare for sub-section ranges
  mm: kill is_dev_zone() helper
  mm/hotplug: kill is_dev_zone() usage in __remove_pages()
  mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()
  mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal
  mm/sparsemem: add helpers track active portions of a section at boot
  mm/sparsemem: introduce a SECTION_IS_EARLY flag
  mm/sparsemem: introduce struct mem_section_usage
  drivers/base/memory.c: get rid of find_memory_block_hinted()
  mm/memory_hotplug: move and simplify walk_memory_blocks()
  mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns
  mm: make register_mem_sect_under_node() static
  ...
  • Loading branch information
torvalds committed Jul 19, 2019
2 parents 3bfe1fc + eec4844 commit 249be85
Show file tree
Hide file tree
Showing 71 changed files with 1,087 additions and 1,020 deletions.
4 changes: 2 additions & 2 deletions Documentation/filesystems/proc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -486,8 +486,8 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
"SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
does not take into account swapped out page of underlying shmem objects.
"Locked" indicates whether the mapping is locked in memory or not.
"THPeligible" indicates whether the mapping is eligible for THP pages - 1 if
true, 0 otherwise.
"THPeligible" indicates whether the mapping is eligible for allocating THP
pages - 1 if true, 0 otherwise. It just shows the current status.

"VmFlags" field deserves a separate description. This member represents the kernel
flags associated with the particular virtual memory area in two letter encoded
Expand Down
40 changes: 40 additions & 0 deletions Documentation/vm/memory-model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,3 +181,43 @@ that is eventually passed to vmemmap_populate() through a long chain
of function calls. The vmemmap_populate() implementation may use the
`vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to
allocate memory map on the persistent memory device.

ZONE_DEVICE
===========
The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer
`struct page` `mem_map` services for device driver identified physical
address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact
that the page objects for these address ranges are never marked online,
and that a reference must be taken against the device, not just the page
to keep the memory pinned for active use. `ZONE_DEVICE`, via
:c:func:`devm_memremap_pages`, performs just enough memory hotplug to
turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and
:c:func:`get_user_pages` service for the given range of pfns. Since the
page reference count never drops below 1 the page is never tracked as
free memory and the page's `struct list_head lru` space is repurposed
for back referencing to the host device / driver that mapped the memory.

While `SPARSEMEM` presents memory as a collection of sections,
optionally collected into memory blocks, `ZONE_DEVICE` users have a need
for smaller granularity of populating the `mem_map`. Given that
`ZONE_DEVICE` memory is never marked online it is subsequently never
subject to its memory ranges being exposed through the sysfs memory
hotplug api on memory block boundaries. The implementation relies on
this lack of user-api constraint to allow sub-section sized memory
ranges to be specified to :c:func:`arch_add_memory`, the top-half of
memory hotplug. Sub-section support allows for 2MB as the cross-arch
common alignment granularity for :c:func:`devm_memremap_pages`.

The users of `ZONE_DEVICE` are:

* pmem: Map platform persistent memory to be used as a direct-I/O target
via DAX mappings.

* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
event callbacks to allow a device-driver to coordinate memory management
events related to device-memory, typically GPU memory. See
Documentation/vm/hmm.rst.

* p2pdma: Create `struct page` objects to allow peer devices in a
PCI/-E topology to coordinate direct-DMA operations between themselves,
i.e. bypass host memory.
17 changes: 17 additions & 0 deletions arch/arm64/mm/mmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -1074,4 +1074,21 @@ int arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
restrictions);
}
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;

/*
* FIXME: Cleanup page tables (also in arch_add_memory() in case
* adding fails). Until then, this function should only be used
* during memory hotplug (adding memory), not for memory
* unplug. ARCH_ENABLE_MEMORY_HOTREMOVE must not be
* unlocked yet.
*/
zone = page_zone(pfn_to_page(start_pfn));
__remove_pages(zone, start_pfn, nr_pages, altmap);
}
#endif
2 changes: 0 additions & 2 deletions arch/ia64/mm/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
return ret;
}

#ifdef CONFIG_MEMORY_HOTREMOVE
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
Expand All @@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
}
#endif
#endif
2 changes: 0 additions & 2 deletions arch/powerpc/mm/mem.c
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start_pfn, nr_pages, restrictions);
}

#ifdef CONFIG_MEMORY_HOTREMOVE
void __ref arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
Expand All @@ -151,7 +150,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
pr_warn("Hash collision while resizing HPT\n");
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */

#ifndef CONFIG_NEED_MULTIPLE_NODES
void __init mem_topology_setup(void)
Expand Down
23 changes: 11 additions & 12 deletions arch/powerpc/platforms/powernv/memtrace.c
Original file line number Diff line number Diff line change
Expand Up @@ -70,23 +70,23 @@ static int change_memblock_state(struct memory_block *mem, void *arg)
/* called with device_hotplug_lock held */
static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
{
u64 end_pfn = start_pfn + nr_pages - 1;
const unsigned long start = PFN_PHYS(start_pfn);
const unsigned long size = PFN_PHYS(nr_pages);

if (walk_memory_range(start_pfn, end_pfn, NULL,
check_memblock_online))
if (walk_memory_blocks(start, size, NULL, check_memblock_online))
return false;

walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
change_memblock_state);
walk_memory_blocks(start, size, (void *)MEM_GOING_OFFLINE,
change_memblock_state);

if (offline_pages(start_pfn, nr_pages)) {
walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
change_memblock_state);
walk_memory_blocks(start, size, (void *)MEM_ONLINE,
change_memblock_state);
return false;
}

walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
change_memblock_state);
walk_memory_blocks(start, size, (void *)MEM_OFFLINE,
change_memblock_state);


return true;
Expand Down Expand Up @@ -242,9 +242,8 @@ static int memtrace_online(void)
*/
if (!memhp_auto_online) {
lock_device_hotplug();
walk_memory_range(PFN_DOWN(ent->start),
PFN_UP(ent->start + ent->size - 1),
NULL, online_mem_block);
walk_memory_blocks(ent->start, ent->size, NULL,
online_mem_block);
unlock_device_hotplug();
}

Expand Down
15 changes: 5 additions & 10 deletions arch/s390/appldata/appldata_base.c
Original file line number Diff line number Diff line change
Expand Up @@ -220,15 +220,13 @@ appldata_timer_handler(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
int timer_active = appldata_timer_active;
int zero = 0;
int one = 1;
int rc;
struct ctl_table ctl_entry = {
.procname = ctl->procname,
.data = &timer_active,
.maxlen = sizeof(int),
.extra1 = &zero,
.extra2 = &one,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
};

rc = proc_douintvec_minmax(&ctl_entry, write, buffer, lenp, ppos);
Expand All @@ -255,13 +253,12 @@ appldata_interval_handler(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
int interval = appldata_interval;
int one = 1;
int rc;
struct ctl_table ctl_entry = {
.procname = ctl->procname,
.data = &interval,
.maxlen = sizeof(int),
.extra1 = &one,
.extra1 = SYSCTL_ONE,
};

rc = proc_dointvec_minmax(&ctl_entry, write, buffer, lenp, ppos);
Expand Down Expand Up @@ -289,13 +286,11 @@ appldata_generic_handler(struct ctl_table *ctl, int write,
struct list_head *lh;
int rc, found;
int active;
int zero = 0;
int one = 1;
struct ctl_table ctl_entry = {
.data = &active,
.maxlen = sizeof(int),
.extra1 = &zero,
.extra2 = &one,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
};

found = 0;
Expand Down
6 changes: 2 additions & 4 deletions arch/s390/kernel/topology.c
Original file line number Diff line number Diff line change
Expand Up @@ -587,15 +587,13 @@ static int topology_ctl_handler(struct ctl_table *ctl, int write,
{
int enabled = topology_is_enabled();
int new_mode;
int zero = 0;
int one = 1;
int rc;
struct ctl_table ctl_entry = {
.procname = ctl->procname,
.data = &enabled,
.maxlen = sizeof(int),
.extra1 = &zero,
.extra2 = &one,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
};

rc = proc_douintvec_minmax(&ctl_entry, write, buffer, lenp, ppos);
Expand Down
18 changes: 10 additions & 8 deletions arch/s390/mm/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
unsigned long size_pages = PFN_DOWN(size);
int rc;

if (WARN_ON_ONCE(restrictions->altmap))
return -EINVAL;

rc = vmem_add_mapping(start, size);
if (rc)
return rc;
Expand All @@ -283,16 +286,15 @@ int arch_add_memory(int nid, u64 start, u64 size,
return rc;
}

#ifdef CONFIG_MEMORY_HOTREMOVE
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
/*
* There is no hardware or firmware interface which could trigger a
* hot memory remove on s390. So there is nothing that needs to be
* implemented.
*/
BUG();
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;

zone = page_zone(pfn_to_page(start_pfn));
__remove_pages(zone, start_pfn, nr_pages, altmap);
vmem_remove_mapping(start, size);
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
2 changes: 0 additions & 2 deletions arch/sh/mm/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,6 @@ int memory_add_physaddr_to_nid(u64 addr)
EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
#endif

#ifdef CONFIG_MEMORY_HOTREMOVE
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
Expand All @@ -440,5 +439,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
zone = page_zone(pfn_to_page(start_pfn));
__remove_pages(zone, start_pfn, nr_pages, altmap);
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
7 changes: 2 additions & 5 deletions arch/x86/entry/vdso/vdso32-setup.c
Original file line number Diff line number Diff line change
Expand Up @@ -65,18 +65,15 @@ subsys_initcall(sysenter_setup);
/* Register vsyscall32 into the ABI table */
#include <linux/sysctl.h>

static const int zero;
static const int one = 1;

static struct ctl_table abi_table2[] = {
{
.procname = "vsyscall32",
.data = &vdso32_enabled,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = (int *)&zero,
.extra2 = (int *)&one,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
{}
};
Expand Down
6 changes: 2 additions & 4 deletions arch/x86/kernel/itmt.c
Original file line number Diff line number Diff line change
Expand Up @@ -65,17 +65,15 @@ static int sched_itmt_update_handler(struct ctl_table *table, int write,
return ret;
}

static unsigned int zero;
static unsigned int one = 1;
static struct ctl_table itmt_kern_table[] = {
{
.procname = "sched_itmt_enabled",
.data = &sysctl_sched_itmt_enabled,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = sched_itmt_update_handler,
.extra1 = &zero,
.extra2 = &one,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
{}
};
Expand Down
2 changes: 0 additions & 2 deletions arch/x86/mm/init_32.c
Original file line number Diff line number Diff line change
Expand Up @@ -860,7 +860,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
return __add_pages(nid, start_pfn, nr_pages, restrictions);
}

#ifdef CONFIG_MEMORY_HOTREMOVE
void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
{
Expand All @@ -872,7 +871,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
}
#endif
#endif

int kernel_set_to_readonly __read_mostly;

Expand Down
6 changes: 3 additions & 3 deletions arch/x86/mm/init_64.c
Original file line number Diff line number Diff line change
Expand Up @@ -1198,7 +1198,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
remove_pagetable(start, end, false, altmap);
}

#ifdef CONFIG_MEMORY_HOTREMOVE
static void __meminit
kernel_physical_mapping_remove(unsigned long start, unsigned long end)
{
Expand All @@ -1219,7 +1218,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
__remove_pages(zone, start_pfn, nr_pages, altmap);
kernel_physical_mapping_remove(start, start + size);
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */

static struct kcore_list kcore_vsyscall;
Expand Down Expand Up @@ -1520,7 +1518,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
{
int err;

if (boot_cpu_has(X86_FEATURE_PSE))
if (end - start < PAGES_PER_SECTION * sizeof(struct page))
err = vmemmap_populate_basepages(start, end, node);
else if (boot_cpu_has(X86_FEATURE_PSE))
err = vmemmap_populate_hugepages(start, end, node, altmap);
else if (altmap) {
pr_err_once("%s: no cpu support for altmap allocations\n",
Expand Down
19 changes: 4 additions & 15 deletions drivers/acpi/acpi_memhotplug.c
Original file line number Diff line number Diff line change
Expand Up @@ -155,16 +155,6 @@ static int acpi_memory_check_device(struct acpi_memory_device *mem_device)
return 0;
}

static unsigned long acpi_meminfo_start_pfn(struct acpi_memory_info *info)
{
return PFN_DOWN(info->start_addr);
}

static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info)
{
return PFN_UP(info->start_addr + info->length-1);
}

static int acpi_bind_memblk(struct memory_block *mem, void *arg)
{
return acpi_bind_one(&mem->dev, arg);
Expand All @@ -173,9 +163,8 @@ static int acpi_bind_memblk(struct memory_block *mem, void *arg)
static int acpi_bind_memory_blocks(struct acpi_memory_info *info,
struct acpi_device *adev)
{
return walk_memory_range(acpi_meminfo_start_pfn(info),
acpi_meminfo_end_pfn(info), adev,
acpi_bind_memblk);
return walk_memory_blocks(info->start_addr, info->length, adev,
acpi_bind_memblk);
}

static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
Expand All @@ -186,8 +175,8 @@ static int acpi_unbind_memblk(struct memory_block *mem, void *arg)

static void acpi_unbind_memory_blocks(struct acpi_memory_info *info)
{
walk_memory_range(acpi_meminfo_start_pfn(info),
acpi_meminfo_end_pfn(info), NULL, acpi_unbind_memblk);
walk_memory_blocks(info->start_addr, info->length, NULL,
acpi_unbind_memblk);
}

static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
Expand Down
Loading

0 comments on commit 249be85

Please sign in to comment.