Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge more updates from Andrew Morton:
 "The remainder of the main mm/ queue.

  143 patches.

  Subsystems affected by this patch series (all mm): pagecache, hugetlb,
  userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap,
  kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and
  kfence"

* emailed patches from Andrew Morton <[email protected]>: (143 commits)
  kfence: use power-efficient work queue to run delayed work
  kfence: maximize allocation wait timeout duration
  kfence: await for allocation using wait_event
  kfence: zero guard page after out-of-bounds access
  mm/process_vm_access.c: remove duplicate include
  mm/mempool: minor coding style tweaks
  mm/highmem.c: fix coding style issue
  btrfs: use memzero_page() instead of open coded kmap pattern
  iov_iter: lift memzero_page() to highmem.h
  mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.
  mm/zswap.c: switch from strlcpy to strscpy
  arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
  x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
  mm,memory_hotplug: add kernel boot option to enable memmap_on_memory
  acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
  mm,memory_hotplug: allocate memmap from the added memory range
  mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()
  mm,memory_hotplug: relax fully spanned sections check
  drivers/base/memory: introduce memory_block_{online,offline}
  mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove
  ...
  • Loading branch information
torvalds committed May 5, 2021
2 parents a79cdfb + 36f0b35 commit 8404c9f
Show file tree
Hide file tree
Showing 125 changed files with 3,386 additions and 1,458 deletions.
25 changes: 25 additions & 0 deletions Documentation/ABI/testing/sysfs-kernel-mm-cma
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
What: /sys/kernel/mm/cma/
Date: Feb 2021
Contact: Minchan Kim <[email protected]>
Description:
/sys/kernel/mm/cma/ contains a subdirectory for each CMA
heap name (also sometimes called CMA areas).

Each CMA heap subdirectory (that is, each
/sys/kernel/mm/cma/<cma-heap-name> directory) contains the
following items:

alloc_pages_success
alloc_pages_fail

What: /sys/kernel/mm/cma/<cma-heap-name>/alloc_pages_success
Date: Feb 2021
Contact: Minchan Kim <[email protected]>
Description:
the number of pages CMA API succeeded to allocate

What: /sys/kernel/mm/cma/<cma-heap-name>/alloc_pages_fail
Date: Feb 2021
Contact: Minchan Kim <[email protected]>
Description:
the number of pages CMA API failed to allocate
17 changes: 17 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2804,6 +2804,23 @@
seconds. Use this parameter to check at some
other rate. 0 disables periodic checking.

memory_hotplug.memmap_on_memory
[KNL,X86,ARM] Boolean flag to enable this feature.
Format: {on | off (default)}
When enabled, runtime hotplugged memory will
allocate its internal metadata (struct pages)
from the hotadded memory which will allow to
hotadd a lot of memory without requiring
additional memory to do so.
This feature is disabled by default because it
has some implication on large (e.g. GB)
allocations in some configurations (e.g. small
memory blocks).
The state of the flag can be read in
/sys/module/memory_hotplug/parameters/memmap_on_memory.
Note that even when enabled, there are a few cases where
the feature is not effective.

memtest= [KNL,X86,ARM,PPC] Enable memtest
Format: <integer>
default : 0 <disable>
Expand Down
9 changes: 9 additions & 0 deletions Documentation/admin-guide/mm/memory-hotplug.rst
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,15 @@ creates ZONE_MOVABLE as following.
Unfortunately, there is no information to show which memory block belongs
to ZONE_MOVABLE. This is TBD.

.. note::
Techniques that rely on long-term pinnings of memory (especially, RDMA and
vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory
hot remove. Pinned pages cannot reside on ZONE_MOVABLE, to guarantee that
memory can still get hot removed - be aware that pinning can fail even if
there is plenty of free memory in ZONE_MOVABLE. In addition, using
ZONE_MOVABLE might make page pinning more expensive, because pages have to be
migrated off that zone first.

.. _memory_hotplug_how_to_offline_memory:

How to offline memory
Expand Down
107 changes: 66 additions & 41 deletions Documentation/admin-guide/mm/userfaultfd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,68 +63,93 @@ the generic ioctl available.

The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
defines what memory types are supported by the ``userfaultfd`` and what
events, except page fault notifications, may be generated.

If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs
virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in
``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be
set if the kernel supports registering ``userfaultfd`` ranges on shared
memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``,
``MAP_SHARED``, ``memfd_create``, etc).

The userland application that wants to use ``userfaultfd`` with hugetlbfs
or shared memory need to set the corresponding flag in
``uffdio_api.features`` to enable those features.

If the userland desires to receive notifications for events other than
page faults, it has to verify that ``uffdio_api.features`` has appropriate
``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
detail below in `Non-cooperative userfaultfd`_ section.

Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should
be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
register a memory range in the ``userfaultfd`` by setting the
events, except page fault notifications, may be generated:

- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
other than page faults are supported. These events are described in more
detail below in the `Non-cooperative userfaultfd`_ section.

- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING``
registrations for hugetlbfs and shared memory (covering all shmem APIs,
i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``,
etc) virtual memory areas, respectively.

- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
areas.

The userland application should set the feature flags it intends to use
when invoking the ``UFFDIO_API`` ioctl, to request that those features be
enabled if supported.

Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER``
ioctl should be invoked (if present in the returned ``uffdio_api.ioctls``
bitmask) to register a memory range in the ``userfaultfd`` by setting the
uffdio_register structure accordingly. The ``uffdio_register.mode``
bitmask will specify to the kernel which kind of faults to track for
the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing
pages). The ``UFFDIO_REGISTER`` ioctl will return the
the range. The ``UFFDIO_REGISTER`` ioctl will return the
``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
userfaults on the range registered. Not all ioctls will necessarily be
supported for all memory types depending on the underlying virtual
memory backend (anonymous memory vs tmpfs vs real filebacked
mappings).
supported for all memory types (e.g. anonymous memory vs. shmem vs.
hugetlbfs), or all types of intercepted faults.

Userland can use the ``uffdio_register.ioctls`` to manage the virtual
address space in the background (to add or potentially also remove
memory from the ``userfaultfd`` registered range). This means a userfault
could be triggering just before userland maps in the background the
user-faulted page.

The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That
atomically copies a page into the userfault registered range and wakes
up the blocked userfaults
(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set).
Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
guaranteeing that nothing can see an half copied page since it'll
keep userfaulting until the copy has finished.
Resolving Userfaults
--------------------

There are three basic ways to resolve userfaults:

- ``UFFDIO_COPY`` atomically copies some existing page contents from
userspace.

- ``UFFDIO_ZEROPAGE`` atomically zeros the new page.

- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page.

These operations are atomic in the sense that they guarantee nothing can
see a half-populated page, since readers will keep userfaulting until the
operation has finished.

By default, these wake up userfaults blocked on the range in question.
They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates
that waking will be done separately at some later time.

Which ioctl to choose depends on the kind of page fault, and what we'd
like to do to resolve it:

- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be
resolved by either providing a new page (``UFFDIO_COPY``), or mapping
the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map
the zero page for a missing fault. With userfaultfd, userspace can
decide what content to provide before the faulting thread continues.

- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in
the page cache). Userspace has the option of modifying the page's
contents before resolving the fault. Once the contents are correct
(modified or not), userspace asks the kernel to map the page and let the
faulting thread continue with ``UFFDIO_CONTINUE``.

Notes:

- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then
you must provide some kind of page in your thread after reading from
the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``.
The normal behavior of the OS automatically providing a zero page on
an anonymous mmaping is not in place.
- You can tell which kind of fault occurred by examining
``pagefault.flags`` within the ``uffd_msg``, checking for the
``UFFD_PAGEFAULT_FLAG_*`` flags.

- None of the page-delivering ioctls default to the range that you
registered with. You must fill in all fields for the appropriate
ioctl struct including the range.

- You get the address of the access that triggered the missing page
event out of a struct uffd_msg that you read in the thread from the
uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or
``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then
the first of any of those IOCTLs wakes up the faulting thread.
uffd. You can supply as many pages as you want with these IOCTLs.
Keep in mind that unless you used DONTWAKE then the first of any of
those IOCTLs wakes up the faulting thread.

- Be sure to test for all errors including
(``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges
Expand Down
9 changes: 2 additions & 7 deletions arch/arc/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
config ARC
def_bool y
select ARC_TIMERS
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_PTE_SPECIAL
Expand All @@ -28,6 +29,7 @@ config ARC
select GENERIC_SMP_IDLE_THREAD
select HAVE_ARCH_KGDB
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARC_MMU_V4
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_DEBUG_KMEMLEAK
select HAVE_FUTEX_CMPXCHG if FUTEX
Expand All @@ -48,9 +50,6 @@ config ARC
select HAVE_ARCH_JUMP_LABEL if ISA_ARCV2 && !CPU_ENDIAN_BE32
select SET_FS

config ARCH_HAS_CACHE_LINE_SIZE
def_bool y

config TRACE_IRQFLAGS_SUPPORT
def_bool y

Expand Down Expand Up @@ -86,10 +85,6 @@ config STACKTRACE_SUPPORT
def_bool y
select STACKTRACE

config HAVE_ARCH_TRANSPARENT_HUGEPAGE
def_bool y
depends on ARC_MMU_V4

menu "ARC Architecture Configuration"

menu "ARC Platform/SoC/Board"
Expand Down
10 changes: 2 additions & 8 deletions arch/arm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ config ARM
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_HUGETLBFS if ARM_LPAE
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_MEMTEST
Expand Down Expand Up @@ -77,6 +78,7 @@ config ARM
select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARM_LPAE
select HAVE_ARM_SMCCC if CPU_V7
select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
select HAVE_CONTEXT_TRACKING
Expand Down Expand Up @@ -1511,14 +1513,6 @@ config HW_PERF_EVENTS
def_bool y
depends on ARM_PMU

config SYS_SUPPORTS_HUGETLBFS
def_bool y
depends on ARM_LPAE

config HAVE_ARCH_TRANSPARENT_HUGEPAGE
def_bool y
depends on ARM_LPAE

config ARCH_WANT_GENERAL_HUGETLB
def_bool y

Expand Down
30 changes: 9 additions & 21 deletions arch/arm64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ config ARM64
select ACPI_PPTT if ACPI
select ARCH_HAS_DEBUG_WX
select ARCH_BINFMT_ELF_STATE
select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
select ARCH_ENABLE_MEMORY_HOTPLUG
select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DMA_PREP_COHERENT
Expand Down Expand Up @@ -72,6 +78,7 @@ config ARM64
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_USE_SYM_ANNOTATIONS
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
select ARCH_SUPPORTS_HUGETLBFS
select ARCH_SUPPORTS_MEMORY_FAILURE
select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
select ARCH_SUPPORTS_LTO_CLANG if CPU_LITTLE_ENDIAN
Expand Down Expand Up @@ -213,6 +220,7 @@ config ARM64
select SWIOTLB
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
help
ARM 64-bit (AArch64) Linux support.

Expand Down Expand Up @@ -308,10 +316,7 @@ config ZONE_DMA32
bool "Support DMA32 zone" if EXPERT
default y

config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y

config ARCH_ENABLE_MEMORY_HOTREMOVE
config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
def_bool y

config SMP
Expand Down Expand Up @@ -1070,18 +1075,9 @@ config HW_PERF_EVENTS
def_bool y
depends on ARM_PMU

config SYS_SUPPORTS_HUGETLBFS
def_bool y

config ARCH_HAS_CACHE_LINE_SIZE
def_bool y

config ARCH_HAS_FILTER_PGPROT
def_bool y

config ARCH_ENABLE_SPLIT_PMD_PTLOCK
def_bool y if PGTABLE_LEVELS > 2

# Supported by clang >= 7.0
config CC_HAVE_SHADOW_CALL_STACK
def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
Expand Down Expand Up @@ -1923,14 +1919,6 @@ config SYSVIPC_COMPAT
def_bool y
depends on COMPAT && SYSVIPC

config ARCH_ENABLE_HUGEPAGE_MIGRATION
def_bool y
depends on HUGETLB_PAGE && MIGRATION

config ARCH_ENABLE_THP_MIGRATION
def_bool y
depends on TRANSPARENT_HUGEPAGE

menu "Power management options"

source "kernel/power/Kconfig"
Expand Down
7 changes: 3 additions & 4 deletions arch/arm64/mm/hugetlbpage.c
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
set_pte(ptep, pte);
}

pte_t *huge_pte_alloc(struct mm_struct *mm,
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz)
{
pgd_t *pgdp;
Expand Down Expand Up @@ -284,9 +284,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
*/
ptep = pte_alloc_map(mm, pmdp, addr);
} else if (sz == PMD_SIZE) {
if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) &&
pud_none(READ_ONCE(*pudp)))
ptep = huge_pmd_share(mm, addr, pudp);
if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp)))
ptep = huge_pmd_share(mm, vma, addr, pudp);
else
ptep = (pte_t *)pmd_alloc(mm, pudp, addr);
} else if (sz == (CONT_PMD_SIZE)) {
Expand Down
Loading

0 comments on commit 8404c9f

Please sign in to comment.