Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge more updates from Andrew Morton:
 "190 patches.

  Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
  vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
  migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
  zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
  core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
  signals, exec, kcov, selftests, compress/decompress, and ipc"

* emailed patches from Andrew Morton <[email protected]>: (190 commits)
  ipc/util.c: use binary search for max_idx
  ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
  ipc: use kmalloc for msg_queue and shmid_kernel
  ipc sem: use kvmalloc for sem_undo allocation
  lib/decompressors: remove set but not used variabled 'level'
  selftests/vm/pkeys: exercise x86 XSAVE init state
  selftests/vm/pkeys: refill shadow register after implicit kernel write
  selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
  selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
  kcov: add __no_sanitize_coverage to fix noinstr for all architectures
  exec: remove checks in __register_bimfmt()
  x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
  hfsplus: report create_date to kstat.btime
  hfsplus: remove unnecessary oom message
  nilfs2: remove redundant continue statement in a while-loop
  kprobes: remove duplicated strong free_insn_page in x86 and s390
  init: print out unknown kernel parameters
  checkpatch: do not complain about positive return values starting with EPOLL
  checkpatch: improve the indented label test
  checkpatch: scripts/spdxcheck.py now requires python3
  ...
  • Loading branch information
torvalds committed Jul 2, 2021
2 parents 3dbdb38 + b869d5b commit 71bd934
Show file tree
Hide file tree
Showing 299 changed files with 5,991 additions and 2,961 deletions.
21 changes: 21 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1594,6 +1594,23 @@
Documentation/admin-guide/mm/hugetlbpage.rst.
Format: size[KMG]

hugetlb_free_vmemmap=
[KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
enabled.
Allows heavy hugetlb users to free up some more
memory (6 * PAGE_SIZE for each 2MB hugetlb page).
Format: { on | off (default) }

on: enable the feature
off: disable the feature

Built with CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON=y,
the default is on.

This is not compatible with memory_hotplug.memmap_on_memory.
If both parameters are enabled, hugetlb_free_vmemmap takes
precedence over memory_hotplug.memmap_on_memory.

hung_task_panic=
[KNL] Should the hung task detector generate panics.
Format: 0 | 1
Expand Down Expand Up @@ -2860,6 +2877,10 @@
Note that even when enabled, there are a few cases where
the feature is not effective.

This is not compatible with hugetlb_free_vmemmap. If
both parameters are enabled, hugetlb_free_vmemmap takes
precedence over memory_hotplug.memmap_on_memory.

memtest= [KNL,X86,ARM,PPC,RISCV] Enable memtest
Format: <integer>
default : 0 <disable>
Expand Down
11 changes: 11 additions & 0 deletions Documentation/admin-guide/mm/hugetlbpage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ HugePages_Surp
the pool above the value in ``/proc/sys/vm/nr_hugepages``. The
maximum number of surplus huge pages is controlled by
``/proc/sys/vm/nr_overcommit_hugepages``.
Note: When the feature of freeing unused vmemmap pages associated
with each hugetlb page is enabled, the number of surplus huge pages
may be temporarily larger than the maximum number of surplus huge
pages when the system is under memory pressure.
Hugepagesize
is the default hugepage size (in Kb).
Hugetlb
Expand All @@ -80,6 +84,10 @@ returned to the huge page pool when freed by a task. A user with root
privileges can dynamically allocate more or free some persistent huge pages
by increasing or decreasing the value of ``nr_hugepages``.

Note: When the feature of freeing unused vmemmap pages associated with each
hugetlb page is enabled, we can fail to free the huge pages triggered by
the user when ths system is under memory pressure. Please try again later.

Pages that are used as huge pages are reserved inside the kernel and cannot
be used for other purposes. Huge pages cannot be swapped out under
memory pressure.
Expand Down Expand Up @@ -145,6 +153,9 @@ default_hugepagesz

will all result in 256 2M huge pages being allocated. Valid default
huge page size is architecture dependent.
hugetlb_free_vmemmap
When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, this enables freeing
unused vmemmap pages associated with each HugeTLB page.

When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
indicates the current number of pre-allocated huge pages of the default size.
Expand Down
13 changes: 13 additions & 0 deletions Documentation/admin-guide/mm/memory-hotplug.rst
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,19 @@ creates ZONE_MOVABLE as following.
Unfortunately, there is no information to show which memory block belongs
to ZONE_MOVABLE. This is TBD.

Memory offlining can fail when dissolving a free huge page on ZONE_MOVABLE
and the feature of freeing unused vmemmap pages associated with each hugetlb
page is enabled.

This can happen when we have plenty of ZONE_MOVABLE memory, but not enough
kernel memory to allocate vmemmmap pages. We may even be able to migrate
huge page contents, but will not be able to dissolve the source huge page.
This will prevent an offline operation and is unfortunate as memory offlining
is expected to succeed on movable zones. Users that depend on memory hotplug
to succeed for movable zones should carefully consider whether the memory
savings gained from this feature are worth the risk of possibly not being
able to offline memory in certain situations.

.. note::
Techniques that rely on long-term pinnings of memory (especially, RDMA and
vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory
Expand Down
2 changes: 2 additions & 0 deletions Documentation/admin-guide/mm/pagemap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ There are four components to pagemap:
* Bit 55 pte is soft-dirty (see
:ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
* Bit 56 page exclusively mapped (since 4.2)
* Bit 57 pte is uffd-wp write-protected (since 5.13) (see
:ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
* Bits 57-60 zero
* Bit 61 page is file-page or shared-anon (since 3.5)
* Bit 62 page swapped
Expand Down
3 changes: 2 additions & 1 deletion Documentation/admin-guide/mm/userfaultfd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,8 @@ events, except page fault notifications, may be generated:

- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
areas.
areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating
support for shmem virtual memory areas.

The userland application should set the feature flags it intends to use
when invoking the ``UFFDIO_API`` ioctl, to request that those features be
Expand Down
7 changes: 2 additions & 5 deletions Documentation/core-api/kernel-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,8 @@ String Conversions
.. kernel-doc:: lib/vsprintf.c
:export:

.. kernel-doc:: include/linux/kernel.h
:functions: kstrtol

.. kernel-doc:: include/linux/kernel.h
:functions: kstrtoul
.. kernel-doc:: include/linux/kstrtox.h
:functions: kstrtol kstrtoul

.. kernel-doc:: lib/kstrtox.c
:export:
Expand Down
48 changes: 40 additions & 8 deletions Documentation/filesystems/proc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -933,8 +933,15 @@ meminfo
~~~~~~~

Provides information about distribution and utilization of memory. This
varies by architecture and compile options. The following is from a
16GB PIII, which has highmem enabled. You may not have all of these fields.
varies by architecture and compile options. Some of the counters reported
here overlap. The memory reported by the non overlapping counters may not
add up to the overall memory usage and the difference for some workloads
can be substantial. In many cases there are other means to find out
additional memory using subsystem specific interfaces, for instance
/proc/net/sockstat for TCP memory allocations.

The following is from a 16GB PIII, which has highmem enabled.
You may not have all of these fields.

::

Expand Down Expand Up @@ -1913,18 +1920,20 @@ if precise results are needed.
3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
---------------------------------------------------------------
This file provides information associated with an opened file. The regular
files have at least three fields -- 'pos', 'flags' and 'mnt_id'. The 'pos'
represents the current offset of the opened file in decimal form [see lseek(2)
for details], 'flags' denotes the octal O_xxx mask the file has been
created with [see open(2) for details] and 'mnt_id' represents mount ID of
the file system containing the opened file [see 3.5 /proc/<pid>/mountinfo
for details].
files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'.
The 'pos' represents the current offset of the opened file in decimal
form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
file has been created with [see open(2) for details] and 'mnt_id' represents
mount ID of the file system containing the opened file [see 3.5
/proc/<pid>/mountinfo for details]. 'ino' represents the inode number of
the file.

A typical output is::

pos: 0
flags: 0100002
mnt_id: 19
ino: 63107

All locks associated with a file descriptor are shown in its fdinfo too::

Expand All @@ -1941,6 +1950,7 @@ Eventfd files
pos: 0
flags: 04002
mnt_id: 9
ino: 63107
eventfd-count: 5a

where 'eventfd-count' is hex value of a counter.
Expand All @@ -1953,6 +1963,7 @@ Signalfd files
pos: 0
flags: 04002
mnt_id: 9
ino: 63107
sigmask: 0000000000000200

where 'sigmask' is hex value of the signal mask associated
Expand All @@ -1966,6 +1977,7 @@ Epoll files
pos: 0
flags: 02
mnt_id: 9
ino: 63107
tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7

where 'tfd' is a target file descriptor number in decimal form,
Expand All @@ -1982,6 +1994,8 @@ For inotify files the format is the following::

pos: 0
flags: 02000000
mnt_id: 9
ino: 63107
inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d

where 'wd' is a watch descriptor in decimal form, i.e. a target file
Expand All @@ -2004,6 +2018,7 @@ For fanotify files the format is::
pos: 0
flags: 02
mnt_id: 9
ino: 63107
fanotify flags:10 event-flags:0
fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
Expand All @@ -2028,6 +2043,7 @@ Timerfd files
pos: 0
flags: 02
mnt_id: 9
ino: 63107
clockid: 0
ticks: 0
settime flags: 01
Expand All @@ -2042,6 +2058,22 @@ details]. 'it_value' is remaining time until the timer expiration.
with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value'
still exhibits timer's remaining time.

DMA Buffer files
~~~~~~~~~~~~~~~~

::

pos: 0
flags: 04002
mnt_id: 9
ino: 63107
size: 32768
count: 2
exp_name: system-heap

where 'size' is the size of the DMA buffer in bytes. 'count' is the file count of
the DMA buffer file. 'exp_name' is the name of the DMA buffer exporter.

3.9 /proc/<pid>/map_files - Information about memory mapped files
---------------------------------------------------------------------
This directory contains symbolic links which represent memory mapped files
Expand Down
19 changes: 18 additions & 1 deletion Documentation/vm/hmm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ between device driver specific code and shared common code:
walks to fill in the ``args->src`` array with PFNs to be migrated.
The ``invalidate_range_start()`` callback is passed a
``struct mmu_notifier_range`` with the ``event`` field set to
``MMU_NOTIFY_MIGRATE`` and the ``migrate_pgmap_owner`` field set to
``MMU_NOTIFY_MIGRATE`` and the ``owner`` field set to
the ``args->pgmap_owner`` field passed to migrate_vma_setup(). This is
allows the device driver to skip the invalidation callback and only
invalidate device private MMU mappings that are actually migrating.
Expand Down Expand Up @@ -405,6 +405,23 @@ between device driver specific code and shared common code:

The lock can now be released.

Exclusive access memory
=======================

Some devices have features such as atomic PTE bits that can be used to implement
atomic access to system memory. To support atomic operations to a shared virtual
memory page such a device needs access to that page which is exclusive of any
userspace access from the CPU. The ``make_device_exclusive_range()`` function
can be used to make a memory range inaccessible from userspace.

This replaces all mappings for pages in the given range with special swap
entries. Any attempt to access the swap entry results in a fault which is
resovled by replacing the entry with the original mapping. A driver gets
notified that the mapping has been changed by MMU notifiers, after which point
it will no longer have exclusive access to the page. Exclusive access is
guranteed to last until the driver drops the page lock and page reference, at
which point any CPU faults on the page may proceed as described.

Memory cgroup (memcg) and rss accounting
========================================

Expand Down
33 changes: 13 additions & 20 deletions Documentation/vm/unevictable-lru.rst
Original file line number Diff line number Diff line change
Expand Up @@ -389,14 +389,14 @@ mlocked, munlock_vma_page() updates that zone statistics for the number of
mlocked pages. Note, however, that at this point we haven't checked whether
the page is mapped by other VM_LOCKED VMAs.

We can't call try_to_munlock(), the function that walks the reverse map to
We can't call page_mlock(), the function that walks the reverse map to
check for other VM_LOCKED VMAs, without first isolating the page from the LRU.
try_to_munlock() is a variant of try_to_unmap() and thus requires that the page
page_mlock() is a variant of try_to_unmap() and thus requires that the page
not be on an LRU list [more on these below]. However, the call to
isolate_lru_page() could fail, in which case we couldn't try_to_munlock(). So,
isolate_lru_page() could fail, in which case we can't call page_mlock(). So,
we go ahead and clear PG_mlocked up front, as this might be the only chance we
have. If we can successfully isolate the page, we go ahead and
try_to_munlock(), which will restore the PG_mlocked flag and update the zone
have. If we can successfully isolate the page, we go ahead and call
page_mlock(), which will restore the PG_mlocked flag and update the zone
page statistics if it finds another VMA holding the page mlocked. If we fail
to isolate the page, we'll have left a potentially mlocked page on the LRU.
This is fine, because we'll catch it later if and if vmscan tries to reclaim
Expand Down Expand Up @@ -545,31 +545,24 @@ munlock or munmap system calls, mm teardown (munlock_vma_pages_all), reclaim,
holepunching, and truncation of file pages and their anonymous COWed pages.


try_to_munlock() Reverse Map Scan
page_mlock() Reverse Map Scan
---------------------------------

.. warning::
[!] TODO/FIXME: a better name might be page_mlocked() - analogous to the
page_referenced() reverse map walker.

When munlock_vma_page() [see section :ref:`munlock()/munlockall() System Call
Handling <munlock_munlockall_handling>` above] tries to munlock a
page, it needs to determine whether or not the page is mapped by any
VM_LOCKED VMA without actually attempting to unmap all PTEs from the
page. For this purpose, the unevictable/mlock infrastructure
introduced a variant of try_to_unmap() called try_to_munlock().
introduced a variant of try_to_unmap() called page_mlock().

try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
mapped file and KSM pages with a flag argument specifying unlock versus unmap
processing. Again, these functions walk the respective reverse maps looking
for VM_LOCKED VMAs. When such a VMA is found, as in the try_to_unmap() case,
the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK. This
undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page.
page_mlock() walks the respective reverse maps looking for VM_LOCKED VMAs. When
such a VMA is found the page is mlocked via mlock_vma_page(). This undoes the
pre-clearing of the page's PG_mlocked done by munlock_vma_page.

Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's
Note that page_mlock()'s reverse map walk must visit every VMA in a page's
reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
However, the scan can terminate when it encounters a VM_LOCKED VMA.
Although try_to_munlock() might be called a great many times when munlocking a
Although page_mlock() might be called a great many times when munlocking a
large region or tearing down a large address space that has been mlocked via
mlockall(), overall this is a fairly rare event.

Expand Down Expand Up @@ -602,7 +595,7 @@ inactive lists to the appropriate node's unevictable list.
shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd
after shrink_active_list() had moved them to the inactive list, or pages mapped
into VM_LOCKED VMAs that munlock_vma_page() couldn't isolate from the LRU to
recheck via try_to_munlock(). shrink_inactive_list() won't notice the latter,
recheck via page_mlock(). shrink_inactive_list() won't notice the latter,
but will pass on to shrink_page_list().

shrink_page_list() again culls obviously unevictable pages that it could
Expand Down
10 changes: 9 additions & 1 deletion MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -7704,6 +7704,14 @@ L: [email protected]
S: Maintained
F: drivers/input/touchscreen/resistive-adc-touch.c

GENERIC STRING LIBRARY
R: Andy Shevchenko <[email protected]>
S: Maintained
F: lib/string.c
F: lib/string_helpers.c
F: lib/test_string.c
F: lib/test-string_helpers.c

GENERIC UIO DRIVER FOR PCI DEVICES
M: "Michael S. Tsirkin" <[email protected]>
L: [email protected]
Expand Down Expand Up @@ -11900,6 +11908,7 @@ F: include/linux/mmzone.h
F: include/linux/pagewalk.h
F: include/linux/vmalloc.h
F: mm/
F: tools/testing/selftests/vm/

MEMORY TECHNOLOGY DEVICES (MTD)
M: Miquel Raynal <[email protected]>
Expand Down Expand Up @@ -20307,7 +20316,6 @@ M: Seth Jennings <[email protected]>
M: Dan Streetman <[email protected]>
L: [email protected]
S: Maintained
F: include/linux/zbud.h
F: mm/zbud.c

ZD1211RW WIRELESS DRIVER
Expand Down
5 changes: 1 addition & 4 deletions arch/alpha/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ config ALPHA
select MMU_GATHER_NO_RANGE
select SET_FS
select SPARSEMEM_EXTREME if SPARSEMEM
select ZONE_DMA
help
The Alpha is a 64-bit general-purpose processor designed and
marketed by the Digital Equipment Corporation of blessed memory,
Expand All @@ -65,10 +66,6 @@ config GENERIC_CALIBRATE_DELAY
bool
default y

config ZONE_DMA
bool
default y

config GENERIC_ISA_DMA
bool
default y
Expand Down
1 change: 0 additions & 1 deletion arch/alpha/include/asm/pgalloc.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ pmd_populate(struct mm_struct *mm, pmd_t *pmd, pgtable_t pte)
{
pmd_set(pmd, (pte_t *)(page_to_pa(pte) + PAGE_OFFSET));
}
#define pmd_pgtable(pmd) pmd_page(pmd)

static inline void
pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte)
Expand Down
Loading

0 comments on commit 71bd934

Please sign in to comment.