Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge misc updates from Andrew Morton:

 - a few random little subsystems

 - almost all of the MM patches which are staged ahead of linux-next
   material. I'll trickle to post-linux-next work in as the dependents
   get merged up.

Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).

* emailed patches from Andrew Morton <[email protected]>: (200 commits)
  mm: cleanup kstrto*() usage
  mm: fix fall-through warnings for Clang
  mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
  mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
  mm:backing-dev: use sysfs_emit in macro defining functions
  mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
  mm: use sysfs_emit for struct kobject * uses
  mm: fix kernel-doc markups
  zram: break the strict dependency from lzo
  zram: add stat to gather incompressible pages since zram set up
  zram: support page writeback
  mm/process_vm_access: remove redundant initialization of iov_r
  mm/zsmalloc.c: rework the list_add code in insert_zspage()
  mm/zswap: move to use crypto_acomp API for hardware acceleration
  mm/zswap: fix passing zero to 'PTR_ERR' warning
  mm/zswap: make struct kernel_param_ops definitions const
  userfaultfd/selftests: hint the test runner on required privilege
  userfaultfd/selftests: fix retval check for userfaultfd_open()
  userfaultfd/selftests: always dump something in modes
  userfaultfd: selftests: make __{s,u}64 format specifiers portable
  ...
  • Loading branch information
torvalds committed Dec 15, 2020
2 parents 148842c + dfefd22 commit ac73e3d
Show file tree
Hide file tree
Showing 216 changed files with 4,331 additions and 2,882 deletions.
6 changes: 6 additions & 0 deletions Documentation/admin-guide/blockdev/zram.rst
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,7 @@ line of text and contains the following stats separated by whitespace:
No memory is allocated for such pages.
pages_compacted the number of pages freed during compaction
huge_pages the number of incompressible pages
huge_pages_since the number of incompressible pages since zram set up
================ =============================================================

File /sys/block/zram<id>/bd_stat
Expand Down Expand Up @@ -334,6 +335,11 @@ Admin can request writeback of those idle pages at right timing via::

With the command, zram writeback idle pages from memory to the storage.

If admin want to write a specific page in zram device to backing device,
they could write a page index into the interface.

echo "page_index=1251" > /sys/block/zramX/writeback

If there are lots of write IO with flash device, potentially, it has
flash wearout problem so that admin needs to design write limitation
to guarantee storage health for entire product life.
Expand Down
8 changes: 3 additions & 5 deletions Documentation/admin-guide/cgroup-v1/memcg_test.rst
Original file line number Diff line number Diff line change
Expand Up @@ -219,13 +219,11 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.

This is an easy way to test page migration, too.

9.5 mkdir/rmdir
---------------
9.5 nested cgroups
------------------

When using hierarchy, mkdir/rmdir test should be done.
Use tests like the following::
Use tests like the following for testing nested cgroups::

echo 1 >/opt/cgroup/01/memory/use_hierarchy
mkdir /opt/cgroup/01/child_a
mkdir /opt/cgroup/01/child_b

Expand Down
40 changes: 13 additions & 27 deletions Documentation/admin-guide/cgroup-v1/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@ Brief summary of control files.
memory.soft_limit_in_bytes set/show soft limit of memory usage
memory.stat show various statistics
memory.use_hierarchy set/show hierarchical account enabled
This knob is deprecated and shouldn't be
used.
memory.force_empty trigger forced page reclaim
memory.pressure_level set memory pressure notifications
memory.swappiness set/show swappiness parameter of vmscan
Expand Down Expand Up @@ -495,16 +497,13 @@ cgroup might have some charge associated with it, even though all
tasks have migrated away from it. (because we charge against pages, not
against tasks.)

We move the stats to root (if use_hierarchy==0) or parent (if
use_hierarchy==1), and no change on the charge except uncharging
We move the stats to parent, and no change on the charge except uncharging
from the child.

Charges recorded in swap information is not updated at removal of cgroup.
Recorded information is discarded and a cgroup which uses swap (swapcache)
will be charged as a new owner of it.

About use_hierarchy, see Section 6.

5. Misc. interfaces
===================

Expand All @@ -527,8 +526,6 @@ About use_hierarchy, see Section 6.
write will still return success. In this case, it is expected that
memory.kmem.usage_in_bytes == memory.usage_in_bytes.

About use_hierarchy, see Section 6.

5.2 stat file
-------------

Expand Down Expand Up @@ -675,31 +672,20 @@ hierarchy::
d e

In the diagram above, with hierarchical accounting enabled, all memory
usage of e, is accounted to its ancestors up until the root (i.e, c and root),
that has memory.use_hierarchy enabled. If one of the ancestors goes over its
limit, the reclaim algorithm reclaims from the tasks in the ancestor and the
children of the ancestor.

6.1 Enabling hierarchical accounting and reclaim
------------------------------------------------

A memory cgroup by default disables the hierarchy feature. Support
can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup::
usage of e, is accounted to its ancestors up until the root (i.e, c and root).
If one of the ancestors goes over its limit, the reclaim algorithm reclaims
from the tasks in the ancestor and the children of the ancestor.

# echo 1 > memory.use_hierarchy

The feature can be disabled by::
6.1 Hierarchical accounting and reclaim
---------------------------------------

# echo 0 > memory.use_hierarchy
Hierarchical accounting is enabled by default. Disabling the hierarchical
accounting is deprecated. An attempt to do it will result in a failure
and a warning printed to dmesg.

NOTE1:
Enabling/disabling will fail if either the cgroup already has other
cgroups created below it, or if the parent cgroup has use_hierarchy
enabled.
For compatibility reasons writing 1 to memory.use_hierarchy will always pass::

NOTE2:
When panic_on_oom is set to "2", the whole system will panic in
case of an OOM event in any cgroup.
# echo 1 > memory.use_hierarchy

7. Soft limits
==============
Expand Down
11 changes: 11 additions & 0 deletions Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1274,6 +1274,9 @@ PAGE_SIZE multiple when read back.
kernel_stack
Amount of memory allocated to kernel stacks.

pagetables
Amount of memory allocated for page tables.

percpu(npn)
Amount of memory used for storing per-cpu kernel
data structures.
Expand All @@ -1300,6 +1303,14 @@ PAGE_SIZE multiple when read back.
Amount of memory used in anonymous mappings backed by
transparent hugepages

file_thp
Amount of cached filesystem data backed by transparent
hugepages

shmem_thp
Amount of shm, tmpfs, shared anonymous mmap()s backed by
transparent hugepages

inactive_anon, active_anon, inactive_file, active_file, unevictable
Amount of memory, swap-backed and filesystem-backed,
on the internal memory management lists used by the
Expand Down
15 changes: 0 additions & 15 deletions Documentation/admin-guide/mm/transhuge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -401,21 +401,6 @@ compact_fail
is incremented if the system tries to compact memory
but failed.

compact_pages_moved
is incremented each time a page is moved. If
this value is increasing rapidly, it implies that the system
is copying a lot of data to satisfy the huge page allocation.
It is possible that the cost of copying exceeds any savings
from reduced TLB misses.

compact_pagemigrate_failed
is incremented when the underlying mechanism
for moving a page failed.

compact_blocks_moved
is incremented each time memory compaction examines
a huge page aligned range of pages.

It is possible to establish how long the stalls were using the function
tracer to record how long was spent in __alloc_pages_nodemask and
using the mm_page_alloc tracepoint to identify which allocations were
Expand Down
15 changes: 10 additions & 5 deletions Documentation/admin-guide/sysctl/vm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone.
unprivileged_userfaultfd
========================

This flag controls whether unprivileged users can use the userfaultfd
system calls. Set this to 1 to allow unprivileged users to use the
userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
privileged users (with SYS_CAP_PTRACE capability).
This flag controls the mode in which unprivileged users can use the
userfaultfd system calls. Set this to 0 to restrict unprivileged users
to handle page faults in user mode only. In this case, users without
SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
succeed. Prohibiting use of userfaultfd for handling faults from kernel
mode may make certain vulnerabilities more difficult to exploit.

The default value is 1.
Set this to 1 to allow unprivileged users to use the userfaultfd system
calls without any restrictions.

The default value is 0.


user_reserve_kbytes
Expand Down
4 changes: 4 additions & 0 deletions Documentation/core-api/memory-allocation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,10 @@ The address of a chunk allocated with `kmalloc` is aligned to at least
ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the
alignment is also guaranteed to be at least the respective size.

Chunks allocated with kmalloc() can be resized with krealloc(). Similarly
to kmalloc_array(): a helper for resizing arrays is provided in the form of
krealloc_array().

For large allocations you can use vmalloc() and vzalloc(), or directly
request pages from the page allocator. The memory allocated by `vmalloc`
and related functions is not physically contiguous.
Expand Down
6 changes: 3 additions & 3 deletions Documentation/core-api/pin_user_pages.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,12 +221,12 @@ Unit testing
============
This file::

tools/testing/selftests/vm/gup_benchmark.c
tools/testing/selftests/vm/gup_test.c

has the following new calls to exercise the new pin*() wrapper functions:

* PIN_FAST_BENCHMARK (./gup_benchmark -a)
* PIN_BENCHMARK (./gup_benchmark -b)
* PIN_FAST_BENCHMARK (./gup_test -a)
* PIN_BASIC_TEST (./gup_test -b)

You can monitor how many total dma-pinned pages have been acquired and released
since the system was booted, via two new /proc/vmstat entries: ::
Expand Down
5 changes: 3 additions & 2 deletions Documentation/dev-tools/kasan.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,8 +190,9 @@ function calls GCC directly inserts the code to check the shadow memory.
This option significantly enlarges kernel but it gives x1.1-x2 performance
boost over outline instrumented kernel.

Generic KASAN prints up to 2 call_rcu() call stacks in reports, the last one
and the second to last.
Generic KASAN also reports the last 2 call stacks to creation of work that
potentially has access to an object. Call stacks for the following are shown:
call_rcu() and workqueue queuing.

Software tag-based KASAN
~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
8 changes: 4 additions & 4 deletions Documentation/filesystems/tmpfs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Tmpfs
=====

Tmpfs is a file system which keeps all files in virtual memory.
Tmpfs is a file system which keeps all of its files in virtual memory.


Everything in tmpfs is temporary in the sense that no files will be
Expand Down Expand Up @@ -35,7 +35,7 @@ tmpfs has the following uses:
memory.

This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not
set, the user visible part of tmpfs is not build. But the internal
set, the user visible part of tmpfs is not built. But the internal
mechanisms are always present.

2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
Expand All @@ -50,7 +50,7 @@ tmpfs has the following uses:
This mount is _not_ needed for SYSV shared memory. The internal
mount is used for that. (In the 2.3 kernel versions it was
necessary to mount the predecessor of tmpfs (shm fs) to use SYSV
shared memory)
shared memory.)

3) Some people (including me) find it very convenient to mount it
e.g. on /tmp and /var/tmp and have a big swap partition. And now
Expand Down Expand Up @@ -83,7 +83,7 @@ If nr_blocks=0 (or size=0), blocks will not be limited in that instance;
if nr_inodes=0, inodes will not be limited. It is generally unwise to
mount with such options, since it allows any user with write access to
use up all the memory on the machine; but enhances the scalability of
that instance in a system with many cpus making intensive use of it.
that instance in a system with many CPUs making intensive use of it.


tmpfs has a mount option to set the NUMA memory allocation policy for
Expand Down
3 changes: 1 addition & 2 deletions Documentation/vm/memory-model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,7 @@ call :c:func:`free_area_init` function. Yet, the mappings array is not
usable until the call to :c:func:`memblock_free_all` that hands all the
memory to the page allocator.

If an architecture enables `CONFIG_ARCH_HAS_HOLES_MEMORYMODEL` option,
it may free parts of the `mem_map` array that do not cover the
An architecture may free parts of the `mem_map` array that do not cover the
actual physical pages. In such case, the architecture specific
:c:func:`pfn_valid` implementation should take the holes in the
`mem_map` into account.
Expand Down
12 changes: 6 additions & 6 deletions Documentation/vm/page_owner.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,17 @@ size change due to this facility.
- Without page owner::

text data bss dec hex filename
40662 1493 644 42799 a72f mm/page_alloc.o
48392 2333 644 51369 c8a9 mm/page_alloc.o

- With page owner::

text data bss dec hex filename
40892 1493 644 43029 a815 mm/page_alloc.o
1427 24 8 1459 5b3 mm/page_ext.o
2722 50 0 2772 ad4 mm/page_owner.o
48800 2445 644 51889 cab1 mm/page_alloc.o
6574 108 29 6711 1a37 mm/page_owner.o
1025 8 8 1041 411 mm/page_ext.o

Although, roughly, 4 KB code is added in total, page_alloc.o increase by
230 bytes and only half of it is in hotpath. Building the kernel with
Although, roughly, 8 KB code is added in total, page_alloc.o increase by
520 bytes and less than half of it is in hotpath. Building the kernel with
page owner and turning it on if needed would be great option to debug
kernel memory problem.

Expand Down
21 changes: 17 additions & 4 deletions arch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ config ARCH_HAS_SET_DIRECT_MAP

#
# Select if the architecture provides the arch_dma_set_uncached symbol to
# either provide an uncached segement alias for a DMA allocation, or
# either provide an uncached segment alias for a DMA allocation, or
# to remap the page tables in place.
#
config ARCH_HAS_DMA_SET_UNCACHED
Expand Down Expand Up @@ -314,14 +314,14 @@ config ARCH_32BIT_OFF_T
config HAVE_ASM_MODVERSIONS
bool
help
This symbol should be selected by an architecure if it provides
This symbol should be selected by an architecture if it provides
<asm/asm-prototypes.h> to support the module versioning for symbols
exported from assembly code.

config HAVE_REGS_AND_STACK_ACCESS_API
bool
help
This symbol should be selected by an architecure if it supports
This symbol should be selected by an architecture if it supports
the API needed to access registers and stack entries from pt_regs,
declared in asm/ptrace.h
For example the kprobes-based event tracer needs this API.
Expand All @@ -336,7 +336,7 @@ config HAVE_RSEQ
config HAVE_FUNCTION_ARG_ACCESS_API
bool
help
This symbol should be selected by an architecure if it supports
This symbol should be selected by an architecture if it supports
the API needed to access function arguments from pt_regs,
declared in asm/ptrace.h

Expand Down Expand Up @@ -665,6 +665,13 @@ config HAVE_IRQ_TIME_ACCOUNTING
Archs need to ensure they use a high enough resolution clock to
support irq time accounting and then call enable_sched_clock_irqtime().

config HAVE_MOVE_PUD
bool
help
Architectures that select this are able to move page tables at the
PUD level. If there are only 3 page table levels, the move effectively
happens at the PGD level.

config HAVE_MOVE_PMD
bool
help
Expand Down Expand Up @@ -1054,6 +1061,12 @@ config ARCH_WANT_LD_ORPHAN_WARN
by the linker, since the locations of such sections can change between linker
versions.

config HAVE_ARCH_PFN_VALID
bool

config ARCH_SUPPORTS_DEBUG_PAGEALLOC
bool

source "kernel/gcov/Kconfig"

source "scripts/gcc-plugins/Kconfig"
Expand Down
8 changes: 8 additions & 0 deletions arch/alpha/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ config ALPHA
select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67
select MMU_GATHER_NO_RANGE
select SET_FS
select SPARSEMEM_EXTREME if SPARSEMEM
help
The Alpha is a 64-bit general-purpose processor designed and
marketed by the Digital Equipment Corporation of blessed memory,
Expand Down Expand Up @@ -551,12 +552,19 @@ config NR_CPUS

config ARCH_DISCONTIGMEM_ENABLE
bool "Discontiguous Memory Support"
depends on BROKEN
help
Say Y to support efficient handling of discontiguous physical memory,
for architectures which are either NUMA (Non-Uniform Memory Access)
or have huge holes in the physical address space for other reasons.
See <file:Documentation/vm/numa.rst> for more.

config ARCH_SPARSEMEM_ENABLE
bool "Sparse Memory Support"
help
Say Y to support efficient handling of discontiguous physical memory,
for systems that have huge holes in the physical address space.

config NUMA
bool "NUMA Support (EXPERIMENTAL)"
depends on DISCONTIGMEM && BROKEN
Expand Down
Loading

0 comments on commit ac73e3d

Please sign in to comment.