Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge misc updates from Andrew Morton:
 "191 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
  slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
  mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
  pagealloc, and memory-failure)"

* emailed patches from Andrew Morton <[email protected]>: (191 commits)
  mm,hwpoison: make get_hwpoison_page() call get_any_page()
  mm,hwpoison: send SIGBUS with error virutal address
  mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
  mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
  mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
  mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
  docs: remove description of DISCONTIGMEM
  arch, mm: remove stale mentions of DISCONIGMEM
  mm: remove CONFIG_DISCONTIGMEM
  m68k: remove support for DISCONTIGMEM
  arc: remove support for DISCONTIGMEM
  arc: update comment about HIGHMEM implementation
  alpha: remove DISCONTIGMEM and NUMA
  mm/page_alloc: move free_the_page
  mm/page_alloc: fix counting of managed_pages
  mm/page_alloc: improve memmap_pages dbg msg
  mm: drop SECTION_SHIFT in code comments
  mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
  mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
  mm/page_alloc: scale the number of pages that are batch freed
  ...
  • Loading branch information
torvalds committed Jun 30, 2021
2 parents 349a2d5 + 0ed950d commit 65090f3
Show file tree
Hide file tree
Showing 259 changed files with 3,757 additions and 2,804 deletions.
6 changes: 6 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3591,6 +3591,12 @@
off: turn off poisoning (default)
on: turn on poisoning

page_reporting.page_reporting_order=
[KNL] Minimal page reporting order
Format: <integer>
Adjust the minimal page reporting order. The page
reporting is disabled when it exceeds (MAX_ORDER-1).

panic= [KNL] Kernel behaviour on panic: delay <timeout>
timeout > 0: seconds before rebooting
timeout = 0: wait forever
Expand Down
4 changes: 2 additions & 2 deletions Documentation/admin-guide/lockup-watchdogs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,15 @@ in principle, they should work in any architecture where these
subsystems are present.

A periodic hrtimer runs to generate interrupts and kick the watchdog
task. An NMI perf event is generated every "watchdog_thresh"
job. An NMI perf event is generated every "watchdog_thresh"
(compile-time initialized to 10 and configurable through sysctl of the
same name) seconds to check for hardlockups. If any CPU in the system
does not receive any hrtimer interrupt during that time the
'hardlockup detector' (the handler for the NMI perf event) will
generate a kernel warning or call panic, depending on the
configuration.

The watchdog task is a high priority kernel thread that updates a
The watchdog job runs in a stop scheduling thread that updates a
timestamp every time it is scheduled. If that timestamp is not updated
for 2*watchdog_thresh seconds (the softlockup threshold) the
'softlockup detector' (coded inside the hrtimer callback function)
Expand Down
10 changes: 5 additions & 5 deletions Documentation/admin-guide/sysctl/kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1297,11 +1297,11 @@ This parameter can be used to control the soft lockup detector.
= =================================

The soft lockup detector monitors CPUs for threads that are hogging the CPUs
without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads
from running. The mechanism depends on the CPUs ability to respond to timer
interrupts which are needed for the 'watchdog/N' threads to be woken up by
the watchdog timer function, otherwise the NMI watchdog — if enabled — can
detect a hard lockup condition.
without rescheduling voluntarily, and thus prevent the 'migration/N' threads
from running, causing the watchdog work fail to execute. The mechanism depends
on the CPUs ability to respond to timer interrupts which are needed for the
watchdog work to be queued by the watchdog timer function, otherwise the NMI
watchdog — if enabled — can detect a hard lockup condition.


stack_erasing
Expand Down
42 changes: 22 additions & 20 deletions Documentation/admin-guide/sysctl/vm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Currently, these files are in /proc/sys/vm:
- overcommit_ratio
- page-cluster
- panic_on_oom
- percpu_pagelist_fraction
- percpu_pagelist_high_fraction
- stat_interval
- stat_refresh
- numa_stat
Expand Down Expand Up @@ -790,22 +790,24 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
why oom happens. You can get snapshot.


percpu_pagelist_fraction
========================
percpu_pagelist_high_fraction
=============================

This is the fraction of pages at most (high mark pcp->high) in each zone that
are allocated for each per cpu page list. The min value for this is 8. It
means that we don't allow more than 1/8th of pages in each zone to be
allocated in any single per_cpu_pagelist. This entry only changes the value
of hot per cpu pagelists. User can specify a number like 100 to allocate
1/100th of each zone to each per cpu page list.
This is the fraction of pages in each zone that are can be stored to
per-cpu page lists. It is an upper boundary that is divided depending
on the number of online CPUs. The min value for this is 8 which means
that we do not allow more than 1/8th of pages in each zone to be stored
on per-cpu page lists. This entry only changes the value of hot per-cpu
page lists. A user can specify a number like 100 to allocate 1/100th of
each zone between per-cpu lists.

The batch value of each per cpu pagelist is also updated as a result. It is
set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8)
The batch value of each per-cpu page list remains the same regardless of
the value of the high fraction so allocation latencies are unaffected.

The initial value is zero. Kernel does not use this value at boot time to set
the high water marks for each per cpu page list. If the user writes '0' to this
sysctl, it will revert to this default behavior.
The initial value is zero. Kernel uses this value to set the high pcp->high
mark based on the low watermark for the zone and the number of local
online CPUs. If the user writes '0' to this sysctl, it will revert to
this default behavior.


stat_interval
Expand Down Expand Up @@ -936,12 +938,12 @@ allocations, THP and hugetlbfs pages.

To make it sensible with respect to the watermark_scale_factor
parameter, the unit is in fractions of 10,000. The default value of
15,000 on !DISCONTIGMEM configurations means that up to 150% of the high
watermark will be reclaimed in the event of a pageblock being mixed due
to fragmentation. The level of reclaim is determined by the number of
fragmentation events that occurred in the recent past. If this value is
smaller than a pageblock then a pageblocks worth of pages will be reclaimed
(e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature.
15,000 means that up to 150% of the high watermark will be reclaimed in the
event of a pageblock being mixed due to fragmentation. The level of reclaim
is determined by the number of fragmentation events that occurred in the
recent past. If this value is smaller than a pageblock then a pageblocks
worth of pages will be reclaimed (e.g. 2MB on 64-bit x86). A boost factor
of 0 will disable the feature.


watermark_scale_factor
Expand Down
9 changes: 4 additions & 5 deletions Documentation/dev-tools/kasan.rst
Original file line number Diff line number Diff line change
Expand Up @@ -447,11 +447,10 @@ When a test fails due to a failed ``kmalloc``::

When a test fails due to a missing KASAN report::

# kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:629
Expected kasan_data->report_expected == kasan_data->report_found, but
kasan_data->report_expected == 1
kasan_data->report_found == 0
not ok 28 - kmalloc_double_kzfree
# kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:974
KASAN failure expected in "kfree_sensitive(ptr)", but none occurred
not ok 44 - kmalloc_double_kzfree


At the end the cumulative status of all KASAN tests is printed. On success::

Expand Down
45 changes: 2 additions & 43 deletions Documentation/vm/memory-model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,11 @@ for the CPU. Then there could be several contiguous ranges at
completely distinct addresses. And, don't forget about NUMA, where
different memory banks are attached to different CPUs.

Linux abstracts this diversity using one of the three memory models:
FLATMEM, DISCONTIGMEM and SPARSEMEM. Each architecture defines what
Linux abstracts this diversity using one of the two memory models:
FLATMEM and SPARSEMEM. Each architecture defines what
memory models it supports, what the default memory model is and
whether it is possible to manually override that default.

.. note::
At time of this writing, DISCONTIGMEM is considered deprecated,
although it is still in use by several architectures.

All the memory models track the status of physical page frames using
struct page arranged in one or more arrays.

Expand Down Expand Up @@ -63,43 +59,6 @@ straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the
The `ARCH_PFN_OFFSET` defines the first page frame number for
systems with physical memory starting at address different from 0.

DISCONTIGMEM
============

The DISCONTIGMEM model treats the physical memory as a collection of
`nodes` similarly to how Linux NUMA support does. For each node Linux
constructs an independent memory management subsystem represented by
`struct pglist_data` (or `pg_data_t` for short). Among other
things, `pg_data_t` holds the `node_mem_map` array that maps
physical pages belonging to that node. The `node_start_pfn` field of
`pg_data_t` is the number of the first page frame belonging to that
node.

The architecture setup code should call :c:func:`free_area_init_node` for
each node in the system to initialize the `pg_data_t` object and its
`node_mem_map`.

Every `node_mem_map` behaves exactly as FLATMEM's `mem_map` -
every physical page frame in a node has a `struct page` entry in the
`node_mem_map` array. When DISCONTIGMEM is enabled, a portion of the
`flags` field of the `struct page` encodes the node number of the
node hosting that page.

The conversion between a PFN and the `struct page` in the
DISCONTIGMEM model became slightly more complex as it has to determine
which node hosts the physical page and which `pg_data_t` object
holds the `struct page`.

Architectures that support DISCONTIGMEM provide :c:func:`pfn_to_nid`
to convert PFN to the node number. The opposite conversion helper
:c:func:`page_to_nid` is generic as it uses the node number encoded in
page->flags.

Once the node number is known, the PFN can be used to index
appropriate `node_mem_map` array to access the `struct page` and
the offset of the `struct page` from the `node_mem_map` plus
`node_start_pfn` is the PFN of that page.

SPARSEMEM
=========

Expand Down
22 changes: 0 additions & 22 deletions arch/alpha/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -549,29 +549,12 @@ config NR_CPUS
MARVEL support can handle a maximum of 32 CPUs, all the others
with working support have a maximum of 4 CPUs.

config ARCH_DISCONTIGMEM_ENABLE
bool "Discontiguous Memory Support"
depends on BROKEN
help
Say Y to support efficient handling of discontiguous physical memory,
for architectures which are either NUMA (Non-Uniform Memory Access)
or have huge holes in the physical address space for other reasons.
See <file:Documentation/vm/numa.rst> for more.

config ARCH_SPARSEMEM_ENABLE
bool "Sparse Memory Support"
help
Say Y to support efficient handling of discontiguous physical memory,
for systems that have huge holes in the physical address space.

config NUMA
bool "NUMA Support (EXPERIMENTAL)"
depends on DISCONTIGMEM && BROKEN
help
Say Y to compile the kernel to support NUMA (Non-Uniform Memory
Access). This option is for configuring high-end multiprocessor
server machines. If in doubt, say N.

config ALPHA_WTINT
bool "Use WTINT" if ALPHA_SRM || ALPHA_GENERIC
default y if ALPHA_QEMU
Expand All @@ -596,11 +579,6 @@ config ALPHA_WTINT

If unsure, say N.

config NODES_SHIFT
int
default "7"
depends on NEED_MULTIPLE_NODES

# LARGE_VMALLOC is racy, if you *really* need it then fix it first
config ALPHA_LARGE_VMALLOC
bool
Expand Down
6 changes: 0 additions & 6 deletions arch/alpha/include/asm/machvec.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,12 +99,6 @@ struct alpha_machine_vector

const char *vector_name;

/* NUMA information */
int (*pa_to_nid)(unsigned long);
int (*cpuid_to_nid)(int);
unsigned long (*node_mem_start)(int);
unsigned long (*node_mem_size)(int);

/* System specific parameters. */
union {
struct {
Expand Down
100 changes: 0 additions & 100 deletions arch/alpha/include/asm/mmzone.h

This file was deleted.

4 changes: 0 additions & 4 deletions arch/alpha/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,6 @@ extern unsigned long __zero_page(void);
#define page_to_pa(page) (page_to_pfn(page) << PAGE_SHIFT)
#define pte_pfn(pte) (pte_val(pte) >> 32)

#ifndef CONFIG_DISCONTIGMEM
#define pte_page(pte) pfn_to_page(pte_pfn(pte))
#define mk_pte(page, pgprot) \
({ \
Expand All @@ -215,7 +214,6 @@ extern unsigned long __zero_page(void);
pte_val(pte) = (page_to_pfn(page) << 32) | pgprot_val(pgprot); \
pte; \
})
#endif

extern inline pte_t pfn_pte(unsigned long physpfn, pgprot_t pgprot)
{ pte_t pte; pte_val(pte) = (PHYS_TWIDDLE(physpfn) << 32) | pgprot_val(pgprot); return pte; }
Expand Down Expand Up @@ -330,9 +328,7 @@ extern inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })

#ifndef CONFIG_DISCONTIGMEM
#define kern_addr_valid(addr) (1)
#endif

#define pte_ERROR(e) \
printk("%s:%d: bad pte %016lx.\n", __FILE__, __LINE__, pte_val(e))
Expand Down
Loading

0 comments on commit 65090f3

Please sign in to comment.