Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge more updates from Andrew Morton:

 - most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
   mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
   memory-hotplug, cleanups, uaccess, migration, gup, pagemap),

 - various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
   checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
   exec, kdump, rapidio, panic, kcov, kgdb, ipc).

* emailed patches from Andrew Morton <[email protected]>: (164 commits)
  mm/gup: remove task_struct pointer for all gup code
  mm: clean up the last pieces of page fault accountings
  mm/xtensa: use general page fault accounting
  mm/x86: use general page fault accounting
  mm/sparc64: use general page fault accounting
  mm/sparc32: use general page fault accounting
  mm/sh: use general page fault accounting
  mm/s390: use general page fault accounting
  mm/riscv: use general page fault accounting
  mm/powerpc: use general page fault accounting
  mm/parisc: use general page fault accounting
  mm/openrisc: use general page fault accounting
  mm/nios2: use general page fault accounting
  mm/nds32: use general page fault accounting
  mm/mips: use general page fault accounting
  mm/microblaze: use general page fault accounting
  mm/m68k: use general page fault accounting
  mm/ia64: use general page fault accounting
  mm/hexagon: use general page fault accounting
  mm/csky: use general page fault accounting
  ...
  • Loading branch information
torvalds committed Aug 12, 2020
2 parents 24fb33d + 64019a2 commit 9ad57f6
Show file tree
Hide file tree
Showing 264 changed files with 2,357 additions and 1,437 deletions.
4 changes: 4 additions & 0 deletions Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1274,6 +1274,10 @@ PAGE_SIZE multiple when read back.
Amount of memory used for storing in-kernel data
structures.

percpu
Amount of memory used for storing per-cpu kernel
data structures.

sock
Amount of memory used in network transmission buffers

Expand Down
3 changes: 2 additions & 1 deletion Documentation/admin-guide/sysctl/kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,8 @@ core_pattern
%s signal number
%t UNIX time of dump
%h hostname
%e executable filename (may be shortened)
%e executable filename (may be shortened, could be changed by prctl etc)
%f executable filename
%E executable path
%c maximum size of core file by resource limit RLIMIT_CORE
%<OTHER> both are dropped
Expand Down
15 changes: 15 additions & 0 deletions Documentation/admin-guide/sysctl/vm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,21 @@ all zones are compacted such that free memory is available in contiguous
blocks where possible. This can be important for example in the allocation of
huge pages although processes will also directly compact memory as required.

compaction_proactiveness
========================

This tunable takes a value in the range [0, 100] with a default value of
20. This tunable determines how aggressively compaction is done in the
background. Setting it to 0 disables proactive compaction.

Note that compaction has a non-trivial system-wide impact as pages
belonging to different processes are moved around, which could also lead
to latency spikes in unsuspecting applications. The kernel employs
various heuristics to avoid wasting CPU cycles if it detects that
proactive compaction is not being effective.

Be careful when setting it to extreme values like 100, as that may
cause excessive background compaction activity.

compact_unevictable_allowed
===========================
Expand Down
11 changes: 3 additions & 8 deletions Documentation/filesystems/proc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1633,9 +1633,6 @@ may allocate from based on an estimation of its current memory and swap use.
For example, if a task is using all allowed memory, its badness score will be
1000. If it is using half of its allowed memory, its score will be 500.

There is an additional factor included in the badness score: the current memory
and swap usage is discounted by 3% for root processes.

The amount of "allowed" memory depends on the context in which the oom killer
was called. If it is due to the memory assigned to the allocating task's cpuset
being exhausted, the allowed memory represents the set of mems assigned to that
Expand Down Expand Up @@ -1671,11 +1668,6 @@ The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
requires CAP_SYS_RESOURCE.

Caveat: when a parent task is selected, the oom killer will sacrifice any first
generation children with separate address spaces instead, if possible. This
avoids servers and important system daemons from being killed and loses the
minimal amount of work.


3.2 /proc/<pid>/oom_score - Display current oom-killer score
-------------------------------------------------------------
Expand All @@ -1684,6 +1676,9 @@ This file can be used to check the current score used by the oom-killer for
any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
process should be killed in an out-of-memory situation.

Please note that the exported value includes oom_score_adj so it is
effectively in range [0,2000].


3.3 /proc/<pid>/io - Display the IO accounting fields
-------------------------------------------------------
Expand Down
27 changes: 27 additions & 0 deletions Documentation/vm/page_migration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -253,5 +253,32 @@ which are function pointers of struct address_space_operations.
PG_isolated is alias with PG_reclaim flag so driver shouldn't use the flag
for own purpose.

Monitoring Migration
=====================

The following events (counters) can be used to monitor page migration.

1. PGMIGRATE_SUCCESS: Normal page migration success. Each count means that a
page was migrated. If the page was a non-THP page, then this counter is
increased by one. If the page was a THP, then this counter is increased by
the number of THP subpages. For example, migration of a single 2MB THP that
has 4KB-size base pages (subpages) will cause this counter to increase by
512.

2. PGMIGRATE_FAIL: Normal page migration failure. Same counting rules as for
_SUCCESS, above: this will be increased by the number of subpages, if it was
a THP.

3. THP_MIGRATION_SUCCESS: A THP was migrated without being split.

4. THP_MIGRATION_FAIL: A THP could not be migrated nor it could be split.

5. THP_MIGRATION_SPLIT: A THP was migrated, but not as such: first, the THP had
to be split. After splitting, a migration retry was used for it's sub-pages.

THP_MIGRATION_* events also update the appropriate PGMIGRATE_SUCCESS or
PGMIGRATE_FAIL events. For example, a THP migration failure will cause both
THP_MIGRATION_FAIL and PGMIGRATE_FAIL to increase.

Christoph Lameter, May 8, 2006.
Minchan Kim, Mar 28, 2016.
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -893,6 +893,10 @@ KBUILD_CFLAGS += $(CC_FLAGS_SCS)
export CC_FLAGS_SCS
endif

ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B
KBUILD_CFLAGS += -falign-functions=32
endif

# arch Makefile may override CC so keep this after arch Makefile is included
NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)

Expand Down
8 changes: 4 additions & 4 deletions arch/alpha/include/asm/io.h
Original file line number Diff line number Diff line change
Expand Up @@ -489,10 +489,10 @@ extern inline void writeq(u64 b, volatile void __iomem *addr)
}
#endif

#define ioread16be(p) be16_to_cpu(ioread16(p))
#define ioread32be(p) be32_to_cpu(ioread32(p))
#define iowrite16be(v,p) iowrite16(cpu_to_be16(v), (p))
#define iowrite32be(v,p) iowrite32(cpu_to_be32(v), (p))
#define ioread16be(p) swab16(ioread16(p))
#define ioread32be(p) swab32(ioread32(p))
#define iowrite16be(v,p) iowrite16(swab16(v), (p))
#define iowrite32be(v,p) iowrite32(swab32(v), (p))

#define inb_p inb
#define inw_p inw
Expand Down
2 changes: 1 addition & 1 deletion arch/alpha/include/asm/uaccess.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
#define get_fs() (current_thread_info()->addr_limit)
#define set_fs(x) (current_thread_info()->addr_limit = (x))

#define segment_eq(a, b) ((a).seg == (b).seg)
#define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg)

/*
* Is a address valid? This does a straightforward calculation rather
Expand Down
8 changes: 3 additions & 5 deletions arch/alpha/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include <linux/interrupt.h>
#include <linux/extable.h>
#include <linux/uaccess.h>
#include <linux/perf_event.h>

extern void die_if_kernel(char *,struct pt_regs *,long, unsigned long *);

Expand Down Expand Up @@ -116,6 +117,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
#endif
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
retry:
mmap_read_lock(mm);
vma = find_vma(mm, address);
Expand Down Expand Up @@ -148,7 +150,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
/* If for any reason at all we couldn't handle the fault,
make sure we exit gracefully rather than endlessly redo
the fault. */
fault = handle_mm_fault(vma, address, flags);
fault = handle_mm_fault(vma, address, flags, regs);

if (fault_signal_pending(fault, regs))
return;
Expand All @@ -164,10 +166,6 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
}

if (flags & FAULT_FLAG_ALLOW_RETRY) {
if (fault & VM_FAULT_MAJOR)
current->maj_flt++;
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;

Expand Down
3 changes: 1 addition & 2 deletions arch/arc/include/asm/segment.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ typedef unsigned long mm_segment_t;

#define KERNEL_DS MAKE_MM_SEG(0)
#define USER_DS MAKE_MM_SEG(TASK_SIZE)

#define segment_eq(a, b) ((a) == (b))
#define uaccess_kernel() (get_fs() == KERNEL_DS)

#endif /* __ASSEMBLY__ */
#endif /* __ASMARC_SEGMENT_H */
2 changes: 1 addition & 1 deletion arch/arc/kernel/process.c
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
goto fail;

mmap_read_lock(current->mm);
ret = fixup_user_fault(current, current->mm, (unsigned long) uaddr,
ret = fixup_user_fault(current->mm, (unsigned long) uaddr,
FAULT_FLAG_WRITE, NULL);
mmap_read_unlock(current->mm);

Expand Down
18 changes: 3 additions & 15 deletions arch/arc/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
if (write)
flags |= FAULT_FLAG_WRITE;

perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
retry:
mmap_read_lock(mm);

Expand All @@ -130,7 +131,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
goto bad_area;
}

fault = handle_mm_fault(vma, address, flags);
fault = handle_mm_fault(vma, address, flags, regs);

/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
Expand All @@ -155,22 +156,9 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
* Major/minor page fault accounting
* (in case of retry we only land here once)
*/
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);

if (likely(!(fault & VM_FAULT_ERROR))) {
if (fault & VM_FAULT_MAJOR) {
tsk->maj_flt++;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
regs, address);
} else {
tsk->min_flt++;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
regs, address);
}

if (likely(!(fault & VM_FAULT_ERROR)))
/* Normal return path: fault Handled Gracefully */
return;
}

if (!user_mode(regs))
goto no_context;
Expand Down
4 changes: 2 additions & 2 deletions arch/arm/include/asm/uaccess.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ static inline void set_fs(mm_segment_t fs)
modify_domain(DOMAIN_KERNEL, fs ? DOMAIN_CLIENT : DOMAIN_MANAGER);
}

#define segment_eq(a, b) ((a) == (b))
#define uaccess_kernel() (get_fs() == KERNEL_DS)

/*
* We use 33-bit arithmetic here. Success returns zero, failure returns
Expand Down Expand Up @@ -267,7 +267,7 @@ extern int __put_user_8(void *, unsigned long long);
*/
#define USER_DS KERNEL_DS

#define segment_eq(a, b) (1)
#define uaccess_kernel() (true)
#define __addr_ok(addr) ((void)(addr), 1)
#define __range_ok(addr, size) ((void)(addr), 0)
#define get_fs() (KERNEL_DS)
Expand Down
2 changes: 2 additions & 0 deletions arch/arm/kernel/signal.c
Original file line number Diff line number Diff line change
Expand Up @@ -713,7 +713,9 @@ struct page *get_signal_page(void)
/* Defer to generic check */
asmlinkage void addr_limit_check_failed(void)
{
#ifdef CONFIG_MMU
addr_limit_user_check();
#endif
}

#ifdef CONFIG_DEBUG_RSEQ
Expand Down
25 changes: 6 additions & 19 deletions arch/arm/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,8 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)

static vm_fault_t __kprobes
__do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
unsigned int flags, struct task_struct *tsk)
unsigned int flags, struct task_struct *tsk,
struct pt_regs *regs)
{
struct vm_area_struct *vma;
vm_fault_t fault;
Expand All @@ -224,7 +225,7 @@ __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
goto out;
}

return handle_mm_fault(vma, addr & PAGE_MASK, flags);
return handle_mm_fault(vma, addr & PAGE_MASK, flags, regs);

check_stack:
/* Don't allow expansion below FIRST_USER_ADDRESS */
Expand Down Expand Up @@ -266,6 +267,8 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
if ((fsr & FSR_WRITE) && !(fsr & FSR_CM))
flags |= FAULT_FLAG_WRITE;

perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);

/*
* As per x86, we may deadlock here. However, since the kernel only
* validly references user space from well defined areas of the code,
Expand All @@ -290,7 +293,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
#endif
}

fault = __do_page_fault(mm, addr, fsr, flags, tsk);
fault = __do_page_fault(mm, addr, fsr, flags, tsk, regs);

/* If we need to retry but a fatal signal is pending, handle the
* signal first. We do not need to release the mmap_lock because
Expand All @@ -302,23 +305,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
return 0;
}

/*
* Major/minor page fault accounting is only done on the
* initial attempt. If we go through a retry, it is extremely
* likely that the page will be found in page cache at that point.
*/

perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
if (!(fault & VM_FAULT_ERROR) && flags & FAULT_FLAG_ALLOW_RETRY) {
if (fault & VM_FAULT_MAJOR) {
tsk->maj_flt++;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
regs, addr);
} else {
tsk->min_flt++;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
regs, addr);
}
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;
goto retry;
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/include/asm/uaccess.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ static inline void set_fs(mm_segment_t fs)
CONFIG_ARM64_UAO));
}

#define segment_eq(a, b) ((a) == (b))
#define uaccess_kernel() (get_fs() == KERNEL_DS)

/*
* Test whether a block of memory is a valid user space address.
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/kernel/sdei.c
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ static __kprobes unsigned long _sdei_handler(struct pt_regs *regs,

/*
* We didn't take an exception to get here, set PAN. UAO will be cleared
* by sdei_event_handler()s set_fs(USER_DS) call.
* by sdei_event_handler()s force_uaccess_begin() call.
*/
__uaccess_enable_hw_pan();

Expand Down
Loading

0 comments on commit 9ad57f6

Please sign in to comment.