Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge patch-bomb from Andrew Morton:

 - inotify tweaks

 - some ocfs2 updates (many more are awaiting review)

 - various misc bits

 - kernel/watchdog.c updates

 - Some of mm.  I have a huge number of MM patches this time and quite a
   lot of it is quite difficult and much will be held over to next time.

* emailed patches from Andrew Morton <[email protected]>: (162 commits)
  selftests: vm: add tests for lock on fault
  mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
  mm: introduce VM_LOCKONFAULT
  mm: mlock: add new mlock system call
  mm: mlock: refactor mlock, munlock, and munlockall code
  kasan: always taint kernel on report
  mm, slub, kasan: enable user tracking by default with KASAN=y
  kasan: use IS_ALIGNED in memory_is_poisoned_8()
  kasan: Fix a type conversion error
  lib: test_kasan: add some testcases
  kasan: update reference to kasan prototype repo
  kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
  kasan: various fixes in documentation
  kasan: update log messages
  kasan: accurately determine the type of the bad access
  kasan: update reported bug types for kernel memory accesses
  kasan: update reported bug types for not user nor kernel memory accesses
  mm/kasan: prevent deadlock in kasan reporting
  mm/kasan: don't use kasan shadow pointer in generic functions
  mm/kasan: MODULE_VADDR is not available on all archs
  ...
  • Loading branch information
torvalds committed Nov 6, 2015
2 parents ea5c58e + b3b0d09 commit 2e3078a
Show file tree
Hide file tree
Showing 127 changed files with 3,093 additions and 1,351 deletions.
22 changes: 14 additions & 8 deletions Documentation/filesystems/proc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ read the file /proc/PID/status:
VmLib: 1412 kB
VmPTE: 20 kb
VmSwap: 0 kB
HugetlbPages: 0 kB
Threads: 1
SigQ: 0/28578
SigPnd: 0000000000000000
Expand Down Expand Up @@ -238,6 +239,7 @@ Table 1-2: Contents of the status files (as of 4.1)
VmPTE size of page table entries
VmPMD size of second level page tables
VmSwap size of swap usage (the number of referred swapents)
HugetlbPages size of hugetlb memory portions
Threads number of threads
SigQ number of signals queued/max. number for queue
SigPnd bitmap of pending signals for the thread
Expand Down Expand Up @@ -424,12 +426,15 @@ Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 892 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 374 kB
VmFlags: rd ex mr mw me de
Locked: 0 kB
VmFlags: rd ex mr mw me dw

the first of these lines shows the same information as is displayed for the
mapping in /proc/PID/maps. The remaining lines show the size of the mapping
Expand All @@ -449,9 +454,14 @@ accessed.
"Anonymous" shows the amount of memory that does not belong to any file. Even
a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
and a page is modified, the file page is replaced by a private anonymous copy.
"Swap" shows how much would-be-anonymous memory is also used, but out on
swap.
"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
"Swap" shows how much would-be-anonymous memory is also used, but out on swap.
"SwapPss" shows proportional swap share of this mapping.
"Locked" indicates whether the mapping is locked in memory or not.

"VmFlags" field deserves a separate description. This member represents the kernel
flags associated with the particular virtual memory area in two letter encoded
manner. The codes are the following:
Expand All @@ -475,7 +485,6 @@ manner. The codes are the following:
ac - area is accountable
nr - swap space is not reserved for the area
ht - area uses huge tlb pages
nl - non-linear mapping
ar - architecture specific flag
dd - do not include area into core dump
sd - soft-dirty flag
Expand Down Expand Up @@ -815,9 +824,6 @@ varies by architecture and compile options. The following is from a

> cat /proc/meminfo

The "Locked" indicates whether the mapping is locked in memory or not.


MemTotal: 16344972 kB
MemFree: 13634064 kB
MemAvailable: 14836172 kB
Expand Down
46 changes: 23 additions & 23 deletions Documentation/kasan.txt
Original file line number Diff line number Diff line change
@@ -1,36 +1,34 @@
Kernel address sanitizer
================
KernelAddressSanitizer (KASAN)
==============================

0. Overview
===========

Kernel Address sanitizer (KASan) is a dynamic memory error detector. It provides
KernelAddressSANitizer (KASAN) is a dynamic memory error detector. It provides
a fast and comprehensive solution for finding use-after-free and out-of-bounds
bugs.

KASan uses compile-time instrumentation for checking every memory access,
therefore you will need a gcc version of 4.9.2 or later. KASan could detect out
of bounds accesses to stack or global variables, but only if gcc 5.0 or later was
used to built the kernel.
KASAN uses compile-time instrumentation for checking every memory access,
therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is
required for detection of out-of-bounds accesses to stack or global variables.

Currently KASan is supported only for x86_64 architecture and requires that the
kernel be built with the SLUB allocator.
Currently KASAN is supported only for x86_64 architecture and requires the
kernel to be built with the SLUB allocator.

1. Usage
=========
========

To enable KASAN configure kernel with:

CONFIG_KASAN = y

and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline/inline
is compiler instrumentation types. The former produces smaller binary the
latter is 1.1 - 2 times faster. Inline instrumentation requires a gcc version
of 5.0 or later.
and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline and
inline are compiler instrumentation types. The former produces smaller binary
the latter is 1.1 - 2 times faster. Inline instrumentation requires a GCC
version 5.0 or later.

Currently KASAN works only with the SLUB memory allocator.
For better bug detection and nicer report, enable CONFIG_STACKTRACE and put
at least 'slub_debug=U' in the boot cmdline.
For better bug detection and nicer reporting, enable CONFIG_STACKTRACE.

To disable instrumentation for specific files or directories, add a line
similar to the following to the respective kernel Makefile:
Expand All @@ -42,7 +40,7 @@ similar to the following to the respective kernel Makefile:
KASAN_SANITIZE := n

1.1 Error reports
==========
=================

A typical out of bounds access report looks like this:

Expand Down Expand Up @@ -119,14 +117,16 @@ Memory state around the buggy address:
ffff8800693bc800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

First sections describe slub object where bad access happened.
See 'SLUB Debug output' section in Documentation/vm/slub.txt for details.
The header of the report discribe what kind of bug happened and what kind of
access caused it. It's followed by the description of the accessed slub object
(see 'SLUB Debug output' section in Documentation/vm/slub.txt for details) and
the description of the accessed memory page.

In the last section the report shows memory state around the accessed address.
Reading this part requires some more understanding of how KASAN works.
Reading this part requires some understanding of how KASAN works.

Each 8 bytes of memory are encoded in one shadow byte as accessible,
partially accessible, freed or they can be part of a redzone.
The state of each 8 aligned bytes of memory is encoded in one shadow byte.
Those 8 bytes can be accessible, partially accessible, freed or be a redzone.
We use the following encoding for each shadow byte: 0 means that all 8 bytes
of the corresponding memory region are accessible; number N (1 <= N <= 7) means
that the first N bytes are accessible, and other (8 - N) bytes are not;
Expand All @@ -139,7 +139,7 @@ the accessed address is partially accessible.


2. Implementation details
========================
=========================

From a high level, our approach to memory error detection is similar to that
of kmemcheck: use shadow memory to record whether each byte of memory is safe
Expand Down
5 changes: 5 additions & 0 deletions Documentation/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1275,6 +1275,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Format: <unsigned int> such that (rxsize & ~0x1fffc0) == 0.
Default: 1024

hardlockup_all_cpu_backtrace=
[KNL] Should the hard-lockup detector generate
backtraces on all cpus.
Format: <integer>

hashdist= [KNL,NUMA] Large hashes allocated during boot
are distributed across NUMA nodes. Defaults on
for 64-bit NUMA, off otherwise.
Expand Down
5 changes: 3 additions & 2 deletions Documentation/lockup-watchdogs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@ kernel mode for more than 10 seconds (see "Implementation" below for
details), without letting other interrupts have a chance to run.
Similarly to the softlockup case, the current stack trace is displayed
upon detection and the system will stay locked up unless the default
behavior is changed, which can be done through a compile time knob,
"BOOTPARAM_HARDLOCKUP_PANIC", and a kernel parameter, "nmi_watchdog"
behavior is changed, which can be done through a sysctl,
'hardlockup_panic', a compile time knob, "BOOTPARAM_HARDLOCKUP_PANIC",
and a kernel parameter, "nmi_watchdog"
(see "Documentation/kernel-parameters.txt" for details).

The panic option can be used in combination with panic_timeout (this
Expand Down
12 changes: 12 additions & 0 deletions Documentation/sysctl/kernel.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ show up in /proc/sys/kernel:
- domainname
- hostname
- hotplug
- hardlockup_all_cpu_backtrace
- hung_task_panic
- hung_task_check_count
- hung_task_timeout_secs
Expand Down Expand Up @@ -292,6 +293,17 @@ Information Service) or YP (Yellow Pages) domainname. These two
domain names are in general different. For a detailed discussion
see the hostname(1) man page.

==============================================================
hardlockup_all_cpu_backtrace:

This value controls the hard lockup detector behavior when a hard
lockup condition is detected as to whether or not to gather further
debug information. If enabled, arch-specific all-CPU stack dumping
will be initiated.

0: do nothing. This is the default behavior.

1: on detection capture more debug information.
==============================================================

hotplug:
Expand Down
27 changes: 12 additions & 15 deletions Documentation/vm/page_migration
Original file line number Diff line number Diff line change
Expand Up @@ -92,29 +92,26 @@ Steps:

2. Insure that writeback is complete.

3. Prep the new page that we want to move to. It is locked
and set to not being uptodate so that all accesses to the new
page immediately lock while the move is in progress.
3. Lock the new page that we want to move to. It is locked so that accesses to
this (not yet uptodate) page immediately lock while the move is in progress.

4. The new page is prepped with some settings from the old page so that
accesses to the new page will discover a page with the correct settings.

5. All the page table references to the page are converted
to migration entries or dropped (nonlinear vmas).
This decrease the mapcount of a page. If the resulting
mapcount is not zero then we do not migrate the page.
All user space processes that attempt to access the page
will now wait on the page lock.
4. All the page table references to the page are converted to migration
entries. This decreases the mapcount of a page. If the resulting
mapcount is not zero then we do not migrate the page. All user space
processes that attempt to access the page will now wait on the page lock.

6. The radix tree lock is taken. This will cause all processes trying
5. The radix tree lock is taken. This will cause all processes trying
to access the page via the mapping to block on the radix tree spinlock.

7. The refcount of the page is examined and we back out if references remain
6. The refcount of the page is examined and we back out if references remain
otherwise we know that we are the only one referencing this page.

8. The radix tree is checked and if it does not contain the pointer to this
7. The radix tree is checked and if it does not contain the pointer to this
page then we back out because someone else modified the radix tree.

8. The new page is prepped with some settings from the old page so that
accesses to the new page will discover a page with the correct settings.

9. The radix tree is changed to point to the new page.

10. The reference count of the old page is dropped because the radix tree
Expand Down
10 changes: 10 additions & 0 deletions Documentation/vm/transhuge.txt
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,16 @@ A lower value leads to gain less thp performance. Value of
max_ptes_none can waste cpu time very little, you can
ignore it.

max_ptes_swap specifies how many pages can be brought in from
swap when collapsing a group of pages into a transparent huge page.

/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_swap

A higher value can cause excessive swap IO and waste
memory. A lower value can prevent THPs from being
collapsed, resulting fewer pages being collapsed into
THPs, and lower memory access performance.

== Boot parameter ==

You can change the sysfs boot time defaults of Transparent Hugepage
Expand Down
Loading

0 comments on commit 2e3078a

Please sign in to comment.