Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge more updates from Andrew Morton:

 - a lot more of MM, quite a bit more yet to come: (memcg, pagemap,
   vmalloc, pagealloc, migration, thp, ksm, madvise, virtio,
   userfaultfd, memory-hotplug, shmem, rmap, zswap, zsmalloc, cleanups)

 - various other subsystems (procfs, misc, MAINTAINERS, bitops, lib,
   checkpatch, epoll, binfmt, kallsyms, reiserfs, kmod, gcov, kconfig,
   ubsan, fault-injection, ipc)

* emailed patches from Andrew Morton <[email protected]>: (158 commits)
  ipc/shm.c: make compat_ksys_shmctl() static
  ipc/mqueue.c: fix a brace coding style issue
  lib/Kconfig.debug: fix a typo "capabilitiy" -> "capability"
  ubsan: include bug type in report header
  kasan: unset panic_on_warn before calling panic()
  ubsan: check panic_on_warn
  drivers/misc/lkdtm/bugs.c: add arithmetic overflow and array bounds checks
  ubsan: split "bounds" checker from other options
  ubsan: add trap instrumentation option
  init/Kconfig: clean up ANON_INODES and old IO schedulers options
  kernel/gcov/fs.c: replace zero-length array with flexible-array member
  gcov: gcc_3_4: replace zero-length array with flexible-array member
  gcov: gcc_4_7: replace zero-length array with flexible-array member
  kernel/kmod.c: fix a typo "assuems" -> "assumes"
  reiserfs: clean up several indentation issues
  kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
  samples/hw_breakpoint: drop use of kallsyms_lookup_name()
  samples/hw_breakpoint: drop HW_BREAKPOINT_R when reporting writes
  fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path
  fs/binfmt_elf.c: allocate less for static executable
  ...
  • Loading branch information
torvalds committed Apr 7, 2020
2 parents 04de788 + 1cd377b commit 63bef48
Show file tree
Hide file tree
Showing 169 changed files with 3,630 additions and 1,190 deletions.
13 changes: 11 additions & 2 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2573,13 +2573,22 @@
For details see: Documentation/admin-guide/hw-vuln/mds.rst

mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory
Amount of memory to be used when the kernel is not able
to see the whole system memory or for test.
Amount of memory to be used in cases as follows:

1 for test;
2 when the kernel is not able to see the whole system memory;
3 memory that lies after 'mem=' boundary is excluded from
the hypervisor, then assigned to KVM guests.

[X86] Work as limiting max address. Use together
with memmap= to avoid physical address space collisions.
Without memmap= PCI devices could be placed at addresses
belonging to unused RAM.

Note that this only takes effects during boot time since
in above case 3, memory may need be hot added after boot
if system memory of hypervisor is not sufficient.

mem=nopentium [BUGS=X86-32] Disable usage of 4MB pages for kernel
memory.

Expand Down
14 changes: 14 additions & 0 deletions Documentation/admin-guide/mm/transhuge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,11 @@ thp_fault_fallback
is incremented if a page fault fails to allocate
a huge page and instead falls back to using small pages.

thp_fault_fallback_charge
is incremented if a page fault fails to charge a huge page and
instead falls back to using small pages even though the
allocation was successful.

thp_collapse_alloc_failed
is incremented if khugepaged found a range
of pages that should be collapsed into one huge page but failed
Expand All @@ -319,6 +324,15 @@ thp_file_alloc
is incremented every time a file huge page is successfully
allocated.

thp_file_fallback
is incremented if a file huge page is attempted to be allocated
but fails and instead falls back to using small pages.

thp_file_fallback_charge
is incremented if a file huge page cannot be charged and instead
falls back to using small pages even though the allocation was
successful.

thp_file_mapped
is incremented every time a file huge page is mapped into
user address space.
Expand Down
51 changes: 51 additions & 0 deletions Documentation/admin-guide/mm/userfaultfd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,57 @@ UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
half copied page since it'll keep userfaulting until the copy has
finished.

Notes:

- If you requested UFFDIO_REGISTER_MODE_MISSING when registering then
you must provide some kind of page in your thread after reading from
the uffd. You must provide either UFFDIO_COPY or UFFDIO_ZEROPAGE.
The normal behavior of the OS automatically providing a zero page on
an annonymous mmaping is not in place.

- None of the page-delivering ioctls default to the range that you
registered with. You must fill in all fields for the appropriate
ioctl struct including the range.

- You get the address of the access that triggered the missing page
event out of a struct uffd_msg that you read in the thread from the
uffd. You can supply as many pages as you want with UFFDIO_COPY or
UFFDIO_ZEROPAGE. Keep in mind that unless you used DONTWAKE then
the first of any of those IOCTLs wakes up the faulting thread.

- Be sure to test for all errors including (pollfd[0].revents &
POLLERR). This can happen, e.g. when ranges supplied were
incorrect.

Write Protect Notifications
---------------------------

This is equivalent to (but faster than) using mprotect and a SIGSEGV
signal handler.

Firstly you need to register a range with UFFDIO_REGISTER_MODE_WP.
Instead of using mprotect(2) you use ioctl(uffd, UFFDIO_WRITEPROTECT,
struct *uffdio_writeprotect) while mode = UFFDIO_WRITEPROTECT_MODE_WP
in the struct passed in. The range does not default to and does not
have to be identical to the range you registered with. You can write
protect as many ranges as you like (inside the registered range).
Then, in the thread reading from uffd the struct will have
msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP set. Now you send
ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect) again
while pagefault.mode does not have UFFDIO_WRITEPROTECT_MODE_WP set.
This wakes up the thread which will continue to run with writes. This
allows you to do the bookkeeping about the write in the uffd reading
thread before the ioctl.
If you registered with both UFFDIO_REGISTER_MODE_MISSING and
UFFDIO_REGISTER_MODE_WP then you need to think about the sequence in
which you supply a page and undo write protect. Note that there is a
difference between writes into a WP area and into a !WP area. The
former will have UFFD_PAGEFAULT_FLAG_WP set, the latter
UFFD_PAGEFAULT_FLAG_WRITE. The latter did not fail on protection but
you still need to supply a page when UFFDIO_REGISTER_MODE_MISSING was
used.

QEMU/KVM
========

Expand Down
40 changes: 40 additions & 0 deletions Documentation/vm/free_page_reporting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. _free_page_reporting:

=====================
Free Page Reporting
=====================

Free page reporting is an API by which a device can register to receive
lists of pages that are currently unused by the system. This is useful in
the case of virtualization where a guest is then able to use this data to
notify the hypervisor that it is no longer using certain pages in memory.

For the driver, typically a balloon driver, to use of this functionality
it will allocate and initialize a page_reporting_dev_info structure. The
field within the structure it will populate is the "report" function
pointer used to process the scatterlist. It must also guarantee that it can
handle at least PAGE_REPORTING_CAPACITY worth of scatterlist entries per
call to the function. A call to page_reporting_register will register the
page reporting interface with the reporting framework assuming no other
page reporting devices are already registered.

Once registered the page reporting API will begin reporting batches of
pages to the driver. The API will start reporting pages 2 seconds after
the interface is registered and will continue to do so 2 seconds after any
page of a sufficiently high order is freed.

Pages reported will be stored in the scatterlist passed to the reporting
function with the final entry having the end bit set in entry nent - 1.
While pages are being processed by the report function they will not be
accessible to the allocator. Once the report function has been completed
the pages will be returned to the free area from which they were obtained.

Prior to removing a driver that is making use of free page reporting it
is necessary to call page_reporting_unregister to have the
page_reporting_dev_info structure that is currently in use by free page
reporting removed. Doing this will prevent further reports from being
issued via the interface. If another driver or the same driver is
registered it is possible for it to resume where the previous driver had
left off in terms of reporting free pages.

Alexander Duyck, Dec 04, 2019
20 changes: 12 additions & 8 deletions Documentation/vm/zswap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,11 @@ Zswap evicts pages from compressed cache on an LRU basis to the backing swap
device when the compressed pool reaches its size limit. This requirement had
been identified in prior community discussions.

Zswap is disabled by default but can be enabled at boot time by setting
the ``enabled`` attribute to 1 at boot time. ie: ``zswap.enabled=1``. Zswap
can also be enabled and disabled at runtime using the sysfs interface.
Whether Zswap is enabled at the boot time depends on whether
the ``CONFIG_ZSWAP_DEFAULT_ON`` Kconfig option is enabled or not.
This setting can then be overridden by providing the kernel command line
``zswap.enabled=`` option, for example ``zswap.enabled=0``.
Zswap can also be enabled and disabled at runtime using the sysfs interface.
An example command to enable zswap at runtime, assuming sysfs is mounted
at ``/sys``, is::

Expand All @@ -64,9 +66,10 @@ allocation in zpool is not directly accessible by address. Rather, a handle is
returned by the allocation routine and that handle must be mapped before being
accessed. The compressed memory pool grows on demand and shrinks as compressed
pages are freed. The pool is not preallocated. By default, a zpool
of type zbud is created, but it can be selected at boot time by
setting the ``zpool`` attribute, e.g. ``zswap.zpool=zbud``. It can
also be changed at runtime using the sysfs ``zpool`` attribute, e.g.::
of type selected in ``CONFIG_ZSWAP_ZPOOL_DEFAULT`` Kconfig option is created,
but it can be overridden at boot time by setting the ``zpool`` attribute,
e.g. ``zswap.zpool=zbud``. It can also be changed at runtime using the sysfs
``zpool`` attribute, e.g.::

echo zbud > /sys/module/zswap/parameters/zpool

Expand Down Expand Up @@ -97,8 +100,9 @@ controlled policy:
* max_pool_percent - The maximum percentage of memory that the compressed
pool can occupy.

The default compressor is lzo, but it can be selected at boot time by
setting the ``compressor`` attribute, e.g. ``zswap.compressor=lzo``.
The default compressor is selected in ``CONFIG_ZSWAP_COMPRESSOR_DEFAULT``
Kconfig option, but it can be overridden at boot time by setting the
``compressor`` attribute, e.g. ``zswap.compressor=lzo``.
It can also be changed at runtime using the sysfs "compressor"
attribute, e.g.::

Expand Down
35 changes: 18 additions & 17 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -77,21 +77,13 @@ Tips for patch submitters

8. Happy hacking.

Descriptions of section entries
-------------------------------
Descriptions of section entries and preferred order
---------------------------------------------------

M: *Mail* patches to: FullName <address@domain>
R: Designated *Reviewer*: FullName <address@domain>
These reviewers should be CCed on patches.
L: *Mailing list* that is relevant to this area
W: *Web-page* with status/info
B: URI for where to file *bugs*. A web-page with detailed bug
filing info, a direct bug tracker link, or a mailto: URI.
C: URI for *chat* protocol, server and channel where developers
usually hang out, for example irc://server/channel.
Q: *Patchwork* web based patch tracking system site
T: *SCM* tree type and location.
Type is one of: git, hg, quilt, stgit, topgit
S: *Status*, one of the following:
Supported: Someone is actually paid to look after this.
Maintained: Someone actually looks after it.
Expand All @@ -102,30 +94,39 @@ Descriptions of section entries
Obsolete: Old code. Something tagged obsolete generally means
it has been replaced by a better system and you
should be using that.
W: *Web-page* with status/info
Q: *Patchwork* web based patch tracking system site
B: URI for where to file *bugs*. A web-page with detailed bug
filing info, a direct bug tracker link, or a mailto: URI.
C: URI for *chat* protocol, server and channel where developers
usually hang out, for example irc://server/channel.
P: Subsystem Profile document for more details submitting
patches to the given subsystem. This is either an in-tree file,
or a URI. See Documentation/maintainer/maintainer-entry-profile.rst
for details.
T: *SCM* tree type and location.
Type is one of: git, hg, quilt, stgit, topgit
F: *Files* and directories wildcard patterns.
A trailing slash includes all files and subdirectory files.
F: drivers/net/ all files in and below drivers/net
F: drivers/net/* all files in drivers/net, but not below
F: */net/* all files in "any top level directory"/net
One pattern per line. Multiple F: lines acceptable.
X: *Excluded* files and directories that are NOT maintained, same
rules as F:. Files exclusions are tested before file matches.
Can be useful for excluding a specific subdirectory, for instance:
F: net/
X: net/ipv6/
matches all files in and below net excluding net/ipv6/
N: Files and directories *Regex* patterns.
N: [^a-z]tegra all files whose path contains the word tegra
N: [^a-z]tegra all files whose path contains tegra
(not including files like integrator)
One pattern per line. Multiple N: lines acceptable.
scripts/get_maintainer.pl has different behavior for files that
match F: pattern and matches of N: patterns. By default,
get_maintainer will not look at git log history when an F: pattern
match occurs. When an N: match occurs, git log history is used
to also notify the people that have git commit signatures.
X: *Excluded* files and directories that are NOT maintained, same
rules as F:. Files exclusions are tested before file matches.
Can be useful for excluding a specific subdirectory, for instance:
F: net/
X: net/ipv6/
matches all files in and below net excluding net/ipv6/
K: *Content regex* (perl extended) pattern match in a patch or file.
For instance:
K: of_get_profile
Expand Down
2 changes: 0 additions & 2 deletions arch/alpha/include/asm/mmzone.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@

#include <asm/smp.h>

struct bootmem_data_t; /* stupid forward decl. */

/*
* Following are macros that are specific to this numa platform.
*/
Expand Down
2 changes: 1 addition & 1 deletion arch/alpha/kernel/syscalls/syscallhdr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,5 @@ grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
printf "#define __NR_syscalls\t%s\n" "${nxt}"
printf "#endif\n"
printf "\n"
printf "#endif /* %s */" "${fileguard}"
printf "#endif /* %s */\n" "${fileguard}"
) > "$out"
2 changes: 1 addition & 1 deletion arch/csky/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
if (!(vma->vm_flags & VM_WRITE))
goto bad_area;
} else {
if (!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
if (unlikely(!vma_is_accessible(vma)))
goto bad_area;
}

Expand Down
2 changes: 1 addition & 1 deletion arch/ia64/kernel/syscalls/syscallhdr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,5 @@ grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
printf "#define __NR_syscalls\t%s\n" "${nxt}"
printf "#endif\n"
printf "\n"
printf "#endif /* %s */" "${fileguard}"
printf "#endif /* %s */\n" "${fileguard}"
) > "$out"
2 changes: 2 additions & 0 deletions arch/ia64/kernel/vmlinux.lds.S
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ SECTIONS {
CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
*(.gnu.linkonce.t*)
}

Expand Down
2 changes: 1 addition & 1 deletion arch/m68k/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
case 1: /* read, present */
goto acc_err;
case 0: /* read, not present */
if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))
if (unlikely(!vma_is_accessible(vma)))
goto acc_err;
}

Expand Down
2 changes: 1 addition & 1 deletion arch/microblaze/kernel/syscalls/syscallhdr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,5 @@ grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
printf "#define __NR_syscalls\t%s\n" "${nxt}"
printf "#endif\n"
printf "\n"
printf "#endif /* %s */" "${fileguard}"
printf "#endif /* %s */\n" "${fileguard}"
) > "$out"
3 changes: 1 addition & 2 deletions arch/mips/kernel/syscalls/syscallhdr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,5 @@ grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
printf "#define __NR_syscalls\t%s\n" "${nxt}"
printf "#endif\n"
printf "\n"
printf "#endif /* %s */" "${fileguard}"
printf "\n"
printf "#endif /* %s */\n" "${fileguard}"
) > "$out"
2 changes: 1 addition & 1 deletion arch/mips/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
goto bad_area;
}
} else {
if (!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
if (unlikely(!vma_is_accessible(vma)))
goto bad_area;
}
}
Expand Down
1 change: 1 addition & 0 deletions arch/nds32/kernel/vmlinux.lds.S
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
*(.fixup)
}

Expand Down
2 changes: 1 addition & 1 deletion arch/parisc/kernel/syscalls/syscallhdr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,5 @@ grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
printf "#define __NR_syscalls\t%s\n" "${nxt}"
printf "#endif\n"
printf "\n"
printf "#endif /* %s */" "${fileguard}"
printf "#endif /* %s */\n" "${fileguard}"
) > "$out"
3 changes: 1 addition & 2 deletions arch/powerpc/kernel/syscalls/syscallhdr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,5 @@ grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
printf "#define __NR_syscalls\t%s\n" "${nxt}"
printf "#endif\n"
printf "\n"
printf "#endif /* %s */" "${fileguard}"
printf "\n"
printf "#endif /* %s */\n" "${fileguard}"
) > "$out"
2 changes: 1 addition & 1 deletion arch/powerpc/kvm/e500_mmu_host.c
Original file line number Diff line number Diff line change
Expand Up @@ -422,7 +422,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
break;
}
} else if (vma && hva >= vma->vm_start &&
(vma->vm_flags & VM_HUGETLB)) {
is_vm_hugetlb_page(vma)) {
unsigned long psize = vma_kernel_pagesize(vma);

tsize = (gtlbe->mas1 & MAS1_TSIZE_MASK) >>
Expand Down
2 changes: 1 addition & 1 deletion arch/powerpc/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ static bool access_error(bool is_write, bool is_exec,
return false;
}

if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))))
if (unlikely(!vma_is_accessible(vma)))
return true;
/*
* We should ideally do the vma pkey access check here. But in the
Expand Down
14 changes: 4 additions & 10 deletions arch/powerpc/platforms/powernv/memtrace.c
Original file line number Diff line number Diff line change
Expand Up @@ -231,16 +231,10 @@ static int memtrace_online(void)
continue;
}

/*
* If kernel isn't compiled with the auto online option
* we need to online the memory ourselves.
*/
if (!memhp_auto_online) {
lock_device_hotplug();
walk_memory_blocks(ent->start, ent->size, NULL,
online_mem_block);
unlock_device_hotplug();
}
lock_device_hotplug();
walk_memory_blocks(ent->start, ent->size, NULL,
online_mem_block);
unlock_device_hotplug();

/*
* Memory was added successfully so clean up references to it
Expand Down
Loading

0 comments on commit 63bef48

Please sign in to comment.