Skip to content

Commit

Permalink
Merge tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/sc…
Browse files Browse the repository at this point in the history
…m/linux/kernel/git/akpm/mm

Pull more mm updates from Andrew Morton:
 "Jeff Xu's implementation of the mseal() syscall"

* tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  selftest mm/mseal read-only elf memory segment
  mseal: add documentation
  selftest mm/mseal memory sealing
  mseal: add mseal syscall
  mseal: wire up mseal syscall
  • Loading branch information
torvalds committed May 24, 2024
2 parents f1f9984 + a52b4f1 commit 0b32d43
Show file tree
Hide file tree
Showing 33 changed files with 2,732 additions and 3 deletions.
1 change: 1 addition & 0 deletions Documentation/userspace-api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ System calls
futex2
ebpf/index
ioctl/index
mseal

Security-related interfaces
===========================
Expand Down
199 changes: 199 additions & 0 deletions Documentation/userspace-api/mseal.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
.. SPDX-License-Identifier: GPL-2.0
=====================
Introduction of mseal
=====================

:Author: Jeff Xu <[email protected]>

Modern CPUs support memory permissions such as RW and NX bits. The memory
permission feature improves security stance on memory corruption bugs, i.e.
the attacker can’t just write to arbitrary memory and point the code to it,
the memory has to be marked with X bit, or else an exception will happen.

Memory sealing additionally protects the mapping itself against
modifications. This is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped. Memory sealing can automatically be
applied by the runtime loader to seal .text and .rodata pages and
applications can additionally seal security critical data at runtime.

A similar feature already exists in the XNU kernel with the
VM_FLAGS_PERMANENT flag [1] and on OpenBSD with the mimmutable syscall [2].

User API
========
mseal()
-----------
The mseal() syscall has the following signature:

``int mseal(void addr, size_t len, unsigned long flags)``

**addr/len**: virtual memory address range.

The address range set by ``addr``/``len`` must meet:
- The start address must be in an allocated VMA.
- The start address must be page aligned.
- The end address (``addr`` + ``len``) must be in an allocated VMA.
- no gap (unallocated memory) between start and end address.

The ``len`` will be paged aligned implicitly by the kernel.

**flags**: reserved for future use.

**return values**:

- ``0``: Success.

- ``-EINVAL``:
- Invalid input ``flags``.
- The start address (``addr``) is not page aligned.
- Address range (``addr`` + ``len``) overflow.

- ``-ENOMEM``:
- The start address (``addr``) is not allocated.
- The end address (``addr`` + ``len``) is not allocated.
- A gap (unallocated memory) between start and end address.

- ``-EPERM``:
- sealing is supported only on 64-bit CPUs, 32-bit is not supported.

- For above error cases, users can expect the given memory range is
unmodified, i.e. no partial update.

- There might be other internal errors/cases not listed here, e.g.
error during merging/splitting VMAs, or the process reaching the max
number of supported VMAs. In those cases, partial updates to the given
memory range could happen. However, those cases should be rare.

**Blocked operations after sealing**:
Unmapping, moving to another location, and shrinking the size,
via munmap() and mremap(), can leave an empty space, therefore
can be replaced with a VMA with a new set of attributes.

Moving or expanding a different VMA into the current location,
via mremap().

Modifying a VMA via mmap(MAP_FIXED).

Size expansion, via mremap(), does not appear to pose any
specific risks to sealed VMAs. It is included anyway because
the use case is unclear. In any case, users can rely on
merging to expand a sealed VMA.

mprotect() and pkey_mprotect().

Some destructive madvice() behaviors (e.g. MADV_DONTNEED)
for anonymous memory, when users don't have write permission to the
memory. Those behaviors can alter region contents by discarding pages,
effectively a memset(0) for anonymous memory.

Kernel will return -EPERM for blocked operations.

For blocked operations, one can expect the given address is unmodified,
i.e. no partial update. Note, this is different from existing mm
system call behaviors, where partial updates are made till an error is
found and returned to userspace. To give an example:

Assume following code sequence:

- ptr = mmap(null, 8192, PROT_NONE);
- munmap(ptr + 4096, 4096);
- ret1 = mprotect(ptr, 8192, PROT_READ);
- mseal(ptr, 4096);
- ret2 = mprotect(ptr, 8192, PROT_NONE);

ret1 will be -ENOMEM, the page from ptr is updated to PROT_READ.

ret2 will be -EPERM, the page remains to be PROT_READ.

**Note**:

- mseal() only works on 64-bit CPUs, not 32-bit CPU.

- users can call mseal() multiple times, mseal() on an already sealed memory
is a no-action (not error).

- munseal() is not supported.

Use cases:
==========
- glibc:
The dynamic linker, during loading ELF executables, can apply sealing to
non-writable memory segments.

- Chrome browser: protect some security sensitive data-structures.

Notes on which memory to seal:
==============================

It might be important to note that sealing changes the lifetime of a mapping,
i.e. the sealed mapping won’t be unmapped till the process terminates or the
exec system call is invoked. Applications can apply sealing to any virtual
memory region from userspace, but it is crucial to thoroughly analyze the
mapping's lifetime prior to apply the sealing.

For example:

- aio/shm

aio/shm can call mmap()/munmap() on behalf of userspace, e.g. ksys_shmdt() in
shm.c. The lifetime of those mapping are not tied to the lifetime of the
process. If those memories are sealed from userspace, then munmap() will fail,
causing leaks in VMA address space during the lifetime of the process.

- Brk (heap)

Currently, userspace applications can seal parts of the heap by calling
malloc() and mseal().
let's assume following calls from user space:

- ptr = malloc(size);
- mprotect(ptr, size, RO);
- mseal(ptr, size);
- free(ptr);

Technically, before mseal() is added, the user can change the protection of
the heap by calling mprotect(RO). As long as the user changes the protection
back to RW before free(), the memory range can be reused.

Adding mseal() into the picture, however, the heap is then sealed partially,
the user can still free it, but the memory remains to be RO. If the address
is re-used by the heap manager for another malloc, the process might crash
soon after. Therefore, it is important not to apply sealing to any memory
that might get recycled.

Furthermore, even if the application never calls the free() for the ptr,
the heap manager may invoke the brk system call to shrink the size of the
heap. In the kernel, the brk-shrink will call munmap(). Consequently,
depending on the location of the ptr, the outcome of brk-shrink is
nondeterministic.


Additional notes:
=================
As Jann Horn pointed out in [3], there are still a few ways to write
to RO memory, which is, in a way, by design. Those cases are not covered
by mseal(). If applications want to block such cases, sandbox tools (such as
seccomp, LSM, etc) might be considered.

Those cases are:

- Write to read-only memory through /proc/self/mem interface.
- Write to read-only memory through ptrace (such as PTRACE_POKETEXT).
- userfaultfd.

The idea that inspired this patch comes from Stephen Röttger’s work in V8
CFI [4]. Chrome browser in ChromeOS will be the first user of this API.

Reference:
==========
[1] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274

[2] https://man.openbsd.org/mimmutable.2

[3] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com

[4] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc
1 change: 1 addition & 0 deletions arch/alpha/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -501,3 +501,4 @@
569 common lsm_get_self_attr sys_lsm_get_self_attr
570 common lsm_set_self_attr sys_lsm_set_self_attr
571 common lsm_list_modules sys_lsm_list_modules
572 common mseal sys_mseal
1 change: 1 addition & 0 deletions arch/arm/tools/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -475,3 +475,4 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
2 changes: 1 addition & 1 deletion arch/arm64/include/asm/unistd.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)

#define __NR_compat_syscalls 462
#define __NR_compat_syscalls 463
#endif

#define __ARCH_WANT_SYS_CLONE
Expand Down
2 changes: 2 additions & 0 deletions arch/arm64/include/asm/unistd32.h
Original file line number Diff line number Diff line change
Expand Up @@ -929,6 +929,8 @@ __SYSCALL(__NR_lsm_get_self_attr, sys_lsm_get_self_attr)
__SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
#define __NR_lsm_list_modules 461
__SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
#define __NR_mseal 462
__SYSCALL(__NR_mseal, sys_mseal)

/*
* Please add new compat syscalls above this comment and update
Expand Down
1 change: 1 addition & 0 deletions arch/m68k/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -461,3 +461,4 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
1 change: 1 addition & 0 deletions arch/microblaze/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -467,3 +467,4 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
1 change: 1 addition & 0 deletions arch/mips/kernel/syscalls/syscall_n32.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -400,3 +400,4 @@
459 n32 lsm_get_self_attr sys_lsm_get_self_attr
460 n32 lsm_set_self_attr sys_lsm_set_self_attr
461 n32 lsm_list_modules sys_lsm_list_modules
462 n32 mseal sys_mseal
1 change: 1 addition & 0 deletions arch/mips/kernel/syscalls/syscall_n64.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -376,3 +376,4 @@
459 n64 lsm_get_self_attr sys_lsm_get_self_attr
460 n64 lsm_set_self_attr sys_lsm_set_self_attr
461 n64 lsm_list_modules sys_lsm_list_modules
462 n64 mseal sys_mseal
1 change: 1 addition & 0 deletions arch/mips/kernel/syscalls/syscall_o32.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -449,3 +449,4 @@
459 o32 lsm_get_self_attr sys_lsm_get_self_attr
460 o32 lsm_set_self_attr sys_lsm_set_self_attr
461 o32 lsm_list_modules sys_lsm_list_modules
462 o32 mseal sys_mseal
1 change: 1 addition & 0 deletions arch/parisc/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -460,3 +460,4 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
1 change: 1 addition & 0 deletions arch/powerpc/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -548,3 +548,4 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
1 change: 1 addition & 0 deletions arch/s390/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -464,3 +464,4 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal sys_mseal
1 change: 1 addition & 0 deletions arch/sh/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -464,3 +464,4 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
1 change: 1 addition & 0 deletions arch/sparc/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -507,3 +507,4 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
1 change: 1 addition & 0 deletions arch/x86/entry/syscalls/syscall_32.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -466,3 +466,4 @@
459 i386 lsm_get_self_attr sys_lsm_get_self_attr
460 i386 lsm_set_self_attr sys_lsm_set_self_attr
461 i386 lsm_list_modules sys_lsm_list_modules
462 i386 mseal sys_mseal
1 change: 1 addition & 0 deletions arch/x86/entry/syscalls/syscall_64.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,7 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal

#
# Due to a historical design error, certain syscalls are numbered differently
Expand Down
1 change: 1 addition & 0 deletions arch/xtensa/kernel/syscalls/syscall.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -432,3 +432,4 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
1 change: 1 addition & 0 deletions include/linux/syscalls.h
Original file line number Diff line number Diff line change
Expand Up @@ -821,6 +821,7 @@ asmlinkage long sys_process_mrelease(int pidfd, unsigned int flags);
asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
unsigned long prot, unsigned long pgoff,
unsigned long flags);
asmlinkage long sys_mseal(unsigned long start, size_t len, unsigned long flags);
asmlinkage long sys_mbind(unsigned long start, unsigned long len,
unsigned long mode,
const unsigned long __user *nmask,
Expand Down
5 changes: 4 additions & 1 deletion include/uapi/asm-generic/unistd.h
Original file line number Diff line number Diff line change
Expand Up @@ -842,8 +842,11 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
#define __NR_lsm_list_modules 461
__SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)

#define __NR_mseal 462
__SYSCALL(__NR_mseal, sys_mseal)

#undef __NR_syscalls
#define __NR_syscalls 462
#define __NR_syscalls 463

/*
* 32 bit systems traditionally used different
Expand Down
1 change: 1 addition & 0 deletions kernel/sys_ni.c
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,7 @@ COND_SYSCALL(migrate_pages);
COND_SYSCALL(move_pages);
COND_SYSCALL(set_mempolicy_home_node);
COND_SYSCALL(cachestat);
COND_SYSCALL(mseal);

COND_SYSCALL(perf_event_open);
COND_SYSCALL(accept4);
Expand Down
4 changes: 4 additions & 0 deletions mm/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ ifdef CONFIG_CROSS_MEMORY_ATTACH
mmu-$(CONFIG_MMU) += process_vm_access.o
endif

ifdef CONFIG_64BIT
mmu-$(CONFIG_MMU) += mseal.o
endif

obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
maccess.o page-writeback.o folio-compat.o \
readahead.o swap.o truncate.o vmscan.o shrinker.o \
Expand Down
37 changes: 37 additions & 0 deletions mm/internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -1435,6 +1435,43 @@ void __meminit __init_single_page(struct page *page, unsigned long pfn,
unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
int priority);

#ifdef CONFIG_64BIT
/* VM is sealed, in vm_flags */
#define VM_SEALED _BITUL(63)
#endif

#ifdef CONFIG_64BIT
static inline int can_do_mseal(unsigned long flags)
{
if (flags)
return -EINVAL;

return 0;
}

bool can_modify_mm(struct mm_struct *mm, unsigned long start,
unsigned long end);
bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
unsigned long end, int behavior);
#else
static inline int can_do_mseal(unsigned long flags)
{
return -EPERM;
}

static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start,
unsigned long end)
{
return true;
}

static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
unsigned long end, int behavior)
{
return true;
}
#endif

#ifdef CONFIG_SHRINKER_DEBUG
static inline __printf(2, 0) int shrinker_debugfs_name_alloc(
struct shrinker *shrinker, const char *fmt, va_list ap)
Expand Down
Loading

0 comments on commit 0b32d43

Please sign in to comment.