Skip to content

Commit

Permalink
Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/lin…
Browse files Browse the repository at this point in the history
…ux/kernel/git/tip/tip

Pull protection keys syscall interface from Thomas Gleixner:
 "This is the final step of Protection Keys support which adds the
  syscalls so user space can actually allocate keys and protect memory
  areas with them. Details and usage examples can be found in the
  documentation.

  The mm side of this has been acked by Mel"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pkeys: Update documentation
  x86/mm/pkeys: Do not skip PKRU register if debug registers are not used
  x86/pkeys: Fix pkeys build breakage for some non-x86 arches
  x86/pkeys: Add self-tests
  x86/pkeys: Allow configuration of init_pkru
  x86/pkeys: Default to a restrictive init PKRU
  pkeys: Add details of system call use to Documentation/
  generic syscalls: Wire up memory protection keys syscalls
  x86: Wire up protection keys system calls
  x86/pkeys: Allocation/free syscalls
  x86/pkeys: Make mprotect_key() mask off additional vm_flags
  mm: Implement new pkey_mprotect() system call
  x86/pkeys: Add fault handling for PF_PK page fault bit
  • Loading branch information
torvalds committed Oct 10, 2016
2 parents 5fa0eb0 + 6679dac commit 93c26d7
Show file tree
Hide file tree
Showing 25 changed files with 2,127 additions and 50 deletions.
5 changes: 5 additions & 0 deletions Documentation/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1666,6 +1666,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.

initrd= [BOOT] Specify the location of the initial ramdisk

init_pkru= [x86] Specify the default memory protection keys rights
register contents for all processes. 0x55555554 by
default (disallow access to all but pkey 0). Can
override in debugfs after boot.

inport.irq= [HW] Inport (ATI XL and Microsoft) busmouse driver
Format: <irq>

Expand Down
70 changes: 64 additions & 6 deletions Documentation/x86/protection-keys.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,68 @@ even though there is theoretically space in the PAE PTEs. These
permissions are enforced on data access only and have no effect on
instruction fetches.

=========================== Config Option ===========================
=========================== Syscalls ===========================

This config option adds approximately 1.5kb of text. and 50 bytes of
data to the executable. A workload which does large O_DIRECT reads
of holes in XFS files was run to exercise get_user_pages_fast(). No
performance delta was observed with the config option
enabled or disabled.
There are 3 system calls which directly interact with pkeys:

int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
int pkey_free(int pkey);
int pkey_mprotect(unsigned long start, size_t len,
unsigned long prot, int pkey);

Before a pkey can be used, it must first be allocated with
pkey_alloc(). An application calls the WRPKRU instruction
directly in order to change access permissions to memory covered
with a key. In this example WRPKRU is wrapped by a C function
called pkey_set().

int real_prot = PROT_READ|PROT_WRITE;
pkey = pkey_alloc(0, PKEY_DENY_WRITE);
ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
... application runs here

Now, if the application needs to update the data at 'ptr', it can
gain access, do the update, then remove its write access:

pkey_set(pkey, 0); // clear PKEY_DENY_WRITE
*ptr = foo; // assign something
pkey_set(pkey, PKEY_DENY_WRITE); // set PKEY_DENY_WRITE again

Now when it frees the memory, it will also free the pkey since it
is no longer in use:

munmap(ptr, PAGE_SIZE);
pkey_free(pkey);

(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
An example implementation can be found in
tools/testing/selftests/x86/protection_keys.c)

=========================== Behavior ===========================

The kernel attempts to make protection keys consistent with the
behavior of a plain mprotect(). For instance if you do this:

mprotect(ptr, size, PROT_NONE);
something(ptr);

you can expect the same effects with protection keys when doing this:

pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
something(ptr);

That should be true whether something() is a direct access to 'ptr'
like:

*ptr = foo;

or when the kernel does the access on the application's behalf like
with a read():

read(fd, ptr, 1);

The kernel will send a SIGSEGV in both cases, but si_code will be set
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
the plain mprotect() permissions are violated.
5 changes: 5 additions & 0 deletions arch/alpha/include/uapi/asm/mman.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,4 +78,9 @@
#define MAP_HUGE_SHIFT 26
#define MAP_HUGE_MASK 0x3f

#define PKEY_DISABLE_ACCESS 0x1
#define PKEY_DISABLE_WRITE 0x2
#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\
PKEY_DISABLE_WRITE)

#endif /* __ALPHA_MMAN_H__ */
5 changes: 5 additions & 0 deletions arch/mips/include/uapi/asm/mman.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,4 +105,9 @@
#define MAP_HUGE_SHIFT 26
#define MAP_HUGE_MASK 0x3f

#define PKEY_DISABLE_ACCESS 0x1
#define PKEY_DISABLE_WRITE 0x2
#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\
PKEY_DISABLE_WRITE)

#endif /* _ASM_MMAN_H */
5 changes: 5 additions & 0 deletions arch/parisc/include/uapi/asm/mman.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,4 +75,9 @@
#define MAP_HUGE_SHIFT 26
#define MAP_HUGE_MASK 0x3f

#define PKEY_DISABLE_ACCESS 0x1
#define PKEY_DISABLE_WRITE 0x2
#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\
PKEY_DISABLE_WRITE)

#endif /* __PARISC_MMAN_H__ */
5 changes: 5 additions & 0 deletions arch/x86/entry/syscalls/syscall_32.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -386,3 +386,8 @@
377 i386 copy_file_range sys_copy_file_range
378 i386 preadv2 sys_preadv2 compat_sys_preadv2
379 i386 pwritev2 sys_pwritev2 compat_sys_pwritev2
380 i386 pkey_mprotect sys_pkey_mprotect
381 i386 pkey_alloc sys_pkey_alloc
382 i386 pkey_free sys_pkey_free
#383 i386 pkey_get sys_pkey_get
#384 i386 pkey_set sys_pkey_set
5 changes: 5 additions & 0 deletions arch/x86/entry/syscalls/syscall_64.tbl
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,11 @@
326 common copy_file_range sys_copy_file_range
327 64 preadv2 sys_preadv2
328 64 pwritev2 sys_pwritev2
329 common pkey_mprotect sys_pkey_mprotect
330 common pkey_alloc sys_pkey_alloc
331 common pkey_free sys_pkey_free
#332 common pkey_get sys_pkey_get
#333 common pkey_set sys_pkey_set

#
# x32-specific system call numbers start at 512 to avoid cache impact
Expand Down
8 changes: 8 additions & 0 deletions arch/x86/include/asm/mmu.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,14 @@ typedef struct {
const struct vdso_image *vdso_image; /* vdso image in use */

atomic_t perf_rdpmc_allowed; /* nonzero if rdpmc is allowed */
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
/*
* One bit per protection key says whether userspace can
* use it or not. protected by mmap_sem.
*/
u16 pkey_allocation_map;
s16 execute_only_pkey;
#endif
} mm_context_t;

#ifdef CONFIG_SMP
Expand Down
25 changes: 19 additions & 6 deletions arch/x86/include/asm/mmu_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <asm/desc.h>
#include <linux/atomic.h>
#include <linux/mm_types.h>
#include <linux/pkeys.h>

#include <trace/events/tlb.h>

Expand Down Expand Up @@ -107,7 +108,16 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
static inline int init_new_context(struct task_struct *tsk,
struct mm_struct *mm)
{
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
/* pkey 0 is the default and always allocated */
mm->context.pkey_allocation_map = 0x1;
/* -1 means unallocated or invalid */
mm->context.execute_only_pkey = -1;
}
#endif
init_new_context_ldt(tsk, mm);

return 0;
}
static inline void destroy_context(struct mm_struct *mm)
Expand Down Expand Up @@ -195,16 +205,20 @@ static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
mpx_notify_unmap(mm, vma, start, end);
}

#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
static inline int vma_pkey(struct vm_area_struct *vma)
{
u16 pkey = 0;
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
unsigned long vma_pkey_mask = VM_PKEY_BIT0 | VM_PKEY_BIT1 |
VM_PKEY_BIT2 | VM_PKEY_BIT3;
pkey = (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
#endif
return pkey;

return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
}
#else
static inline int vma_pkey(struct vm_area_struct *vma)
{
return 0;
}
#endif

static inline bool __pkru_allows_pkey(u16 pkey, bool write)
{
Expand Down Expand Up @@ -258,5 +272,4 @@ static inline bool arch_pte_access_permitted(pte_t pte, bool write)
{
return __pkru_allows_pkey(pte_flags_pkey(pte_flags(pte)), write);
}

#endif /* _ASM_X86_MMU_CONTEXT_H */
73 changes: 72 additions & 1 deletion arch/x86/include/asm/pkeys.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
* Try to dedicate one of the protection keys to be used as an
* execute-only protection key.
*/
#define PKEY_DEDICATED_EXECUTE_ONLY 15
extern int __execute_only_pkey(struct mm_struct *mm);
static inline int execute_only_pkey(struct mm_struct *mm)
{
Expand All @@ -31,4 +30,76 @@ static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
return __arch_override_mprotect_pkey(vma, prot, pkey);
}

extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);

#define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | VM_PKEY_BIT3)

#define mm_pkey_allocation_map(mm) (mm->context.pkey_allocation_map)
#define mm_set_pkey_allocated(mm, pkey) do { \
mm_pkey_allocation_map(mm) |= (1U << pkey); \
} while (0)
#define mm_set_pkey_free(mm, pkey) do { \
mm_pkey_allocation_map(mm) &= ~(1U << pkey); \
} while (0)

static inline
bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
{
return mm_pkey_allocation_map(mm) & (1U << pkey);
}

/*
* Returns a positive, 4-bit key on success, or -1 on failure.
*/
static inline
int mm_pkey_alloc(struct mm_struct *mm)
{
/*
* Note: this is the one and only place we make sure
* that the pkey is valid as far as the hardware is
* concerned. The rest of the kernel trusts that
* only good, valid pkeys come out of here.
*/
u16 all_pkeys_mask = ((1U << arch_max_pkey()) - 1);
int ret;

/*
* Are we out of pkeys? We must handle this specially
* because ffz() behavior is undefined if there are no
* zeros.
*/
if (mm_pkey_allocation_map(mm) == all_pkeys_mask)
return -1;

ret = ffz(mm_pkey_allocation_map(mm));

mm_set_pkey_allocated(mm, ret);

return ret;
}

static inline
int mm_pkey_free(struct mm_struct *mm, int pkey)
{
/*
* pkey 0 is special, always allocated and can never
* be freed.
*/
if (!pkey)
return -EINVAL;
if (!mm_pkey_is_allocated(mm, pkey))
return -EINVAL;

mm_set_pkey_free(mm, pkey);

return 0;
}

extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
extern void copy_init_pkru_to_fpregs(void);

#endif /*_ASM_X86_PKEYS_H */
4 changes: 4 additions & 0 deletions arch/x86/kernel/fpu/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <asm/traps.h>

#include <linux/hardirq.h>
#include <linux/pkeys.h>

#define CREATE_TRACE_POINTS
#include <asm/trace/fpu.h>
Expand Down Expand Up @@ -505,6 +506,9 @@ static inline void copy_init_fpstate_to_fpregs(void)
copy_kernel_to_fxregs(&init_fpstate.fxsave);
else
copy_kernel_to_fregs(&init_fpstate.fsave);

if (boot_cpu_has(X86_FEATURE_OSPKE))
copy_init_pkru_to_fpregs();
}

/*
Expand Down
5 changes: 4 additions & 1 deletion arch/x86/kernel/fpu/xstate.c
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
*/
#include <linux/compat.h>
#include <linux/cpu.h>
#include <linux/mman.h>
#include <linux/pkeys.h>

#include <asm/fpu/api.h>
Expand Down Expand Up @@ -866,9 +867,10 @@ const void *get_xsave_field_ptr(int xsave_state)
return get_xsave_addr(&fpu->state.xsave, xsave_state);
}

#ifdef CONFIG_ARCH_HAS_PKEYS

#define NR_VALID_PKRU_BITS (CONFIG_NR_PROTECTION_KEYS * 2)
#define PKRU_VALID_MASK (NR_VALID_PKRU_BITS - 1)

/*
* This will go out and modify PKRU register to set the access
* rights for @pkey to @init_val.
Expand Down Expand Up @@ -914,6 +916,7 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,

return 0;
}
#endif /* ! CONFIG_ARCH_HAS_PKEYS */

/*
* This is similar to user_regset_copyout(), but will not add offset to
Expand Down
13 changes: 7 additions & 6 deletions arch/x86/kernel/process_64.c
Original file line number Diff line number Diff line change
Expand Up @@ -109,12 +109,13 @@ void __show_regs(struct pt_regs *regs, int all)
get_debugreg(d7, 7);

/* Only print out debug registers if they are in their non-default state. */
if ((d0 == 0) && (d1 == 0) && (d2 == 0) && (d3 == 0) &&
(d6 == DR6_RESERVED) && (d7 == 0x400))
return;

printk(KERN_DEFAULT "DR0: %016lx DR1: %016lx DR2: %016lx\n", d0, d1, d2);
printk(KERN_DEFAULT "DR3: %016lx DR6: %016lx DR7: %016lx\n", d3, d6, d7);
if (!((d0 == 0) && (d1 == 0) && (d2 == 0) && (d3 == 0) &&
(d6 == DR6_RESERVED) && (d7 == 0x400))) {
printk(KERN_DEFAULT "DR0: %016lx DR1: %016lx DR2: %016lx\n",
d0, d1, d2);
printk(KERN_DEFAULT "DR3: %016lx DR6: %016lx DR7: %016lx\n",
d3, d6, d7);
}

if (boot_cpu_has(X86_FEATURE_OSPKE))
printk(KERN_DEFAULT "PKRU: %08x\n", read_pkru());
Expand Down
9 changes: 9 additions & 0 deletions arch/x86/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -1144,6 +1144,15 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
{
/* This is only called for the current mm, so: */
bool foreign = false;

/*
* Read or write was blocked by protection keys. This is
* always an unconditional error and can never result in
* a follow-up action to resolve the fault, like a COW.
*/
if (error_code & PF_PK)
return 1;

/*
* Make sure to check the VMA so that we do not perform
* faults just to hit a PF_PK as soon as we fill in a
Expand Down
Loading

0 comments on commit 93c26d7

Please sign in to comment.