Skip to content

Commit

Permalink
Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux…
Browse files Browse the repository at this point in the history
…/kernel/git/tip/tip

Pull x86 mm changes from Ingo Molnar:
 "PCID support, 5-level paging support, Secure Memory Encryption support

  The main changes in this cycle are support for three new, complex
  hardware features of x86 CPUs:

   - Add 5-level paging support, which is a new hardware feature on
     upcoming Intel CPUs allowing up to 128 PB of virtual address space
     and 4 PB of physical RAM space - a 512-fold increase over the old
     limits. (Supercomputers of the future forecasting hurricanes on an
     ever warming planet can certainly make good use of more RAM.)

     Many of the necessary changes went upstream in previous cycles,
     v4.14 is the first kernel that can enable 5-level paging.

     This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
     default.

     (By Kirill A. Shutemov)

   - Add 'encrypted memory' support, which is a new hardware feature on
     upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
     RAM to be encrypted and decrypted (mostly) transparently by the
     CPU, with a little help from the kernel to transition to/from
     encrypted RAM. Such RAM should be more secure against various
     attacks like RAM access via the memory bus and should make the
     radio signature of memory bus traffic harder to intercept (and
     decrypt) as well.

     This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
     by default.

     (By Tom Lendacky)

   - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
     hardware feature that attaches an address space tag to TLB entries
     and thus allows to skip TLB flushing in many cases, even if we
     switch mm's.

     (By Andy Lutomirski)

  All three of these features were in the works for a long time, and
  it's coincidence of the three independent development paths that they
  are all enabled in v4.14 at once"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
  x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
  x86/mm: Use pr_cont() in dump_pagetable()
  x86/mm: Fix SME encryption stack ptr handling
  kvm/x86: Avoid clearing the C-bit in rsvd_bits()
  x86/CPU: Align CR3 defines
  x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
  acpi, x86/mm: Remove encryption mask from ACPI page protection type
  x86/mm, kexec: Fix memory corruption with SME on successive kexecs
  x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
  x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
  x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
  x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
  x86/mm: Allow userspace have mappings above 47-bit
  x86/mm: Prepare to expose larger address space to userspace
  x86/mpx: Do not allow MPX if we have mappings above 47-bit
  x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
  x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
  x86/mm/dump_pagetables: Fix printout of p4d level
  x86/mm/dump_pagetables: Generalize address normalization
  x86/boot: Fix memremap() related build failure
  ...
  • Loading branch information
torvalds committed Sep 4, 2017
2 parents 5f82e71 + 9e52fc2 commit b1b6f83
Show file tree
Hide file tree
Showing 120 changed files with 3,134 additions and 470 deletions.
13 changes: 13 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2233,6 +2233,17 @@
memory contents and reserves bad memory
regions that are detected.

mem_encrypt= [X86-64] AMD Secure Memory Encryption (SME) control
Valid arguments: on, off
Default (depends on kernel configuration option):
on (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
mem_encrypt=on: Activate SME
mem_encrypt=off: Do not activate SME

Refer to Documentation/x86/amd-memory-encryption.txt
for details on when memory encryption can be activated.

mem_sleep_default= [SUSPEND] Default system suspend mode:
s2idle - Suspend-To-Idle
shallow - Power-On Suspend or equivalent (if supported)
Expand Down Expand Up @@ -2697,6 +2708,8 @@
nopat [X86] Disable PAT (page attribute table extension of
pagetables) support.

nopcid [X86-64] Disable the PCID cpu feature.

norandmaps Don't use address space randomization. Equivalent to
echo 0 > /proc/sys/kernel/randomize_va_space

Expand Down
68 changes: 68 additions & 0 deletions Documentation/x86/amd-memory-encryption.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
Secure Memory Encryption (SME) is a feature found on AMD processors.

SME provides the ability to mark individual pages of memory as encrypted using
the standard x86 page tables. A page that is marked encrypted will be
automatically decrypted when read from DRAM and encrypted when written to
DRAM. SME can therefore be used to protect the contents of DRAM from physical
attacks on the system.

A page is encrypted when a page table entry has the encryption bit set (see
below on how to determine its position). The encryption bit can also be
specified in the cr3 register, allowing the PGD table to be encrypted. Each
successive level of page tables can also be encrypted by setting the encryption
bit in the page table entry that points to the next table. This allows the full
page table hierarchy to be encrypted. Note, this means that just because the
encryption bit is set in cr3, doesn't imply the full hierarchy is encyrpted.
Each page table entry in the hierarchy needs to have the encryption bit set to
achieve that. So, theoretically, you could have the encryption bit set in cr3
so that the PGD is encrypted, but not set the encryption bit in the PGD entry
for a PUD which results in the PUD pointed to by that entry to not be
encrypted.

Support for SME can be determined through the CPUID instruction. The CPUID
function 0x8000001f reports information related to SME:

0x8000001f[eax]:
Bit[0] indicates support for SME
0x8000001f[ebx]:
Bits[5:0] pagetable bit number used to activate memory
encryption
Bits[11:6] reduction in physical address space, in bits, when
memory encryption is enabled (this only affects
system physical addresses, not guest physical
addresses)

If support for SME is present, MSR 0xc00100010 (MSR_K8_SYSCFG) can be used to
determine if SME is enabled and/or to enable memory encryption:

0xc0010010:
Bit[23] 0 = memory encryption features are disabled
1 = memory encryption features are enabled

Linux relies on BIOS to set this bit if BIOS has determined that the reduction
in the physical address space as a result of enabling memory encryption (see
CPUID information above) will not conflict with the address space resource
requirements for the system. If this bit is not set upon Linux startup then
Linux itself will not set it and memory encryption will not be possible.

The state of SME in the Linux kernel can be documented as follows:
- Supported:
The CPU supports SME (determined through CPUID instruction).

- Enabled:
Supported and bit 23 of MSR_K8_SYSCFG is set.

- Active:
Supported, Enabled and the Linux kernel is actively applying
the encryption bit to page table entries (the SME mask in the
kernel is non-zero).

SME can also be enabled and activated in the BIOS. If SME is enabled and
activated in the BIOS, then all memory accesses will be encrypted and it will
not be necessary to activate the Linux memory encryption support. If the BIOS
merely enables SME (sets bit 23 of the MSR_K8_SYSCFG), then Linux can activate
memory encryption by default (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y) or
by supplying mem_encrypt=on on the kernel command line. However, if BIOS does
not enable SME, then Linux will not be able to activate memory encryption, even
if configured to do so by default or the mem_encrypt=on command line parameter
is specified.
6 changes: 3 additions & 3 deletions Documentation/x86/protection-keys.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,17 +34,17 @@ with a key. In this example WRPKRU is wrapped by a C function
called pkey_set().

int real_prot = PROT_READ|PROT_WRITE;
pkey = pkey_alloc(0, PKEY_DENY_WRITE);
pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
... application runs here

Now, if the application needs to update the data at 'ptr', it can
gain access, do the update, then remove its write access:

pkey_set(pkey, 0); // clear PKEY_DENY_WRITE
pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
*ptr = foo; // assign something
pkey_set(pkey, PKEY_DENY_WRITE); // set PKEY_DENY_WRITE again
pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again

Now when it frees the memory, it will also free the pkey since it
is no longer in use:
Expand Down
64 changes: 64 additions & 0 deletions Documentation/x86/x86_64/5level-paging.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
== Overview ==

Original x86-64 was limited by 4-level paing to 256 TiB of virtual address
space and 64 TiB of physical address space. We are already bumping into
this limit: some vendors offers servers with 64 TiB of memory today.

To overcome the limitation upcoming hardware will introduce support for
5-level paging. It is a straight-forward extension of the current page
table structure adding one more layer of translation.

It bumps the limits to 128 PiB of virtual address space and 4 PiB of
physical address space. This "ought to be enough for anybody" ©.

QEMU 2.9 and later support 5-level paging.

Virtual memory layout for 5-level paging is described in
Documentation/x86/x86_64/mm.txt

== Enabling 5-level paging ==

CONFIG_X86_5LEVEL=y enables the feature.

So far, a kernel compiled with the option enabled will be able to boot
only on machines that supports the feature -- see for 'la57' flag in
/proc/cpuinfo.

The plan is to implement boot-time switching between 4- and 5-level paging
in the future.

== User-space and large virtual address space ==

On x86, 5-level paging enables 56-bit userspace virtual address space.
Not all user space is ready to handle wide addresses. It's known that
at least some JIT compilers use higher bits in pointers to encode their
information. It collides with valid pointers with 5-level paging and
leads to crashes.

To mitigate this, we are not going to allocate virtual address space
above 47-bit by default.

But userspace can ask for allocation from full address space by
specifying hint address (with or without MAP_FIXED) above 47-bits.

If hint address set above 47-bit, but MAP_FIXED is not specified, we try
to look for unmapped area by specified address. If it's already
occupied, we look for unmapped area in *full* address space, rather than
from 47-bit window.

A high hint address would only affect the allocation in question, but not
any future mmap()s.

Specifying high hint address on older kernel or on machine without 5-level
paging support is safe. The hint will be ignored and kernel will fall back
to allocation from 47-bit address space.

This approach helps to easily make application's memory allocator aware
about large address space without manually tracking allocated virtual
address space.

One important case we need to handle here is interaction with MPX.
MPX (without MAWA extension) cannot handle addresses above 47-bit, so we
need to make sure that MPX cannot be enabled we already have VMA above
the boundary and forbid creating such VMAs once MPX is enabled.

2 changes: 0 additions & 2 deletions arch/ia64/include/asm/acpi.h
Original file line number Diff line number Diff line change
Expand Up @@ -112,8 +112,6 @@ static inline void arch_acpi_set_pdc_bits(u32 *buf)
buf[2] |= ACPI_PDC_EST_CAPABILITY_SMP;
}

#define acpi_unlazy_tlb(x)

#ifdef CONFIG_ACPI_NUMA
extern cpumask_t early_cpu_possible_map;
#define for_each_possible_early_cpu(cpu) \
Expand Down
4 changes: 2 additions & 2 deletions arch/ia64/kernel/efi.c
Original file line number Diff line number Diff line change
Expand Up @@ -757,14 +757,14 @@ efi_memmap_intersects (unsigned long phys_addr, unsigned long size)
return 0;
}

u32
int
efi_mem_type (unsigned long phys_addr)
{
efi_memory_desc_t *md = efi_memory_descriptor(phys_addr);

if (md)
return md->type;
return 0;
return -EINVAL;
}

u64
Expand Down
49 changes: 49 additions & 0 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ config X86
select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER_UNWINDER && STACK_VALIDATION
select HAVE_STACK_VALIDATION if X86_64
Expand Down Expand Up @@ -329,6 +330,7 @@ config FIX_EARLYCON_MEM

config PGTABLE_LEVELS
int
default 5 if X86_5LEVEL
default 4 if X86_64
default 3 if X86_PAE
default 2
Expand Down Expand Up @@ -1399,6 +1401,24 @@ config X86_PAE
has the cost of more pagetable lookup overhead, and also
consumes more pagetable space per process.

config X86_5LEVEL
bool "Enable 5-level page tables support"
depends on X86_64
---help---
5-level paging enables access to larger address space:
upto 128 PiB of virtual address space and 4 PiB of
physical address space.

It will be supported by future Intel CPUs.

Note: a kernel with this option enabled can only be booted
on machines that support the feature.

See Documentation/x86/x86_64/5level-paging.txt for more
information.

Say N if unsure.

config ARCH_PHYS_ADDR_T_64BIT
def_bool y
depends on X86_64 || X86_PAE
Expand All @@ -1416,6 +1436,35 @@ config X86_DIRECT_GBPAGES
supports them), so don't confuse the user by printing
that we have them enabled.

config ARCH_HAS_MEM_ENCRYPT
def_bool y

config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
---help---
Say yes to enable support for the encryption of system memory.
This requires an AMD processor that supports Secure Memory
Encryption (SME).

config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
bool "Activate AMD Secure Memory Encryption (SME) by default"
default y
depends on AMD_MEM_ENCRYPT
---help---
Say yes to have system memory encrypted by default if running on
an AMD processor that supports Secure Memory Encryption (SME).

If set to Y, then the encryption of system memory can be
deactivated with the mem_encrypt=off command line option.

If set to N, then the encryption of system memory can be
activated with the mem_encrypt=on command line option.

config ARCH_USE_MEMREMAP_PROT
def_bool y
depends on AMD_MEM_ENCRYPT

# Common NUMA Features
config NUMA
bool "Numa Memory Allocation and Scheduler Support"
Expand Down
7 changes: 7 additions & 0 deletions arch/x86/boot/compressed/pagetable.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,13 @@
#define __pa(x) ((unsigned long)(x))
#define __va(x) ((void *)((unsigned long)(x)))

/*
* The pgtable.h and mm/ident_map.c includes make use of the SME related
* information which is not used in the compressed image support. Un-define
* the SME support to avoid any compile and link errors.
*/
#undef CONFIG_AMD_MEM_ENCRYPT

#include "misc.h"

/* These actually do the work of building the kernel identity maps. */
Expand Down
13 changes: 6 additions & 7 deletions arch/x86/include/asm/acpi.h
Original file line number Diff line number Diff line change
Expand Up @@ -150,8 +150,6 @@ static inline void disable_acpi(void) { }
extern int x86_acpi_numa_init(void);
#endif /* CONFIG_ACPI_NUMA */

#define acpi_unlazy_tlb(x) leave_mm(x)

#ifdef CONFIG_ACPI_APEI
static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
{
Expand All @@ -162,12 +160,13 @@ static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
* you call efi_mem_attributes() during boot and at runtime,
* you could theoretically see different attributes.
*
* Since we are yet to see any x86 platforms that require
* anything other than PAGE_KERNEL (some arm64 platforms
* require the equivalent of PAGE_KERNEL_NOCACHE), return that
* until we know differently.
* We are yet to see any x86 platforms that require anything
* other than PAGE_KERNEL (some ARM64 platforms require the
* equivalent of PAGE_KERNEL_NOCACHE). Additionally, if SME
* is active, the ACPI information will not be encrypted,
* so return PAGE_KERNEL_NOENC until we know differently.
*/
return PAGE_KERNEL;
return PAGE_KERNEL_NOENC;
}
#endif

Expand Down
2 changes: 2 additions & 0 deletions arch/x86/include/asm/cmdline.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,7 @@
#define _ASM_X86_CMDLINE_H

int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
int cmdline_find_option(const char *cmdline_ptr, const char *option,
char *buffer, int bufsize);

#endif /* _ASM_X86_CMDLINE_H */
1 change: 1 addition & 0 deletions arch/x86/include/asm/cpufeatures.h
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,7 @@

#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */

#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
Expand Down
4 changes: 3 additions & 1 deletion arch/x86/include/asm/disabled-features.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,13 @@
# define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31))
# define DISABLE_CYRIX_ARR (1<<(X86_FEATURE_CYRIX_ARR & 31))
# define DISABLE_CENTAUR_MCR (1<<(X86_FEATURE_CENTAUR_MCR & 31))
# define DISABLE_PCID 0
#else
# define DISABLE_VME 0
# define DISABLE_K6_MTRR 0
# define DISABLE_CYRIX_ARR 0
# define DISABLE_CENTAUR_MCR 0
# define DISABLE_PCID (1<<(X86_FEATURE_PCID & 31))
#endif /* CONFIG_X86_64 */

#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
Expand All @@ -49,7 +51,7 @@
#define DISABLED_MASK1 0
#define DISABLED_MASK2 0
#define DISABLED_MASK3 (DISABLE_CYRIX_ARR|DISABLE_CENTAUR_MCR|DISABLE_K6_MTRR)
#define DISABLED_MASK4 0
#define DISABLED_MASK4 (DISABLE_PCID)
#define DISABLED_MASK5 0
#define DISABLED_MASK6 0
#define DISABLED_MASK7 0
Expand Down
5 changes: 3 additions & 2 deletions arch/x86/include/asm/dma-mapping.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <asm/io.h>
#include <asm/swiotlb.h>
#include <linux/dma-contiguous.h>
#include <linux/mem_encrypt.h>

#ifdef CONFIG_ISA
# define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
Expand Down Expand Up @@ -57,12 +58,12 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)

static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
{
return paddr;
return __sme_set(paddr);
}

static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
{
return daddr;
return __sme_clr(daddr);
}
#endif /* CONFIG_X86_DMA_REMAP */

Expand Down
Loading

0 comments on commit b1b6f83

Please sign in to comment.