Skip to content

Commit

Permalink
Merge tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/lin…
Browse files Browse the repository at this point in the history
…ux/kernel/git/tip/tip

Pull x86 SGX updates from Borislav Petkov:

 - Add support for handling hw errors in SGX pages: poisoning,
   recovering from poison memory and error injection into SGX pages

 - A bunch of changes to the SGX selftests to simplify and allow of SGX
   features testing without the need of a whole SGX software stack

 - Add a sysfs attribute which is supposed to show the amount of SGX
   memory in a NUMA node, similar to what /proc/meminfo is to normal
   memory

 - The usual bunch of fixes and cleanups too

* tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/sgx: Fix NULL pointer dereference on non-SGX systems
  selftests/sgx: Fix corrupted cpuid macro invocation
  x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
  x86/sgx: Fix minor documentation issues
  selftests/sgx: Add test for multiple TCS entry
  selftests/sgx: Enable multiple thread support
  selftests/sgx: Add page permission and exception test
  selftests/sgx: Rename test properties in preparation for more enclave tests
  selftests/sgx: Provide per-op parameter structs for the test enclave
  selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed
  selftests/sgx: Move setup_test_encl() to each TEST_F()
  selftests/sgx: Encpsulate the test enclave creation
  selftests/sgx: Dump segments and /proc/self/maps only on failure
  selftests/sgx: Create a heap for the test enclave
  selftests/sgx: Make data measurement for an enclave segment optional
  selftests/sgx: Assign source for each segment
  selftests/sgx: Fix a benign linker warning
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Hook arch_memory_failure() into mainline code
  ...
  • Loading branch information
torvalds committed Jan 10, 2022
2 parents d3c20bf + 2056e29 commit bfed6ef
Show file tree
Hide file tree
Showing 23 changed files with 698 additions and 103 deletions.
6 changes: 6 additions & 0 deletions Documentation/ABI/stable/sysfs-devices-node
Original file line number Diff line number Diff line change
Expand Up @@ -176,3 +176,9 @@ Contact: Keith Busch <[email protected]>
Description:
The cache write policy: 0 for write-back, 1 for write-through,
other or unknown.

What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
Date: November 2021
Contact: Jarkko Sakkinen <[email protected]>
Description:
The total amount of SGX physical memory in bytes.
19 changes: 19 additions & 0 deletions Documentation/firmware-guide/acpi/apei/einj.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,5 +181,24 @@ You should see something like this in dmesg::
[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)

Special notes for injection into SGX enclaves:

There may be a separate BIOS setup option to enable SGX injection.

The injection process consists of setting some special memory controller
trigger that will inject the error on the next write to the target
address. But the h/w prevents any software outside of an SGX enclave
from accessing enclave pages (even BIOS SMM mode).

The following sequence can be used:
1) Determine physical address of enclave page
2) Use "notrigger=1" mode to inject (this will setup
the injection address, but will not actually inject)
3) Enter the enclave
4) Store data to the virtual address matching physical address from step 1
5) Execute CLFLUSH for that virtual address
6) Spin delay for 250ms
7) Read from the virtual address. This will trigger the error

For more information about EINJ, please refer to ACPI specification
version 4.0, section 17.5 and ACPI 5.0, section 18.6.
14 changes: 7 additions & 7 deletions Documentation/x86/sgx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Overview
Software Guard eXtensions (SGX) hardware enables for user space applications
to set aside private memory regions of code and data:

* Privileged (ring-0) ENCLS functions orchestrate the construction of the.
* Privileged (ring-0) ENCLS functions orchestrate the construction of the
regions.
* Unprivileged (ring-3) ENCLU functions allow an application to enter and
execute inside the regions.
Expand Down Expand Up @@ -91,7 +91,7 @@ In addition to the traditional compiler and linker build process, SGX has a
separate enclave “build” process. Enclaves must be built before they can be
executed (entered). The first step in building an enclave is opening the
**/dev/sgx_enclave** device. Since enclave memory is protected from direct
access, special privileged instructions are Then used to copy data into enclave
access, special privileged instructions are then used to copy data into enclave
pages and establish enclave page permissions.

.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
Expand Down Expand Up @@ -126,13 +126,13 @@ the need to juggle signal handlers.
ksgxd
=====

SGX support includes a kernel thread called *ksgxwapd*.
SGX support includes a kernel thread called *ksgxd*.

EPC sanitization
----------------

ksgxd is started when SGX initializes. Enclave memory is typically ready
For use when the processor powers on or resets. However, if SGX has been in
for use when the processor powers on or resets. However, if SGX has been in
use since the reset, enclave pages may be in an inconsistent state. This might
occur after a crash and kexec() cycle, for instance. At boot, ksgxd
reinitializes all enclave pages so that they can be allocated and re-used.
Expand All @@ -147,7 +147,7 @@ Page reclaimer

Similar to the core kswapd, ksgxd, is responsible for managing the
overcommitment of enclave memory. If the system runs out of enclave memory,
*ksgxwapd* “swaps” enclave memory to normal memory.
*ksgxd* “swaps” enclave memory to normal memory.

Launch Control
==============
Expand All @@ -156,7 +156,7 @@ SGX provides a launch control mechanism. After all enclave pages have been
copied, kernel executes EINIT function, which initializes the enclave. Only after
this the CPU can execute inside the enclave.

ENIT function takes an RSA-3072 signature of the enclave measurement. The function
EINIT function takes an RSA-3072 signature of the enclave measurement. The function
checks that the measurement is correct and signature is signed with the key
hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the
SHA256 of a public key.
Expand Down Expand Up @@ -184,7 +184,7 @@ CPUs starting from Icelake use Total Memory Encryption (TME) in the place of
MEE. TME-based SGX implementations do not have an integrity Merkle tree, which
means integrity and replay-attacks are not mitigated. B, it includes
additional changes to prevent cipher text from being returned and SW memory
aliases from being Created.
aliases from being created.

DMA to enclave memory is blocked by range registers on both MEE and TME systems
(SDM section 41.10).
Expand Down
4 changes: 4 additions & 0 deletions arch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1312,6 +1312,10 @@ config ARCH_HAS_PARANOID_L1D_FLUSH
config DYNAMIC_SIGFRAME
bool

# Select, if arch has a named attribute group bound to NUMA device nodes.
config HAVE_ARCH_NODE_DEV_GROUP
bool

source "kernel/gcov/Kconfig"

source "scripts/gcc-plugins/Kconfig"
Expand Down
2 changes: 2 additions & 0 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,7 @@ config X86
select HAVE_ARCH_KCSAN if X86_64
select X86_FEATURE_NAMES if PROC_FS
select PROC_PID_ARCH_STATUS if PROC_FS
select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI

config INSTRUCTION_DECODER
Expand Down Expand Up @@ -1921,6 +1922,7 @@ config X86_SGX
select SRCU
select MMU_NOTIFIER
select NUMA_KEEP_MEMINFO if NUMA
select XARRAY_MULTI
help
Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
that can be used by applications to set aside private regions of code
Expand Down
8 changes: 8 additions & 0 deletions arch/x86/include/asm/processor.h
Original file line number Diff line number Diff line change
Expand Up @@ -855,4 +855,12 @@ enum mds_mitigations {
MDS_MITIGATION_VMWERV,
};

#ifdef CONFIG_X86_SGX
int arch_memory_failure(unsigned long pfn, int flags);
#define arch_memory_failure arch_memory_failure

bool arch_is_platform_page(u64 paddr);
#define arch_is_platform_page arch_is_platform_page
#endif

#endif /* _ASM_X86_PROCESSOR_H */
4 changes: 4 additions & 0 deletions arch/x86/include/asm/set_memory.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
#ifndef _ASM_X86_SET_MEMORY_H
#define _ASM_X86_SET_MEMORY_H

#include <linux/mm.h>
#include <asm/page.h>
#include <asm-generic/set_memory.h>

Expand Down Expand Up @@ -99,6 +100,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
unsigned long decoy_addr;
int rc;

/* SGX pages are not in the 1:1 map */
if (arch_is_platform_page(pfn << PAGE_SHIFT))
return 0;
/*
* We would like to just call:
* set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);
Expand Down
162 changes: 161 additions & 1 deletion arch/x86/kernel/cpu/sgx/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@
#include <linux/highmem.h>
#include <linux/kthread.h>
#include <linux/miscdevice.h>
#include <linux/node.h>
#include <linux/pagemap.h>
#include <linux/ratelimit.h>
#include <linux/sched/mm.h>
#include <linux/sched/signal.h>
#include <linux/slab.h>
#include <linux/sysfs.h>
#include <asm/sgx.h>
#include "driver.h"
#include "encl.h"
Expand All @@ -20,6 +22,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
static int sgx_nr_epc_sections;
static struct task_struct *ksgxd_tsk;
static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
static DEFINE_XARRAY(sgx_epc_address_space);

/*
* These variables are part of the state of the reclaimer, and must be accessed
Expand Down Expand Up @@ -60,6 +63,24 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)

page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);

/*
* Checking page->poison without holding the node->lock
* is racy, but losing the race (i.e. poison is set just
* after the check) just means __eremove() will be uselessly
* called for a page that sgx_free_epc_page() will put onto
* the node->sgx_poison_page_list later.
*/
if (page->poison) {
struct sgx_epc_section *section = &sgx_epc_sections[page->section];
struct sgx_numa_node *node = section->node;

spin_lock(&node->lock);
list_move(&page->list, &node->sgx_poison_page_list);
spin_unlock(&node->lock);

continue;
}

ret = __eremove(sgx_get_epc_virt_addr(page));
if (!ret) {
/*
Expand Down Expand Up @@ -471,6 +492,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)

page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
list_del_init(&page->list);
page->flags = 0;

spin_unlock(&node->lock);
atomic_long_dec(&sgx_nr_free_pages);
Expand Down Expand Up @@ -624,7 +646,12 @@ void sgx_free_epc_page(struct sgx_epc_page *page)

spin_lock(&node->lock);

list_add_tail(&page->list, &node->free_page_list);
page->owner = NULL;
if (page->poison)
list_add(&page->list, &node->sgx_poison_page_list);
else
list_add_tail(&page->list, &node->free_page_list);
page->flags = SGX_EPC_PAGE_IS_FREE;

spin_unlock(&node->lock);
atomic_long_inc(&sgx_nr_free_pages);
Expand All @@ -648,17 +675,102 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
}

section->phys_addr = phys_addr;
xa_store_range(&sgx_epc_address_space, section->phys_addr,
phys_addr + size - 1, section, GFP_KERNEL);

for (i = 0; i < nr_pages; i++) {
section->pages[i].section = index;
section->pages[i].flags = 0;
section->pages[i].owner = NULL;
section->pages[i].poison = 0;
list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
}

return true;
}

bool arch_is_platform_page(u64 paddr)
{
return !!xa_load(&sgx_epc_address_space, paddr);
}
EXPORT_SYMBOL_GPL(arch_is_platform_page);

static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
{
struct sgx_epc_section *section;

section = xa_load(&sgx_epc_address_space, paddr);
if (!section)
return NULL;

return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
}

/*
* Called in process context to handle a hardware reported
* error in an SGX EPC page.
* If the MF_ACTION_REQUIRED bit is set in flags, then the
* context is the task that consumed the poison data. Otherwise
* this is called from a kernel thread unrelated to the page.
*/
int arch_memory_failure(unsigned long pfn, int flags)
{
struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
struct sgx_epc_section *section;
struct sgx_numa_node *node;

/*
* mm/memory-failure.c calls this routine for all errors
* where there isn't a "struct page" for the address. But that
* includes other address ranges besides SGX.
*/
if (!page)
return -ENXIO;

/*
* If poison was consumed synchronously. Send a SIGBUS to
* the task. Hardware has already exited the SGX enclave and
* will not allow re-entry to an enclave that has a memory
* error. The signal may help the task understand why the
* enclave is broken.
*/
if (flags & MF_ACTION_REQUIRED)
force_sig(SIGBUS);

section = &sgx_epc_sections[page->section];
node = section->node;

spin_lock(&node->lock);

/* Already poisoned? Nothing more to do */
if (page->poison)
goto out;

page->poison = 1;

/*
* If the page is on a free list, move it to the per-node
* poison page list.
*/
if (page->flags & SGX_EPC_PAGE_IS_FREE) {
list_move(&page->list, &node->sgx_poison_page_list);
goto out;
}

/*
* TBD: Add additional plumbing to enable pre-emptive
* action for asynchronous poison notification. Until
* then just hope that the poison:
* a) is not accessed - sgx_free_epc_page() will deal with it
* when the user gives it back
* b) results in a recoverable machine check rather than
* a fatal one
*/
out:
spin_unlock(&node->lock);
return 0;
}

/**
* A section metric is concatenated in a way that @low bits 12-31 define the
* bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
Expand All @@ -670,6 +782,48 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
((high & GENMASK_ULL(19, 0)) << 32);
}

#ifdef CONFIG_NUMA
static ssize_t sgx_total_bytes_show(struct device *dev, struct device_attribute *attr, char *buf)
{
return sysfs_emit(buf, "%lu\n", sgx_numa_nodes[dev->id].size);
}
static DEVICE_ATTR_RO(sgx_total_bytes);

static umode_t arch_node_attr_is_visible(struct kobject *kobj,
struct attribute *attr, int idx)
{
/* Make all x86/ attributes invisible when SGX is not initialized: */
if (nodes_empty(sgx_numa_mask))
return 0;

return attr->mode;
}

static struct attribute *arch_node_dev_attrs[] = {
&dev_attr_sgx_total_bytes.attr,
NULL,
};

const struct attribute_group arch_node_dev_group = {
.name = "x86",
.attrs = arch_node_dev_attrs,
.is_visible = arch_node_attr_is_visible,
};

static void __init arch_update_sysfs_visibility(int nid)
{
struct node *node = node_devices[nid];
int ret;

ret = sysfs_update_group(&node->dev.kobj, &arch_node_dev_group);

if (ret)
pr_err("sysfs update failed (%d), files may be invisible", ret);
}
#else /* !CONFIG_NUMA */
static void __init arch_update_sysfs_visibility(int nid) {}
#endif

static bool __init sgx_page_cache_init(void)
{
u32 eax, ebx, ecx, edx, type;
Expand Down Expand Up @@ -713,10 +867,16 @@ static bool __init sgx_page_cache_init(void)
if (!node_isset(nid, sgx_numa_mask)) {
spin_lock_init(&sgx_numa_nodes[nid].lock);
INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
INIT_LIST_HEAD(&sgx_numa_nodes[nid].sgx_poison_page_list);
node_set(nid, sgx_numa_mask);
sgx_numa_nodes[nid].size = 0;

/* Make SGX-specific node sysfs files visible: */
arch_update_sysfs_visibility(nid);
}

sgx_epc_sections[i].node = &sgx_numa_nodes[nid];
sgx_numa_nodes[nid].size += size;

sgx_nr_epc_sections++;
}
Expand Down
Loading

0 comments on commit bfed6ef

Please sign in to comment.