Skip to content

Commit

Permalink
Merge tag 'x86-cpu-2021-08-30' of git://git.kernel.org/pub/scm/linux/…
Browse files Browse the repository at this point in the history
…kernel/git/tip/tip

Pull x86 cache flush updates from Thomas Gleixner:
 "A reworked version of the opt-in L1D flush mechanism.

  This is a stop gap for potential future speculation related hardware
  vulnerabilities and a mechanism for truly security paranoid
  applications.

  It allows a task to request that the L1D cache is flushed when the
  kernel switches to a different mm. This can be requested via prctl().

  Changes vs the previous versions:

   - Get rid of the software flush fallback

   - Make the handling consistent with other mitigations

   - Kill the task when it ends up on a SMT enabled core which defeats
     the purpose of L1D flushing obviously"

* tag 'x86-cpu-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: Add L1D flushing Documentation
  x86, prctl: Hook L1D flushing in via prctl
  x86/mm: Prepare for opt-in based L1D flush in switch_mm()
  x86/process: Make room for TIF_SPEC_L1D_FLUSH
  sched: Add task_work callback for paranoid L1D flush
  x86/mm: Refactor cond_ibpb() to support other use cases
  x86/smp: Add a per-cpu view of SMT state
  • Loading branch information
torvalds committed Aug 30, 2021
2 parents 7d6e3fa + b7fe54f commit 0a096f2
Show file tree
Hide file tree
Showing 15 changed files with 281 additions and 28 deletions.
1 change: 1 addition & 0 deletions Documentation/admin-guide/hw-vuln/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ are configurable at compile, boot or run time.
multihit.rst
special-register-buffer-data-sampling.rst
core-scheduling.rst
l1d_flush.rst
69 changes: 69 additions & 0 deletions Documentation/admin-guide/hw-vuln/l1d_flush.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
L1D Flushing
============

With an increasing number of vulnerabilities being reported around data
leaks from the Level 1 Data cache (L1D) the kernel provides an opt-in
mechanism to flush the L1D cache on context switch.

This mechanism can be used to address e.g. CVE-2020-0550. For applications
the mechanism keeps them safe from vulnerabilities, related to leaks
(snooping of) from the L1D cache.


Related CVEs
------------
The following CVEs can be addressed by this
mechanism

============= ======================== ==================
CVE-2020-0550 Improper Data Forwarding OS related aspects
============= ======================== ==================

Usage Guidelines
----------------

Please see document: :ref:`Documentation/userspace-api/spec_ctrl.rst
<set_spec_ctrl>` for details.

**NOTE**: The feature is disabled by default, applications need to
specifically opt into the feature to enable it.

Mitigation
----------

When PR_SET_L1D_FLUSH is enabled for a task a flush of the L1D cache is
performed when the task is scheduled out and the incoming task belongs to a
different process and therefore to a different address space.

If the underlying CPU supports L1D flushing in hardware, the hardware
mechanism is used, software fallback for the mitigation, is not supported.

Mitigation control on the kernel command line
---------------------------------------------

The kernel command line allows to control the L1D flush mitigations at boot
time with the option "l1d_flush=". The valid arguments for this option are:

============ =============================================================
on Enables the prctl interface, applications trying to use
the prctl() will fail with an error if l1d_flush is not
enabled
============ =============================================================

By default the mechanism is disabled.

Limitations
-----------

The mechanism does not mitigate L1D data leaks between tasks belonging to
different processes which are concurrently executing on sibling threads of
a physical CPU core when SMT is enabled on the system.

This can be addressed by controlled placement of processes on physical CPU
cores or by disabling SMT. See the relevant chapter in the L1TF mitigation
document: :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.

**NOTE** : The opt-in of a task for L1D flushing works only when the task's
affinity is limited to cores running in non-SMT mode. If a task which
requested L1D flushing is scheduled on a SMT-enabled core the kernel sends
a SIGBUS to the task.
17 changes: 17 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2421,6 +2421,23 @@
feature (tagged TLBs) on capable Intel chips.
Default is 1 (enabled)

l1d_flush= [X86,INTEL]
Control mitigation for L1D based snooping vulnerability.

Certain CPUs are vulnerable to an exploit against CPU
internal buffers which can forward information to a
disclosure gadget under certain conditions.

In vulnerable processors, the speculatively
forwarded data can be used in a cache side channel
attack, to access data to which the attacker does
not have direct access.

This parameter controls the mitigation. The
options are:

on - enable the interface for the mitigation

l1tf= [X86] Control mitigation of the L1TF vulnerability on
affected CPUs

Expand Down
8 changes: 8 additions & 0 deletions Documentation/userspace-api/spec_ctrl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,3 +106,11 @@ Speculation misfeature controls
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);

- PR_SPEC_L1D_FLUSH: Flush L1D Cache on context switch out of the task
(works only when tasks run on non SMT cores)

Invocations:
* prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, 0, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, PR_SPEC_ENABLE, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, PR_SPEC_DISABLE, 0, 0);
3 changes: 3 additions & 0 deletions arch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1282,6 +1282,9 @@ config ARCH_SPLIT_ARG64
config ARCH_HAS_ELFCORE_COMPAT
bool

config ARCH_HAS_PARANOID_L1D_FLUSH
bool

source "kernel/gcov/Kconfig"

source "scripts/gcc-plugins/Kconfig"
Expand Down
1 change: 1 addition & 0 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ config X86
select ARCH_WANT_HUGE_PMD_SHARE
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANTS_THP_SWAP if X86_64
select ARCH_HAS_PARANOID_L1D_FLUSH
select BUILDTIME_TABLE_SORT
select CLKEVT_I8253
select CLOCKSOURCE_VALIDATE_LAST_CYCLE
Expand Down
2 changes: 2 additions & 0 deletions arch/x86/include/asm/nospec-branch.h
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,8 @@ DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
DECLARE_STATIC_KEY_FALSE(mds_user_clear);
DECLARE_STATIC_KEY_FALSE(mds_idle_clear);

DECLARE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);

#include <asm/segment.h>

/**
Expand Down
2 changes: 2 additions & 0 deletions arch/x86/include/asm/processor.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,8 @@ struct cpuinfo_x86 {
u16 logical_die_id;
/* Index into per_cpu list: */
u16 cpu_index;
/* Is SMT active on this core? */
bool smt_active;
u32 microcode;
/* Address space bits used by the cache internally */
u8 x86_cache_bits;
Expand Down
6 changes: 4 additions & 2 deletions arch/x86/include/asm/thread_info.h
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ struct thread_info {
#define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/
#define TIF_SSBD 5 /* Speculative store bypass disable */
#define TIF_SPEC_IB 9 /* Indirect branch speculation mitigation */
#define TIF_SPEC_FORCE_UPDATE 10 /* Force speculation MSR update in context switch */
#define TIF_SPEC_L1D_FLUSH 10 /* Flush L1D on mm switches (processes) */
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
#define TIF_UPROBE 12 /* breakpointed or singlestepping */
#define TIF_PATCH_PENDING 13 /* pending live patching update */
Expand All @@ -93,6 +93,7 @@ struct thread_info {
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
#define TIF_IO_BITMAP 22 /* uses I/O bitmap */
#define TIF_SPEC_FORCE_UPDATE 23 /* Force speculation MSR update in context switch */
#define TIF_FORCED_TF 24 /* true if TF in eflags artificially */
#define TIF_BLOCKSTEP 25 /* set when we want DEBUGCTLMSR_BTF */
#define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */
Expand All @@ -104,7 +105,7 @@ struct thread_info {
#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
#define _TIF_SSBD (1 << TIF_SSBD)
#define _TIF_SPEC_IB (1 << TIF_SPEC_IB)
#define _TIF_SPEC_FORCE_UPDATE (1 << TIF_SPEC_FORCE_UPDATE)
#define _TIF_SPEC_L1D_FLUSH (1 << TIF_SPEC_L1D_FLUSH)
#define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
#define _TIF_UPROBE (1 << TIF_UPROBE)
#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
Expand All @@ -115,6 +116,7 @@ struct thread_info {
#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
#define _TIF_SPEC_FORCE_UPDATE (1 << TIF_SPEC_FORCE_UPDATE)
#define _TIF_FORCED_TF (1 << TIF_FORCED_TF)
#define _TIF_BLOCKSTEP (1 << TIF_BLOCKSTEP)
#define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES)
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/include/asm/tlbflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ struct tlb_state {
/* Last user mm for optimizing IBPB */
union {
struct mm_struct *last_user_mm;
unsigned long last_user_mm_ibpb;
unsigned long last_user_mm_spec;
};

u16 loaded_mm_asid;
Expand Down
70 changes: 70 additions & 0 deletions arch/x86/kernel/cpu/bugs.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ static void __init mds_select_mitigation(void);
static void __init mds_print_mitigation(void);
static void __init taa_select_mitigation(void);
static void __init srbds_select_mitigation(void);
static void __init l1d_flush_select_mitigation(void);

/* The base value of the SPEC_CTRL MSR that always has to be preserved. */
u64 x86_spec_ctrl_base;
Expand Down Expand Up @@ -76,6 +77,13 @@ EXPORT_SYMBOL_GPL(mds_user_clear);
DEFINE_STATIC_KEY_FALSE(mds_idle_clear);
EXPORT_SYMBOL_GPL(mds_idle_clear);

/*
* Controls whether l1d flush based mitigations are enabled,
* based on hw features and admin setting via boot parameter
* defaults to false
*/
DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);

void __init check_bugs(void)
{
identify_boot_cpu();
Expand Down Expand Up @@ -111,6 +119,7 @@ void __init check_bugs(void)
mds_select_mitigation();
taa_select_mitigation();
srbds_select_mitigation();
l1d_flush_select_mitigation();

/*
* As MDS and TAA mitigations are inter-related, print MDS
Expand Down Expand Up @@ -491,6 +500,34 @@ static int __init srbds_parse_cmdline(char *str)
}
early_param("srbds", srbds_parse_cmdline);

#undef pr_fmt
#define pr_fmt(fmt) "L1D Flush : " fmt

enum l1d_flush_mitigations {
L1D_FLUSH_OFF = 0,
L1D_FLUSH_ON,
};

static enum l1d_flush_mitigations l1d_flush_mitigation __initdata = L1D_FLUSH_OFF;

static void __init l1d_flush_select_mitigation(void)
{
if (!l1d_flush_mitigation || !boot_cpu_has(X86_FEATURE_FLUSH_L1D))
return;

static_branch_enable(&switch_mm_cond_l1d_flush);
pr_info("Conditional flush on switch_mm() enabled\n");
}

static int __init l1d_flush_parse_cmdline(char *str)
{
if (!strcmp(str, "on"))
l1d_flush_mitigation = L1D_FLUSH_ON;

return 0;
}
early_param("l1d_flush", l1d_flush_parse_cmdline);

#undef pr_fmt
#define pr_fmt(fmt) "Spectre V1 : " fmt

Expand Down Expand Up @@ -1215,6 +1252,24 @@ static void task_update_spec_tif(struct task_struct *tsk)
speculation_ctrl_update_current();
}

static int l1d_flush_prctl_set(struct task_struct *task, unsigned long ctrl)
{

if (!static_branch_unlikely(&switch_mm_cond_l1d_flush))
return -EPERM;

switch (ctrl) {
case PR_SPEC_ENABLE:
set_ti_thread_flag(&task->thread_info, TIF_SPEC_L1D_FLUSH);
return 0;
case PR_SPEC_DISABLE:
clear_ti_thread_flag(&task->thread_info, TIF_SPEC_L1D_FLUSH);
return 0;
default:
return -ERANGE;
}
}

static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl)
{
if (ssb_mode != SPEC_STORE_BYPASS_PRCTL &&
Expand Down Expand Up @@ -1324,6 +1379,8 @@ int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which,
return ssb_prctl_set(task, ctrl);
case PR_SPEC_INDIRECT_BRANCH:
return ib_prctl_set(task, ctrl);
case PR_SPEC_L1D_FLUSH:
return l1d_flush_prctl_set(task, ctrl);
default:
return -ENODEV;
}
Expand All @@ -1340,6 +1397,17 @@ void arch_seccomp_spec_mitigate(struct task_struct *task)
}
#endif

static int l1d_flush_prctl_get(struct task_struct *task)
{
if (!static_branch_unlikely(&switch_mm_cond_l1d_flush))
return PR_SPEC_FORCE_DISABLE;

if (test_ti_thread_flag(&task->thread_info, TIF_SPEC_L1D_FLUSH))
return PR_SPEC_PRCTL | PR_SPEC_ENABLE;
else
return PR_SPEC_PRCTL | PR_SPEC_DISABLE;
}

static int ssb_prctl_get(struct task_struct *task)
{
switch (ssb_mode) {
Expand Down Expand Up @@ -1390,6 +1458,8 @@ int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which)
return ssb_prctl_get(task);
case PR_SPEC_INDIRECT_BRANCH:
return ib_prctl_get(task);
case PR_SPEC_L1D_FLUSH:
return l1d_flush_prctl_get(task);
default:
return -ENODEV;
}
Expand Down
10 changes: 9 additions & 1 deletion arch/x86/kernel/smpboot.c
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,9 @@ void set_cpu_sibling_map(int cpu)
if (threads > __max_smt_threads)
__max_smt_threads = threads;

for_each_cpu(i, topology_sibling_cpumask(cpu))
cpu_data(i).smt_active = threads > 1;

/*
* This needs a separate iteration over the cpus because we rely on all
* topology_sibling_cpumask links to be set-up.
Expand Down Expand Up @@ -1552,8 +1555,13 @@ static void remove_siblinginfo(int cpu)

for_each_cpu(sibling, topology_die_cpumask(cpu))
cpumask_clear_cpu(cpu, topology_die_cpumask(sibling));
for_each_cpu(sibling, topology_sibling_cpumask(cpu))

for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
if (cpumask_weight(topology_sibling_cpumask(sibling)) == 1)
cpu_data(sibling).smt_active = false;
}

for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
cpumask_clear(cpu_llc_shared_mask(cpu));
Expand Down
Loading

0 comments on commit 0a096f2

Please sign in to comment.