Skip to content

Commit

Permalink
x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
Browse files Browse the repository at this point in the history
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
SS == 0 results in an invalid usermode state in which SS is apparently
equal to __USER_DS but causes #SS if used.

Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
ensuring that SYSRET never happens with SS set to NULL.

This was exposed by a recent vDSO cleanup.

Fixes: e7d6eef x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Peter Anvin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Brian Gerst <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
amluto authored and torvalds committed Apr 27, 2015
1 parent 1190944 commit 61f01dd
Show file tree
Hide file tree
Showing 5 changed files with 48 additions and 0 deletions.
7 changes: 7 additions & 0 deletions arch/x86/ia32/ia32entry.S
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,13 @@ sysretl_from_sys_call:
* cs and ss are loaded from MSRs.
* (Note: 32bit->32bit SYSRET is different: since r11
* does not exist, it merely sets eflags.IF=1).
*
* NB: On AMD CPUs with the X86_BUG_SYSRET_SS_ATTRS bug, the ss
* descriptor is not reinitialized. This means that we must
* avoid SYSRET with SS == NULL, which could happen if we schedule,
* exit the kernel, and re-enter using an interrupt vector. (All
* interrupt entries on x86_64 set SS to NULL.) We prevent that
* from happening by reloading SS in __switch_to.
*/
USERGS_SYSRET32

Expand Down
1 change: 1 addition & 0 deletions arch/x86/include/asm/cpufeature.h
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,7 @@
#define X86_BUG_11AP X86_BUG(5) /* Bad local APIC aka 11AP */
#define X86_BUG_FXSAVE_LEAK X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */
#define X86_BUG_CLFLUSH_MONITOR X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */
#define X86_BUG_SYSRET_SS_ATTRS X86_BUG(8) /* SYSRET doesn't fix up SS attrs */

#if defined(__KERNEL__) && !defined(__ASSEMBLY__)

Expand Down
3 changes: 3 additions & 0 deletions arch/x86/kernel/cpu/amd.c
Original file line number Diff line number Diff line change
Expand Up @@ -720,6 +720,9 @@ static void init_amd(struct cpuinfo_x86 *c)
if (!cpu_has(c, X86_FEATURE_3DNOWPREFETCH))
if (cpu_has(c, X86_FEATURE_3DNOW) || cpu_has(c, X86_FEATURE_LM))
set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH);

/* AMD CPUs don't reset SS attributes on SYSRET */
set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
}

#ifdef CONFIG_X86_32
Expand Down
9 changes: 9 additions & 0 deletions arch/x86/kernel/entry_64.S
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,15 @@ system_call_fastpath:
* rflags from r11 (but RF and VM bits are forced to 0),
* cs and ss are loaded from MSRs.
* Restoration of rflags re-enables interrupts.
*
* NB: On AMD CPUs with the X86_BUG_SYSRET_SS_ATTRS bug, the ss
* descriptor is not reinitialized. This means that we should
* avoid SYSRET with SS == NULL, which could happen if we schedule,
* exit the kernel, and re-enter using an interrupt vector. (All
* interrupt entries on x86_64 set SS to NULL.) We prevent that
* from happening by reloading SS in __switch_to. (Actually
* detecting the failure in 64-bit userspace is tricky but can be
* done.)
*/
USERGS_SYSRET64

Expand Down
28 changes: 28 additions & 0 deletions arch/x86/kernel/process_64.c
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,34 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV))
__switch_to_xtra(prev_p, next_p, tss);

if (static_cpu_has_bug(X86_BUG_SYSRET_SS_ATTRS)) {
/*
* AMD CPUs have a misfeature: SYSRET sets the SS selector but
* does not update the cached descriptor. As a result, if we
* do SYSRET while SS is NULL, we'll end up in user mode with
* SS apparently equal to __USER_DS but actually unusable.
*
* The straightforward workaround would be to fix it up just
* before SYSRET, but that would slow down the system call
* fast paths. Instead, we ensure that SS is never NULL in
* system call context. We do this by replacing NULL SS
* selectors at every context switch. SYSCALL sets up a valid
* SS, so the only way to get NULL is to re-enter the kernel
* from CPL 3 through an interrupt. Since that can't happen
* in the same task as a running syscall, we are guaranteed to
* context switch between every interrupt vector entry and a
* subsequent SYSRET.
*
* We read SS first because SS reads are much faster than
* writes. Out of caution, we force SS to __KERNEL_DS even if
* it previously had a different non-NULL value.
*/
unsigned short ss_sel;
savesegment(ss, ss_sel);
if (ss_sel != __KERNEL_DS)
loadsegment(ss, __KERNEL_DS);
}

return prev_p;
}

Expand Down

0 comments on commit 61f01dd

Please sign in to comment.