Skip to content

Commit

Permalink
x86/mm: Micro-optimise clflush_cache_range()
Browse files Browse the repository at this point in the history
Whilst inspecting the asm for clflush_cache_range() and some perf profiles
that required extensive flushing of single cachelines (from part of the
intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading
boot_cpu_data.x86_clflush_size on every iteration of the loop. We can
manually hoist that read which perf regarded as taking ~25% of the
function time for a single cacheline flush.

Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Ross Zwisler <[email protected]>
Acked-by: "H. Peter Anvin" <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Luis R. Rodriguez <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Sai Praneeth <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
  • Loading branch information
ickle authored and KAGA-KOKO committed Jan 8, 2016
1 parent 2039e6a commit 1f1a89a
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions arch/x86/mm/pageattr.c
Original file line number Diff line number Diff line change
Expand Up @@ -129,14 +129,16 @@ within(unsigned long addr, unsigned long start, unsigned long end)
*/
void clflush_cache_range(void *vaddr, unsigned int size)
{
unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
const unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
void *p = (void *)((unsigned long)vaddr & ~(clflush_size - 1));
void *vend = vaddr + size;
void *p;

if (p >= vend)
return;

mb();

for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
p < vend; p += boot_cpu_data.x86_clflush_size)
for (; p < vend; p += clflush_size)
clflushopt(p);

mb();
Expand Down

0 comments on commit 1f1a89a

Please sign in to comment.