Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
x86/mm: Micro-optimise clflush_cache_range()
Whilst inspecting the asm for clflush_cache_range() and some perf profiles that required extensive flushing of single cachelines (from part of the intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading boot_cpu_data.x86_clflush_size on every iteration of the loop. We can manually hoist that read which perf regarded as taking ~25% of the function time for a single cacheline flush. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Acked-by: "H. Peter Anvin" <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Sai Praneeth <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
- Loading branch information