Skip to content

Commit

Permalink
powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch
Browse files Browse the repository at this point in the history
Implement a POWER7 optimised memcpy using VMX and enhanced prefetch
instructions.

This is a copy of the POWER7 optimised copy_to_user/copy_from_user
loop. Detailed implementation and performance details can be found in
commit a66086b (powerpc: POWER7 optimised
copy_to_user/copy_from_user using VMX).

I noticed memcpy issues when profiling a RAID6 workload:

	.memcpy
	.async_memcpy
	.async_copy_data
	.__raid_run_ops
	.handle_stripe
	.raid5d
	.md_thread

I created a simplified testcase by building a RAID6 array with 4 1GB
ramdisks (booting with brd.rd_size=1048576):

# mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3]

I then timed how long it took to write to the entire array:

# dd if=/dev/zero of=/dev/md0 bs=1M

Before: 892 MB/s
After:  999 MB/s

A 12% improvement.

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
  • Loading branch information
antonblanchard authored and ozbenh committed Jul 3, 2012
1 parent bce4b4b commit b3f271e
Show file tree
Hide file tree
Showing 3 changed files with 656 additions and 1 deletion.
3 changes: 2 additions & 1 deletion arch/powerpc/lib/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ obj-$(CONFIG_HAS_IOMEM) += devres.o
obj-$(CONFIG_PPC64) += copypage_64.o copyuser_64.o \
memcpy_64.o usercopy_64.o mem_64.o string.o \
checksum_wrappers_64.o hweight_64.o \
copyuser_power7.o string_64.o copypage_power7.o
copyuser_power7.o string_64.o copypage_power7.o \
memcpy_power7.o
obj-$(CONFIG_XMON) += sstep.o ldstfp.o
obj-$(CONFIG_KPROBES) += sstep.o ldstfp.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += sstep.o ldstfp.o
Expand Down
4 changes: 4 additions & 0 deletions arch/powerpc/lib/memcpy_64.S
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,11 @@

.align 7
_GLOBAL(memcpy)
BEGIN_FTR_SECTION
std r3,48(r1) /* save destination pointer for return value */
FTR_SECTION_ELSE
b memcpy_power7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_VMX_COPY)
PPC_MTOCRF(0x01,r5)
cmpldi cr1,r5,16
neg r6,r3 # LS 3 bits = # bytes to 8-byte dest bdry
Expand Down
Loading

0 comments on commit b3f271e

Please sign in to comment.