Skip to content

Commit

Permalink
crypto: arm/chacha20 - always use vrev for 16-bit rotates
Browse files Browse the repository at this point in the history
The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
but the one-way code (used on remainder blocks) implements it with
vshl + vsri, which is slower.  Switch the one-way code to vrev32.16 too.

Signed-off-by: Eric Biggers <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
  • Loading branch information
ebiggers authored and herbertx committed Aug 3, 2018
1 parent f53ad3e commit 4e34e51
Showing 1 changed file with 4 additions and 6 deletions.
10 changes: 4 additions & 6 deletions arch/arm/crypto/chacha20-neon-core.S
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,8 @@ ENTRY(chacha20_block_xor_neon)
.Ldoubleround:
// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
vadd.i32 q0, q0, q1
veor q4, q3, q0
vshl.u32 q3, q4, #16
vsri.u32 q3, q4, #16
veor q3, q3, q0
vrev32.16 q3, q3

// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
vadd.i32 q2, q2, q3
Expand Down Expand Up @@ -82,9 +81,8 @@ ENTRY(chacha20_block_xor_neon)

// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
vadd.i32 q0, q0, q1
veor q4, q3, q0
vshl.u32 q3, q4, #16
vsri.u32 q3, q4, #16
veor q3, q3, q0
vrev32.16 q3, q3

// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
vadd.i32 q2, q2, q3
Expand Down

0 comments on commit 4e34e51

Please sign in to comment.