Skip to content

Commit

Permalink
block-sha1: improved SHA1 hashing
Browse files Browse the repository at this point in the history
I think I have found a way to avoid the gcc crazyness.

Lookie here:

	#             TIME[s] SPEED[MB/s]
	rfc3174         5.094       119.8
	rfc3174         5.098       119.7
	linus           1.462       417.5
	linusas         2.008         304
	linusas2        1.878         325
	mozilla         5.566       109.6
	mozillaas       5.866       104.1
	openssl         1.609       379.3
	spelvin         1.675       364.5
	spelvina        1.601       381.3
	nettle          1.591       383.6

notice? I outperform all the hand-tuned asm on 32-bit too. By quite a
margin, in fact.

Now, I didn't try a P4, and it's possible that it won't do that there, but
the 32-bit code generation sure looks impressive on my Nehalem box. The
magic? I force the stores to the 512-bit hash bucket to be done in order.
That seems to help a lot.

The diff is trivial (on top of the "rename registers with cpp" patch), as
appended. And it does seem to fix the P4 issues too, although I can
obviously (once again) only test Prescott, and only in 64-bit mode:

	#             TIME[s] SPEED[MB/s]
	rfc3174         1.662       36.73
	rfc3174          1.64       37.22
	linus          0.2523       241.9
	linusas        0.4367       139.8
	linusas2       0.4487         136
	mozilla        0.9704        62.9
	mozillaas      0.9399       64.94

that's some really impressive improvement. All from just saying "do the
stores in the order I told you to, dammit!" to the compiler.

Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
  • Loading branch information
torvalds authored and gitster committed Aug 8, 2009
1 parent 30d12d4 commit 66c9c6c
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions block-sha1/sha1.c
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)

/* This "rolls" over the 512-bit array */
#define W(x) (array[(x)&15])
#define setW(x, val) (*(volatile unsigned int *)&W(x) = (val))

/*
* Where do we get the source from? The first 16 iterations get it from
Expand All @@ -102,9 +103,9 @@ void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
#define SHA_MIX(t) SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1)

#define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \
unsigned int TEMP = input(t); W(t) = TEMP; \
TEMP += E + SHA_ROL(A,5) + (fn) + (constant); \
B = SHA_ROR(B, 2); E = TEMP; } while (0)
unsigned int TEMP = input(t); setW(t, TEMP); \
E += TEMP + SHA_ROL(A,5) + (fn) + (constant); \
B = SHA_ROR(B, 2); } while (0)

#define T_0_15(t, A, B, C, D, E) SHA_ROUND(t, SHA_SRC, (((C^D)&B)^D) , 0x5a827999, A, B, C, D, E )
#define T_16_19(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (((C^D)&B)^D) , 0x5a827999, A, B, C, D, E )
Expand Down

0 comments on commit 66c9c6c

Please sign in to comment.