Skip to content

Commit

Permalink
x86, core: Optimize hweight32()
Browse files Browse the repository at this point in the history
Optimize hweight32 by using the same technique in hweight64.

The proof of this technique can be found in the commit log for
f9b4192 ("bitops: hweight()
speedup").

The userspace benchmark on x86_32 showed 20% speedup with
bitmap_weight() which uses hweight32 to count bits for each
unsigned long on 32bit architectures.

 int main(void)
 {
	#define SZ (1024 * 1024 * 512)

	static DECLARE_BITMAP(bitmap, SZ) = {
	        [0 ... 100] = 1,
	};

	return bitmap_weight(bitmap, SZ);
 }

Signed-off-by: Akinobu Mita <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
LKML-Reference: <[email protected]>
[ only x86 sets ARCH_HAS_FAST_MULTIPLIER so we do this via the x86 tree]
Signed-off-by: Ingo Molnar <[email protected]>
  • Loading branch information
mita authored and Ingo Molnar committed Dec 28, 2009
1 parent 6b7b284 commit 39d997b
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions lib/hweight.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,18 @@

unsigned int hweight32(unsigned int w)
{
#ifdef ARCH_HAS_FAST_MULTIPLIER
w -= (w >> 1) & 0x55555555;
w = (w & 0x33333333) + ((w >> 2) & 0x33333333);
w = (w + (w >> 4)) & 0x0f0f0f0f;
return (w * 0x01010101) >> 24;
#else
unsigned int res = w - ((w >> 1) & 0x55555555);
res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
res = (res + (res >> 4)) & 0x0F0F0F0F;
res = res + (res >> 8);
return (res + (res >> 16)) & 0x000000FF;
#endif
}
EXPORT_SYMBOL(hweight32);

Expand Down

0 comments on commit 39d997b

Please sign in to comment.