forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
word-at-a-time: make the interfaces truly generic
This changes the interfaces in <asm/word-at-a-time.h> to be a bit more complicated, but a lot more generic. In particular, it allows us to really do the operations efficiently on both little-endian and big-endian machines, pretty much regardless of machine details. For example, if you can rely on a fast population count instruction on your architecture, this will allow you to make your optimized <asm/word-at-a-time.h> file with that. NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is not truly generic, it actually only works on big-endian. Why? Because on little-endian the generic algorithms are wasteful, since you can inevitably do better. The x86 implementation is an example of that. (The only truly non-generic part of the asm-generic implementation is the "find_zero()" function, and you could make a little-endian version of it. And if the Kbuild infrastructure allowed us to pick a particular header file, that would be lovely) The <asm/word-at-a-time.h> functions are as follows: - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm uses. - has_zero(): take a word, and determine if it has a zero byte in it. It gets the word, the pointer to the constant pool, and a pointer to an intermediate "data" field it can set. This is the "quick-and-dirty" zero tester: it's what is run inside the hot loops. - "prep_zero_mask()": take the word, the data that has_zero() produced, and the constant pool, and generate an *exact* mask of which byte had the first zero. This is run directly *outside* the loop, and allows the "has_zero()" function to answer the "is there a zero byte" question without necessarily getting exactly *which* byte is the first one to contain a zero. If you do multiple byte lookups concurrently (eg "hash_name()", which looks for both NUL and '/' bytes), after you've done the prep_zero_mask() phase, the result of those can be or'ed together to get the "either or" case. - The result from "prep_zero_mask()" can then be fed into "find_zero()" (to find the byte offset of the first byte that was zero) or into "zero_bytemask()" (to find the bytemask of the bytes preceding the zero byte). The existence of zero_bytemask() is optional, and is not necessary for the normal string routines. But dentry name hashing needs it, so if you enable DENTRY_WORD_AT_A_TIME you need to expose it. This changes the generic strncpy_from_user() function and the dentry hashing functions to use these modified word-at-a-time interfaces. This gets us back to the optimized state of the x86 strncpy that we lost in the previous commit when moving over to the generic version. Signed-off-by: Linus Torvalds <[email protected]>
- Loading branch information
Showing
6 changed files
with
102 additions
and
53 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
#ifndef _ASM_WORD_AT_A_TIME_H | ||
#define _ASM_WORD_AT_A_TIME_H | ||
|
||
/* | ||
* This says "generic", but it's actually big-endian only. | ||
* Little-endian can use more efficient versions of these | ||
* interfaces, see for example | ||
* arch/x86/include/asm/word-at-a-time.h | ||
* for those. | ||
*/ | ||
|
||
#include <linux/kernel.h> | ||
|
||
struct word_at_a_time { | ||
const unsigned long high_bits, low_bits; | ||
}; | ||
|
||
#define WORD_AT_A_TIME_CONSTANTS { REPEAT_BYTE(0xfe) + 1, REPEAT_BYTE(0x7f) } | ||
|
||
/* Bit set in the bytes that have a zero */ | ||
static inline long prep_zero_mask(unsigned long val, unsigned long rhs, const struct word_at_a_time *c) | ||
{ | ||
unsigned long mask = (val & c->low_bits) + c->low_bits; | ||
return ~(mask | rhs); | ||
} | ||
|
||
#define create_zero_mask(mask) (mask) | ||
|
||
static inline long find_zero(unsigned long mask) | ||
{ | ||
long byte = 0; | ||
#ifdef CONFIG_64BIT | ||
if (mask >> 32) | ||
mask >>= 32; | ||
else | ||
byte = 4; | ||
#endif | ||
if (mask >> 16) | ||
mask >>= 16; | ||
else | ||
byte += 2; | ||
return (mask >> 8) ? byte : byte + 1; | ||
} | ||
|
||
static inline bool has_zero(unsigned long val, unsigned long *data, const struct word_at_a_time *c) | ||
{ | ||
unsigned long rhs = val | c->low_bits; | ||
*data = rhs; | ||
return (val + c->high_bits) & ~rhs; | ||
} | ||
|
||
#endif /* _ASM_WORD_AT_A_TIME_H */ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters