-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MM SPM, bench tool and general Simd/v3 #11617
base: master
Are you sure you want to change the base?
Conversation
Tool to benchmark detection engine content inspection, which is the inspection of individual groups of content, etc matches for a buffer. Also add a set of basic tests for the various single pattern matching implementation. Output is in csv. To files for the rule based tests. To stdout for the spm tests.
Rename to match coding style. Update callers.
AVX2 implementation that compares 32 bytes at a time. Rearrange code to make parts reusable. Fall back to smaller SIMD for remaining buffer. When (remaining) buffer is smaller than 32 bytes fall back to other SIMD implementations that deal with 16 bytes of data per iteration. Add 16/32/64 byte implementations using AVX512.
Implement for AVX512, AVX2 and SSE42.
Wrapper around `memmem`. The case sensitive search is implemented by directly calling `memmem`. As there is no case insensitieve variant available, a wrapper around memmem is created, that takes a sliding window approach: 1. take a slice of the haystack 2. convert it to lowercase 3. search it using memmem 4. move window forward
For the transform tolower, use new SIMD enabled tolower logic. On an AVX2 system, this gives a noticeable speed up: Non-SIMD: -------------------------------------------------------------------------------------------------------------------------------- Date: 8/4/2024 -- 20:06:58 -------------------------------------------------------------------------------------------------------------------------------- Stats for: total -------------------------------------------------------------------------------------------------------------------------------- Prefilter Ticks Called Max Ticks Avg Bytes Called Max Bytes Avg Bytes Ticks/Byte -------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- file_data#172 (to_lowercase) 3318781786 424799 3201525 7812.00 252733244 20026 98304 12620.00 13.00 AVX2: -------------------------------------------------------------------------------------------------------------------------------- Date: 8/4/2024 -- 20:08:11 -------------------------------------------------------------------------------------------------------------------------------- Stats for: total -------------------------------------------------------------------------------------------------------------------------------- Prefilter Ticks Called Max Ticks Avg Bytes Called Max Bytes Avg Bytes Ticks/Byte -------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- file_data#172 (to_lowercase) 865647888 424798 487271 2037.00 249009608 20326 98304 12250.00 3.00
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11617 +/- ##
==========================================
- Coverage 82.53% 82.48% -0.05%
==========================================
Files 923 924 +1
Lines 248838 249228 +390
==========================================
+ Hits 205381 205587 +206
- Misses 43457 43641 +184
Flags with carried forward coverage won't be shown. Click here to find out more. |
Information: QA ran without warnings. Pipeline 22015 |
#define UPPER_LOW 0x40 /* "A" - 1 */ | ||
#define UPPER_HIGH 0x5B /* "Z" + 1 */ | ||
|
||
static inline int SCMemcmpLowercase(const void *s1, const void *s2, size_t len) | ||
// clang-format off | ||
static char scmemcmp_sse41_ul[16] __attribute__((aligned(16))) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, as someone not that familiar with sse/avx/etc, I'd like to see more comments around what these are for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fun code.
Comments:
- Minor nits inline. Nothing blocking the merge.
- Tool worked well and created the expected benchmarks ✅
- SIMD calculations to convert to lower seemed correct ✅
- memmem SPM seemed to work as intended ✅
|
||
uint64_t nsecs = diff.tv_sec * 1000000000ULL + diff.tv_nsec; | ||
uint64_t nsecs_avg = nsecs / cnt; | ||
total_nsecs += nsecs_avg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming is a bit odd. Makes you wonder why does the average get added to the total?
uint64_t nsecs = diff.tv_sec * 1000000000ULL + diff.tv_nsec; | ||
uint64_t nsecs_avg = nsecs / cnt; | ||
total_nsecs += nsecs_avg; | ||
total_evals++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we just use i
?
mask1 = _mm256_cmpgt_epi8(b2, upper1); | ||
/* mark all chars lower than upper2 */ | ||
mask2 = _mm256_cmpgt_epi8(upper2, b2); | ||
/* merge the two, leaving only those that are true in both */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: They just have to be equal, not necessarily true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, but not a nit. I think this is a logic error. The goal is to take both masks (one for lower bound and the one for upper bound) and create a mask that is only true for bytes that satisfy both. Switching to _mm256_and_si256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I tested this one on a small dataset and it gave correct results.. 🤔
Could you please share a string where this shows the logical error?
Edit: checked it again. It seems like an unneeded op indeed but not wrong. It's just that the condition in which both the masks are false cannot happen. lmk wdyt
In clearer words: This looks like a not so straightforward way of ANDing the masks to me as intended. Would indeed be good to replace w a proper and
call. lmk wdyt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a rebase and there is a logic error to fix apparently
What is the plan for this Victor ? |
Replaces #11614 and #11615