Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MM SPM, bench tool and general Simd/v3 #11617

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

victorjulien
Copy link
Member

Replaces #11614 and #11615

Tool to benchmark detection engine content inspection, which is the
inspection of individual groups of content, etc matches for a buffer.

Also add a set of basic tests for the various single pattern matching
implementation.

Output is in csv. To files for the rule based tests. To stdout for the
spm tests.
Rename to match coding style. Update callers.
AVX2 implementation that compares 32 bytes at a time.

Rearrange code to make parts reusable.

Fall back to smaller SIMD for remaining buffer.

When (remaining) buffer is smaller than 32 bytes fall back to other
SIMD implementations that deal with 16 bytes of data per iteration.

Add 16/32/64 byte implementations using AVX512.
Implement for AVX512, AVX2 and SSE42.
Wrapper around `memmem`.

The case sensitive search is implemented by directly calling `memmem`.

As there is no case insensitieve variant available, a wrapper around
memmem is created, that takes a sliding window approach:

1. take a slice of the haystack
2. convert it to lowercase
3. search it using memmem
4. move window forward
For the transform tolower, use new SIMD enabled tolower logic.

On an AVX2 system, this gives a noticeable speed up:

Non-SIMD:

  --------------------------------------------------------------------------------------------------------------------------------
  Date: 8/4/2024 -- 20:06:58
  --------------------------------------------------------------------------------------------------------------------------------
  Stats for: total
  --------------------------------------------------------------------------------------------------------------------------------
  Prefilter                        Ticks           Called          Max Ticks       Avg             Bytes           Called          Max Bytes       Avg Bytes       Ticks/Byte
  -------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
  file_data#172 (to_lowercase)     3318781786      424799          3201525         7812.00         252733244       20026           98304           12620.00        13.00

AVX2:

  --------------------------------------------------------------------------------------------------------------------------------
  Date: 8/4/2024 -- 20:08:11
  --------------------------------------------------------------------------------------------------------------------------------
  Stats for: total
  --------------------------------------------------------------------------------------------------------------------------------
  Prefilter                        Ticks           Called          Max Ticks       Avg             Bytes           Called          Max Bytes       Avg Bytes       Ticks/Byte
  -------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
  file_data#172 (to_lowercase)     865647888       424798          487271          2037.00         249009608       20326           98304           12250.00        3.00
@victorjulien victorjulien requested review from jasonish and a team as code owners August 7, 2024 19:07
Copy link

codecov bot commented Aug 7, 2024

Codecov Report

Attention: Patch coverage is 56.81159% with 149 lines in your changes missing coverage. Please review.

Project coverage is 82.48%. Comparing base (61cb14d) to head (68ab0c2).
Report is 140 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11617      +/-   ##
==========================================
- Coverage   82.53%   82.48%   -0.05%     
==========================================
  Files         923      924       +1     
  Lines      248838   249228     +390     
==========================================
+ Hits       205381   205587     +206     
- Misses      43457    43641     +184     
Flag Coverage Δ
fuzzcorpus 60.48% <45.45%> (-0.09%) ⬇️
livemode 18.57% <27.84%> (-0.08%) ⬇️
pcap 43.98% <45.45%> (-0.16%) ⬇️
suricata-verify 61.78% <45.96%> (-0.04%) ⬇️
unittests 59.04% <58.46%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

Information: QA ran without warnings.

Pipeline 22015

#define UPPER_LOW 0x40 /* "A" - 1 */
#define UPPER_HIGH 0x5B /* "Z" + 1 */

static inline int SCMemcmpLowercase(const void *s1, const void *s2, size_t len)
// clang-format off
static char scmemcmp_sse41_ul[16] __attribute__((aligned(16))) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, as someone not that familiar with sse/avx/etc, I'd like to see more comments around what these are for?

Copy link
Member

@inashivb inashivb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun code.
Comments:

  • Minor nits inline. Nothing blocking the merge.
  • Tool worked well and created the expected benchmarks ✅
  • SIMD calculations to convert to lower seemed correct ✅
  • memmem SPM seemed to work as intended ✅

tools/benches/bench-content-inspect/main.c Show resolved Hide resolved

uint64_t nsecs = diff.tv_sec * 1000000000ULL + diff.tv_nsec;
uint64_t nsecs_avg = nsecs / cnt;
total_nsecs += nsecs_avg;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is a bit odd. Makes you wonder why does the average get added to the total?

uint64_t nsecs = diff.tv_sec * 1000000000ULL + diff.tv_nsec;
uint64_t nsecs_avg = nsecs / cnt;
total_nsecs += nsecs_avg;
total_evals++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we just use i?

mask1 = _mm256_cmpgt_epi8(b2, upper1);
/* mark all chars lower than upper2 */
mask2 = _mm256_cmpgt_epi8(upper2, b2);
/* merge the two, leaving only those that are true in both */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: They just have to be equal, not necessarily true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, but not a nit. I think this is a logic error. The goal is to take both masks (one for lower bound and the one for upper bound) and create a mask that is only true for bytes that satisfy both. Switching to _mm256_and_si256

Copy link
Member

@inashivb inashivb Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I tested this one on a small dataset and it gave correct results.. 🤔
Could you please share a string where this shows the logical error?

Edit: checked it again. It seems like an unneeded op indeed but not wrong. It's just that the condition in which both the masks are false cannot happen. lmk wdyt
In clearer words: This looks like a not so straightforward way of ANDing the masks to me as intended. Would indeed be good to replace w a proper and call. lmk wdyt

@catenacyber catenacyber added the needs rebase Needs rebase to master label Sep 3, 2024
Copy link
Contributor

@catenacyber catenacyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a rebase and there is a logic error to fix apparently

@victorjulien victorjulien marked this pull request as draft September 11, 2024 08:29
@catenacyber
Copy link
Contributor

What is the plan for this Victor ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs rebase Needs rebase to master
Development

Successfully merging this pull request may close these issues.

5 participants