Skip to content

A very fast SIMD (AVX2) accelerated lexer that does not rely on tables

License

Notifications You must be signed in to change notification settings

Mateuszd6/simd-lexing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A very fast single-header C99 AVX2-accelerated lexer that does not use any
tables, can be customized by manipulating macros and is on average 3-5 times
faster than corresponding flex lexer.

It should work out of a box, but it's more of a prove-of-concept right now.

Nice things about it:
* No code generation, include and call a function
* No tables, custom language, etc
* It's fast
* Can define you own callback (through macros) on each token for advanced usage
* Handles some tricky things, which some "example" flex files don't, like
  BACKSLASH-LF in string and single-line comments, CR-LF line endings etc

Examples comming soon, I hope

Speed for lexing the GCC source code (all GCC files concatenated into a single
file, from: https://people.csail.mit.edu/smcc/projects/single-file-programs/ ),
on my Skylake 2.40GHz i5 laptop. This only counts idents, strings etc, does not
parse integers or others (this is on the way, but the reference flex file does
not do that either):
|-------------+----------|
| Lexer:      | time(ms) |
|-------------+----------|
| flex        |      231 |
| flex --fast |      102 |
| flex --full |      106 |
| simd-lex    |       34 |
|-------------+----------|