FlatSQL is a flat representation of a PostgreSQL AST. The library provides a Bison parser that materializes AST Nodes into a single compact Flatbuffer vector. It can be compiled to WebAssembly and has been originally built for lightweight SQL instrumentation, running on every user keystroke in DashQL.
Each AST node is packed into 24 bytes and references matched substrings in the original script text. This encoding is compact and efficient for simple passes, but is not directly suited for a full semantic analysis.
make infra_macos # Install dependencies to .infra, `infra_linux` for linux
yarn install # Install npm packages
make proto # Generate flatbuffers
make core_native_o0 # Build unoptimized native core library
make core_native_o2 # Build optimized native core library with debug symbols
make core_wasm_o3 # Build optimized wasm core library
make core_js_o3 # Build js bundle with wasm module and js api
make core_native_tests # Run native tests
make core_js_tests # Run js tests using the wasm module
make parser_snapshots # Update parser snapshots in snapshots/parser
make analyzer_snapshots # Update analyzer snapshots in snapshots/analyzer
make benchmark_pipeline # Benchmark the processing pipeline
make benchmark_layout # Benchmark the schema graph layout
make lsp # Build the lsp library
make vscode # Build the vscode extension
make editor_start # Start the dev server for the editor
Tree-sitter is a great parser framework and I recommend everyone to try it out. It gives you flexible incremental parsing without much hassle and is a perfect fit for many editors.
FlatSQL was built for specific database systems. The systems Hyper, Umbra, NoisePage, AlloyDB and DuckDB all use Bison parsers derived from the PostgreSQL grammar. The PostgreSQL grammar stood the test of time and established itself as de-facto reference for SQL syntax. Staying close to PostgreSQL simplifies building frontends for these database systems without worrying too much about grammar differences. FlatSQL builds around a carefully optimized and very fast parser based on the PostgreSQL-grammar to provide lightweight semantic analysis passes, running on every single keystroke.
FlatSQL is still doing work in O(text-length)
with every input event, as opposed to O(change-size)
by Tree-sitter.
Yet, FlatSQL analyzes most input in well under a millisecond in your browser, even when replacing the entire text.
After all, the parser is not the only component that has to be tuned for fast analysis passes, incremental parsing alone "only" gives you a head-start for the AST update.
FlatSQL maintains B+-tree ropes, dictionary-encodes and tags SQL object names in-flight and performs efficient post-order DFS traversals through linear scans over a compact AST representation.
Here are timings for TPC-DS Q1 on my laptop. All steps run single-threaded on a M1Max. FlatSQL spends 5us with scanning, 8us with parsing, 12us with analyzing and 5us with building the completion index.
Running /Users/andre.kohn/Repositories/flatsql/packages/flatsql/build/native/o2/benchmark_pipeline
Run on (10 X 24.1209 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x10)
Load Average: 6.73, 4.68, 3.56
--------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------
scan_query 5171 ns 5169 ns 109632
parse_query 8414 ns 8414 ns 83443
analyze_query 11926 ns 11925 ns 58638
index_query 4602 ns 4602 ns 150386