Stars
LETSQL is a deferred compute system focused on Preprocessing for AI pipelines. Optimize performance with cross-engine caching and static planning. Easily go from research to production with portabl…
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
DiceDB is an open source, redis-compliant, reactive, scalable, highly-available, unified cache optimized for modern hardware.
An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing
Make your functions return something meaningful, typed, and safe!
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
Master programming by recreating your favorite technologies from scratch.
A query language for exploring knowledge graphs.
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Turning PySpark Into a Universal DataFrame API
Dropbase helps developers build and prototype web apps faster with AI. Dropbase is local-first and self hosted.
New file format for storage of large columnar datasets.
Polars extension for general data science use cases
An extensible, lightweight relational/logic programming DSL written in pure Python
An Awesome List of Open-Source Data Engineering Projects
A friendly programming language from the future
list and get specific files from remote zip archives without downloading the whole thing
A modular SQL linter and auto-formatter with support for multiple dialects and templated code.