Stars
nannyml: post-deployment data science in python
Logica is a logic programming language that compiles to SQL. It runs on DuckDB, Google BigQuery, PostgreSQL and SQLite.
The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
DiceDB is an open-source in-memory reactive database with query subscriptions.
An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing
Make your functions return something meaningful, typed, and safe!
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
Master programming by recreating your favorite technologies from scratch.
A query language for exploring knowledge graphs.
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Turning PySpark Into a Universal DataFrame API
Dropbase helps developers build and prototype web apps faster with AI. Dropbase is local-first and self hosted.
New file format for storage of large columnar datasets.
Polars extension for general data science use cases
An extensible, lightweight relational/logic programming DSL written in pure Python
An Awesome List of Open-Source Data Engineering Projects