Stars
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Distributed stream processing engine in Rust
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streamiβ¦
A curated list of Rust code and resources.
π₯ π₯ π₯ Open Source JIRA, Linear, Monday, and Asana Alternative. Plane helps you track your issues, epics, and product roadmaps in the simplest way possible.
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
The Metadata Platform for your Data and AI Stack
SQL Lineage Analysis Tool powered by Python
β‘ Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...
SQL Database Explorer [SQLite, libSQL, PostgreSQL, MySQL/MariaDB, DuckDB, ClickHouse]
π Cube β Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Apache Doris is an easy-to-use, high performance and unified analytics database.
DuckDB is an analytical in-process SQL database management system
Ceph is a distributed object, block, and file storage platform
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
Kubernetes Virtualization API and runtime in order to define and manage virtual machines.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.