Stars
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
presto hbase connector 组件基于Presto Connector接口规范实现,用来给Presto增加查询HBase的功能。相比其他开源版本的HBase Connector,我们的性能要快10到100倍以上。
Alluxio, data orchestration for analytics and machine learning in the cloud
该仓库用于记录作者本人参加的各大数据科学竞赛的获奖方案源码以及一些新比赛的原创baseline. 主要涵盖:kaggle, 阿里天池,华为云大赛校园赛,百度aistudio,和鲸社区,datafountain等
A Java library to perform direct I/O in Linux, bypassing file page cache.
Occlum is a memory-safe, multi-process library OS for Intel SGX
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
Apache Beam is a unified programming model for Batch and Streaming data processing.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
A list of learning materials to understand databases internals
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
ZooKeeper client wrapper and rich ZooKeeper framework
[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
😎 A curated list of amazingly awesome Flink and Flink ecosystem resources
Benchmarks for queries over continuous data streams.
Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data