Starred repositories
The official home of the Presto distributed SQL query engine for big data
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
ClickHouse® is a real-time analytics database management system
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-like data
Apache Pulsar - distributed pub-sub messaging system
Apache Doris is an easy-to-use, high performance and unified analytics database.
An AI to play the Rock Paper Scissors game
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
【2024最新版】 大数据 数据分析 电商系统 实时数仓 离线数仓 数据湖 建设方案及实战代码,涉及组件 #flink #paimon #doris #seatunnel #dolphinscheduler #datart #dinky #hudi #iceberg。
Upserts, Deletes And Incremental Processing on Big Data.
Java流行框架源码分析:Spring源码、SpringBoot源码、SpringAOP源码、SpringSecurity源码、SpringSecurity OAuth2源码、JDK源码、Netty源码
A list of awesome beginners-friendly projects.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team co…
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Design patterns implemented in Java
Apache Flink Kubernetes Operator
数据建设与大数据技术知识体系,包含hadoop、hive、spark、flink主流框架和系列框架,数据中台、数据湖、数据治理、数仓建设、数据化转型等
#1 Locally hosted web application that allows you to perform various operations on PDF files
one-stop telemetry collector for nightingale
freeCodeCamp.org's open-source codebase and curriculum. Learn to code for free.