bigdata
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
The next generation of cloud-native big data management expert , Aims to help users rapidly build stable, efficient, and scalable cloud-native platforms for big data.
An Open Standard for lineage metadata collection
Collect, aggregate, and visualize a data ecosystem's metadata
基于 antlr4 的多种数据库SQL解析器,获取SQL中元数据,可用于数据平台产品中的多个场景:ddl语句提取元数据、sql 权限校验、表级血缘、sql语法校验等场景。支持spark、flink、gauss、starrocks、Oracle、MYSQL、Postgresql,sqlserver,、db2等
Open data platform based on Kubernetes. Scaleph supports SeaTunnel、Flink and Doris backended by SeaTunnel on Flink engine、Flink Kubernetes Operator and Doris operator.
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
The Metadata Platform for your Data and AI Stack
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
Big data computing platform based on Spark <至轻云-超轻量级大数据计算平台/数据中台>
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Apache Paimon Rust The rust implementation of Apache Paimon.
Lakekeeper: A Rust native Iceberg REST Catalog
运行时动态注册切换数据源,自动生成SQL(DDL/DML/DQL),读写元数据,对比数据库结构差异。适配100+关系/非关系数据库。 常用于动态场景的底层支持,如:数据中台、可视化、低代码后台、工作流、自定义表单、异构数据库迁移同步、物联网车联网数据处理、数据清洗、运行时自定义报表/查询条件/数据结构、爬虫数据解析等
A collection of RBIR projects and posts for anyone interested in joining this journey.