-
众安在线财险有限公司
- Shanghai
Starred repositories
VincentSleepless / substrait
Forked from substrait-io/substraitA cross platform way to express data transformation, relational algebra, standardized record expression and plans.
Grammars written for ANTLR v4; expectation that the grammars are free of actions.
Examples and guides for using the OpenAI API
Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on und…
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
The next generation of cloud-native big data management expert , Aims to help users rapidly build stable, efficient, and scalable cloud-native platforms for big data.
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去…
The Lineage Analysis system for FlinkSQL supports advanced syntax such as Watermark, UDTF, CEP, Windowing TVFs, and CTAS.
Apache InLong - a one-stop, full-scenario integration framework for massive data
Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data ever…
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
VincentSleepless / bk-job
Forked from TencentBlueKing/bk-job蓝鲸作业平台(Job)是一套运维基础操作管理系统,具备海量任务并发处理能力。除了支持脚本执行、文件分发、定时任务等一系列基础运维场景以外,还支持通过流程调度能力将零碎的单个任务组装成一个自动化作业流程;而每个作业都可做为一个原子节点,提供给上层或周边系统/平台使用,实现调度自动化。
基于 antlr4 的多种数据库SQL解析器,获取SQL中元数据,可用于数据平台产品中的多个场景:ddl语句提取元数据、sql 权限校验、表级血缘、sql语法校验等场景。支持spark、flink、gauss、starrocks、Oracle、MYSQL、Postgresql,sqlserver,、db2等
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
解析SQL,获取字段、表级别的血缘关系。转换成血缘模型,在图数据库neo4j上呈现。
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
实现yarn客户端,datax-on-yarn可以让datax在yarn master上运行
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI