stack
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
The world's simplest facial recognition api for Python and the command line
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
Maxwell's daemon, a mysql-to-json kafka producer
Instructions for getting started with Ververica Platform on minikube.
Flink CDC is a streaming data integration tool
《剑指Offer:名企面试官精讲典型编程面试题》第二版源代码
MaxCompute spark demo for building a runnable application.
Fivetran data models for QuickBooks using dbt.
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
为Docker Desktop for Mac/Windows开启Kubernetes和Istio。
Production-Grade Container Scheduling and Management
TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
Apache Flink Kubernetes Operator
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.
Example code and files from "Kubernetes: Up and Running"