-
Disney Streaming
- San Francisco Bay Area, CA
Stars
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
This repository contains best profile readme's for your reference.
Apache Flink Kubernetes Operator
Apache Pinot - A realtime distributed OLAP datastore
Stream Processing with Apache Flink - Scala Examples
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
A dark Vim/Neovim color scheme inspired by Atom's One Dark syntax theme.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Build an Elasticsearch index with Python APIs on AWS EC2. Search the Elasticsearch index with appropriate queries.
Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
已经合入(apache/incubator-kyuubi) ACL Management for Apache Spark SQL with Apache Ranger.
Upserts, Deletes And Incremental Processing on Big Data.
ClickHouse® is a real-time analytics DBMS
🐠 Beats - Lightweight shippers for Elasticsearch & Logstash
Logstash - transport and process your logs, events, or other data
Fluentd: Unified Logging Layer (project under CNCF)
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Druid: a high performance real-time analytics database.
Alluxio, data orchestration for analytics and machine learning in the cloud
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Build machine learning and deep learning models on Kaggle.