- shanghai
-
10:16
(UTC -12:00)
bigData
A curated list of awesome big data frameworks, ressources and other awesomeness.
Upserts, Deletes And Incremental Processing on Big Data.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Apache Spark - A unified analytics engine for large-scale data processing
The official home of the Presto distributed SQL query engine for big data
Apache Druid: a high performance real-time analytics database.
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
A composable and fully extensible C++ execution engine library for data management systems.
Notes talking about the design and implementation of Apache Spark
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.