Skip to content
Change the repository type filter

All

    Repositories list

    • Apache Pinot (Incubating) - A realtime distributed OLAP datastore
      Java
      Apache License 2.0
      1.3k0025Updated Mar 4, 2025Mar 4, 2025
    • Jupyter Notebook
      Apache License 2.0
      1100Updated Feb 22, 2025Feb 22, 2025
    • delta

      Public
      An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
      Scala
      Apache License 2.0
      1.8k000Updated Jan 6, 2025Jan 6, 2025
    • minio

      Public
      High Performance, Kubernetes Native Object Storage
      Go
      GNU Affero General Public License v3.0
      5.7k000Updated Jan 6, 2025Jan 6, 2025
    • spark

      Public
      Apache Spark - A unified analytics engine for large-scale data processing
      Scala
      Apache License 2.0
      29k200Updated Jan 6, 2025Jan 6, 2025
    • marquez

      Public
      Collect, aggregate, and visualize a data ecosystem's metadata
      Java
      Apache License 2.0
      330000Updated Dec 24, 2024Dec 24, 2024
    • Jupyter magics and kernels for working with remote Spark clusters
      Python
      Other
      447000Updated Dec 24, 2024Dec 24, 2024
    • druid

      Public
      Apache Druid: a high performance real-time analytics database.
      Java
      Apache License 2.0
      3.7k000Updated Dec 24, 2024Dec 24, 2024
    • ClickHouse
      C++
      Apache License 2.0
      7.1k000Updated Dec 24, 2024Dec 24, 2024
    • hive

      Public
      Apache Hive
      Java
      Apache License 2.0
      4.7k000Updated Dec 24, 2024Dec 24, 2024
    • iceberg

      Public
      Apache Iceberg
      Java
      Apache License 2.0
      2.4k000Updated Dec 24, 2024Dec 24, 2024
    • REST job server for Apache Spark
      Scala
      Other
      991100Updated Dec 24, 2024Dec 24, 2024
    • airflow

      Public
      Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
      Python
      Apache License 2.0
      15k000Updated Dec 24, 2024Dec 24, 2024
    • doris

      Public
      Apache Doris is an easy-to-use, high performance and unified analytics database.
      Java
      Apache License 2.0
      3.4k000Updated Dec 24, 2024Dec 24, 2024
    • starrocks

      Public
      StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
      Java
      Apache License 2.0
      1.9k000Updated Dec 24, 2024Dec 24, 2024
    • hudi

      Public
      Upserts, Deletes And Incremental Processing on Big Data.
      Java
      Apache License 2.0
      2.5k000Updated Dec 24, 2024Dec 24, 2024
    • Mirror of Apache livy (Incubating)
      Scala
      Apache License 2.0
      606000Updated Dec 24, 2024Dec 24, 2024
    • hadoop

      Public
      Apache Hadoop
      Java
      Apache License 2.0
      9k000Updated Dec 24, 2024Dec 24, 2024
    • Open, Multi-modal Catalog for Data & AI
      Java
      Apache License 2.0
      444000Updated Dec 23, 2024Dec 23, 2024
    • How to optimize your Spark Cluster with Interactive Spark Jobs
      Scala
      Apache License 2.0
      0000Updated Jul 27, 2024Jul 27, 2024
    • doc

      Public
      Ilum - Apache Spark cluster on Kubernetes
      0610Updated Jun 27, 2024Jun 27, 2024