Starred repositories
Apache Spark - A unified analytics engine for large-scale data processing
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
Code to accompany Advanced Analytics with Spark from O'Reilly Media
Example of http (micro)service in Scala & akka-http
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
Wayeb is a Complex Event Processing and Forecasting (CEP/F) engine written in Scala.
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Dynamically compiles and reloads scala classes
A small akka-persistence-jdbc demo project