Starred repositories
8
stars
written in Scala
Clear filter
Apache Spark - A unified analytics engine for large-scale data processing
CMAK is a tool for managing Apache Kafka clusters
The leader in Next-Generation Customer Data Infrastructure
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Html Content / Article Extractor in Scala - open sourced from Gravity Labs
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive