Stars
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
A topic-centric list of HQ open datasets.
Records actions made in the AWS Management Console and outputs the equivalent CLI/SDK commands and CloudFormation/Terraform templates.
CLI tool which enables you to login and retrieve AWS temporary credentials using a SAML IDP
🔥 Simple AWS authentication CLI with support for MFA, secure credential storage and easy IAM role switching.
State of the Art Natural Language Processing
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Тимлид – это ❄️, потому что в каждой компании он уникален и неповторим.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
A command-line tool for launching Apache Spark clusters.
tekumara / pyspark
Forked from apache/sparkTekumara build of Apache PySpark with Hadoop 3.x and cloud jars for S3 access
A Database Change Management tool for Snowflake
A collection of 3 lambda functions that are invoked by Amazon S3 or Amazon API Gateway to analyze uploaded images with Amazon Rekognition and save picture labels to ElasticSearch (written in Kotlin)
pure golang library for reading/writing parquet file
Always know what to expect from your data.
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data 📊
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination…
Karabiner-Elements is a powerful tool for customizing keyboards on macOS
Qubole Sparklens tool for performance tuning Apache Spark
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
REST job server for Apache Spark
Apache Spark - A unified analytics engine for large-scale data processing