Skip to content
View supersum9's full-sized avatar
🎓
I may be slow to respond.
🎓
I may be slow to respond.
  • EnergyAustralia
  • Melbourne, Australia

Block or report supersum9

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

Java 1,132 143 Updated Feb 21, 2025

PacBot (Policy as Code Bot)

Java 1,293 279 Updated Dec 8, 2022

An open-source framework that simplifies implementation of data solutions.

TypeScript 123 24 Updated Feb 21, 2025

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Jupyter Notebook 38,199 6,048 Updated Aug 18, 2024

Python packaging and dependency management made easy

Python 32,616 2,320 Updated Feb 19, 2025

Apache Flink

Java 24,546 13,517 Updated Feb 21, 2025

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

529 133 Updated Aug 21, 2024

A dbt package for modelling dbt metadata. https://brooklyn-data.github.io/dbt_artifacts

Shell 352 133 Updated Jan 27, 2025

Efficient data transformation and modeling framework that is backwards compatible with dbt.

Python 2,087 183 Updated Feb 21, 2025

💯 Curated coding interview preparation materials for busy software engineers

TypeScript 121,985 15,005 Updated Oct 8, 2024

a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.

Jupyter Notebook 578 59 Updated Sep 17, 2023

Any Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND one in Google Cloud Composer with tested data pipelines(DAGs) 🖥️ >> [ 🚀, 🚢 ]

HCL 111 32 Updated Sep 21, 2023

Docker base images for Ruby, Python, Node.js and Meteor web apps

Shell 2,779 420 Updated Feb 19, 2025

Master programming by recreating your favorite technologies from scratch.

Markdown 337,878 31,255 Updated Sep 3, 2024

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Python 289,797 48,211 Updated Dec 2, 2024

A complete computer science study plan to become a software engineer.

312,151 77,984 Updated Dec 5, 2024

Koalas: pandas API on Apache Spark

Python 3,347 359 Updated Mar 20, 2024

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

Python 8,337 598 Updated Oct 8, 2024

Kafka Connect Store Partitioner by custom fields and time

Java 39 29 Updated Dec 8, 2021

Public interface definitions of Google APIs.

Starlark 7,759 2,362 Updated Feb 21, 2025
Jupyter Notebook 9 3 Updated Dec 18, 2020

A series of exercises to apply your Scala knowledge

Scala 7 30 Updated Nov 18, 2024

The Data Explorer gives you fast, safe access to data stored in Cassandra, Dynomite, and Redis.

TypeScript 433 40 Updated Apr 10, 2023

My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on ​lambda architecture​, that aggregates Twitter and US stock market data for user sentiment anal…

Scala 503 128 Updated Aug 24, 2022

The Data Engineering Cookbook

Python 14,014 2,555 Updated Dec 11, 2024

Containerized Spark, ready to use with AWS infrastructure

Dockerfile 4 Updated Nov 1, 2020

Flink CDC is a streaming data integration tool

Java 5,955 2,011 Updated Feb 20, 2025

A simple Spark-powered ETL framework that just works 🍺

Scala 179 32 Updated Feb 3, 2025

Build better AWS infrastructure

Python 1,497 311 Updated Feb 3, 2025
Next