Starred repositories
Smart, pythonic, ad-hoc, typed polymorphism for Python
Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
spark-based library that helps construct and query knowledge graphs from unstructured and structured data
Create, share, and keep track of your learning curricula.
Architecture decision record (ADR) examples for software planning, IT leadership, and template documentation
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
🎨 Diagram as Code for prototyping cloud system architectures
This is a repo with links to everything you'd ever want to learn about data engineering
A binary for parallel copying of CSV data into a TimescaleDB hypertable
The multi-node setup of TimescaleDB 🐯🐯🐯 🐘 🐯🐯🐯
Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
An example showing how to apply software engineering best practices to Databricks notebooks.
Examples of using Terraform to deploy Databricks resources
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Roadmap to becoming a data engineer in 2021
fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache Jena, A free and open source Java framework for building Semantic Web and Linked Data applications.
Java binary serialization and cloning: fast, efficient, automatic
IoT Event Analytics is a complex event processing and agent network platform
Software define radio plane tracking into KSQL Kafka queries
Video Streaming Analytics platform