Category | Tools/Technologies |
---|---|
π Big Data Frameworks | PySpark |
π¦ Data Storage and Management | Iceberg, MinIO, Nessie |
π Workflow Orchestration | Airflow, SSIS |
βοΈ Data Quality | Soda, dbt, Regex for failure detection |
π§ Data Transformation | dbt (Data Build Tool), SQL, Jinja templating |
π Version Control for Data | Implementing branching and versioning with Nessie |
π File Formats | Parquet, CSV, JSON, YAML |
π CI/CD | GitHub Actions, act |
π³ Containerization | Docker, Docker-Compose |
π§ͺ Testing | Python UnitTest, dbt unit tests, Soda quality tests, dbt data tests |
ποΈ Data Modeling | Kimball Approach, Data Vaults |
π» Programming Languages | Python, JS, SH |
βοΈ ETL Pipelines | π€ Orchestration and Automation |
---|---|
π§ Loading and Partitioning | π Orchestrating remote Spark jobs |
βοΈ Object Storage Integration | π οΈ Custom Airflow Operators via SSH |
π³ Environment Orchestration | β±οΈ Data-Aware Scheduling |
πΎπΎπΎπΎπΎπΎπΎπΎπΎπΎπΎπΎπΎπΎ |
---|
π Precision Over Convenience |
π Efficiency First |
π Collaboration is Key |
𧩠Modularity and Reusability |
- Deepen my knowledge of dbt and evaluate its potential against custom SQL workflows.
- Continue refining incremental load strategies to support real-time analytics.
- Explore advanced lakehouse concepts and cutting-edge tools.
Iβm always open to learning and collaborating. If youβre working on an interesting data engineering project, Iβd love to discuss and exchange ideas. Letβs build something amazing together!