Course repository of projects from Udacity's Data Engineer Nanodegree. Feel free to check out more about the course and program on Udacity's website or check out the course syllabus
-
Course 1: Data Modeling
- Relational Data Modeling
- Unstructured (NoSQL) Data Modeling
- OLAP and OLTP
- Normalization and Denormalization
- STAR and Snowflake Schemas
- Technologies Utilized:
- Python, PostGreSQL, Apache Cassandra
- Project 1: Data Modeling with Postgres: ETL Process
- Project 2: Data Modeling with Apache Cassanrda: ETL Process
-
Course2: Cloud Data Warehouses
- Data Warehouse Archiecture
- Dimensional Modeling
- Denormalizing 3NF database to Star Schema with ETL process
- OLAP Cube and its Operations: Roll-up, drill-down, Slice & Dice, Pivot
- Cloud Computing with Amazon Web Services (AWS)
- AWS Redshift Archictecture
- Set up AWS infrastucture as code (IaC)
- Optimized table design with distribution style and sorting key
- Technologies Utilized:
- Python, AWS (EC2, S3, IAM, VPC, RDS PostgreSQL, Redshift)
- Project 3: Data Warehouse with AWS Redshift
-
Course 3: Data Lakes with Spark
- Big Data Ecosystem
- Distributed Systems
- Data Wrangling with Spark
- Data Lakes
- Technologies Utilized: *_Python, PySpark, AWS (EC2, S3, IAM, EMR, Redshift)
- Project 4: Data Lakes with AWS EMR and PySpark