Data-Engineering-with-AWS

This repository contains the projects completed as part of the Udacity Data Engineering Nanodegree.
A brief description of each of the projects can be found below.
The scripts created for each project can be found in the corresponding sub-directories.

Full details of what was studied in the Nanodegree can be found on the Syllabus
Projects were completed for the four main modules of the course:

Data Modelling

Model event data to create a non-relational database and ETL pipeline for a music streaming app. Learners will define queries and tables for a database built using Apache Cassandra.

Cloud Data Warehouses

In this project, learners will act as a data engineer for a streaming music service. They are tasked with building an ELT pipeline that extracts data from S3, stages it in Redshift, and transforms it into a set of dimensional tables for an analytics team to find insights into what songs their users are listening to.

Spark and Data Lakes

STEDI Human Balance Analytics - In this project, learners will act as a data engineer for the STEDI team to build a data lakehouse solution for sensor data that trains a machine learning model. They will build an ELT (Extract, Load, Transform) pipeline for lakehouse architecture, load data from an AWS S3 data lake, process the data into analytics tables using Spark and AWS Glue, and load them back into lakehouse architecture.

Automate Data Pipeline

In this project, learners will work to build high grade data pipelines from reusable tasks that can be monitored and provide easy backfills for a music streaming company, Sparkify. They will move JSON logs of user activity and JSON metadata data from S3 and process it in Sparkify’s data warehouse in Amazon Redshift. To complete the project, learners will need to create their own custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step. Further details can be found in the README files for each project.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Automate Data Pipelines		Automate Data Pipelines
Cloud Data Warehouses		Cloud Data Warehouses
Data Modelling with Cassandra		Data Modelling with Cassandra
Spark and Data Lakes		Spark and Data Lakes
Data+Engineering+Nanodegree+Program+Syllabus.pdf		Data+Engineering+Nanodegree+Program+Syllabus.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Engineering-with-AWS

About

Releases

Packages

Languages

yalazad/Data-Engineering-with-AWS

Folders and files

Latest commit

History

Repository files navigation

Data-Engineering-with-AWS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages