Skip to content

amald94/Sparkify-data-warehouse-RedShift

Repository files navigation

Project Overview

A music streaming startup, Sparkify, has grown their user base and song database and want to move their processes and data onto the cloud. Their data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app.

In this project, we will create an ETL pipeline to build a data warehouses hosted on Redshift.

Target Redshift AWS Architecture

Summary

Structure

  • /images - some screenshots.
  • Analytics.ipynb - It is a notebook containing basic analytics on the datawarehouse.
  • create_cluster.ipynb - It is a notebook containing code to create a redshift cluster.
  • create_cluster.py - A script to create a redshift cluster.
  • create_tables.py - A script to drop and create tables.
  • etl.py - A script to load data from s3 to stagging tables and then to fact and dim tables using the given dataset on S3.
  • sql_queries.py - A script containing sql queries.
  • dwh.cfg - Configuration file to add AWS credentials.
  • delete_cluster.py - A script to delete the redshift cluster.

Datasets

You'll be working with two datasets that reside in S3. Here are the S3 links for each:

  • Song data - s3://udacity-dend/song_data
  • Log data - s3://udacity-dend/log_data
  • Log data json path - s3://udacity-dend/log_json_path.json

Schema

Fact Table

songplays - records in event data associated with song plays. Columns for the table:

songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent

Dimension Tables

users
user_id, first_name, last_name, gender, level
songs
song_id, title, artist_id, year, duration
artists
artist_id, name, location, lattitude, longitude
time
start_time, hour, day, week, month, year, weekday

Execute

Create Cluster

$ python3 create_cluster.py

Create Tables

$ python3 create_tables.py

Load The Data

$ python3 etl.py

Delete The Cluster

$ python3 delete_cluster.py 

Analytics

Run Analytics notebook to get the insights.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published