Project Background And Purpose

A hypothetical startup, Sparkify has data about user activities on their streaming app. These data are stored in json log files. They want these information to be processed and loaded into a well design database. The resulting database will be used by the analytic team to learn about user behaviour to help drive business decision.

Database Schema

In this project, the star schema was use due to a few reasons:

It simplifies queries
It allows for fast aggregation
It is denormalized which enables fewer joins for fast aggregation

Example Query

SELECT * FROM songplays WHERE duration > 100

How to Run Scripts

Run create_tables.py to create database and necessary tables
run etl.py begin ETL process.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
main.ipynb		main.ipynb
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Background And Purpose

Database Schema

Example Query

How to Run Scripts

About

Releases

Packages

Languages

Isaac1989/data-engineering-project-1

Folders and files

Latest commit

History

Repository files navigation

Project Background And Purpose

Database Schema

Example Query

How to Run Scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages