A hypothetical startup, Sparkify has data about user activities on their streaming app. These data are stored in json log files. They want these information to be processed and loaded into a well design database. The resulting database will be used by the analytic team to learn about user behaviour to help drive business decision.
In this project, the star schema was use due to a few reasons:
- It simplifies queries
- It allows for fast aggregation
- It is denormalized which enables fewer joins for fast aggregation
SELECT * FROM songplays WHERE duration > 100
- Run create_tables.py to create database and necessary tables
- run etl.py begin ETL process.