This project takes in the html watch history file you get from google takeout,
and then saves it into an sqlite3 database, it then uses the youtube api to save additional information
about it and runs some analysis to show interesting statistics.
The data it saves after the api calls:
- Video id (for each video in your watch history)
- Video duration (for each video in your watch history)
- Id of channel that uploaded each video
- Name of channel that uploaded each video
- The tags on each video
- The title of each video
- The like number on each video
- The view count on each video
- Whether the video is a short
The results it can output include:
- The top channels (channels you clicked on the most videos from)
- Top videos (videos you watched the most times)
- Most common tags in the videos in your watch history
- The average, median, and max like and view count of the videos
- The average, median, and total duration of all the videos
Although it's pretty easy to add more.
- Create apiKey.json file with ["Your api key here"] inside
- Change the location of your watch history html file in read_html_fil.py and run it to save the info to youtube_data.db
- Run add_api_data.py to get all the info from the youtube api and add it to youtube_data.db again
- Uncomment whatever you want to see at the bottom of data_analysis.py and run it
To avoid outliers like 356 day livestreams the duration deletes the top 5 longest videos, if anyone is smart and wants to make that actually do something reliable be my guest
Download google takeout data here, https://takeout.google.com/settings/takeout?pli=1
you only need to check youtube and youtube music at the bottom