This is the repo for a movie recommendation system made using the IMDb dataset.
The deployed app can be found here.
The dataset was cleaned and trivial/non-essential features were discarded.
Final dataset consited of the following features-
- ID
- Title
- IMDb ID
- List of genres
- List of top 3 actors
- List of director(s)
- Popularity
The logic behind the program revolves aroung KNN that a particular object is influenced by other objects in its proximity. However, we used cosine distances to compute our recommended movies and not the Nearest Neighbour class available in Scikit-learn. A mega-list was created for genres, actors and directors which consisted of all unique entities in them. Using them we created a binary-sparse list for all 3 features which represented if the entity was present in a particular movie or not. To compute the closeness, we import Scipy's spatial.distance.cosine method to calculate the distances.
The spatial.distance.cosine object calculates the distance between 2 1-d array u and v as
where u and v are the binary lists for the movies.
Finally web scraping was performed to obtain the poster links of the movies using the imdb-id. For this we used The Movie Database's public API. To know more visit this link.
Note: You will need to obtain an API key that is accessible for free after you create an account.