ML-DRIVEN: A Clustering-Based Approach of Book Recommendation System

Exploratory Data Analysis + Data Visualization + Modelling

1 - Abstract

In this project, I conducted exploratory data analysis, data visualization, and modeling. The dataset contains 11123 rows in a CSV file. Each example row represents a book with 12 different pieces of information. Before modeling, I have to check NaN values, make some small adjustments for easy use of the dataset, and merge a couple of languages into 1 language(en-AUS,en-UK to eng). Later I made a couple of visualizations to understand the dataset better. In the modeling part, I used the unsupervised learning algorithm K-means which groups unlabelled data. For deciding the number of clusters I used the Elbow method and decided to do 5 clusters. Finally, I tested my model with several books and added an input function for searching easily.

2 - Data

Dataset contains 12 columns and 11123 rows.

Columns Description:

bookID = contains the unique ID for each book/series
title = contains the titles of the books
authors = contains the author of the particular book
average_rating = the average rating of the books, as decided by the users
ISBN ISBN(10) = number, tells the information about a book - such as edition and publisher
ISBN 13 = the new format for ISBN, implemented in 2007. 13 digits
language_code = tells the language for the books
Num_pages = contains the number of pages for the book
Ratings_count = contains the number of ratings given for the book
text_reviews_count = has the count of reviews left by users
publication_date = date of publication
publisher = name of the publisher

3 - Exploratory Data Analysis

In EDA I visualize language distribution, the Top 20 authors with the number of books, the Top 20 highest-rated books, and the Average rating distribution for all books.

Secondly, I create a list of my favorite authors and visualize their books according to the average rating of books.

authors = ['Gabriel García Márquez', 'Jack London', 'George Orwell', 'Jules Verne', 'Richard P. Feynman']

After all these steps, I wanted to investigate the relationship between columns. As you can see below, Average Rating and Number of Pages, Average Rating and Reviews Counts, Rating Counts and Average Ratings

4 - Modelling

In the modeling part, I already decided to use the K-Means Algorithm but I have to decide how many should I use. For deciding this I used the Elbow Method which gives giving very good assumption. In the figure below you can see the graph.

After deciding on 5 clusters, I created plotting and expressing clusters.

Lastly, I implemented a Min-max scaler, to reduce bias. Because some books have a massive amount of features and some of them very few. So, the Min-Max scaler will find the median of all books.

5 - Result & Future Work

print_similar_books("Caesar (Masters of Rome  #5)")

The Metaphysical Club
One Hundred Years of Solitude
Alice's Adventures in Wonderland and Through the Looking-Glass (Alice's Adventures in Wonderland #1-2)
In Cold Blood
Desperation / The Regulators: Box Set

print_similar_books("Lord of the Flies")

Introduction to the Philosophy of History with Selections from The Philosophy of Right
Marie Dancing
The Odyssey
The Hour Before Dark
A Philosophical Enquiry into the Origin of our Ideas of the Sublime and Beautiful

As a result, my book recommender gives good results. But still, there is more room for improvement. Such as finding the category of each book makes everything more effective. Increasing the size of data or information(more rows) can help more accurate recommendations.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
BookRecommendation.ipynb		BookRecommendation.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-DRIVEN: A Clustering-Based Approach of Book Recommendation System

Exploratory Data Analysis + Data Visualization + Modelling

1 - Abstract

2 - Data

3 - Exploratory Data Analysis

4 - Modelling

5 - Result & Future Work

About

Releases

Packages

Languages

poojasundar15/ml-driven-clustering-based-approach-of-book-recommendation-system

Folders and files

Latest commit

History

Repository files navigation

ML-DRIVEN: A Clustering-Based Approach of Book Recommendation System

Exploratory Data Analysis + Data Visualization + Modelling

1 - Abstract

2 - Data

3 - Exploratory Data Analysis

4 - Modelling

5 - Result & Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages