Skip to content

Analyze Black Friday Kagglle Data set to predict customer purchase behavior

Notifications You must be signed in to change notification settings

amshar05/Customer-Purchase-Behaviour

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

In order to do this homework you need to go to Kaggle website and create a user account (which is free) and download Black Friday dataset.

https://www.kaggle.com/mehdidag/black-friday

Description

The dataset here is a sample of the transactions made in a retail store. The store wants to know better the customer purchase behaviour against different products. Specifically, here the problem is a regression problem where we are trying to predict the dependent variable (the amount of purchase) with the help of the information contained in the other variables. Dataset of more than 500000 observations about the black Friday in a retail store, it contains different kinds of variables either numerical or categorical.

User_ID | User ID

Product_ID | Product ID

Gender | Sex of User

Age | Age in bins

Occupation | Occupation

City_Category | Category of the City (A,B,C)

Stay_In_Current_City_Years | Number of years stay in current city

Marital_Status | Marital Status

Product_Category_1 | Product Category

Product_Category_2 | Product may belongs to other category also

Product_Category_3 | Product may belongs to other category also

Purchase | Purchase Amount

Your hw has two parts. First part requires building a linear regression model, second part requires building a classification model.

PART 1- Linear Regression & Cross Validation

Q1) Use set.seed(8). Divide the data set into training set (80% of the data) and test_set (20%)

Q2) Build a linear regression model to predict Purchase using predictors Gender, Age, Occupation,City_Category, Stay_In_Current_City_Years, Marital_Status, Product_Category_1. Do not use other variables and make sure categorical variables are considered as “factor” before you build the model

Q3) Which variables are significant predictors of Purchase?

Q4) Predict Purchase amount using the model you built separately for training set and test set. Calculate the Square root of Mean Square Errors for training and test set predictions. Test set RMSE (root mean squared error) will be the holdout error value.

Q5) Using cv.glm function, perform a 5-fold cross validation on Black Friday data set using the same predictors and response indicated above in Question2. Calculate cross validation RMSE and compare to the value you found in Q4 using holdout method. Which method would you prefer?

PART 2: Logistic Regression

Use set.seed(9) and only holdout method for this analysis -80:20 ratio again for train and test sets

Q6) Create a histogram of Product_Category_1

Q7) Create a table that shows the counts(frequency) of each Product_Category_1 using table() function). How many different categories were observed in this variable in BlackFriday data set?

Q8) Using the output from Q7 above, calculate the probability of occurrence of each level in Product_category_1.

Q9) Determine the levels of Product_category_1 which has lower than 0.03 probability of occurrence, and consider these levels as “low probability levels”.

Q10) Create a new variable that takes the value of 0 if an observation’s Product_category_1 level belongs to the “low probability group” described above in Q9. If the level does not belong to the low probability group, new variable will take the value of 1. Add this newly created binary variable to BlackFriday data set

Q11) Build a logistic regression model that predicts whether a purchase will belong to a low probability level or not. Use predictors Gender,Age,Occupation,City_Category,Stay_In_Current_City_Years,Marital_Status,Purchase

Q12) Calculate Test set misclassification rate and accuracy of your model using different threshold(cutoff) probabilities. Which cutoff value would you choose? Why?

About

Analyze Black Friday Kagglle Data set to predict customer purchase behavior

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages