Skip to content

Latest commit

 

History

History

Your First Ever Kaggle Project

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Kaggle's Titanic Machine Learning Problem is one of the easiest project to start your Journey towards Machine learning.

Follow along with the Notebook.

About The Compitition

We all know about legendary movie Titanic,that romatic story is unfortunately fictional, but the disaster was real. On April 15, 1912, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others. Kaggle provides us train and test data sets, which containts details like Sex,Age,Passenger Class,Embarked etc,.Our job is to find who had survived in test data set.

Brief Overview

In data set there are missing values, some hidden but important features like name Prefix,Family Size,which plays good role predicting person's survive chance, we imputed missing values with Random Forest Regressor which is way more accurate than filling with Mean,Median or most frequent value(Mode). Then we used different models and tune them and pick the best model for our final prediction.

df.isna().sum()
PassengerId      0
Survived       418
Pclass           0
Sex              0
Age            263
SibSp            0
Parch            0
Ticket           0
Fare             1
Embarked         2

Filling Fare with median value 'cause there is too many outliers to affect our mean

click to redirect

Family Size,Name Prefix and Their Survive Chance

FamSize-Survive Prefix-Survive

Family Size with 2,3 and 4 had higher chance of Survive,

Similarly,prefix Mr[Married Man :( ] had very low chance of Survive

Tuning Diffrent model and predict the future with best model

KNN model's accuracy over K values

knn_line

After some try and error Random Forest stand out

confusion_matrix(y_test,y_pred_rfc)

Output:

array([[268,   5],
       [  2, 171]])

Feature Importance: feature-importance

Prefix Mr,Sex_male and Passenger Class are Highly correlate to Wheather pearson survive or not.

For Deployment check the original Repo [Here](https://github.com/Aditya-Rajgor/Personal-Projects/edit/master/Titanic ML Problem)

[][heroku-url] [heroku-url]: https://titanic101.herokuapp.com/