Porto Seguro’s Safe Driver Prediction:

Objective:

Predict if a driver will file an insurance claim next year

Bussiness Aspect:

Help provide accurate tailored insurance plans and hopefully make auto insurance coverage more accessible to more drivers.

Project Description:

Porto Seguro, one of Brazil’s largest auto and homeowner insurance companies, completely agrees. Inaccuracies in car insurance company’s claim predictions raise the cost of insurance for good drivers and reduce the price for bad ones. In this competition, you’re challenged to build a model that predicts the probability that a driver will initiate an auto insurance claim in the next year. While Porto Seguro has used machine learning for the past 20 years, they’re looking to Kaggle’s machine learning community to explore new, more powerful methods.

Data

Source: Kaggle: Porto Seguro Driver's Prediction Challenge In data, features that belong to similar groupings are tagged as such in the feature names (e.g., ind, reg, car, calc). In addition, feature names include the postfix bin to indicate binary features and cat to indicate categorical features. Features without these designations are either continuous or ordinal. Values of -1 indicate that the feature was missing from the observation.

Pipeline

1. Exploratory Data Analysis:

a. Label/Target Distribution:

Lable:1 indicates people who claimed insurance while '0' who did not claim insurance. From the plot, we can infer that the data is imbalanced. Also, we can conclude that the base line accuracy is 96.36% i.e., predicting no one claims insurance.

b. Correlation plot

This shows that the columns with 'cal' suffix are not correlated to any columns in the data sets. We can use this information to keep useful columns when predicting our labels.

c. Imputing Missing values: Filling with Median, Mean and using outlier detection methods

2. Machine Learning:

Trained different machine learning models and used ensemble techinqies to improve the metric 'Normalized gini index'

Different Machine learning models used:

Random Forest
AdaBoost Classiifer
Gradient Boosted Trees
XGBoost

Result

Top 17% on Kaggle Leaderboard with Normalized gini index of 0.28986

Detailed Project report and Code

GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
Capstone project 1		Capstone project 1
Data Science at scale		Data Science at scale
Data Wrangling		Data Wrangling
Datastorytelling		Datastorytelling
Finding the right Job		Finding the right Job
Inferential Statistics		Inferential Statistics
Job		Job
Machine Learning		Machine Learning
Natural Language Processing/Tutorial-pycon-2016-master		Natural Language Processing/Tutorial-pycon-2016-master
SQL/SQLite		SQL/SQLite
Take_Home_Challengs		Take_Home_Challengs
Tools		Tools
images		images
README.md		README.md
Repository_README.md		Repository_README.md
Springboard Capstone project 1 proposal.pdf		Springboard Capstone project 1 proposal.pdf
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Porto Seguro’s Safe Driver Prediction:

Objective:

Bussiness Aspect:

Project Description:

Data

Pipeline

1. Exploratory Data Analysis:

2. Machine Learning:

Result

Detailed Project report and Code

About

Releases

Packages

Languages

cjvegi/DataScience_career_track

Folders and files

Latest commit

History

Repository files navigation

Porto Seguro’s Safe Driver Prediction:

Objective:

Bussiness Aspect:

Project Description:

Data

Pipeline

1. Exploratory Data Analysis:

2. Machine Learning:

Result

Detailed Project report and Code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages