MFE Applied Financial Project

In this project, we would like to use data to predict the defaulting of consumer. The methodologies we may use include linear regression, logistic regression, neutral network and support vector machine.

Current Achivement

clean data and seperate to training data and testing data
run logistic regression on training data, the accuracy on training is about 0.74 and the accuracy on testing data is about 0.65.
reduce feature according to the f value and reduce it to 50 features. The accuracy becomes about 0.71.
run random forest on training dataset and the accuracy is about 0.65
use 10 fold cross validation for logistic regression and the accuracy is about 0.71 when we reduce the feature size to around 100
before using logistic regression we normalize the data which means that for each feature we minus the value by mean of the feature and devided by sd of the feature.
after normalization, the accuracy of using all data becomes 0.77 and the AUC is about 0.73. you can see the result from logistic_regression_ROC.ipynb or html file.
We try to use feature selection by the build-in library or PCA method. None of the methods can increase the accuracy and AUC.
The best AUC we get is about 0.730 and the accuracy is 0.769.

Future improvement

we may try higher order feature or interactive features.
compared with the article in literature review, the AUC he got for logistic regression is 0.779 and used about 25000 data point. Therefore, it's reasonable that the AUC we got is smaller than him because we have only about 3000 data point after we deleted the data with null value. In Order to solve this problem, we may require more data point.
we may apply other techniques in machine learning

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Clean_data.ipynb		Clean_data.ipynb
Ensemble_Method_KNN.ipynb		Ensemble_Method_KNN.ipynb
Final_report.ipynb		Final_report.ipynb
Observations.csv		Observations.csv
README.md		README.md
Random_Forest.ipynb		Random_Forest.ipynb
Rename_and_VIF		Rename_and_VIF
Summary.ipynb		Summary.ipynb
Summary_add_individual_factors.ipynb		Summary_add_individual_factors.ipynb
cleaned_data.csv		cleaned_data.csv
logistic_regression_ROC.html		logistic_regression_ROC.html
logistic_regression_ROC.ipynb		logistic_regression_ROC.ipynb
logistic_regression_all.ipynb		logistic_regression_all.ipynb
logistic_regression_normalization.ipynb		logistic_regression_normalization.ipynb
logistic_regression_without_normalization.ipynb		logistic_regression_without_normalization.ipynb
regression_yang		regression_yang
testing_data.csv		testing_data.csv
training_data.csv		training_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MFE Applied Financial Project

Current Achivement

Future improvement

ML_Project

About

Releases

Packages

Contributors 3

Languages

tomcy/AFP

Folders and files

Latest commit

History

Repository files navigation

MFE Applied Financial Project

Current Achivement

Future improvement

ML_Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages