Lending Club is the world’s largest peer-to-peer marketplace connecting borrowers and investors. Since its inception in 2007, the number of loans in the marketplace has increased exponentially. Lending Club provides its historical loan information every year. In this paper we attempt to use this data to build a number of supervised learning techniques that can predict whether a borrower will default so that investors can avoid those borrowers. A loan status is either good/bad so that the company can approve/decline the new loan applications.
Predicting loan defaults is a binary classification problem - a borrower will either default at some time during the loan term or finish the payment. The dataset contains approximately 42,500 records of loan information from year 2007 to 2011, having 145 columns (predictors). Out of these records, only 20% represents the loan default data. Therefore, the machine learning task here is an imbalanced two-classed classification.
Data source: Lending Club
Please see the python notebooks and the report for further details