Car Price Prediction Project
Background: The problem solved in this projet is one dealing with the prediction of car prices. Assuming that a sales company operates a website where people can buy and sell second-hand cars. Usually, when sellers post adverts on such websites, they often struggle to come up with a meaningful price or valuation for their cars. Having an automatic system for price recommendation can be of help to users of such platforms to get them better deals by simply specifying their car model, make, year, mileage, and other important characteristics can be of significant help.
This projects demonstrates the application of Machine Learning by means of a Linear Regression model using Python programming language and NumPy library for predicting car prices. The dataset used for executing this project was obtained from the open datasets available on Kaggle.com. The original dataset is from a German second-hand car sales website and each data point is an advertisement placed by an individual. The dataset is therefore quite arbitrary, with great percentage of missing data, most of which cannot be completed with interpolation or other conventional fill methods.
At first, an Exploratory Data Analysis (EDA) was performed so as to clean the data and do some preliminary to check for the distribution of the target variable. Secondly, a validation strategy to ensure that the model produces correct predictions followed in the procedure. Afterwards, Feature Engineering was perfomed on the dataset to extract important features from the data for the purpose of improving the model. Lastly, regularization to check for stability, modelling and evaluation of the resulting model were performed before the actual prediction.