Step-1-Data-Insights-and-Data-Preprocessing

In this step, I got to know our data and made it ready for other steps by preprocessing it.

First, I merged the relevant tables and examined our data.

Then, I preprocessed data, and make it ready for further steps.

In this step, I have performed the methods:

Joining tables
pd.merge
Preprocessing
SimpleImputer with Most Frequent Strategy as it accepts the categorical values
Normalization as our problem is classification & there are lots of unique values
Feature Extraction
PCA; plotting features & PCA's n_components for 90 % usefulness
Outlier Detection
Matplotlib's boxplot
Outlier Elimination
Quantile Based Outlier Elimination Technique

DataSets.zip file contains the dataset that I used in the first step.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
DataSets.zip		DataSets.zip
README.md		README.md
main_data.ipynb		main_data.ipynb

Provide feedback