In this step, I got to know our data and made it ready for other steps by preprocessing it.
First, I merged the relevant tables and examined our data.
Then, I preprocessed data, and make it ready for further steps.
In this step, I have performed the methods:
- Joining tables
pd.merge
- Preprocessing
SimpleImputer
with Most Frequent Strategy as it accepts the categorical values
Normalization as our problem is classification & there are lots of unique values - Feature Extraction
PCA
; plotting features & PCA'sn_components
for 90 % usefulness - Outlier Detection
Matplotlib's boxplot - Outlier Elimination
Quantile Based Outlier Elimination Technique
DataSets.zip file contains the dataset that I used in the first step.