The goal of this project is to leverage data science techniques to assist doctors in recommending the best type of drug treatment for at-risk patients.The dataset provided contains information about patients, and the objective is to predict the drug type based on various features. The project involves building and comparing different machine learning models, tuning hyperparameters, and selecting the best-performing model.
-Data loading and initial exploration.
-Handling missing values using appropriate imputation strategies.
-Ensuring data types match the data dictionary.
-Addressing inconsistent and impossible values.
-Exploratory Data Analysis (EDA) with visualizations to gain insights into the data.
-Preprocessing data using pipelines to avoid data leakage.
-Scaling data when necessary for specific models.
-Encoding variables using appropriate techniques (e.g., one-hot encoding).
-Creating default versions of at least two different models.
-Tuning hyperparameters for each model using techniques like RandomizedSearchCV.
-Evaluating models using appropriate metrics (e.g., accuracy, precision, recall).
-Choosing a final model based on performance and justifying the selection.