This project aims at predicting the likelihood that someone has purchased or is in possession of a Medical insurance cover based on various demographic and socio-political factors. The prediction is based on identifying key features that significantly contribute to the assessment of one being in possession of a Medical Insurance cover.
The dataset includes the following features:
- Patient_Age: Age group of the respondent.
- Patient_Gender: Gender of the respondent.
- Marital_Status: Marital status of the respondent.
- Number_of_Children: Number of children.
- Employment_Status: Employment status of the respondent.
- Monthly_Income: Respondent's monthly income.
- Had_Health_Insurance: If the respondent has ever had health insurance.
- Current_Insurance: Type of insurance cover.
- Last_Visit: Time since the last hospital visit in months.
- Insurance_During_Last_Visit: Health insurance status during the last hospital visit.
- Routine_Checkup: If the respondent has ever had a routine check-up.
- Checkup_Interval_Years: Frequency of routine check-ups.
- Had_Cancer_Screening: If the respondent has ever had a cancer screening.
- Cancer_Screening_Interval_Years: Frequency of cancer screenings.
- Geographical_Code: Geographical coordinates of the respondent's location.
- Geographical_Address: Address of the respondent.
The Logistic Regression
, Random Forest
, and SVM
models aim at predicting whether a person has a Medical Insurance cover or not. By precise feature engineering and prioritizing key factors, the models' accuracy increases and aims to provide valuable information and insights to make valuable decisions that can be used by Medical Insurance cover companies in marketing their covers.
Data was compiled and consolidated by SaFra Data
.