Analytics Vidhya Job-A-Thon November 2021
- The problem is to predict employee attrition, and analogous to customer churn, and a similar approach can be used to model the data as a binary classification problem
- The data is present over a time frame (reporting_dates) for each employee
- The aim is to predict if an employee will churn in the next six months
- The target variable in the train set will be generated for each record as a binary output (0 or 1, with 1 indicating attrition)
- Inspiration to developing the approach has been derived from Korichi et al.
- Add the last working date to all records as applicable (for attritioned emp_id)
- Evaluate
tenure
of emp_id for each record {record_date - dateofjoining} - Evaluate
grade_chg_join
{designation - joining_designation} - Evaluate median salary at a particular period and difference from employee salary (salary - median_salary)
- Separate out the test set i.e. records of employees of date '2017-12-01'
- Create target variable
attr_risk
- Calculate time before attrition
t_attr
(in months) for all records (lastworkingdate - record_date) - Binarize above result and evaluate attrition risk
attr_risk
as per the formula:
- Encode categorical variables
- The train set is modelled using XGBoost
- There are 43 features in the final set
- The model is tuned using RandomizedSearchCV using 15 folds of 60 iterations