We build a CV analysis system for a particular organization to ease their recruitment.
- An organization will have a dataset of requirement entities where our Machine Learning model will be trained.
- There will be a user interface where a CV is provided for evaluation.
- The Interface takes a CV in PDF format and predicts which position the candidate is suitable for.
- overall this system is built from a recruiter’s perspective.
Our project includes two portions namely:
- The Python implementation of the system in Google Colab.
- The user Interface was developed with Streamlit.
We used “UpdatedResumeDataSet.csv” from Kaggle and it has two features such as Category (Job positions) & Resume.
- The dataset was cleaned using the Python Regex module.The cleaned data was added to the 'clean text' feature.
- The category of job positions were encoded to numericals by the lebel encoder module.
- 'Clean text' Features of the dataset were vectorized using TfidfVectorizer with a maximum of 2000 features.
- We used KNeighborsClassifier as OneVsrestClassifier for training 769 data items and tested on 193 data samples.
- We got an accuracy of 98% in this approach so we decided to use this model to take a single CV as an input and predict which job position the candidate is suitable for.
- We integrated Streamlit with Colab using Pyngrok.
- The frontend has an input field where a CV is taken as an input PDF file and processed through the system to generate results.