Project Description

In my project, I will use a data set of resumes to train a model that can be used for automated recruitment. At the same time, I will also extract some of the most common words from these resumes to find out the main interests of these job seekers. I know that a lot of people have researched this data set and got a lot of interesting results. So I will do something different. For my problem, I will mainly use deep learning algorithms. Of course I will also use some conventional machine learning algorithms to compare. The reason why deep learning is chosen for NLP is that deep learning has three advantages in dealing with NLP problems compared to traditional machine learning algorithms. The first is its ability to express. With deep learning, data in different formats such as text and images can be represented as real-value vectors. This allows us to perform information processing across multiple modes. The second is its trainability. Deep learning allows us to perform end-to-end training to solve problems quickly and with high quality. Because the information entered in the neural network is uniformly "encoded". The third is its scalability. Deep learning can predict untrained data very well.

Data set

I find this data set on “kaggle”, which is available:https://storage.googleapis.com/kaggle-data-sets/897/1638/bundle/archive.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1575746495&Signature=Q6SOxb4XB8HL3cXehQzq%2B7ACdWkNYGJBsY5Xk86uP%2F%2BqxyHZoRorG47mgPV%2BAMEHPtlVVHBLLvSQGEB38Co0CKhCi2hw4EmMvHoeHoJlpuVeSnkLXp8AU7K03mn7uvPGcbZaDwzMUmYcl8mMqvlWOOW1gfc8szmhgXm0UUOaP5jzBGJ67Jp%2F9h9RL9DlMTFE%2B%2FnOPf45AfMqpX9fW%2BVy0dD6KRRBGydTkmu3JjeUXWlevAJZPHXAcC0eWF4uI7ihw%2FephXmo42cWdINoDRSOQxWXJy1cnRTsxmTNL7HC0I4vYq2h6ZJlJSM4G2UOcV1By6cNhKdOw9nZ7SlnV%2FJ5eg%3D%3D&response-content-disposition=attachment%3B+filename%3Ddeepnlp.zip. I will use the one called “Sheet_2.csv”. The data set I use contains 125 rows and three columns. First column is the ID of the resumes, from 1 to 125. Second column “class” is the label, and third column is the resume. The resumes are queried from “Indeed.com” with keyword “data scientist”, location “Vermont”. In “class” column, there are two types of value which are “flagged” and “not flagged”. If a resume is “not flagged”, it means that the applicant can submit a modified resume version at a later date. If it is “flagged”, it means that the applicant is invited to interview.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Final Project.ipynb		Final Project.ipynb
README.md		README.md
Sheet_2.csv		Sheet_2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

Data set

About

Releases

Packages

Languages

zcx10025/AI-Final-Project

Folders and files

Latest commit

History

Repository files navigation

Project Description

Data set

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages