Thesis - Edison Competencies Evaluation and Term extraction

Prerequisites

Python3
Spark2.7
MongoDB
Check env.sh and change SPARK_HOME variable to where Spark is located
pip3 - python3 dependency manager (Install any missing python3 dependency using pip3)
Some python dependencies(not all needed all the time): scipy, numpy, pathlib, nltk(need to install wordnet in this), langdetect, flask, pymongo, flask_cors
Clone the repository and run: source env.sh for environment setup
Need mongodb running locally. sudo service mongodb start
To run server python3 server.py (Necessary for running the front-end app. This requires mongodb to be running)
Main spark jobs are listed in env.sh rr1 runs the spark job with rr1 alias rrall runs them all (Check env.sh before running jobs)

Notes

Job ads are "jobs". Job ads go under jobs/ folder. Single job has single file in json. "description" field is where job ad text is located. Then run jobs4rdd.py script with python3 jobs4rdd.py
Folders under Categories/ are category names. .txt files listed under these category folders are called "skill" Add a particular skill by creating folders and files in the following format: Categories/<new-category-name>/<new-skill-name>.txt Then run python3 categories4rdd.py
CVs are located under CVs/ Add a new CV under this folder as a single json file and run python3 cvs4rdd.py
*4rdd.py scripts prepares input for Spark to process
Spark job outputs go under Calculated/

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.vscode		.vscode
BioNLP-ST_2011_Bacteria_Biotopes_train_data_rev1		BioNLP-ST_2011_Bacteria_Biotopes_train_data_rev1
BioNLP-ST_2011_Entity_Relations_training_data		BioNLP-ST_2011_Entity_Relations_training_data
BioNLP-ST_2011_Epi_and_PTM_training_data_rev1		BioNLP-ST_2011_Epi_and_PTM_training_data_rev1
BioNLP-ST_2011_GE_devel		BioNLP-ST_2011_GE_devel
BioNLP-ST_2011_Infectious_Diseases_training_data_rev1		BioNLP-ST_2011_Infectious_Diseases_training_data_rev1
BioNLP-ST_2011_bacteria_interactions_train_data_rev1		BioNLP-ST_2011_bacteria_interactions_train_data_rev1
BioNLP-ST_2011_bacteria_rename_train_data		BioNLP-ST_2011_bacteria_rename_train_data
BioNLP-ST_2011_coreference_training_data		BioNLP-ST_2011_coreference_training_data
CVs		CVs
Calculated		Calculated
Categories		Categories
Entities-newjobs		Entities-newjobs
Entities		Entities
FP-Entities-newjobs-with-blacklist-all		FP-Entities-newjobs-with-blacklist-all
FP-Entities-newjobs-with-blacklist		FP-Entities-newjobs-with-blacklist
FP-Entities-with-blacklist		FP-Entities-with-blacklist
FP-Entities		FP-Entities
allcategories4rdd		allcategories4rdd
allcvs4rdd		allcvs4rdd
alljobs4rdd		alljobs4rdd
fpjobs-newjobs-with-blacklist		fpjobs-newjobs-with-blacklist
fpjobs-with-blacklist		fpjobs-with-blacklist
fpjobs		fpjobs
jobs-importantwords		jobs-importantwords
jobs		jobs
lda-topics-lemmatized		lda-topics-lemmatized
lda-topics-stemmed		lda-topics-stemmed
newjobs		newjobs
newjobs4rdd		newjobs4rdd
word2vec-model		word2vec-model
.gitignore		.gitignore
CVs.tar.gz		CVs.tar.gz
License		License
README.md		README.md
SmartStoplist.txt		SmartStoplist.txt
biomed-score.py		biomed-score.py
categories4rdd.py		categories4rdd.py
collectjobs.py		collectjobs.py
count-pca.py		count-pca.py
countvectorizer.py		countvectorizer.py
cvs4rdd.py		cvs4rdd.py
dbwriter.py		dbwriter.py
entity-linker.py		entity-linker.py
entity-tfidf.py		entity-tfidf.py
env.sh		env.sh
fetch.py		fetch.py
fiddle.py		fiddle.py
fnonenglish.py		fnonenglish.py
fpjobs-important-words.py		fpjobs-important-words.py
fpjobs.py		fpjobs.py
fpreduced-as-is.txt		fpreduced-as-is.txt
fpreduced-newjobs.txt		fpreduced-newjobs.txt
fpreduced.txt		fpreduced.txt
jobs-collected		jobs-collected
jobs4rdd.py		jobs4rdd.py
lda.py		lda.py
newjobs4rdd.py		newjobs4rdd.py
rake-on-alljobs.txt		rake-on-alljobs.txt
rake.py		rake.py
rake_tutorial.py		rake_tutorial.py
reduced.txt		reduced.txt
reducefp.py		reducefp.py
server.py		server.py
synonyms.py		synonyms.py
tfidf.py		tfidf.py
word2vec.py		word2vec.py
word2vec2.py		word2vec2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thesis - Edison Competencies Evaluation and Term extraction

About

Releases

Packages

Languages

License

bcanvural/thesis

Folders and files

Latest commit

History

Repository files navigation

Thesis - Edison Competencies Evaluation and Term extraction

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages