The purpose of the seminar paper was to attempt to show the possibility of discerning between antnonyms and synonyms based on their semantical distributional differences and to use these results to better existing VSM models' predictions.
All the datasets, neural networks and code used to create and evaluate them during the writing of the seminar paper has been uploaded here in order to replicate the findings if necessary.
To replicate the findings or re-use the code, one needs to:
- Have a word pair list ready (or use the one in the word pairs folder).
- Have a vector embedding model ready (e.g GloVE or word2vec), these can be loaded using the model_utils module.
- Create the datasets using the prelim module.
- Create and train the neural network with the model_creation module.
- Evaluate the neural network model and the VSM (vector space model) using the model_evaluation module.
- The prelim module creates the word pairs and the datasets.
- The model_utils module loads the different embedding models to memory.
- The model_creation module creates the neural network model (keras) and trains it with the relevant datasets.
- The model_evaluation model take an embedding model and a neural network model and evaluates their performance via Spearman's rho agains the Simlex999 dataset.
- Python3
- gensim (2.3.0)
- h5py (2.7.0)
- Keras (1.2.0) - backend configuration file uploaded.
- matplotlib (2.0.0)
- nltk (3.2.2)
- numpy (1.12.0+mkl)
- scikit-learn (0.18.1)
- scipy (0.18.1)
- tensorflow (0.12.1)