Please use the following BibTex code for citing this work.
@InProceedings{abhishek-anand-awekar:2017:EACLlong,
author = {Abhishek, Abhishek and Anand, Ashish and Awekar, Amit},
title = {Fine-Grained Entity Type Classification by Jointly Learning Representations and Label Embeddings},
booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers},
month = {April},
year = {2017},
address = {Valencia, Spain},
publisher = {Association for Computational Linguistics},
pages = {797--807},
url = {http://www.aclweb.org/anthology/E17-1075}
}
An updated version of the main code, compatible with TensorFlow 1.10 is available at https://github.com/abhipec/FgEC. Transfer learning related experiments are not part of that code.
Download the necessary data as per instructions mentioned in data/processed/f1/README.md file.
Directory structure:
- /home/
- EACL-2017
- fnet
- glove.840B.300d
- EACL-2017
Python3 version of TensorFlow (0.10.0rc0) framework is used in this experiment.
pip install numpy docopt pandas plotly matplotlib scipy sklearn
Compile Cpp libraries.
cd src/lib
bash compile_gcc_5.bash
cd src
bash scripts/BBN.bash
bash scripts/OntoNotes.bash
bash scripts/Wiki.bash
This will create model checkpoints in the ckpt directory.
Please have a look at the scripts and modify necessary variables.
Report result:
python report_results.py ~/EACL-2017/fnet/ckpt/
Download the necessary data as per instructions mentioned in data/processed/f4/README.md file.
bash scripts/tl.bash
python report_results.py ~/EACL-2017/fnet/ckpt/
These steps will convert the original data https://github.com/shanzhenren/AFET to tfrecord format used in this code.
Download the necessary data as per instructions mentioned in data/AFET/dataset/README.md file.
Also download and extract GloVe vectors (http://nlp.stanford.edu/data/glove.840B.300d.zip) in glove.840B.300d directory.
Dataset names used: BBN, Wiki and OntoNotes.
Preprocess data and generate train, development and test set.
cd data_processing/
python sanitizer.py BBN ~/EACL-2017/fnet/data/AFET/ 10 ~/EACL-2017/fnet/data/sanitized/
Convert json to Tfrecord format
python data_processing/json_to_tfrecord.py BBN ~/EACL-2017/fnet/data/sanitized/ ~/EACL-2017/glove.840B.300d/glove.840B.300d.txt f1 ~/EACL-2017/fnet/data/processed/
python data_processing/json_to_tfrecord.py BBN ~/EACL-2017/fnet/data/sanitized/ ~/EACL-2017/glove.840B.300d/glove.840B.300d.txt f2 ~/EACL-2017/fnet/data/processed/
python data_processing/json_to_tfrecord.py BBN ~/EACL-2017/fnet/data/sanitized/ ~/EACL-2017/glove.840B.300d/glove.840B.300d.txt f3 ~/EACL-2017/fnet/data/processed/
data_format | alias | remarks |
---|---|---|
our | f1 | Used in our, our-NoM, our-AllC |
Attentive | f2 | Used in Attentive |
transfer-learning-model | f3 | Used in model level transfer learning |
- Train our model on Wiki dataset.
- Note down its uid.
- Modify ../ckpt/uid/checkpint file such that it points to the best performing checkpoint.
- Change the fintune_directory parameter in the following scripts to include uid noted in step 2.
bash scripts/transfer_learning_model.bash
bash scripts/transfer_learning_feature_dumping.bash
bash scripts/tl.bash
Report result
python report_results.py ~/EACL-2017/fnet/ckpt/
Please change the dataset and the path of result file that need to be analysed type wise.
python class_wise_analysis.py --all_labels_file=../data/sanitized/BBN/sanitized_labels.txt --json_file=../data/sanitized/BBN/sanitized_test.json --result_file=../ckpt/Wiki_1.2/result_7.txt --dataset=Wiki