Abstract: Distant Supervision (DS) is a popular technique for developing relation extractors starting with limited supervision. Our contributions in this paper are threefold. Firstly, we propose three novel models for distantly-supervised relation extraction: (1) a Bi-GRU based word attention model (BGWA), (2) an entity-centric attention model (EA), and (3) and a combination model (BNET-DS) which jointly trains and combines multiple complementary models for improved relation extraction. Secondly, we introduce GDS, a new distant supervision dataset for relation extraction. GDS removes test data noise present in all previous distance supervision benchmark datasets, making credible automatic evaluation possible. Thirdly, through extensive experiments on multiple real-world datasets, we demonstrate effectiveness of the proposed methods.
The folder Code/Preprocess/
has the files for preprocessing the data. Just run the file preprocess.sh
to get the output files in the same folder. There will be some intermediate files, but the final processed files will have the following name:
- train_final.p : The processed train files
- test_final.p : The processed test files
- dev_final.p : The processed dev files
The folder Codes/Models/
has the files for the 3 models:
- BGWA.py : Bi-GRU based word attention model
- EA.py : Entity-centric attention model
- PCNN.py : Piecewise convolutional neural model
Each of the files can be run in the following way:
python2.7 <file> <data directory> <train file name> <test file name> <dev file name> <word embedding file name>
The command has 5 arguments
- : The name of the directory containing the processed files
- : The name (not the path) of the processed train file
- : The name (not the path) of the processed test file
- : The name (not the path) of the processed dev file
- : The name (not the path) of the processed word embedding file