agreement
contains evaluation data generated based on long distance agreement patternsevaluation_output
contains evaluation data and results of our trained models (this is probably where you want to look at, if you're interested in using our agreement test sets)linzen_testset
contains the subset of data from Linzen et al. TACL 2016 (https://github.com/TalLinzen/rnn_agreement) which we used for our evaluationraw_mturk_data
contains the reponses of MTurk subjects for the extended Italian agreement data
Each corpus consists of around 100M tokens, we used training (80M) and validation (10M) subsets in our experiments. All corpora were shuffled at sentence level.
- English train / valid / test / vocab
- Hebrew train / valid / test / vocab
- Italian train / valid / test / vocab
- Russian train / valid / test / vocab
For each language, we distribute the trained LSTM model which achieved the lowest perplexity on our test set (validation in the data above). The name of the model file indicates the hyperparameters that were used to train this model. See the supplementary materials for more details, and scripts in the src directory for usage examples.
The models were trained with the vocabularies given above. Each vocabulary lists words according to their indices starting from 0, <unk>
and <eos>
tokens are already in the vocabulary.
Please cite the paper if you use the above resources.