This is the official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://openreview.net/forum?id=xcelRQScTjP).
The code uses PyTorch 1.10+ and HuggingFace's transformers
library for training the RoBERTa models. To install PyTorch, look for the Python package compatible with your local CUDA setup here.
virtualenv relic-venv
source relic-venv/bin/activate
pip install torch torchvision # currently, this is the version compatible with CUDA 10.1
pip install transformers
pip install tensorboardX
pip install --editable .
Download the dataset from this link. Your RELiC
folder should look like,
(relic-venv) kalpesh@node187:relic-retrieval$ ls RELiC/
test.json train.json val.json
(relic-venv) kalpesh@node187:relic-retrieval$
All pretrained models can be found in the dataset Google Drive folder. Individual checkpoint links are added below,
Model | Google Drive link |
---|---|
dense-RELiC (4 left, 4 right sentences) | link |
dense-RELiC (4 left, 0 right sentences) | link |
dense-RELiC (0 left, 4 right sentences) | link |
dense-RELiC (1 left, 1 right sentences) | link |
dense-RELiC (1 left, 0 right sentences) | link |
Make sure you have downloaded the dataset as described above. The evaluation script assumes the pretrained models are downloaded from the Google Drive links above and placed in the retriever_train/saved_models
. It's best to run this on a GPU, since dense vectors need to be computed before retrieval takes place.
# you may need to run "export CUDA_VISIBLE_DEVICES=0" to use GPU-0
# remove --write_to_file if you don't wish to write a 1GB output file with retrieval ranks
python scripts/relic_evaluation.py \
--model retriever_train/saved_models/model_denserelic_4_4 \
--write_to_file \
--split val
Make sure you have downloaded the dataset as described above. Run the following preprocessing script (adjust the --left_sents
/ --right_sents
flags for shorter contexts):
python scripts/preprocess_lit_analysis_data.py --left_sents 4 --right_sents 4
Two scripts are used while training dense-ReLIC, a model training script and an early stopping evaluation script. Both scripts can be run simultaneously --- the evaluation script periodically looks at the checkpoint folder and deletes suboptimal checkpoints. Alternatively, the evaluation script can be run after the model training is finished (to find the best checkpoints).
There are two ways to run training (directly or using SLURM) ---
- Run the example bash scripts directly,
# in terminal 1
# you may need to run "export CUDA_VISIBLE_DEVICES=0" to use GPU-0
bash retriever_train/examples/schedule.sh
# in terminal 2
# you may need to run "export CUDA_VISIBLE_DEVICES=1" to use GPU-1
# this script is used for early stopping checkpoints, it is not a precise evaluation.
bash retriever_train/examples/evaluate.sh
- If you have a SLURM setup, you can configure model hyperparameters using
retriever_train/hyperparameter_config.py
(which supports grid search too) and then run,
python retriever_train/schedule.py
This script launches both train / evaluation processes simultaneously on SLURM giving them a unique job_id (let's say X
). You can access the logs using,
### Access training logs
cat retriever_train/logs/log_X.txt
### Access early stopping evaluation logs
cat retriever_train/logs/log_eval_X.txt
### Access hyperparameter config for experiment X
cat retriever_train/logs/expts.txt | grep -A "model_X"
### Access the bash scripts running on SLURM
cat retriever_train/slurm-schedulers/schedule_X.sh
cat retriever_train/slurm-schedulers/evaluate_X.sh
This script exports checkpoints to retriever_train/saved_models/model_X
. There's also TensorBoard support, see retriever_train/runs
.
NOTE: You may need to make minor changes to retriever_train/run_finetune_gpt2_template.sh
, retriever_train/run_evaluate_gpt2_template.sh
and retriever_train/schedule.py
to make them compatible with your SLURM setup.
Additional libraries will be needed to run the baseline retrievers.
- SIM --- A semantic similarity model from Wieting et al. 2019 trained on STS data.
pip install nltk
pip install sentencepiece
# remove --write_to_file if you don't wish to write a 1GB output file with retrieval ranks
python scripts/relic_evaluation_sim.py \
--left_sents 1 --right_sents 1 \
--write_to_file \
--split val
- DPR --- A retriever from Karphukin et al. 2020 trained on Natural Questions data.
# remove --write_to_file if you don't wish to write a 1GB output file with retrieval ranks
python scripts/relic_evaluation_dpr.py \
--left_sents 1 --right_sents 1 \
--write_to_file \
--split val
- c-REALM --- A retriever from Krishna et al. 2021 based on REALM and trained on ELI5 data.
### for c-REALM
# TF 2.3 is the version compatible with CUDA 10.1
# See https://www.tensorflow.org/install/source#gpu for TF-CUDA mapping
pip install tensorflow==2.3
pip install tensor2tensor
# Download and unzip the c-REALM checkpoint
wget https://storage.googleapis.com/rt-checkpoint/retriever.zip
unzip retriever.zip && rm retriever.zip
mv retriever crealm-retriever
rm -rf crealm-retriever/encoded_*
# remove --write_to_file if you don't wish to write a 1GB output file with retrieval ranks
python scripts/relic_evaluation_crealm.py \
--left_sents 1 --right_sents 1 \
--write_to_file \
--split val
- Random retrieval
python scripts/relic_evaluation_random.py --num_samples 100 --split val
You may submit your predictions for the test set here: https://forms.gle/1B6JuQ3nbGXCR2kC8
The format of your submission file should be a .json
file that is a dictionary where the unique IDs of each test set instance are the keys, and the values are a rank list. This list should contain the 100 indicies of the top 100 candidates retriever by your model, in rank order. For example, if your retriever's top-ranked candidate is 99
for test set instance "070789"
, one entry in your .json
dict should look like:
"070789": [99, ...]
If you found our paper or this repository useful, please cite:
@inproceedings{relic22,
author={Katherine Thai and Yapei Chang and Kalpesh Krishna and Mohit Iyyer},
Booktitle = {Association of Computational Linguistics},
Year = "2022",
Title={RELiC: Retrieving Evidence for Literary Claims},
}