Name	Name	Last commit message	Last commit date
Latest commit History 36 Commits
relic_preprocessed	relic_preprocessed
retriever_train	retriever_train
scripts	scripts
.gitignore	.gitignore
DATA.md	DATA.md
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
setup.py	setup.py

RELiC: Retrieving Evidence for Literary Claims

This is the official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://openreview.net/forum?id=xcelRQScTjP).

Setup

The code uses PyTorch 1.10+ and HuggingFace's transformers library for training the RoBERTa models. To install PyTorch, look for the Python package compatible with your local CUDA setup here.

virtualenv relic-venv
source relic-venv/bin/activate
pip install torch torchvision # currently, this is the version compatible with CUDA 10.1
pip install transformers
pip install tensorboardX
pip install --editable .

Download the dataset from this link. Your RELiC folder should look like,

(relic-venv) kalpesh@node187:relic-retrieval$ ls RELiC/
test.json  train.json  val.json
(relic-venv) kalpesh@node187:relic-retrieval$

Pretrained dense-RELiC models

All pretrained models can be found in the dataset Google Drive folder. Individual checkpoint links are added below,

Model	Google Drive link
dense-RELiC (4 left, 4 right sentences)	link
dense-RELiC (4 left, 0 right sentences)	link
dense-RELiC (0 left, 4 right sentences)	link
dense-RELiC (1 left, 1 right sentences)	link
dense-RELiC (1 left, 0 right sentences)	link

Evaluation

Make sure you have downloaded the dataset as described above. The evaluation script assumes the pretrained models are downloaded from the Google Drive links above and placed in the retriever_train/saved_models. It's best to run this on a GPU, since dense vectors need to be computed before retrieval takes place.

# you may need to run "export CUDA_VISIBLE_DEVICES=0" to use GPU-0
# remove --write_to_file if you don't wish to write a 1GB output file with retrieval ranks
python scripts/relic_evaluation.py \
    --model retriever_train/saved_models/model_denserelic_4_4 \
    --write_to_file \
    --split val

Training dense-RELiC

Preprocess Dataset

Make sure you have downloaded the dataset as described above. Run the following preprocessing script (adjust the --left_sents / --right_sents flags for shorter contexts):

python scripts/preprocess_lit_analysis_data.py --left_sents 4 --right_sents 4

Training and early stopping evaluation

Two scripts are used while training dense-ReLIC, a model training script and an early stopping evaluation script. Both scripts can be run simultaneously --- the evaluation script periodically looks at the checkpoint folder and deletes suboptimal checkpoints. Alternatively, the evaluation script can be run after the model training is finished (to find the best checkpoints).

There are two ways to run training (directly or using SLURM) ---

Run the example bash scripts directly,

# in terminal 1
# you may need to run "export CUDA_VISIBLE_DEVICES=0" to use GPU-0
bash retriever_train/examples/schedule.sh

# in terminal 2
# you may need to run "export CUDA_VISIBLE_DEVICES=1" to use GPU-1
# this script is used for early stopping checkpoints, it is not a precise evaluation.
bash retriever_train/examples/evaluate.sh

If you have a SLURM setup, you can configure model hyperparameters using retriever_train/hyperparameter_config.py (which supports grid search too) and then run,

python retriever_train/schedule.py

This script launches both train / evaluation processes simultaneously on SLURM giving them a unique job_id (let's say X). You can access the logs using,

### Access training logs
cat retriever_train/logs/log_X.txt

### Access early stopping evaluation logs
cat retriever_train/logs/log_eval_X.txt

### Access hyperparameter config for experiment X
cat retriever_train/logs/expts.txt | grep -A "model_X"

### Access the bash scripts running on SLURM
cat retriever_train/slurm-schedulers/schedule_X.sh
cat retriever_train/slurm-schedulers/evaluate_X.sh

This script exports checkpoints to retriever_train/saved_models/model_X. There's also TensorBoard support, see retriever_train/runs.

NOTE: You may need to make minor changes to retriever_train/run_finetune_gpt2_template.sh, retriever_train/run_evaluate_gpt2_template.sh and retriever_train/schedule.py to make them compatible with your SLURM setup.

Running Baselines (DPR, SIM, c-REALM)

Additional libraries will be needed to run the baseline retrievers.

SIM --- A semantic similarity model from Wieting et al. 2019 trained on STS data.

pip install nltk
pip install sentencepiece

# remove --write_to_file if you don't wish to write a 1GB output file with retrieval ranks
python scripts/relic_evaluation_sim.py \
    --left_sents 1 --right_sents 1 \
    --write_to_file \
    --split val

DPR --- A retriever from Karphukin et al. 2020 trained on Natural Questions data.

# remove --write_to_file if you don't wish to write a 1GB output file with retrieval ranks
python scripts/relic_evaluation_dpr.py \
    --left_sents 1 --right_sents 1 \
    --write_to_file \
    --split val

c-REALM --- A retriever from Krishna et al. 2021 based on REALM and trained on ELI5 data.

### for c-REALM
# TF 2.3 is the version compatible with CUDA 10.1
# See https://www.tensorflow.org/install/source#gpu for TF-CUDA mapping
pip install tensorflow==2.3
pip install tensor2tensor

# Download and unzip the c-REALM checkpoint
wget https://storage.googleapis.com/rt-checkpoint/retriever.zip
unzip retriever.zip && rm retriever.zip
mv retriever crealm-retriever
rm -rf crealm-retriever/encoded_*

# remove --write_to_file if you don't wish to write a 1GB output file with retrieval ranks
python scripts/relic_evaluation_crealm.py \
    --left_sents 1 --right_sents 1 \
    --write_to_file \
    --split val

Random retrieval

python scripts/relic_evaluation_random.py  --num_samples 100 --split val

Leaderboard Submission

You may submit your predictions for the test set here: https://forms.gle/1B6JuQ3nbGXCR2kC8

The format of your submission file should be a .json file that is a dictionary where the unique IDs of each test set instance are the keys, and the values are a rank list. This list should contain the 100 indicies of the top 100 candidates retriever by your model, in rank order. For example, if your retriever's top-ranked candidate is 99 for test set instance "070789", one entry in your .json dict should look like:

"070789": [99, ...]

Citation

If you found our paper or this repository useful, please cite:

@inproceedings{relic22,
author={Katherine Thai and Yapei Chang and Kalpesh Krishna and Mohit Iyyer},
Booktitle = {Association of Computational Linguistics},
Year = "2022",
Title={RELiC: Retrieving Evidence for Literary Claims},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RELiC: Retrieving Evidence for Literary Claims

Setup

Pretrained dense-RELiC models

Evaluation

Training dense-RELiC

Preprocess Dataset

Training and early stopping evaluation

Running Baselines (DPR, SIM, c-REALM)

Leaderboard Submission

Citation

About

Releases

Packages

Languages

License

katherinethai/relic-retrieval

Folders and files

Latest commit

History

Repository files navigation

RELiC: Retrieving Evidence for Literary Claims

Setup

Pretrained dense-RELiC models

Evaluation

Training dense-RELiC

Preprocess Dataset

Training and early stopping evaluation

Running Baselines (DPR, SIM, c-REALM)

Leaderboard Submission

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages