SciCo-Radar: Inferring Scientific Cross-Document Coreference and Hierarchy with Definition-Augmented Relational Reasoning

We present SciCo-Radar, methods to detect cross-document coreference and hierarchy in scientific papers by augmenting original inputs from papers with context-sensitive generated definitions and relational reasoning.

Preprint
Definitions files
Trained model on HF (WIP)

Baseline

To run and evaluate the baseline model, please follow the instructions in the original SciCo model repository. Make sure you run this project on a separate environment, as the original SciCo repo has many dependencies that might conflict with this one.

Steps to run our model

Install all dependencies using pip install -r requirements.txt (we recommend using conda).

1. Data

First create a folder named data in the root directory.

SciCo dataset

To download the raw data, click here.

Each file (train, dev, test) is in the jsonl format where each row corresponds a topic. See below the description of the fields in each topic. After Downloading the data, place it in the data folder in the root directory.

You can also load SciCo directly from huggingface.co/datasets/allenai/scico as follows:

from datasets import load_dataset
scico = load_dataset("allenai/scico")

Definitions dataset

download the Singleton and Relational definitions datasets from here and place them in the data folder in the root directory.

2. Training

Finetuning

To fine-tune the model, run the following command:

python train_llm_classification.py --output_dir your-output-dir

or use accelerate launch to train on multiple GPUs

Using our trained model

WIP

3. Inference

To run inference on the model, execute the following command:

python predict_llm_classification.py --adapter your-saved-model-path --output_dir your-output-dir

or use accelerate launch to train on multiple GPUs

add --use_relational_def True to use the relational definitions

Note: The provided relational definitions were generated specifically with the model stated in the paper so scores can vary with other models.

Set save_path (where you want to save the results) and scores_path (where you saved the inference scores) in the config path in configs/multiclass.yaml, then run the following script:

python predict.py --config configs/multiclass.yaml

Evaluation

After we finish the inference process it produces a jsonl file with the fields tokens, mentions, relations and id. Models are evaluated using the usual coreference metrics using the coval script, hierarchy (recall, precision and F1), and directed path ratio.

python evaluation.py [gold_jsonl_path] [sys_jsonl_path] options

If you want to evaluate only on the hard topics (based on levenshtein performance, see Section 4.5), you can set the options to be hard_10, hard_20 or curated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciCo-Radar: Inferring Scientific Cross-Document Coreference and Hierarchy with Definition-Augmented Relational Reasoning

Baseline

Steps to run our model

1. Data

SciCo dataset

Definitions dataset

2. Training

Finetuning

Using our trained model

3. Inference

Evaluation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
coval		coval
definition_handler		definition_handler
eval		eval
models		models
utils		utils
Dspy_test.py		Dspy_test.py
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
predict.py		predict.py
predict_llm_classification.py		predict_llm_classification.py
requirements.txt		requirements.txt
scico_def_main_fig_sketch.png		scico_def_main_fig_sketch.png
train_llm_classification.py		train_llm_classification.py

License

TomHopeLab/ScicoRadar

Folders and files

Latest commit

History

Repository files navigation

SciCo-Radar: Inferring Scientific Cross-Document Coreference and Hierarchy with Definition-Augmented Relational Reasoning

Baseline

Steps to run our model

1. Data

SciCo dataset

Definitions dataset

2. Training

Finetuning

Using our trained model

3. Inference

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages