Skip to content

[ACL2024 Findings] Towards Better Question Generation in QA-Based Event Extraction

License

Notifications You must be signed in to change notification settings

Rcrossmeister/RLQG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RLQG - Towards Better QG in QA-based EE

[2024/10] Check our video presentation in Underline!

[2024/08] The video presentation of our paper will be available soon.

[2024/08] The presentation of our paper is scheduled at Virtual Poster Session 2, check the poster and slides here.

[2024/07] The code is available now.

[2024/05] Our paper is accepted as a findings paper in ACL2024!

We propose a novel framework RLQG for generating better questions in QA-based event extraction via reinforcement learning, the paper is available here.

Framework

Setup

Environment

The GPU resources we use in our study is 4*A800-SXM4-80G with the corresponding CUDA version 12.1, we strongly recommend using the PyTorch version above 2.0.

# Clone the repository
git clone https://github.com/Rcrossmeister/RLQG.git
cd ./RLQG

# Create the conda environment
conda create -n rlqg python=3.11.3
conda activate rlqg

# Install the required packages
pip install -r requirements.txt
python -m spacy download en_core_web_lg
python -m nltk.downloader punkt

Dataset

We use the ACE2005 and the RAMS dataset in our study, please follow their copyright to download the required one (ACE2005 with more template options is preferred), another widely-used dataset WiKiEvent is planning to support soon.

Pre-processing

Follow ./ACE2005/README.md or ./RAMS/README.md to pre-process the ACE2005 or RAMS dataset accordingly. Data pre-processing is compulsory in our study when you are using raw datasets before getting template questions.

Teamplate questions for ACE2005

We support 3 types of template questions for ACE2005 dataset include standard, annotation and dynamic, you can check more details here. We recommend the dynamic template if there is no additional setting for you, which is also the default setting for the argument --template_type.

python dataset/generator.py --template_type dynamic

The questions used for supervised fine-tune a QG model and also be used to get beam search implementation will be saved at ./model/data.

Teamplate questions for RAMS

Currently, we only support standard template questions in RAMS dataset, see more details here. Follow the given template format; you are allowed to create your own question templates. The questions can be directly obtained in the pre-processing step and will be saved at ./model/data.

Models

We use LLaMA-2 as the backbone model in our paper, and we also support several popular open-source LLMs like ChatGLM and Qwen. To load the model weight locally, using LLaMA-2-7b as an example:

mkdir backbone_model && cd backbone_model
git lfs install
git clone https://huggingface.co/meta-llama/Llama-2-7b-hf

Or you can replace the local path at argument --model_name_or_path by the repository name of huggingface (e.g. meta-llama/Llama-2-7b-hf) in the following training script, the model weight will be downloaded and loaded automatically.

Training

The training implementaion was inspired by LLaMA Factory, you can check their technical report here. To have better robustness, in this repository, we use DPO training after SFT instead of as refining algorithm. If you're interested in PPO, please refer to the usage here (will be support in this repository soon).

Tip

If you need to use the fine-tuned Inverse Prompting Model (IPM) in our paper, please download Rcross/IPM-Llama-2-13b to ./backbone_model before start. Notice that we only provides LoRA weights, please merge them with the backbone model meta-llama/Llama-2-13b-hf before use. You can also train your customized IPM, please refer here to organize your own training data.

Quick Start

We provide a script to quick start on ACE2005 dataset, which supervised fine-tune the QG model over the dynamic template questions proposed by (Lu et al., 2023), and further refined by the RLQG framework.

cd ./model && sh run.sh 

Important

Please download meta-llama/Llama-2-7b-hf, meta-llama/Llama-2-13b-hf and meta-llama/Llama-2-13b-chat-hf to./backbone_model before using the quick start script. "The quick start will use the IPM Rcross/IPM-Llama-2-13b proposed in our paper, and deploy the meta-llama/Llama-2-13b-chat-hf as the local off-the-shelf QA model with the default port 19777."

Detailed Workflow

You can check the detailed workflow and more usage in this README, which explain each module involved in the quick start script run.sh.

Evaluation

Question Answering

We support two paradigms to answer the generated questions:

LLaMA-2 QA

Using the open-source model such as LLaMA-2 to answer the generated questions with few-shot prompting, the server is deployed following the OpenAI API style. Fill in [your-url-port] with the port you use to deploy your local QA model.

python evaluation/llama2_qa.py \
    --url http://localhost:[your-url-port]/v1/chat/completions \
    --model_name [QA-model-name] \
    --input_path [path-to-QG-file] \
    --num_shots 5

Tip

We support various open-source model to serve as the QA model in our framework (e.g. ChatGLM and Qwen), and we also support the fine-tuned QA model with LoRA weight to deploy locally. You can modify the deploying details by following the instructions here.

OpenAI QA

Using OpenAI API such as GPT-4 to answer the generated questions with few-shot prompting, you need to prepare an API key (e.g. sk-a1b2c3) to support the server. Please find more details in the OpenAI API platform.

python evaluation/openai_qa.py \
    --api_key [your-api-key] \
    --api_model [your-api-model] \
    --input_path [path-to-QG-file] \
    --num_shots 5 

Response Assessment

python evaluation/eval.py --input_dir [path-to-QA-file]

If you are using the quick start script, you are expected to obtain the experimental results in Table 2 in our paper, which looks like follows:

============================ Practical Eval ============================
Metric              EM                  COR                 SemSim       
========================================================================
Value(%)          41.47                48.55                68.04        

============================== Full Eval ===============================
Metric              EM                  COR                 SemSim       
========================================================================
Value(%)          21.94                24.31                31.92  

Citation

Please cite our paper if you include RLQG in your work:

@inproceedings{hong2024towards,
    title = "Towards Better Question Generation in {QA}-based Event Extraction",
    author = "Hong, Zijin  and Liu, Jian",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    year = "2024"
}

About

[ACL2024 Findings] Towards Better Question Generation in QA-Based Event Extraction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published