Skip to content

Universal information extraction with instruction learning

License

Notifications You must be signed in to change notification settings

waywayyang/InstructUIE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InstructUIE

  • This repo releases our implementation for the InstructUIE model.
  • It is built based on the pretrained Flan T5 model, and finetuned on our data.

Requirements

Our main experiments and analysis are conducted on the following environment:

  • CUDA (11.3)
  • cuDNN (8.2.0.53)
  • Pytorch (1.10.0)
  • Transformers (4.26.1)
  • DeepSpeed (0.7.7)

You can install the required libraries by running

bash setup.sh

Data

Our models are trained and evaluated on IE INSTRUCTIONS. You can download the data from Baidu NetDisk or Google Drive.

Training

A sample script for training the InstructUIE model in our paper can be found at scripts/train_flan-t5.sh. You can run it as follows:

bash ./scripts/train_flan-t5.sh

Released Checkpoints

Our model checkpoints would be released soon.

Evaluation

A sample script for evaluating the InstructUIE model in our paper can be found at scripts/eval_flan-t5.sh. You can run it as follows:

bash ./scripts/eval_flan-t5.sh

The decoded results would save to predict_eval_predictions.jsonl in your output dir. To calculate f1 with predict_eval_predictions.jsonl

python calculate_f1.py

Citation

If you are using InstructUIE for your work, please kindly cite our paper:

@article{wang2023instructuie,
  title={InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction},
  author={Wang, Xiao and Zhou, Weikang and Zu, Can and Xia, Han and Chen, Tianze and Zhang, Yuansen and Zheng, Rui and Ye, Junjie and Zhang, Qi and Gui, Tao and others},
  journal={arXiv preprint arXiv:2304.08085},
  year={2023}
}

About

Universal information extraction with instruction learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.9%
  • Shell 3.9%
  • Dockerfile 0.2%