Skip to content
forked from wyu97/permgen

Author: Wenhao Yu ([email protected]). EMNLP'21. Sentence-Permuted Paragraph Generation.

Notifications You must be signed in to change notification settings

coriskr/permgen

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentence-Permuted Paragraph Generation

This repository contains the code package for the EMNLP'2021 paper:

Sentence-Permuted Paragraph Generation [arXiv] [slides] [video]

Wenhao Yu (ND), Chenguang Zhu (MSR), Tong Zhao (ND), Zhichun Guo (ND), Meng Jiang (ND).

In this paper, we propose a novel framework PermGen whose objective is to maximize the expected log-likelihood of output paragraph distributions with respect to all possible sentence orders. PermGen uses hierarchical positional embedding and designs new procedures for training, and decoding. Experiments on three paragraph generation benchmarks demonstrate PermGen generates more diverse outputs with a higher quality than existing models.

Model Usage

Step 1: Download datasets

We conducted experiments on three paragraph generation tasks: story generation (ROCStory), news generation (DailyMail), paper abstract generation (AGENDA). For the ROCStory and AGENDA datasets, we directly download them from their official repos. For the DailyMail dataset, We use randomly sampled 53,102 news articles from the original corpus and extract keyphrases from each sentence using RAKE.

Dataset Name Original Link Paper Link Our Pre-processed
ROCStory OL-ROC PL-ROC OP-ROC
AGENDA OL-AG PL-AG OP-AG
DailyMail OL-DM PL-DM OP-DM

After downloading the pre-processed datasets, please put them in the dataset folder.

Step 2: Install packages

The python version should be at least 3.6.0.

conda create -n permgen python=3.6
conda activate permgen
pip install transformers==3.3.1
pip install torch==1.7.0

Step 3: Randomly permute sentences

Add/delete --dataset to choose the dataset.

python dataset/preprocessing.py --agenda --dailymail --rocstory

Step 4: Train the model

bash scripts/train_agenda.sh
bash scripts/train_dailymail.sh
bash scripts/train_rocstory.sh

Step 5: Test with saved checkpoints

Do not forget to specify the path for saved checkpoints!

bash scripts/test_agenda.sh
bash scripts/test_dailymail.sh
bash scripts/test_rocstory.sh

Easy-to-use baseline implementation

The baseline BART implementation can be found at here. The repository contains the code to reproduce the baseline performance reported in our paper. All hyperparameters and evaluations are the same as in this repository.

Output examples

Please find our output examples in the examples folder.

Reference

If you find this repository useful in your research, please consider to cite our paper:

@inproceedings{yu2021sentence,
  title={Sentence-Permuted Paragraph Generation},
  author={Yu, Wenhao and Zhu, Chenguang and Zhao, Tong and Guo, Zhichun and Jiang, Meng},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2021}
}

Contact

If you have any questions, please contact Wenhao Yu ([email protected])

About

Author: Wenhao Yu ([email protected]). EMNLP'21. Sentence-Permuted Paragraph Generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.7%
  • Shell 1.3%