GitHub - qiang2100/Chinese-Idiom-Paraphrasing

Chinese Idiom Paraphrasing

Chinese Idiom Paraphrasing (CIP), which goal is to rephrase the idioms of input sentence to generate a fluent, meaning-preserving sentence without any idiom:

Data in this dataset and several approaches:

LSTM approach
Transformer approach
mt5-seq2seq approach
mt5-infill approach
mt5-knowledge approach

Dependecies

Python>=3.6
torch>=1.7.1
transformers==4.8.0
fairseq==0.10.2

Pre-trained model

you can download all pre-trained models here, and put it intomodeldirectory.

If you want train models from scratch, you need uses the pre-trained language models mt5-base (huggingface) and place the models under the model directory after downloading.

Train

train LSTM and Transformer model by fairseq, you need process data for jieba and bpe tokenize sentence, we use scripts from Subword-nmt:

git clone https://github.com/rsennrich/subword-nmt

Then run

sh prepare.sh

train LSTM, Transformer, mt5-seq2seq, mt5-fill, mt5-knowledge model

sh train_lstm.sh
sh train_transformer.sh
sh train_t5.sh
sh train_t5_fill.sh
sh train_t5_knowledge.sh

Evaluate

Run the following command to evaluate

sh evaluate_base.sh
sh evaluate_t5.sh
sh evaluate_t5_knowledge.sh
sh evaluate_t5_fill.sh

Citation

@article{qiang2022chinese,
    title={Chinese Idiom Paraphrasing},
    author={Jipeng Qiang, Yang Li, Chaowei Zhang, Yun Li and YunHao Yuan, Yi Zhu, Xindong Wu},
    journal={},
    year={2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
fairseq		fairseq
preprocess		preprocess
result		result
rouge		rouge
README.md		README.md
evaluate_base.sh		evaluate_base.sh
evaluate_rough.py		evaluate_rough.py
evaluate_t5.py		evaluate_t5.py
evaluate_t5.sh		evaluate_t5.sh
evaluate_t5_fill.py		evaluate_t5_fill.py
evaluate_t5_fill.sh		evaluate_t5_fill.sh
evaluate_t5_knowledge.sh		evaluate_t5_knowledge.sh
idioms.txt		idioms.txt
paraphrasing.png		paraphrasing.png
prepare.sh		prepare.sh
requirements.txt		requirements.txt
train_lstm.sh		train_lstm.sh
train_t5.py		train_t5.py
train_t5.sh		train_t5.sh
train_t5_fill.py		train_t5_fill.py
train_t5_fill.sh		train_t5_fill.sh
train_t5_knowledge.sh		train_t5_knowledge.sh
train_transformer.sh		train_transformer.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese Idiom Paraphrasing

Dependecies

Pre-trained model

Train

Evaluate

Citation

About

Releases

Packages

Languages

qiang2100/Chinese-Idiom-Paraphrasing

Folders and files

Latest commit

History

Repository files navigation

Chinese Idiom Paraphrasing

Dependecies

Pre-trained model

Train

Evaluate

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages