knowledge-driven-dialogue-2019-lic

2019语言与智能技术竞赛知识驱动对话 B榜第5名方案
由于线上部署对时间有要求，最终提交人工评估的版本删掉了一些全局主题特征，导致模型结果有所下降，最终人工评估第9名。A榜第四 B榜第五

Overview

For building a proactive dialogue chatbot, we used a so-called generation-reranking method. First, the generative models(Multi-Seq2Seq) produce some candidate replies. Next, the re-ranking model is responsible for performing query-answer matching, to choice a reply as informative as possible over the produced candidates. A detailed paper to describle our solution is now avaliable at https://arxiv.org/pdf/1907.03590.pdf, please check.

Data Augmentation

We used four data augmentation techniques, Entity Generalization,Knowledge Selection,Switch,Conversation Extraction to construct multiple different dataset for training Seq2Seq models. One can use the scripts Seq2Seq/preclean_*.py to with slight modification of parameters to get 6 datasets.

Seq2Seq Model

For ensemble purpose we choose different encoders and decoders, i.e. LSTM cells and the Transformer. This part is implemented based on the Open-NMT framework.

Training

python preprocess.py
python train.py

Testing

python translate.py
All the config file of training & testing can be easily modified in the config/*.yml
In total, we trained 27 Seq2Seq model for ensemble.

Answer rank

We used a GBDT regressor for ranking. One may arugue that Why not use a neural network, such as BERT for ranking. Actually We tried, but it doesn't work well.

Creating ranking dataset

python create_gbdt_dataset.py

Feature extraction

python feature_util_multiprocess.py
The feature extractions reference the Kaggle_HomeDepot by ChenglongChen

Checkpoints

It might take some extra time to upload the checkpoints because they are rather large in size.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
answer_rank		answer_rank
seq2seq		seq2seq
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

knowledge-driven-dialogue-2019-lic

Overview

Data Augmentation

Seq2Seq Model

Training

Testing

Answer rank

Creating ranking dataset

Feature extraction

Checkpoints

About

Releases

Packages

Languages

canwushuang/knowledge-driven-dialogue-lic2019

Folders and files

Latest commit

History

Repository files navigation

knowledge-driven-dialogue-2019-lic

Overview

Data Augmentation

Seq2Seq Model

Training

Testing

Answer rank

Creating ranking dataset

Feature extraction

Checkpoints

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages