Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshghi, and Oliver Lemon) [SemDial 2018 paper] [Slides]
- Set up the environment (below are steps for Conda):
$ cd code-directory
$ git submodule update --init
$ conda create -n multitask_disfluency python=2.7
$ conda activate multitask_disfluency
$ pip install -r requirements.txt
- Preprocess the Switchboard dataset for training:
$ python make_deep_disfluency_dataset.py swbd disfluency
- Train the model:
$ python train.py swbd model
- Get the bAbI tools and install requirements
- Download bAbI dialog tasks into the
babi_tools
folder - Run
sh make_generalization_study_datasets.sh <RESULT_FOLDER>
- Run
sh tag_dataset.sh <RESULT_FOLDER> <config_file_name>
for every config in2018_generalization_study_configs
- The resulting datasets are
<RESULT_FOLDER>/<BABI_DATASET_NAME>/*.tagged.json