Name		Name	Last commit message	Last commit date
parent directory ..
data/catcher		data/catcher
eval		eval
scripts		scripts
src		src
README.md		README.md
requirements.txt		requirements.txt

README.md

Training Tagger and Generator

This repository contains the code to train the tagger and generator modules.
Apart from scripts to train the modules, it also has scripts needed to run inference on the test set and to run evaluation for metrics like BLEU, ROUGE, and METEOR.

Background

Both tagger and generator are seq2seq models that require parallel data generated by the data prep module.
The parallel datasets are:
- Tagger: entagged_parallel.{split}.en → entagged_parallel.{split}.tagged
- Generated: engenerated_parallel.{split}.en → engenerated_parallel.{split}.generated (where {split} is either train, test, or dev.)

Prepare BPE

bash scripts/prepare_bpe.sh [tagged|generated] {base_folder}

Where:

base_folder: The folder in which the data files are stored (argument used in creation of training data)

Train Tagger and Generator

bash scripts/train_tagger.sh tagged {handle} {base_folder}

Where:

handle: This is an identifier used to bucketize models trained on different datasets. Models on each handle are stored seperate folders with names indexed by {handle}, within the {models} directory.
base_folder: The folder in which the data files are stored (argument used in creation of training data).

Train Generator

bash scripts/train_generator.sh generated {handle} {base_folder}

Where:

handle: This is an identifier used to bucketize models trained on different datasets. Models on each handle are stored seperate folders with names indexed by {handle}, within the {models} directory.
base_folder: The folder in which the data files are stored (argument used in creation of training data).

Inference

bash scripts/inference.sh {input_file} {jobname}\
                          tagged generated\
                          {handle}\
                          {style_0_label} {style_1_label}\
                          {base_folder} {device}

Where:

input_file: The input test file which needs to be transferred. This is the raw text file, with one sentence per line.
jobname: A unique identifier for the inference job.
handle: dataset argument we pass when we train tagger or generator -- used to identify model paths for tagger and generator.
style_0_label: A label for style 0
style_1_label: A label for style 1
base_folder: The folder in which the data files are stored (argument used in creation of training data)
device: gpu id

Evaluation

bash run_context_eval.sh {hypothesis_filepath} {reference_filepath}

Where:

hypothesis_filepath: The full path to the transferred output from trained model (hypothesis).
reference_filepath: The full path to the ideal output (for BLEU-r) or the original input file (for BLEU-s).

Trained Models

The trained models can be found here.

References

The code for evaluation has been partially borrowed from https://github.com/Maluuba/nlg-eval
Most of the code for the training pipeline has been borrowed from https://github.com/pmichel31415/jsalt-2019-mt-tutorial

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tag-and-generate-train

tag-and-generate-train

README.md

Training Tagger and Generator

Background

Prepare BPE

Train Tagger and Generator

Train Generator

Inference

Evaluation

Trained Models

References

Files

tag-and-generate-train

Directory actions

More options

Directory actions

More options

Latest commit

History

tag-and-generate-train

Folders and files

parent directory

README.md

Training Tagger and Generator

Background

Prepare BPE

Train Tagger and Generator

Train Generator

Inference

Evaluation

Trained Models

References