This is the Mechanical Turk task for evaluating models trained on the Wizard of Wikipedia task.
As an example, we have one of the pre-trained models loaded inside the task. Please edit config
in run.py
to swap out the model for one of yours.
In order to run the task with two humans speaking to each other, run with the flag --human-eval True
.