A dataset survey about task-oriented dialogue, including information about recent datasets.
Name | Introduction | Multi/Single Turn | Task | Task Detail | Public Accessible | Links | Size & Stats | Included Label | Missing Label |
---|---|---|---|---|---|---|---|---|---|
Dialog bAbI tasks data | 1. Facebook's 6 task-oriented dialogues data set consist of 6 different tasks.2. Dataset for task 1-5 is constucted automaticly from bots' chat(Bot2Bot). And dataset for task 6 is simply reformated dstc2 dataset.3. A Shared database is included.4. This is the only task-oriented dataset among bAbI tasks.5. The goal of it is to evaluate end2end tasks, so there is not intents and slots. | M | Task Oriented | Book a table at a restaurant | Yes | Download:https://research.fb.com/downloads/babi/Paper:http://arxiv.org/abs/1605.07683 | For each task, training 1000 develop 1000test 1000 For tasks 1-5, second test set (with suffix -OOV.txt) that contains dialogs including entities not present. | API callFull Database | SlotIntentUser ActAgent Act |
Stanford Dialog Dataset | 1. Standford NLP group's data of car autopilot agent.2. Human2Human3. A quick intro http://m.sohu.com/n/499803391/ | M | Task Oriented | car autopilot agent: schedule, weather, navigation | Yes | Download:http://nlp.stanford.edu/projects/kvret/kvret_dataset_public.zipPaper:https://arxiv.org/abs/1705.05414 | Training Dialogues 2,425Validation Dialogues 302Test Dialogues 304Avg. # of Utterances Per Dialogue 5.25 | Dialogue level databaseUser Act(inform, request slots)Agent Act(inform, request slots) | API callIntentSlot |
Stanford Dialog Dataset Labeled | 1. Stanford data labeled by us, relabel slot & intent2. Human2Human3. A quick intro http://m.sohu.com/n/499803391/ to stanford data4. Annotation handbook: https://docs.google.com/document/d/1ROARKf8AJNnG2_nPINe1Xm5Rza7V0jPnQV8io09hcFY/edit | M | Task Oriented | car autopilot agent: schedule, weather, navigation | No | N/A | Training Dialogues 2,425Validation Dialogues 302Test Dialogues 304Avg. # of Utterances Per Dialogue 5.25 | SlotIntent | API callNeed to do sample alignment to get the following:Dialogue level databaseUser Act(inform, request slots)Agent Act(inform, request slots)Agent Reply |
灵犀数据 | 1. The data is all single round user input divided into good words. There is more noise.2. Completed part of speech tagging and slot labeling3. Language: Chinese | S | Task Oriented | conversational robot service user log | No | N/A | Utterance: 5132 | SlotPOS | Agent replyIntentAPI callDatabase |
DSTC-2 | 1. Human2Bot restaurant booking dataset2. For usage refer to: http://camdial.org/~mh521/dstc/downloads/handbook.pdf3. Each dialofue is stored in different folder, which contains log and label. | M | Task Oriented | Booking restautant | Yes | http://camdial.org/~mh521/dstc/ | Train 1612 callsDev 506 callsTest 1117 dialogs | SlotUser Act(inform, request slots)Agent Act(inform, request slots) | IntentAPI callDatabase |
CamRest676 | CamRest676 Human2Human dataset contains the following three json files:1. CamRest676.json: the woz dialogue dataset, which contains the conversion from users and wizards, as well as a set of coarse labels for each user turn.2. CamRestDB.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes.3. The ontology file, specific all the values the three informable slots can take. | M | Task Oriented | Booking restautant | Yes | Download:https://www.repository.cam.ac.uk/handle/1810/260970Paper:https://arxiv.org/abs/1604.04562 | Total 676 DialoguesTotal 1500 TurnsTrain:Dev:Test 3:1:1 (Test set not given) | SlotUser Act(inform, request slots)Agent Act(inform, request slots) | IntentAPI callDatabase |
Human-human goal oriented dataset | 1. Maluuba reased a travel booking dataset2. Design for new task: frame tracking (allow comparing between history entities)3. Homepage: https://datasets.maluuba.com/Frames4. Human2Human | M | Task Oriented | Travel Booking | Yes | Download:https://datasets.maluuba.com/Frames/dlPaper:https://arxiv.org/abs/1706.01690https://1drv.ms/b/s!Aqj1OvgfsHB7dsg42yp2BzDUK6U | Dialogues 1369Turns 19986Average user satisfaction (from 1-5) 4.58 | FrameUser agendaUser Act(inform, request slots)Agent Act(inform, request slots)API CallUser's satisfactionTask successfulDatabaseEntity reference | Intent |
DSTC4 | 1. Data name as TourSG consists of 35 dialog sessions on touristic information for Singapore collected from Skype calls between three tour guides and 35 tourists2. All the recorded dialogs with the total length of 21 hours have been manually transcribed and annotated with speech act and semantic labels for each turn level.3. Homepage: http://www.colips.org/workshop/dstc4/data.html4. Human2Human | M | Task Oriented | Querry touristic information | No | N/A | Train 20 dialogsTest 15 dialogs | speech act (User & Agent)semantic labels(Intent? User & Agent)topic for turn (Intent?) | N/A |
Movie Booking Dataset | 1. (Microsoft) Raw conversational data collected via Amazon Mechanical Turk, with annotations provided by domain experts.2. Human2Human | M | Task Oriented | Booking Movie | Yes | Download:https://github.com/MiuLab/TC-Bot#dataPaper:TC-bot | 280 dialoguesturns per dialogue is approximately 11 | User Act(inform, request slots)Agent Act(inform, request slots)IntentSlots | DatabaseAPI-call |
Microsoft Dialogue Challenge | human-annotated conversational data in three domains (movie-ticket booking, restaurant reservation, and taxi booking), as well as an experiment platform with built-in simulators in each domain, for training and evaluation purposes. | M | Task Oriented | Movie-Ticket BookingRestaurant ReservationTaxi Ordering | Yes | Paper:https://arxiv.org/pdf/1807.11125.pdf | Task Intents Slots DialoguesMovie-Ticket Booking 11 29 2890Restaurant Reservation 11 30 4103Taxi Ordering 11 29 3094 | IntentSlots | DatabaseAPI-call |
MultiWOZ | EMNLP 2018 best paper, not release yet. | N/A | Task Oriented | N/A | N/A | Paper:MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling(Not Released Yet) | N/A | N/A | N/A |