Skip to content

[Under Progress] Code & Data for the AAAI 2020 Paper "Likelihood Ratios and Generative Classifiers For Unsupervised OOD Detection In Task-Based Dialog" - Varun Gangal, Abhinav Arora, Arash Einolghozati, Sonal Gupta

Notifications You must be signed in to change notification settings

liushui9404/LR_GC_OOD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LR_GC_OOD

Code & Data for the AAAI 2020 Paper "Likelihood Ratios and Generative Classifiers For Unsupervised OOD Detection In Task-Based Dialog"

Data:
The ROSTD dataset of OOD points can be found under data/fbrelease
This tsv file contains ~4500 OOD examples. The 3rd field of each line contains the sentence.
This is the only field which could be of interest - the other fields are vestigial and can be ignored.

Note that this OOD dataset is a companion to the ID dataset released as part of the paper "Cross-lingual transfer learning for multilingual task oriented dialog" by Schuster et al at NAACL 2019.
This ID dataset can be found in its original form here.

Alternatively, you can directly use the splits we made (with ID train, and ID-OOD mixed validation and test) as described under the "Dataset Splits" section below.

Reference:
If you find our code or data useful, please consider citing our paper:

@article{gangal2019likelihood,
  title={Likelihood Ratios and Generative Classifiers for Unsupervised Out-of-Domain Detection In Task Oriented Dialog},
  author={Gangal, Varun and Arora, Abhinav and Einolghozati, Arash and Gupta, Sonal},
  journal={arXiv preprint arXiv:1912.12800},
  year={2019}
}

Contact:
For any questions or issues, either raise an issue here or drop an email at [email protected]

Code: [Under Progress]

Refer to requirements.txt for the python package requirements For other specifications, refer to other_specifications.txt

Code Structure and TLDR:

code/util.py: Contains most of the argument specifications. Ignore arguments or argument groups with an "IGNORE" comment on top of them

code/train.py: Contains the training and inference mechanism

code/model.py: Specifices architecture for most of the models e.g Discriminative Classifier, Generative Classifier etc

code/oodmetrics.py: Code for computing the ood-related metrics such as AUROC

Please ignore code/model_gan.py and code/wasserstein.py. They are not really used much for the paper experiments, but we have just retained them to not meddle with the imports.

Dataset Splits:

  • For fbrelease and fbreleasecoarse You can directly find the ready-to-use dataset splits under code/data/{dataset_name}/unsup/ for dataset_name = fbrelease / fbreleasecoarse
    This already contains the plain id train split and the id-ood mixed dev and test splits
    Note that only the ood part of the fbrelease dev and test splits constitutes our own released data. The rest is formed from existing datasets.
  • For atis and snips You will need to run some scripts to do random splitting where a fraction of classes are held out as OOD.
    The code/data/{dataset_name}/preprocess_{dataset_name}.sh needs to be run for this. (Where dataset_name = atis/snips)

Shell Scripts:

train_for_fbrelease.sh - Commands for fbrelease i.e ROSTD with its corresponding id training set and validation sets

train_for_fbreleasecoarse.sh - Commands for fbreleasecoarse i.e ROSTD with its corresponding id training set and validation sets, but with labels coarsened.

train_for_atis.sh - Commands for atis

train_for_snips.sh - Commands for snips

Notes:

  • In all of these scripts, you will need to set super_root to point to where the repo resides on your system. We need this because we use torchtext to preprocess, create the vocabulary, load and minibatch our datasets, and we could only get it to work with absolute path specifications.

About

[Under Progress] Code & Data for the AAAI 2020 Paper "Likelihood Ratios and Generative Classifiers For Unsupervised OOD Detection In Task-Based Dialog" - Varun Gangal, Abhinav Arora, Arash Einolghozati, Sonal Gupta

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 63.1%
  • Shell 36.9%