Skip to content

deepaknlp/MMQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

MMQA Dataset

Multi-domain English-Hindi Question Answering Datset.

The dataset can be downloaded from here.

Details

This multilingual QA dataset is created from the comparable documents of six different domains (Tourism, History, Geography, Environment, Diseases and Economics). Our resources are divided into three sub-resources, which are as follows:

  • Multi-domain Multi-lingual Question-Answer (MMQA) : This dataset contains the question-answer pair in English and Hindi language. The filename “QA_Pairs.tsv” contains the said dataset in tab-separated format. This dataset contains the 5495 question-answer pairs (see our paper for details).
  • Question classification dataset: The question classification dataset comprising of 1,022 questions in English associated with their coarse and fine class label. The file “Question_Classification_Data.tsv” contains the said dataset in tab-separated format.
  • Comparable Corpora: This dataset contains the 500 comparable documents in English and Hindi. The folder name “Comparable Corpora” contains the said dataset.

Reference

If you are using this resource then please cite our paper:

Gupta, Deepak, et al. "MMQA: A Multi-domain Multi-lingual Question-Answering Framework for English and Hindi." Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018). 2018.

@InProceedings{GUPTA18.826,
author = {Deepak Gupta and Surabhi Kumari and Asif Ekbal and Pushpak Bhattacharyya},
title = "{MMQA: A Multi-domain Multi-lingual Question-Answering Framework for English and Hindi}",
booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation
(LREC 2018)},
year = {2018},
month = {May 7-12, 2018},
address = {Miyazaki, Japan},
editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck
and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo
and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
publisher = {European Language Resources Association (ELRA)},
isbn = {979-10-95546-00-9},
languag e = {english}
}

License

The MMQA dataset is distributed under the CC BY-NC-SA license.

About

MMQA Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published