Skip to content

Latest commit

 

History

History
 
 

templates

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
language bigbio_language license bigbio_license_shortname multilinguality pretty_name homepage bigbio_pubmed bigbio_public bigbio_tasks paperswithcode_id
en [This needs to be a supported huggingface language code]
English
apache-2.0 [this shoudl be a supported huggingface license]
APACHE_2p0
monolingual
SciTail
false
true
TEXTUAL_ENTAILMENT
scitail

Dataset Card for SciTail

Dataset Description

[This can be equal to the _DESCRIPTION attribute of the dataset you are implementing] The SciTail dataset is an entailment dataset created from multiple-choice science exams and web sentences. Each question and the correct answer choice are converted into an assertive statement to form the hypothesis. We use information retrieval to obtain relevant text from a large text corpus of web sentences, and use these sentences as a premise P. We crowd source the annotation of such premise-hypothesis pair as supports (entails) or not (neutral), in order to create the SciTail dataset. The dataset contains 27,026 examples with 10,101 examples with entails label and 16,925 examples with neutral label.

Citation Information

@inproceedings{scitail,
    author = {Tushar Khot and Ashish Sabharwal and Peter Clark},
    booktitle = {AAAI}
    title = {SciTail: A Textual Entailment Dataset from Science Question Answering},
    year = {2018}