Skip to content

mo-arvan/clinical-nlp

Repository files navigation

Clinical NLP

% Transfer Learning is not a Silver Bullet: A Case Study on Medical Relation Extraction

Datasets

i2b2 2010

This dataset provides a corpus of assertions in clinical discharge summaries. The task is split into six classes, namely present, possible, absent, hypothetical, conditional and associated with someone else. However, the distribution is highly skewed, such that only 6% of the assertions belong to the latter three classes. Hence we only use the present, possible, and absent assertions for our evaluation as they present the most important information for doctors.

From [1].

BioScope

This is a corpus of assertions in biomedical publications. It was specifically curated for the study of negation and speculation (or absent and possible in this paper) scope and does not contain present annotations. The BioScope dataset does not completely match the information need of health professionals and the i2b2 corpus lacks varied medical text types.

From [1].

MIMIC-III

provides texts from discharge summaries as well as other clinical notes (physician letters, nurse letters, and radiology reports) representing a promising source of varied medical text. Therefore, two annotators followed the annotation guidelines from the i2b2 challenge, and labelled 5,000 assertions, i.e. word spans of entities and their corresponding present / possible / absent class.

From [1].

*sem2012 - Sherlock

Taken from stories by Sir Author Conan Doyle (literary work)

SFU Review Corpus

A collection of product reviews (free text by human users)

References

1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published