HOIP dataset

Dataset for our BioNLP'24 paper titled "Mention-Agnostic Information Extraction for Ontological Annotation of Biomedical Articles".

Named entities are typically assumed to appear explicitly in text (such textual instances are called mentions), and entity features are derived based on the mentions. Mentions are strong indicators in information extraction tasks, since they directly indicate how entities are described in text. However, in real-world scenarios, important entities sometimes appear only implicitly.

To accelerate the research on mention-agnostic information extraction, we introduce HOIP dataset, a new biomedical dataset constructed based on Homeostasis Imbalance Process Ontology (HOIP), which focuses on understanding the COVID-19 infectious mechanism (courses).

HOIP dataset consists of passages (plain text) extracted from PubMed and Wikipedia articles describing biomedical processes in the context of COVID-19 infectious courses. Each passage is a brief portion of an article that describes at least two specific processes.
HOIP dataset annotates both entities and relation triples, (head entity, relation, tail entity).
HOIP dataset requires the capability to infer about entities and relations between them that are not explicitly described, using background knowledge.

The following figure shows an example in the HOIP dataset along with the approach proposed in our paper.

For the details of the dataset, please see our paper.

HOIP ontology is also available from the NCBO BioPortal ontology repository site (https://bioportal.bioontology.org/ontologies/HOIP) and GitHub website (https://github.com/yuki-yamagata/hoip).

Directory structure

.
|-- README.md
|-- LICENSE
|-- releases/ # dataset
|   |-- v1/
|       |-- train.json
|       |-- dev.json
|       |-- test.json
|       |--- hoip_ontology.json
|-- construction/ # source codes to generate the dataset
|-- docs/ # our paper and some figures

Citation

If you use the dataset, please cite this paper:

@inproceedings{khettari-etal-2024-mention,
    title={Mention-Agnostic Information Extraction for Ontological Annotation of Biomedical Articles},
    author={
        Khettari, Oumaima El and
        Nishida, Noriki and
        Liu, Shanshan and
        Munne, Rumana Ferdous and
        Yamagata, Yuki and
        Quiniou, Solen and
        Chaffron, Samuel and
        Matsumoto, Yuji
    },
    booktitle={The 23rd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks},
    month={August},
    year={2024},
    publisher={Association for Computational Linguistics},
    url={},
    doi={}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HOIP dataset

Directory structure

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
construction		construction
docs		docs
examples		examples
releases		releases
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

norikinishida/hoip-dataset

Folders and files

Latest commit

History

Repository files navigation

HOIP dataset

Directory structure

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages