Skip to content

Latest commit

 

History

History

formatconversion

files related to converting between different corpus formats

Conversion tools by our group

without annotations (input format)

CORD-19 json to BioBERT input format

https://github.com/AnttonLA/BINP37/tree/master/dataset_generation

with annotations (output format)

BioC xml to BioC json and vice versa

https://github.com/Aitslab/BioNLP/blob/master/formatconversion/BioCxml-BioCjson.ipynb

BioBERT output format to PubAnnotation format

https://github.com/AnttonLA/BINP37/tree/master/output_generation

additional converters

https://github.com/Aitslab/BioNLP/tree/master/antton/formatting

Conversion tools by others

BioC convert

http://sourceforge.net/projects/bioc/files/BioCconvert-0.1.tar.gz/download

Comment: so far we have not been able to get this to work

Brat2BioC

https://bitbucket.org/nicta_biomed/brat2bioc

DKPro-core

converts many formats including CoNLL and Pubannotation

https://github.com/dkpro/dkpro-core

Inception

https://github.com/inception-project

Standoff2conll

converts brat standoff format to CoNLL format

https://github.com/spyysalo/standoff2conll

Format documentation and example files

Comparison of Pubtator, BioC and PubAnnotation formats

https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/Format.html

BioC (xml)

http://bioc.sourceforge.net/

Reference: https://academic.oup.com/database/article/doi/10.1093/database/bat064/341301

BioNLP Shared task format = BRAT format (text) (not needed by us for now)

http://2013.bionlp-st.org/file-formats

http://2011.bionlp-st.org/home/file-formats

http://www.nactem.ac.uk/tsujii/GENIA/SharedTask/detail.shtml#format

http://brat.nlplab.org/standoff.html

https://github.com/nlplab/brat/wiki/Annotation-Data-Format

CORD-19 (json)

https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge#json_schema.txt

PubAnnotation (json)

http://www.pubannotation.org/docs/annotation-format/

Reference: https://dl.acm.org/doi/10.5555/2391123.2391150

Pubtator (text)

https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/tutorial/

Reference: https://academic.oup.com/nar/article/41/W1/W518/1105731 and https://academic.oup.com/database/article/doi/10.1093/database/bas041/438535

Example files:

https://github.com/chanzuckerberg/MedMentions

Universal Dependencies = CoNLL-U

https://universaldependencies.org/format.html