Skip to content
/ cas2iob Public
forked from UB-Mannheim/cas2iob

A converter of UIMA CAS XMI files exported from INCEpTION into IOB TSV files with nested NER/NEL tags and components

License

Notifications You must be signed in to change notification settings

tsmdt/cas2iob

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAS2IOB

PyPI version

CAS2IOB is a converter of UIMA CAS XMI files exported from the INCEpTION annotation platform into IOB TSV files. In contrast to the internal convertor in INCEpTION, it handles the nested NER tags, NEL tags and components, and saves them into multiple columns of a TSV-file:

TOKEN  NE-COARSE   NE-FINE NE-FINE-COMP    NE-NESTED   NEL-WikidataQID

It reads the UIMA CAS XMI files using dkpro-cassis library.

Table of contents

Installation

pip install cas2iob

Using as a library

Import cas2iob:

import cas2iob

Convert ./input.xmi with ./TypeSystem.xml into ./output.tsv:

cas2iob.file('./input.xmi', 'output.tsv')

Convert all files in ./input folder with ./TypeSystem.xml into ./output folder:

cas2iob.folder('./input', './output')

If ./TypeSystem.xml is located in a different folder, add it to the commands above as the third argument.

If you don't want to include column names in a TSV-file, add the forth argument metadata=False.

Using in CLI

% cas2iob --help
                                                                                
 Usage: cas2iob [OPTIONS] INPUT_PATH OUTPUT_PATH [TYPESYSTEM_XML] [METADATA]    
                                                                                
╭─ Arguments ──────────────────────────────────────────────────────────────────╮
│ *    input_path          PATH              [default: None] [required]        │
│ *    output_path         PATH              [default: None] [required]        │
│      typesystem_xml      [TYPESYSTEM_XML]  [default: ./TypeSystem.xml]       │
│      metadata            [METADATA]        [default: True]                   │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.      │
│ --show-completion             Show completion for the current shell, to copy │
│                               it or customize the installation.              │
│ --help                        Show this message and exit.                    │
╰──────────────────────────────────────────────────────────────────────────────╯

Archived code

Shigapov, Renat. (2023). CAS2IOB: A converter of UIMA CAS XMI files with nested NER tags, NEL tags and components into IOB TSV files. Zenodo. https://doi.org/10.5281/zenodo.8420111

About

A converter of UIMA CAS XMI files exported from INCEpTION into IOB TSV files with nested NER/NEL tags and components

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%