Skip to content

Source code of Unisound-NER project. Some important information has been ignored.

License

Notifications You must be signed in to change notification settings

StevenZhaoo/Unisound-NER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

7428218 · Oct 12, 2021

History

7 Commits
Sep 13, 2021
Sep 24, 2021
Sep 8, 2021
Oct 1, 2021
Oct 12, 2021
Sep 13, 2021
Sep 13, 2021

Repository files navigation

Unisound Chinese Medical Named Entity Recognition

Author: StevenChaoo

vscode neovim git python visitors

This blog is written by Neovim and Visual Studio Code. You may need to clone this repository to your local and use Visual Studio Code to read. Markdown Preview Enhanced plugin is necessary as well. Codes are all writen with Python.

Quick links

Results

We do experiment on Ubuntu 16.04 with Intel Xeon CPU E5-2620 @3.2GHz and GeForce GTX 1080 Ti 12GB. Python and Anaconda version are 3.7.11 and 4.10.1 respectively.

Multi-label prediction:

 -------------------------------------------------
|      TOTAL      | p=81.388% r=83.226% f=82.297% |
|-----------------|-------------------------------|
|     DISEASE     | p=82.809% r=91.200% f=86.802% |
|     PURPOSE     | p=76.190% r=86.486% f=81.013% |
|   PERSON_GROUP  | p=66.667% r=45.455% f=54.054% |
|    CONDITION    | p=80.392% r=63.077% f=70.690% |
|     SUMPTOM     | p=83.784% r=71.264% f=77.019% |
| INAPPLICABILITY | p=100.00% r=100.00% f=100.00% |
 -------------------------------------------------

Disease-label prediction:

 -------------------------------------
|     DISEASE     | p=89% r=87% f=88% |
 -------------------------------------

Setup

Install dependencies

Please install all the dependency packages using the following command:

pip install -r requirements.txt

Preprocess the datasets

Please put train.txt and test.txt in /data/raw/ with formatting as followed:

...
预 B-PURPOSE
防 E-PURPOSE
和 O
治 B-PURPOSE
疗 E-PURPOSE
癌 B-CONDITAION
症 I-CONDITAION
化 I-CONDITAION
疗 E-CONDITAION
引 O
起 O
...

Directory /data/dis/ contains data with only DISEASE label:

...
儿 O
童 O
的 O
支 B-DISEASE
气 I-DISEASE
管 I-DISEASE
哮 I-DISEASE
喘 E-DISEASE
...

Quick Start

The following commands can be used to run our pre-trained model on /data/. You can also fine tune our pre-trained model with extra dataset with --mode=finetune:

python core/bert.py \
    --save \
    --mode=train \
    --pretrained_bert_path=model/roberta_wwm_ext_large \
    --saved_path=model \
    --model_path=model \
    --trainset_path="['./data/raw/train.txt']" \
    --testset_path=./data/raw/test.txt \
    --cuda=0 \
    --batch_size=2

The output files will be stored in /pred/. Please use following command to post-process data:

python core/compose.py \
    --path={OUTPUT FILE}

You may want to evaluate other results with following command:

mv {OTHER RESULT} BIEO.txt
python core/evaluate.py

Or only predict DISEASE label with light and efficient crf model by using following command:

python core/run_crf.py

About

Source code of Unisound-NER project. Some important information has been ignored.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages