Unisound Chinese Medical Named Entity Recognition

Author: StevenChaoo

This blog is written by Neovim and Visual Studio Code. You may need to clone this repository to your local and use Visual Studio Code to read. Markdown Preview Enhanced plugin is necessary as well. Codes are all writen with Python.

Quick links

Results
Setup
- Install dependencies
- Preprocess the datasets
Quick Start

Results

We do experiment on Ubuntu 16.04 with Intel Xeon CPU E5-2620 @3.2GHz and GeForce GTX 1080 Ti 12GB. Python and Anaconda version are 3.7.11 and 4.10.1 respectively.

Multi-label prediction:

 -------------------------------------------------
|      TOTAL      | p=81.388% r=83.226% f=82.297% |
|-----------------|-------------------------------|
|     DISEASE     | p=82.809% r=91.200% f=86.802% |
|     PURPOSE     | p=76.190% r=86.486% f=81.013% |
|   PERSON_GROUP  | p=66.667% r=45.455% f=54.054% |
|    CONDITION    | p=80.392% r=63.077% f=70.690% |
|     SUMPTOM     | p=83.784% r=71.264% f=77.019% |
| INAPPLICABILITY | p=100.00% r=100.00% f=100.00% |
 -------------------------------------------------

Disease-label prediction:

 -------------------------------------
|     DISEASE     | p=89% r=87% f=88% |
 -------------------------------------

Setup

Install dependencies

Please install all the dependency packages using the following command:

pip install -r requirements.txt

Preprocess the datasets

Please put train.txt and test.txt in /data/raw/ with formatting as followed:

...
预 B-PURPOSE
防 E-PURPOSE
和 O
治 B-PURPOSE
疗 E-PURPOSE
癌 B-CONDITAION
症 I-CONDITAION
化 I-CONDITAION
疗 E-CONDITAION
引 O
起 O
...

Directory /data/dis/ contains data with only DISEASE label:

...
儿 O
童 O
的 O
支 B-DISEASE
气 I-DISEASE
管 I-DISEASE
哮 I-DISEASE
喘 E-DISEASE
...

Quick Start

The following commands can be used to run our pre-trained model on /data/. You can also fine tune our pre-trained model with extra dataset with --mode=finetune:

python core/bert.py \
    --save \
    --mode=train \
    --pretrained_bert_path=model/roberta_wwm_ext_large \
    --saved_path=model \
    --model_path=model \
    --trainset_path="['./data/raw/train.txt']" \
    --testset_path=./data/raw/test.txt \
    --cuda=0 \
    --batch_size=2

The output files will be stored in /pred/. Please use following command to post-process data:

python core/compose.py \
    --path={OUTPUT FILE}

You may want to evaluate other results with following command:

mv {OTHER RESULT} BIEO.txt
python core/evaluate.py

Or only predict DISEASE label with light and efficient crf model by using following command:

python core/run_crf.py

Name	Name	Last commit message	Last commit date
Latest commit StevenZhaoo Update README.md Oct 12, 2021 7428218 · Oct 12, 2021 History 7 Commits
core	core	2021-09-13	Sep 13, 2021
.DS_Store	.DS_Store	Update README.md	Sep 24, 2021
.gitignore	.gitignore	2021-09-08	Sep 8, 2021
LICENSE	LICENSE	Create LICENSE	Oct 1, 2021
README.md	README.md	Update README.md	Oct 12, 2021
requirements.txt	requirements.txt	2021-09-13	Sep 13, 2021
results.txt	results.txt	2021-09-13	Sep 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unisound Chinese Medical Named Entity Recognition

Quick links

Results

Setup

Install dependencies

Preprocess the datasets

Quick Start

About

Releases

Packages

Languages

License

StevenZhaoo/Unisound-NER

Folders and files

Latest commit

History

Repository files navigation

Unisound Chinese Medical Named Entity Recognition

Quick links

Results

Setup

Install dependencies

Preprocess the datasets

Quick Start

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages