Natural Language Processing for Extraction of Phenotypes for Inherited Retinal Disease from Electronic Health Records
Moorfields Eye Hospital (MEH) unstructured free-text EHR data
MIMIC-III unstructured free-text EHR data
Using CogStack SemEHR to identify eye disease phenotype.
https://github.com/CogStack/CogStack-SemEHR
Use binary classification for internal and external validation to determine whether the mentions identified by CogStack-SemEHR is true or not using BERT model.
- Preprocess of the dataset
MIMIC-III: https://github.com/pontikos-lab/NLP_EyeDisease/blob/main/MIMIC-III_preprocessing.ipynb
MEH: https://github.com/pontikos-lab/NLP_EyeDisease/blob/main/MEH_preprocessing.ipynb - Internal validation of BERT
MIMIC-III: https://github.com/pontikos-lab/NLP_EyeDisease/blob/main/MIMIC-III_BERT.ipynb
MEH: https://github.com/pontikos-lab/NLP_EyeDisease/blob/main/MEH_BERT.ipynb - External validation of BERT
MIMIC-III use MEH for validation: https://github.com/pontikos-lab/NLP_EyeDisease/blob/main/MIMIC-III_bert_MEH_validation.ipynb