Dataset compiled for Natural Language Processing using a corpus of medical transcriptions and custom-generated clinical stop words and vocabulary.
Clone or download files for use in medical text Natural Language Processing (NLP) experiments.
-
mtsamples.csv
. Compiled from Kaggle's medical transcriptions dataset by Tara Boyle, scraped from Transcribed Medical Transcription Sample Reports and Examples. See Kaggle repository. -
clinical-stopwords.txt
. Compiled from Dr. Kavita Ganesan clinical-concepts repository. See the Discovering Related Clinical Concepts Using Large Amounts of Clinical Notes paper. -
vocab.txt
. Generated vocabulary text files for Natural Language Processing (NLP) using the Systematized Nomenclature of Medicine International (SNMI) data. See how to Generate your own vocab file.