Skip to content

Automatically exported from code.google.com/p/disease-crawler

Notifications You must be signed in to change notification settings

narzeja/disease-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

==Abstract==

In this paper we design and construct models for assisting physicians with the task of diagnosing rare diseases. Using a prior knowledge of rare diseases consisting of 'disease name' and 'abstract', we utilize the 'Google Search Engine' to harvest additional knowledge of 3882 rare diseases to expand the model. Using various techniques, ranging from data set noise reduction, to Machine Learning and Natural Language Processing are applied in order to construct different models and subsequently compare them in terms of prediction precision.

Results: The most successful approach (Orphanet Abstracts+Noise Reduced googled data with TFIDF modelling) places 86% (37 out of 43) of the diseases within the top 5 predicted results and 95% (41 out of 43) within top 20. The model is based on harvested information and prior information (abstracts from Orphanet), and is implemented using a basic 'Term Frequency - Inverse Document Frequency' model.

About

Automatically exported from code.google.com/p/disease-crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published