GitHub - narzeja/disease-crawler: Automatically exported from code.google.com/p/disease-crawler

==Abstract==

In this paper we design and construct models for assisting physicians with the task of diagnosing rare diseases. Using a prior knowledge of rare diseases consisting of 'disease name' and 'abstract', we utilize the 'Google Search Engine' to harvest additional knowledge of 3882 rare diseases to expand the model. Using various techniques, ranging from data set noise reduction, to Machine Learning and Natural Language Processing are applied in order to construct different models and subsequently compare them in terms of prediction precision.

Results: The most successful approach (Orphanet Abstracts+Noise Reduced googled data with TFIDF modelling) places 86% (37 out of 43) of the diseases within the top 5 predicted results and 95% (41 out of 43) within top 20. The model is based on harvested information and prior information (abstracts from Orphanet), and is implemented using a basic 'Term Frequency - Inverse Document Frequency' model.

Name		Name	Last commit message	Last commit date
Latest commit History 557 Commits
OrphanetData		OrphanetData
hcluster-0.2.0		hcluster-0.2.0
report		report
testdata1		testdata1
BaseCrawler.py		BaseCrawler.py
CrawlerController.py		CrawlerController.py
CrawlerInterface.py		CrawlerInterface.py
DiseaseListCrawler.py		DiseaseListCrawler.py
FeatureExtractor.py		FeatureExtractor.py
GCrawler.py		GCrawler.py
IOmodule.py		IOmodule.py
InitialCrawler.py		InitialCrawler.py
KeywordCrawler.py		KeywordCrawler.py
MainProgram.py		MainProgram.py
NLPminer.py		NLPminer.py
OrphanetCrawler.py		OrphanetCrawler.py
PreliminaryTests.py		PreliminaryTests.py
README.md		README.md
RareDiseases.py		RareDiseases.py
SearchGoogle.py		SearchGoogle.py
SymptomListCrawler.py		SymptomListCrawler.py
TextCleaner.py		TextCleaner.py
TextmineThis.py		TextmineThis.py
TextmineThis_backup.py		TextmineThis_backup.py
TextmineThis_symptoms.py		TextmineThis_symptoms.py
TheMatrix.py		TheMatrix.py
Wikipedia.py		Wikipedia.py
WrongdiagnosisCrawler.py		WrongdiagnosisCrawler.py
data.db		data.db
db.db.tar.bz2		db.db.tar.bz2
db.py		db.py
db_2100ish.db		db_2100ish.db
everything.py		everything.py
icd10_term_extractor.py		icd10_term_extractor.py
knn.py		knn.py
orphanet_urls.pcl		orphanet_urls.pcl
pod.py		pod.py
symptoms.dict		symptoms.dict
symptoms.list		symptoms.list
tagger.pkl		tagger.pkl
test.py		test.py
testicd.py		testicd.py
testmodule.py		testmodule.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

narzeja/disease-crawler

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages