forked from lushl9301/PubMed-Text-Mining-Tool
-
Notifications
You must be signed in to change notification settings - Fork 0
A Simple Text Mining Tool for Analyzing Research Paper Abstracts
License
yuxiaohui78/PubMed-Text-Mining-Tool
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Project Name: A Simple Text Mining Tool for Analyzing Research Paper Abstracts Description: This project is a text mining tool using search results from National Center for Biotechnology Information's database (http://www.ncbi.nlm.nih.gov/pubmed). It uses Perl and Python for text processing and statistic analysis. Modules and files (not all): pubmed_result.txt -- results downloaded from NCBI PubMed preProcess.pl -- take pubmed_result.txt as input; make it easy for later process myFormat.txt -- generated by preProcess.pl stem.pl -- take myFormat.txt as input; stem each word in every sentence stemDict.txt -- stemmed words and their corresponding original words generated by stem.pl stemmedSentence.txt -- stemmed words in sentences; generated by stem.pl selectSentence.pl -- take stemmedSentence.txt as input; take stemKeyword.pl as sub-module; handle all stemmed sentences and select those contains given keywords; if no keywords is provided, take myFormat.txt as result instead. stemKeyword.pl -- take keywords.txt as input; stem the keywords keywords.txt -- keywords provided by user stemFunction.pl -- core stem function; Porter stemmer dict.py -- take stemDict.txt as input; eliminate stop words and proceed simple statistic static_words.txt -- stemmed words and their frequencies; generate by dict.py pmidList.txt -- pmid list file; generated by selectSentence.pl htmlGenerator.py -- use pmidList.txt to generate a simple webpage for easy database access PMIDList.html -- simple webpage contains PMID, hyperlinks and titles HOWTO: 1. Make a search on http://www.ncbi.nlm.nih.gov/pubmed. 2. Press "Send to" on the right top of page and select "File" & "MEDLINE". Press "Create File" 3. Put this file "pubmed_result.txt" into the same directory as these codes. 4. cd to current directory and type make<RETURN> in the command line. 5. type make<SPACE>html<RETURN> in the command line to generate PMIDList.html Installation (Ubuntu as example): #install perl, python and make. #you can install build-essential too. $sudo apt-get install perl python make #install CPAN for perl modules $sudo perl -MCPAN -e shell #press <RETURN> until the installation is finished $sudo cpan cpan[1]> install Lingua:EN:Sentence cpan[2]> install Unicode:Normalize #quit cpan shell cpan[3]> exit #DONE LICENSE: See LICENSE
About
A Simple Text Mining Tool for Analyzing Research Paper Abstracts
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Perl 54.4%
- Python 29.7%
- Makefile 15.9%