-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and viruses
Kareena Singh
Jitu Ram Bhargav
- To achieve the target “Which viruses are reported as being involved in causing viral epidemics?”
- For better understanding, not all viruses are infectious and may lead to an epidemic or a pandemic for that matter. There are few viruses that have been reported as being involved in a viral epidemic, whereas few are not. The goal is to find out which viruses can cause or have caused an epidemic outbreak.
- To create Dictionary on viruses from scratch. (viruses not builtin ami dictionary)
- To download a corpus of 1000 articles using
getpapers
on viruses that cause viral epidemics. - To run
ami search
for the viruses dictionary. - To perform Binary Classification of papers using
KNIME
- To do Sectioning of the papers using
ami section
- To Identify and extract entities and display the data.
- Initially the communal corpus called
epidemic50noCov
of 50 articles on viral epidemics will be created. - After analyzing the above corpus, we shall later come up with own individual corpus consisting of 950 papers. It shall be created using the virus dictionary.
- The corpus of 950 articles was created and committed here in 4 parts. https://github.com/petermr/openVirus/tree/master/miniproject/virus
- virus dictionary a test dictionary was created on human viruses to begin with. https://github.com/petermr/openVirus/blob/master/dictionaries/test/virus.xml
-
Use of
getpapers
for downloading a corpus of 950 articles from PubMedCentral. See https://github.com/petermr/openVirus/wiki/getpapers -
Use of
ami
/SPARQL withamidict
tool for creating dictionary on viruses. See https://github.com/petermr/openVirus/wiki/INSTALLING-ami3 -
Using
amisearch
for testing the virus dictionary. See https://github.com/petermr/openVirus/wiki/ami-search and https://github.com/petermr/openVirus/wiki/How-ami-search-works -
Using
ami section
to split a document in aCtree
into sections (front, body, back). See https://github.com/petermr/openVirus/wiki/ami:section -
Data analysis using
KNIME
.KNIME
allows users to visually create data flows (or pipelines), selectively execute some or all analysis steps, and later inspect the results, models, using interactive widgets and views. See https://github.com/petermr/openVirus/wiki/Tools:-KNIME -
Using
Python
,Keras
andJupyter notebook
.
For Python
see https://github.com/petermr/openVirus/wiki/Tools:-Python
Keras
is a powerful and easy-to-use free open source Python library for developing and evaluating deep learning models. It wraps the efficient numerical computation libraries Theano and TensorFlow and allows you to define and train neural network models in just a few lines of code.
The Jupyter Notebook
is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more
-
R
for summarizing the extracted information.R
is a powerful language used widely for data analysis and statistical computing.
- For displaying the extracted information, we will use excel for creating spreadsheets and other forms of data display such as histograms, timeline, pie charts and graphical representations. This is called Scoping review.
- Analysis of the 50 papers of communal corpus
epidemic50noCov
and displayed in the form of a spreadsheet. Finished 🟢 - Sectioning of the 50 papers using
ami section
Finished 🟢 - Downloaded a corpus of 950 articles on viruses using
getpapers
Finished 🟢 - Sectioning of corpus 950 using
ami section
. Finished 🟢 - Created a test dictionary (link above) with 30 entries on human viruses using a test file containing a list of names of human viruses. Using
ami dict
. Finished. 🟢 - Created Dictionary using Wikidata Query Service and SPARQL. (Finished) 🟢
- Run
ami search
on corpus 950 and recieved cooccurrence. (Finished) 🟢 - Committed the corpus 950 on GitHub. (Finished) 🟢
- Installation of Jupyter notebook as a machine learning tool. (Finished) 🟢
- Manual classification of corpus950 (Ongoing) 🔵
- Download and Install Github Desktop from here https://desktop.github.com , Log in to your github account and Clone the repository openVirus using URL
- Remember the folder of your system where you have cloned the repo. Open the folder of your miniproject and move your corpus950 files here. Go to github desktop and you can see the changes committed on your left
- Add your summary like 'added files' to miniproject and commit to master.
- Then click on Push changes and you data will be committed. ( It will take time depending on your file size)
- here https://github.com/petermr/openVirus/tree/master/miniproject/virus
- I am an MSc student and want to pursue PhD in related field.
- Helpful in understanding the current scenario in viral epidemics.
- Give an idea about accesssing the online stored data and how to use the stored information for our research.
- This project will help me in understanding the research methodology by using computational biology and bioinformatics
- To create and maintain a dictionary for viruses which are responsible for causing viral epidemic.
- To find the papers and articles that are related to viruses and viral epidemic.
- To identify the different types of viruses which causing viral epidemic around the globe.
- To collect updated data from trusted sources which are related to viruses and viral epidemic.
-
getpapers
to obtain papers -
ami
for to create and maintain dictionary -
ami search
use for testing the dictionary -
ami section
use for a document sectioning -
amidict
tool for creating dictionary
- An overall information of specific work on which an individual is working on.
- It consists of the work done till date and tells that what will be the possibilities of further research in the topic under limits or beyond limits.
- An editing, analysing platform for the processed documents and papers and also used for uploading data and create dictionaries.
- My dictionary is
virus
created from wikidata using the softwareami
.
- It consists of 950 articles which are taken from European PubMedCentral with help of
getpapers
- EuPMC is a collection of journals literature and research articles related to life sciences around the globe.
- No bugs or issues faced till now, Hoping that by communicating with the allocated mentor and members of openVirus group the problems can be solved till the completion of my four weeks programme
- I learnt about purpose and usage of
getpapers
,ami
, corpus 950 etc. - I understood how to update and edit pages on GitHub.
- I came to know that from where I can collect the articles.
- I also understood about how to download and install software from GitHub.
``