Skip to content

Python tool to request bulk data from KEGG database.

License

Notifications You must be signed in to change notification settings

mdsufz/OrtScraper-1

 
 

Repository files navigation

OrtScraper

Description

Python Tool to request information in bulk from KEGG (Kyoto Encyclopedia of Genes and Genomes) database.

Installation

Python 3 is required.

Use the package manager pip to install virtualenv.

pip install virtualenv

Create a virtual environment to use with this tool.

virtualenv venv_OrtScraper

Activate the virtual environment.

source venv_OrtScraper/bin/activate

Install dependencies

pip install grequests
pip install bs4

Then move to the folder where the file setup.py from the OrtScraper tool is located.

cd /path/to/OrtScraper

Run the command:

python setup.py install

The tool should be ready to use.

Usage

download_kos

Download all the sequences from each desired KO (KEGG Orthology) group to a FASTA file.

The input can be:

  • KEGG pathway map ID

  • List of KO IDs

  • List of KEGG Reaction IDs

  • List of EC (Enzyme commission) numbers

Note: The format of the input lists must be a txt file with only one ID per line.

Run the command with the help option to see the usage and all the available options.

download_kos -h

To test if the tool is working you can use the files contained in the examples folder.

Using the tool to download all the sequences from the KO associated with the pathway with the ID map:

download_kos -o /path/to/output/folder/ -m map map00362

Using the tool to download all the sequences from the KO's listed in the file examples/kos.txt:

download_kos -o /path/to/output/folder/ -k /path/to/OrtScraper/examples/kos.txt

Using the tool to download all the sequences from the KO's associated with the listed reactions in the file examples/reactions.txt:

download_kos -o /path/to/output/folder/ -r /path/to/OrtScraper/examples/reactions.txt

Using the tool to download all the sequences from the KO's associated with the listed enzymes in the file examples/ecs.txt:

download_kos -o /path/to/output/folder/ -e /path/to/OrtScraper/examples/ecs.txt

Note: Running this commands may take some time and memory space.

Output

In the output folder you will find one FASTA file for each selected KO. If you use one of the -e or -r options you will have another file, associations.txt, which indicates which kos where selected for download for each reaction/EC number. In the same folder can be found a file info_db.csv that contains a table with information regarding the KO's that was selected to download, their name and also the EC numbers and Reactions IDs to which they are associated.

About

Python tool to request bulk data from KEGG database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%