Python Tool to request information in bulk from KEGG (Kyoto Encyclopedia of Genes and Genomes) database.
Python 3 is required.
Use the package manager pip to install virtualenv.
pip install virtualenv
Create a virtual environment to use with this tool.
virtualenv venv_OrtScraper
Activate the virtual environment.
source venv_OrtScraper/bin/activate
Install dependencies
pip install grequests
pip install bs4
Then move to the folder where the file setup.py from the OrtScraper tool is located.
cd /path/to/OrtScraper
Run the command:
python setup.py install
The tool should be ready to use.
Download all the sequences from each desired KO (KEGG Orthology) group to a FASTA file.
The input can be:
-
KEGG pathway map ID
-
List of KO IDs
-
List of KEGG Reaction IDs
-
List of EC (Enzyme commission) numbers
Note: The format of the input lists must be a txt file with only one ID per line.
Run the command with the help option to see the usage and all the available options.
download_kos -h
To test if the tool is working you can use the files contained in the examples folder.
Using the tool to download all the sequences from the KO associated with the pathway with the ID map:
download_kos -o /path/to/output/folder/ -m map map00362
Using the tool to download all the sequences from the KO's listed in the file examples/kos.txt:
download_kos -o /path/to/output/folder/ -k /path/to/OrtScraper/examples/kos.txt
Using the tool to download all the sequences from the KO's associated with the listed reactions in the file examples/reactions.txt:
download_kos -o /path/to/output/folder/ -r /path/to/OrtScraper/examples/reactions.txt
Using the tool to download all the sequences from the KO's associated with the listed enzymes in the file examples/ecs.txt:
download_kos -o /path/to/output/folder/ -e /path/to/OrtScraper/examples/ecs.txt
Note: Running this commands may take some time and memory space.
In the output folder you will find one FASTA file for each selected KO. If you use one of the -e or -r options you will have another file, associations.txt, which indicates which kos where selected for download for each reaction/EC number. In the same folder can be found a file info_db.csv that contains a table with information regarding the KO's that was selected to download, their name and also the EC numbers and Reactions IDs to which they are associated.