Skip to content

Commit

Permalink
docs(installation): Update Bioconda install documentation and README.md.
Browse files Browse the repository at this point in the history
  • Loading branch information
aaronmussig committed Dec 1, 2022
1 parent ea7f601 commit 08ef9cd
Show file tree
Hide file tree
Showing 3 changed files with 73 additions and 46 deletions.
41 changes: 30 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,37 @@
[![Docker Image Version (latest by date)](https://img.shields.io/docker/v/ecogenomic/gtdbtk?sort=date&color=299bec&label=docker)](https://hub.docker.com/r/ecogenomic/gtdbtk)
[![Docker Pulls](https://img.shields.io/docker/pulls/ecogenomic/gtdbtk?color=299bec&label=pulls)](https://hub.docker.com/r/ecogenomic/gtdbtk)

<b>[GTDB-Tk v2.1.0](https://ecogenomics.github.io/GTDBTk/announcements.html) was released on May 11, 2022. Upgrading is recommended.</b>
<b> Please note v2.1.0+ is not compatible with GTDB-Tk package [R207_v1](https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_data.tar.gz). It is necessary to upgrade to GTDB-Tk package [R207_v2](https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_v2_data.tar.gz).</b>
<b>GTDB-Tk v2.1.0+ requires an updated reference package ([R207_v2](https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_v2_data.tar.gz)), [read more](https://ecogenomics.github.io/GTDBTk/installing/index.html#gtdb-tk-reference-data).</b>

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy ([GTDB](https://gtdb.ecogenomic.org/)). It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the [GNU General Public License (Version 3)](https://www.gnu.org/licenses/gpl-3.0.en.html).
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based
on the Genome Database Taxonomy ([GTDB](https://gtdb.ecogenomic.org/)). It is designed to work with recent advances that
allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples.
It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the
[GNU General Public License (Version 3)](https://www.gnu.org/licenses/gpl-3.0.en.html).

Notifications about GTDB-Tk releases will be available through the [GTDB Twitter](https://twitter.com/ace_gtdb) account and the [GTDB Announcements Forum](https://forum.gtdb.ecogenomic.org/c/announcements/10).
Notifications about GTDB-Tk releases will be available through the [GTDB Twitter](https://twitter.com/ace_gtdb)
account and the [GTDB Announcements Forum](https://forum.gtdb.ecogenomic.org/c/announcements/10).

Please post questions and issues related to GTDB-Tk on the Issues section of the GitHub repository. Questions related to the [GTDB](https://gtdb.ecogenomic.org/) can be posted on the [GTDB Forum](https://forum.gtdb.ecogenomic.org/) or sent to the [GTDB team](https://gtdb.ecogenomic.org/about).
Please post questions and issues related to GTDB-Tk on the Issues section of the GitHub repository. Questions
related to the [GTDB](https://gtdb.ecogenomic.org/) can be posted on the [GTDB Forum](https://forum.gtdb.ecogenomic.org/)
or sent to the [GTDB team](https://gtdb.ecogenomic.org/about).

## New Features

## 🚀 Getting started

Be sure to check the [hardware requirements](https://ecogenomics.github.io/GTDBTk/installing/index.html), then choose your preferred method:

* [Bioconda](https://ecogenomics.github.io/GTDBTk/installing/bioconda.html)
* [Docker](https://ecogenomics.github.io/GTDBTk/installing/docker.html)
* [pip](https://ecogenomics.github.io/GTDBTk/installing/pip.html)


## 📖 Documentation

Documentation for GTDB-Tk can be found [here](https://ecogenomics.github.io/GTDBTk/).


## ✨ New Features

GTDB-Tk v2.1.0 includes the following new features:
- GTDB-TK now uses a **divide-and-conquer** approach where the bacterial reference tree is split into multiple **class**-level subtrees. This reduces the memory requirements of GTDB-Tk from **320 GB** of RAM when using the full GTDB R07-RS207 reference tree to approximately **55 GB**. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the `--full-tree` flag.
Expand All @@ -26,10 +47,7 @@ This is the main change from v2.0.0. The split tree approach has been modified f
- `--write_single_copy_genes` flag in now available in the `classify_wf` and `de_novo_wf` workflows.


## Documentation
Documentation for GTDB-Tk can be found [here](https://ecogenomics.github.io/GTDBTk/).

## References
## 📚 References

GTDB-Tk is described in:

Expand All @@ -53,6 +71,7 @@ We strongly encourage you to cite the following 3rd party dependencies:
* Eddy SR. 2011. [Accelerated profile HMM searches](https://www.ncbi.nlm.nih.gov/pubmed/22039361). <i>PLOS Comp. Biol.</i>, 7:e1002195.
* Ondov BD, et al. 2016. [Mash: fast genome and metagenome distance estimation using MinHash](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x). <i>Genome Biol</i> 17, 132. doi: 10.1186/s13059-016-0997-x.

## Copyright

## © Copyright

Copyright 2017 Pierre-Alain Chaumeil. See LICENSE for further details.
37 changes: 20 additions & 17 deletions docs/src/installing/bioconda.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,32 @@
Bioconda
========

Step 1: Install anaconda (if not already done)
----------------------------------------------
Step 1: Install conda (if not already done)
-------------------------------------------

Ensure that ``conda`` on the system path. It is recommended to download `miniconda <https://docs.conda.io/en/latest/miniconda.html>`_.
We strongly recommend using `Mamba <https://mamba.readthedocs.io/en/latest/installation.html>`_ (much faster!) over `miniconda <https://docs.conda.io/en/latest/miniconda.html>`_/`conda <https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html>`_, but all will work.


Step 2: Create the GTDB-Tk environment
--------------------------------------

.. note:: It is strongly recommended to create a new GTDB-Tk environment for each version of GTDB-Tk released.
It is strongly recommended to create a new conda environment for each version of GTDB-Tk released.

GTDB-Tk package requires third-party packages from the ``conda-forge`` and ``bioconda`` channels.
.. warning:: You must always specify the version of GTDB-Tk, as conda may try to install a **very old version (v1.0.2)**.


GTDB-Tk requires third-party packages from the ``conda-forge`` and ``bioconda`` channels, make sure to
specify those channels in that order!

.. code-block:: bash
# latest version
conda create -n gtdbtk -c conda-forge -c bioconda gtdbtk
# NOTE: replace 2.1.1 with the version you wish to install
# specific version (replace 1.3.0 with the version you wish to install, recommended)
conda create -n gtdbtk-1.3.0 -c conda-forge -c bioconda gtdbtk=1.3.0
# using conda
conda create -n gtdbtk-2.1.1 -c conda-forge -c bioconda gtdbtk=2.1.1
# using mamba (alternative)
mamba create -n gtdbtk-2.1.1 -c conda-forge -c bioconda gtdbtk=2.1.1
Step 3: Download and alias the GTDB-Tk reference data
-----------------------------------------------------
Expand All @@ -35,7 +39,7 @@ containing the unarchived :ref:`installing#gtdbtk-reference-data`.
Automatically
^^^^^^^^^^^^^

The conda package is bundled with a script ``download-db.sh`` `(located here) <https://github.com/bioconda/bioconda-recipes/blob/master/recipes/gtdbtk/download-db.sh>`_
The conda package is bundled with a script ``download-db.sh`` `(source) <https://github.com/bioconda/bioconda-recipes/blob/master/recipes/gtdbtk/download-db.sh>`_
that will automatically download, and extract the GTDB-Tk reference data. The script will be on the system path so simply run:

.. code-block:: bash
Expand All @@ -47,14 +51,13 @@ that will automatically download, and extract the GTDB-Tk reference data. The sc
Manually
^^^^^^^^

You can automatically alias ``GTDBTK_DATA_PATH`` whenever the environment is activated by editing ``{gtdbtk environment path}/etc/conda/activate.d/gtdbtk.sh``, e.g.:
You can automatically alias ``GTDBTK_DATA_PATH`` whenever the environment is activated by
`setting environment-specific variables <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#setting-environment-variables>`_, e.g.:

.. code-block:: bash
# Determine the GTDB-Tk environment path
conda activate gtdbtk-1.3.0
which gtdbtk
# /miniconda3/envs/gtdbtk-1.3.0/bin/gtdbtk
# Activate the GTDB-Tk conda environment
conda activate gtdbtk-2.1.1
# Edit the activate file
echo "export GTDBTK_DATA_PATH=/path/to/release/package/" > /miniconda3/envs/gtdbtk-1.3.0/etc/conda/activate.d/gtdbtk.sh
# Set the environment variable to the directory containing the GTDB-Tk reference data
conda env config vars set GTDBTK_DATA_PATH="/path/to/unarchived/gtdbtk/data";
41 changes: 23 additions & 18 deletions docs/src/installing/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@
Installing GTDB-Tk
==================

GTDB-Tk is available through multiple sources.

If you are unsure which source to install, Bioconda is generally the easiest.
GTDB-Tk is available through multiple sources, you only need to choose one.
If you are unsure which one to choose, Bioconda is generally the easiest.


Sources
Expand All @@ -15,11 +14,11 @@ Sources
.. toctree::
:maxdepth: 1

pip
bioconda
pip
docker

Alternatively, GTDB-Tk can be run online through `KBase <https://kbase.us/applist/apps/kb_gtdbtk/run_kb_gtdbtk>`_ (third party).
Alternatively, GTDB-Tk can be run online through `KBase <https://kbase.us/applist/apps/kb_gtdbtk/run_kb_gtdbtk>`_ (third party). Note that the version may not be the most recent release.


Hardware requirements
Expand Down Expand Up @@ -127,40 +126,46 @@ GTDB-Tk requires ~66G of external data that needs to be downloaded and unarchive
tar xvzf gtdbtk_v2_data.tar.gz
Note that different versions of the GTDB release data may not run on all versions of GTDB-Tk, below are all supported versions:
.. note:: Note that different versions of the GTDB release data may not run on all versions of GTDB-Tk, check the supported versions!


.. list-table::
:widths: 10 10 10
:widths: 10 10 10 20
:header-rows: 1

* - GTDB Release
- Minimum version
- Maximum version
* - R207_v2
- MD5
* - `R207_v2 <https://data.gtdb.ecogenomic.org/releases/release207/207.0/auxillary_files/gtdbtk_r207_v2_data.tar.gz>`_
- 2.1.0
- Current
* - R207
- df468d63265e8096d8ca01244cb95f30
* - `R207 <https://data.gtdb.ecogenomic.org/releases/release207/207.0/auxillary_files/gtdbtk_r207_data.tar.gz>`_
- 2.0.0
- 2.0.0
* - R202
- b04c55104b491f84e053a9011b36164a
* - `R202 <https://data.gtdb.ecogenomic.org/releases/release202/202.0/auxillary_files/gtdbtk_r202_data.tar.gz>`_
- 1.5.0
- 1.7.0
* - R95
- 4986526c2b935fd4dcc2e604c0322517
* - `R95 <https://data.gtdb.ecogenomic.org/releases/release95/95.0/auxillary_files/gtdbtk_r95_data.tar.gz>`_
- 1.3.0
- 1.4.2
* - R89
- 06924c63f4b555ac6fd1525b09901186
* - `R89 <https://data.gtdb.ecogenomic.org/releases/release89/89.0/gtdbtk_r89_data.tar.gz>`_
- 0.3.0
- 0.1.2
* - R86.2
- 82966ef36086237d7230955e2bfff759
* - `R86.2 <https://data.gtdb.ecogenomic.org/releases/release86/86.2/gtdbtk.r86_v2_data.tar.gz>`_
- 0.2.1
- 0.2.2
* - R86
- f71408d69fa2a289f2cdc734b7a58a02
* - `R86 <https://data.gtdb.ecogenomic.org/releases/release86/86.0/gtdbtk_r86_data.tar.gz>`_
- 0.1.0
- 0.1.6
* - R83
- d019b3541746c3673181f24e666594ba
* - `R83 <https://data.gtdb.ecogenomic.org/releases/release83/83.0/gtdbtk_r83_data.tar.gz>`_
- 0.0.6
- 0.0.7


Reference data for prior releases of GTDB-Tk are available at: https://data.ace.uq.edu.au/public/gtdbtk
- 9cf523761da843b5787f591f6c5a80de

0 comments on commit 08ef9cd

Please sign in to comment.