Skip to content

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.

License

Notifications You must be signed in to change notification settings

Ecogenomics/GTDBTk

Repository files navigation

GTDB-Tk

PyPI PyPI Downloads Bioconda BioConda Downloads Docker Image Version (latest by date) Docker Pulls

GTDB-Tk v2.1.0+ requires an updated reference package (R207_v2), read more.

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy (GTDB). It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the GNU General Public License (Version 3).

Notifications about GTDB-Tk releases will be available through the GTDB Twitter account and the GTDB Announcements Forum.

Please post questions and issues related to GTDB-Tk on the Issues section of the GitHub repository. Questions related to the GTDB can be posted on the GTDB Forum or sent to the GTDB team.

🚀 Getting started

Be sure to check the hardware requirements, then choose your preferred method:

📖 Documentation

Documentation for GTDB-Tk can be found here.

✨ New Features

GTDB-Tk v2.1.0 includes the following new features:

  • GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple class-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 55 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the --full-tree flag.
    This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (See #383).
  • Genomes that cannot be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the gtdbtk.bac120.summary.tsv as 'Unclassified'
  • Genomes filtered out during the alignment step are now reported in the gtdbtk.bac120.summary.tsv or gtdbtk.ar53.summary.tsv as 'Unclassified Bacteria/Archaea'
  • --write_single_copy_genes flag in now available in the classify_wf and de_novo_wf workflows.

📚 References

GTDB-Tk is described in:

The Genome Taxonomy Database (GTDB) is described in:

We strongly encourage you to cite the following 3rd party dependencies:

© Copyright

Copyright 2017 Pierre-Alain Chaumeil. See LICENSE for further details.