-
Notifications
You must be signed in to change notification settings - Fork 82
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #411 from Ecogenomics/write_doc
Write doc
- Loading branch information
Showing
12 changed files
with
627 additions
and
138 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ Announcements | |
|
||
|
||
GTDB-Tk 2.1.0 available | ||
------------------- | ||
----------------------- | ||
|
||
*May 11, 2022* | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
.. _commands/convert_to_itol: | ||
|
||
convert_to_itol | ||
=============== | ||
|
||
The `convert_to_itol` command will remove internal labels from Newick tree, making it suitable for visualization in `iTOL <http://itol.embl.de/>`_. | ||
|
||
Arguments | ||
--------- | ||
|
||
.. argparse:: | ||
:module: gtdbtk.cli | ||
:func: get_main_parser | ||
:prog: gtdbtk | ||
:path: convert_to_itol | ||
:nodefaultconst: | ||
|
||
Example | ||
------- | ||
|
||
Input | ||
^^^^^ | ||
|
||
|
||
.. code-block:: bash | ||
gtdbtk convert_to_itol --input some_tree.tree --output itol.tree | ||
Output | ||
^^^^^^ | ||
|
||
|
||
.. code-block:: text | ||
[2022-06-30 18:44:54] INFO: GTDB-Tk v2.1.0 | ||
[2022-06-30 18:44:54] INFO: gtdbtk convert_to_itol --input /tmp/decorated.tree --output new.tree | ||
[2022-06-30 18:44:54] INFO: Using GTDB-Tk reference data version r207: /gtdbtk-data | ||
[2022-06-30 18:44:54] INFO: Convert GTDB-Tk tree to iTOL format | ||
[2022-06-30 18:44:54] INFO: Done. | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
.. _performance/Accuracy: | ||
|
||
|
||
Accuracy | ||
======== | ||
|
||
The similarity of GTDB-Tk v1 and v2 classifications were first assessed using 16,710 bacterial genomes from the GEMs dataset (Nayfach et al., 2021) that represent novel taxa relative to GTDB R07-RS207. | ||
| Only 12 genomes (0.07%) did not have identical classifications between GTDB-Tk v1 and the divide-and-conquer approach used in GTDB-Tk v2. | ||
| The majority of incongruence was due to genomes being over- (6 genomes) or under-classified (4 genomes) by a single taxonomic rank. Only 2 genomes had conflicting taxonomic assignments, and these were both relatively poor-quality genomes assigned as new classes in alternative phyla. | ||
.. flat-table:: Table 1. Novelty of GEM genomes relative to GTDB R07-RS207 based on GTDB-Tk v1 classifications. | ||
:header-rows: 2 | ||
|
||
* - | ||
- | ||
- :cspan:`4` GTDB-Tk v2 classifications relative to GTDB-Tk v1 classifications | ||
* - Toxon Novelty | ||
- No genomes | ||
- Congruent | ||
- Conflict | ||
- Underclassified | ||
- Overclassified | ||
* - Novel phylum | ||
- 3 | ||
- 2 | ||
- 0 | ||
- 0 | ||
- 1 | ||
* - Novel class | ||
- 42 | ||
- 35 | ||
- 2 | ||
- 2 | ||
- 2 | ||
* - Novel order | ||
- 144 | ||
- 143 | ||
- 0 | ||
- 0 | ||
- 1 | ||
* - Novel family | ||
- 543 | ||
- 540 | ||
- 0 | ||
- 1 | ||
- 2 | ||
* - Novel genus | ||
- 3,222 | ||
- 3,219 | ||
- 0 | ||
- 1 | ||
- 0 | ||
* - Novel species | ||
- 12,756 | ||
- 12,756 | ||
- 0 | ||
- 0 | ||
- 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
.. _performance: | ||
|
||
############################ | ||
Performance and Accuracy | ||
############################ | ||
|
||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
performance | ||
accuracy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
.. _performance/Performance: | ||
|
||
Performance | ||
=========== | ||
|
||
|
||
| GTDB-Tk v2 also runs 22% to 35% faster when processing 1000 genomes with 1 to 64 CPUs (Fig.1) and is >40% faster when processing 5,000 genomes using 32 CPUs (Fig.2). | ||
| The tests below were run on a machine with 4 AMD EPYC 7402 24-Core Processor and 512 GB of RAM. | ||
.. plot:: | ||
|
||
import matplotlib.pyplot as plt | ||
from matplotlib.ticker import ScalarFormatter | ||
import numpy as np | ||
|
||
color_setup = ['#5a6855','red','color'] | ||
setup = color_setup | ||
cpus_list = [1, 8, 16, 32, 64] | ||
split_list = [2571, 457, 309, 230, 216] | ||
nosplit_list= [3980, 638, 398, 295, 279] | ||
values = ['1', '8', '16', '32','64'] | ||
|
||
fig, ax = plt.subplots() | ||
|
||
plt.ylim(200,4500) | ||
plt.xticks(cpus_list, values) | ||
|
||
ax.scatter(cpus_list, split_list, label="GTDB-Tk v2", marker="s", s=30, color=setup[0]) | ||
ax.plot(cpus_list, split_list,linestyle='dashed', color=setup[0]) | ||
ax.scatter(cpus_list, nosplit_list, label="GTDB-Tk v1",color=setup[1]) | ||
ax.plot(cpus_list, nosplit_list,linestyle=':', color=setup[1]) | ||
|
||
ax.set_yscale('log') | ||
ax.set_yticks([200,500,1000,2000,4000]) | ||
ax.yaxis.set_major_formatter(ScalarFormatter()) | ||
|
||
plt.ylabel('Runtime (min)') | ||
plt.xlabel('No. CPUs') | ||
plt.title('Fig.1: GTDB-Tk runtime for 1000 genomes') | ||
|
||
# show a legend on the plot | ||
plt.legend(loc=1, prop={'size': 12},frameon=False) | ||
plt.show() | ||
|
||
Fig. 1: Processing time for 1,000 randomly selected GEM MAGs for increasing numbers of CPUs. | ||
|
||
.. plot:: | ||
|
||
color_setup = ['#5a6855','red','color'] | ||
setup = color_setup | ||
pool_size_list=[10, 50, 100, 200, 500, 1000, 2000, 5000] | ||
split_list=[88, 150, 160, 169, 195, 235, 312, 558] | ||
nosplit_list=[137, 151, 163, 180, 219, 280, 416, 934] | ||
values = [10,500,1000,1500,2000,2500,3000,3500,4000,4500,5000] | ||
|
||
plt.scatter(pool_size_list, split_list, label="GTDB-Tk v2" , marker="s", s=30, color=setup[0]) | ||
plt.plot(pool_size_list, split_list,linestyle='dashed', color=setup[0]) | ||
plt.scatter(pool_size_list, nosplit_list, label="GTDB-Tk v1",color=setup[1]) | ||
plt.plot(pool_size_list, nosplit_list,linestyle=':', color=setup[1]) | ||
|
||
# naming the x axis | ||
|
||
plt.xticks(values) | ||
plt.yticks([100,300,500,700,900]) | ||
|
||
# naming the axis | ||
plt.ylabel('Runtime (min)') | ||
plt.xlabel('No. genomes') | ||
# giving a title to my graph | ||
plt.title('Fig.2: GTDB-Tk runtime with 32CPUs') | ||
|
||
# show a legend on the plot | ||
plt.legend(loc=2, prop={'size': 12},frameon=False) | ||
|
||
# function to show the plot | ||
plt.show() | ||
Fig. 2: Processing time with 32 CPUs on increasing numbers of randomly selected GEM MAGs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.