2021.01 (qiita-spots#3065)

sjanssen2 · Jan 22, 2021 · e6e5d80 · e6e5d80
1 parent 30d7f5e
commit e6e5d80
Show file tree

Hide file tree

Showing 11 changed files with 1,913 additions and 32 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,18 @@
 # Qiita changelog
 
 
+Version 2021.01
+---------------
+
+* Moved the qiita repo from biocore to [qiita-spots](https://github.com/qiita-spots/qiita/).
+* Created the [Qiita portal for the Cancer Microbiome](https://qiita.ucsd.edu/cancer/).
+* The EBI-ENA code now verifies that the sample information file has a description column; this wasn't previously required because it was automatically prefilled by the QIIME 1 mapping file.
+* Now it is possible to download the per preparation sample information file and the sample-preparation summary.
+* Added a faster metagenomic/metatranscriptomic adaptor and host removal step based on fastp and minimap2. The previous version, using atropos and bowtie2 for QC host filtering, is now deprecated.
+* Added qiime2.2020.11 to the system; which updated these plugins: qp-qiime2, qtp-biom, qtp-diversity, qtp-visualization.
+* Added [WoL](https://biocore.github.io/wol/) tree for phylogenetic analyses (/projects/wol/release/databases/qiime2/phylogeny.qza) with per-genome WoL artifacts.
+* Fixed the following issues: [#3060](https://github.com/qiita-spots/qiita/issues/3060), [#3049](https://github.com/qiita-spots/qiita/issues/3049), and [#2751](https://github.com/qiita-spots/qiita/issues/2751).
+
 Version 2020.11
 ---------------
 

diff --git a/README.rst b/README.rst
@@ -1,7 +1,7 @@
 Qiita (canonically pronounced *cheetah*)
 ========================================
 
-|Build Status| |Coverage Status| |Gitter|
+|Build Status| |Coverage Status|
 
 Advances in sequencing, proteomics, transcriptomics and metabolomics are giving
 us new insights into the microbial world and dramatically improving our ability

diff --git a/logos/qiita_cancer.ai b/logos/qiita_cancer.ai
diff --git a/qiita_core/__init__.py b/qiita_core/__init__.py
@@ -6,4 +6,4 @@
 # The full license is in the file LICENSE, distributed with this software.
 # -----------------------------------------------------------------------------
 
-__version__ = "2020.11"
+__version__ = "2021.01"
diff --git a/qiita_db/__init__.py b/qiita_db/__init__.py
@@ -27,7 +27,7 @@
 from . import user
 from . import processing_job
 
-__version__ = "2020.11"
+__version__ = "2021.01"
 
 __all__ = ["analysis", "artifact",  "archive", "base", "commands",
            "environment_manager", "exceptions", "investigation", "logger",

diff --git a/qiita_pet/__init__.py b/qiita_pet/__init__.py
@@ -6,4 +6,4 @@
 # The full license is in the file LICENSE, distributed with this software.
 # -----------------------------------------------------------------------------
 
-__version__ = "2020.11"
+__version__ = "2021.01"
diff --git a/qiita_pet/handlers/api_proxy/__init__.py b/qiita_pet/handlers/api_proxy/__init__.py
@@ -38,7 +38,7 @@
 from .user import (user_jobs_get_req)
 from .util import check_access, check_fp
 
-__version__ = "2020.11"
+__version__ = "2021.01"
 
 __all__ = ['prep_template_summary_get_req', 'data_types_get_req',
            'study_get_req', 'sample_template_filepaths_get_req',

diff --git a/qiita_pet/support_files/doc/source/processingdata/processing-recommendations.rst b/qiita_pet/support_files/doc/source/processingdata/processing-recommendations.rst
@@ -25,9 +25,11 @@ gene data: sequence clustering and sequence deblur.
 Sequencing deblur (preferred)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-For this we use `deblur <https://github.com/biocore/deblur>`_. Here 2 BIOM tables are generated by default: fina.biom and final.only-16s.biom. The former is the full biom table, which can be used with any target gene and wetlab work;
-the latter is the trimmed version to those sequences that match Greengenes at 80% similarity, a really basic and naive filtering. Each of those BIOM tables, is accompanied by a FASTA that contains
-the representative sequences. The OTU IDs are given by the unique sequence.
+For this we use `deblur <https://github.com/biocore/deblur>`_. Here 2 BIOM tables are generated by default:
+`deblur final table` and `deblur reference hit table`. The former is the full biom table, which can be used with any
+target gene and wetlab work; the latter is the trimmed version to those sequences that match Greengenes at 80% similarity, a
+really basic and naive filtering. Each of those BIOM tables, is accompanied by a FASTA that contains the representative sequences.
+The OTU IDs are given by the unique sequence.
 
 Note that deblur needs all sequences to be trimmed at the same length, thus the recommended pipeline is to trim everything at 150bp and the deblur.
 
@@ -49,25 +51,28 @@ Below you will find more information about each of these options.
 
 The current workflow is as follows:
 
-#. Removal of adapter sequence and quality control: `Atropos <https://github.com/jdidion/atropos/>`_
-#. Removal of host contamination using `Bowtie2 <http://bowtie-bio.sourceforge.net/bowtie2/index.shtml>`_
+#. A single step per sample adapter removal (via `fastp <https://academic.oup.com/bioinformatics/article/34/17/i884/5093234>`_) and host filtering (via `minimap2 <https://academic.oup.com/bioinformatics/article/34/18/3094/4994778>`_); more information below.
 #. Taxonomy profiling using bowtie2 as an aligner and two different reference databases; see sections below
 
 Note that we recommend only uploading sequences that have already been through QC and human sequence removal. However, we
-recommend that all sequence files go through adapter and quality control within the system to ensure they are ready for
-subsequent analyses. Currently, the command removes adaptor sequences (only KAPA HyperPlus with iTru, which are compatible
-with Illumina TruSeq).
-
-Sequences generated with an instrument that relies on two-color chemistry (NextSeq, NovaSeq), need to undergo an additional
-quality control step. This step removes trailing G nucleotides which signify that the instrument has finished capturing new
-information. Per Illumina's specification, NovaSeq instruments have 3 quality levels (11, 25 and 37), and
-high-quality trailing Gs need to be removed. Typically this can be done in conjunction with adapter removal, with Atropos
-we recommend using the `--nextseq-trim 30` parameter.
-
-For host removal we currently support *Danio Rerio* (zebrafish), *Drosophila Melanogaster* (fruit fly), *Mus Musculus* (mouse),
-*Rattus Norvegicus* (rat), and Enterobacteria phage phiX174 (the Illumina spike-in control).
+recommend that all sequence files go through adapter and host filtering within the system to ensure they are ready for
+subsequent meta-analyses. Currently, the `fastp` command is set to autodetect adaptors so this command is available for all different
+wetlab processing and we provide the following host references for your convenience:
+
+- auto-detect adapters and artifacts + phix filtering: This is a `deblur artifacts <https://github.com/biocore/deblur/blob/master/deblur/support_files/artifacts.fa>`_ reference, mainly for debugging and testing
+- auto-detect adapters and cheetah + phix filtering
+- auto-detect adapters and cow + phix filtering
+- auto-detect adapters and hamster + phix filtering
+- auto-detect adapters and horse + phix filtering
+- auto-detect adapters and merge_genomes + phix filtering : is the combined genomes of a cheetah, cow, hamster, horse, human, mouse, pig, rabbit, and rat
+- auto-detect adapters and mouse + phix filtering
+- auto-detect adapters and pig + phix filtering
+- auto-detect adapters and rabbit + phix filtering
+- auto-detect adapters and rat + phix filtering
+- auto-detect adapters only filtering [not recommended]
 
 Note that the command produces up to 6 output artifacts based on the aligner and database selected:
+
 - Alignment Profile: contains the raw alignment file and the no rank classification BIOM table
 - Taxonomic Prediction - phylum: contains the phylum level taxonomic predictions BIOM table
 - Taxonomic Prediction - genus: contains the genus level taxonomic predictions BIOM table
@@ -186,19 +191,18 @@ Note that some of these are legacy option but not available for new processing.
 Metatranscriptome processing
 ----------------------------
 
+Qiita currently has one active Metatranscriptome data analysis pipeline, as follows:
+
+#. Ribosomal read filtering via `SortMeRNA <https://pubmed.ncbi.nlm.nih.gov/23071270/>`_; details below. This produces a `Ribosomal reads` and a `Non-ribosomal reads` artifact/
+#. Taxonomic profiling via Woltka; for more information see details above.
+
 Sample processing guidelines for metatranscriptomic data
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Total community RNA extracted from samples contain both coding and non-coding RNA. Typically, ribosomal RNA make up
->90% of the library if not depleted prior to library construction. Ribosomal depletion allows for mRNA enrichment. Even if
-you are dealing with ribosomal RNA subtracted cDNA libraries, there will be some
-residual ribosomal RNA in the libraries that you want to remove/separate from the non ribosomal RNA sequences.
-
 Ribosomal read filtering
 ^^^^^^^^^^^^^^^^^^^^^^^^
 
-`SortMeRNA <https://bioinfo.lifl.fr/RNA/sortmerna/>`_
-is used for removal of ribosomal reads from quality filtered Metatranscriptome data
+`SortMeRNA <https://pubmed.ncbi.nlm.nih.gov/23071270/>`_ is used for removal of ribosomal reads from quality filtered Metatranscriptome data
 
 Latest SortMeRNA version: v2.1
 

diff --git a/qiita_ware/__init__.py b/qiita_ware/__init__.py
@@ -6,4 +6,4 @@
 # The full license is in the file LICENSE, distributed with this software.
 # -----------------------------------------------------------------------------
 
-__version__ = "2020.11"
+__version__ = "2021.01"
diff --git a/scripts/qiita-auto-processing b/scripts/qiita-auto-processing
@@ -46,7 +46,7 @@ full_pipelines = [
      'steps': [
          {'previous-step': None,
           'plugin': 'qp-meta',
-          'version': '2020.11',
+          'version': '2021.01',
           'cmd_name': 'Atropos v1.1.24',
           'input_name': 'input',
           'ignore_parameters': ['Number of threads used'],

diff --git a/setup.py b/setup.py
@@ -10,7 +10,7 @@
 from setuptools import setup
 from glob import glob
 
-__version__ = "2020.11"
+__version__ = "2021.01"
 
 
 classes = """