-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit b0bb618
Showing
3 changed files
with
280 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
|
||
testdata/illumina_reads_5_R1_trimmed.fq | ||
testdata/illumina_reads_5_R2_trimmed.fq | ||
testdata/maxbin/maxbin_bin_out.contig.tmp | ||
testdata/maxbin/maxbin_bin_out.contig.tmp.abund1 | ||
testdata/maxbin/maxbin_bin_out.contig.tmp.frag.faa | ||
testdata/maxbin/maxbin_bin_out.contig.tmp.hmmout | ||
testdata/maxbin/maxbin_bin_out.contig.tmp.hmmout.FINISH | ||
testdata/maxbin/maxbin_bin_out.log | ||
testdata/maxbin/maxbin_bin_out.tooshort | ||
testdata/maxbin/scaffolds.fasta.counts | ||
testdata/maxbin/scaffolds.fasta.idxstats | ||
testdata/metabat/metabat_depth.txt | ||
testdata/metabat/metabat_paired.txt | ||
testdata/scaffolds.fasta | ||
testdata/scaffolds.fasta.bam | ||
testdata/scaffolds.fasta.bam.bai | ||
testdata/scaffolds.fasta.index.1.bt2 | ||
testdata/scaffolds.fasta.index.2.bt2 | ||
testdata/scaffolds.fasta.index.3.bt2 | ||
testdata/scaffolds.fasta.index.4.bt2 | ||
testdata/scaffolds.fasta.index.rev.1.bt2 | ||
testdata/scaffolds.fasta.index.rev.2.bt2 | ||
testdata/scaffolds.fasta.sam |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
binner | ||
========= | ||
|
||
binner is a wrapper script to run several metagenome binning programs using Docker. | ||
|
||
|
||
Supported Binners | ||
=========== | ||
|
||
Currently binner supports these metagenome binning programs: | ||
|
||
[CONCOCT] (https://github.com/BinPro/CONCOCT) | ||
[MaxBin2] (https://sourceforge.net/projects/maxbin2/) | ||
[MetaBat] (https://bitbucket.org/berkeleylab/metabat/src/master/) | ||
[blobtools] (https://github.com/DRL/blobtools) | ||
|
||
|
||
REQUIREMENTS | ||
============ | ||
|
||
- MacOS X or other Unix like operating system | ||
- [Docker] (https://www.docker.com/get-started) | ||
|
||
|
||
INSTALLATION | ||
======= | ||
Assuming Docker is installed and configured properly, it is straightforward to install binner: | ||
|
||
``` | ||
$ git clone git clone https://github.com/reslp/binner.git | ||
$ cd binner | ||
$ chmod +x binner | ||
$ ./binner -h | ||
Welcome to binner. A script to quickly run metagenomic binning software using Docker. | ||
Usage: ./binner.sh [-v] [-a <assembly_file>] [-f <read_file1>] [-r <read_file2>] [-m maxbin,metabat,blobtools,concoct] [-t nthreads] [[-b /path/to/diamonddb -p /path/to/prot.accession2taxid]] | ||
Options: | ||
-a <assembly_file> Assembly file in FASTA format (needs to be in current folder) | ||
-f <read_file1> Forward read file in FASTQ format (can be gzipped) | ||
-r <read_file2> Reverse read file in FASTQ format (can be gzipped) | ||
-m <maxbin,metabat,blobtools,concoct> specify binning software to run. | ||
Seperate multiple options by a , (eg. -o maxbin,blobtools). | ||
-t number of threads for multi threaded parts | ||
-v Display program version | ||
Options specific to blobtools: | ||
The blobtools container used here uses diamond instead of blast to increase speed. | ||
Options needed when blobtools should be run. The blobtools container used here uses diamond instead of blast to increase speed. | ||
-b full (absolute) path to diamond database | ||
-p full (absolute) path to directory containing prot.accession2taxid file provided by NCBI | ||
``` | ||
|
||
|
||
|
||
|
||
USAGE | ||
======== | ||
|
||
binner can run multiple binning software. The components of different binners are contained as individual Docker containers. It is not necessary to install them individually. Most metagenomic binners need an assembly and the associated read files used to create the assembly. Binner expects that the Assembly to filter is provided in FASTA format and the read files in FASTQ format. Assembly and reads should be in the same directory. binner should be executed in this directory. | ||
|
||
**Running MetaBat with binner:** | ||
|
||
```$ binner -a metagenome.fasta -f forward_readfile.fq -r reverse_readfile.fq -m metabat``` | ||
|
||
**Running MaxBin with binner:** | ||
|
||
```$ binner -a metagenome.fasta -f forward_readfile.fq -r reverse_readfile.fq -m maxbin``` | ||
|
||
**Running CONCOCT with binner:** | ||
|
||
```$ binner -a metagenome.fasta -f forward_readfile.fq -r reverse_readfile.fq -m concoct``` | ||
|
||
**Running blobtools with binner:** | ||
|
||
Blobtools requires blast results to get the taxonomic identity (by using NCBI taxids) of individual contigs in the assembly. binner creates these blast results with diamond blastx. However you will need a diamond based sequence database (typically the NCBI nr database). If you don't already have one you can set it up like this. | ||
|
||
``` | ||
$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz | ||
$ docker run --rm reslp/diamond diamond makedb --in nr.gz -d nr | ||
``` | ||
|
||
Using this command has the advantage that the database is compatible with the used diamond Docker container used in binner which is reslp/binner. | ||
|
||
Because diamond cannot output taxids directly binner maps the ids retrieved by diamond blastx to NCBI taxids. This is done using the file `prot.accession2taxid` provided by NCBI. If you don't have this file already download by running the following commands: | ||
|
||
``` | ||
$ wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz | ||
$ gunzip prot.accession2taxid.gz | ||
``` | ||
|
||
```$ binner -a metagenome.fasta -f forward_readfile.fq -r reverse_readfile.fq -m metabat -b /path/to/diamonddb -p /path/to/prot.accession2taxid``` | ||
|
||
|
||
|
||
|
||
COPYRIGTH AND LICENSE | ||
===================== | ||
|
||
Copyright (C) 2019 Philipp Resl | ||
|
||
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. | ||
|
||
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. | ||
|
||
You should have received a copy of the GNU General Public License along with this program in the file LICENSE. If not, see http://www.gnu.org/licenses/. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
#!/bin/bash | ||
# written by Philipp Resl, Oct. 2019, github.com/reslp/binner | ||
|
||
usage() { | ||
echo "Welcome to binner. A script to quickly run metagenomic binning software using Docker." | ||
echo | ||
echo "Usage: $0 [-v] [-a <assembly_file>] [-f <read_file1>] [-r <read_file2>] [-m maxbin,metabat,blobtools,concoct] [-t nthreads] [[-b /path/to/diamonddb -p /path/to/prot.accession2taxid]]" | ||
echo | ||
echo "Options:" | ||
echo " -a <assembly_file> Assembly file in FASTA format (needs to be in current folder)" | ||
echo " -f <read_file1> Forward read file in FASTQ format (can be gzipped)" | ||
echo " -r <read_file2> Reverse read file in FASTQ format (can be gzipped)" | ||
echo " -m <maxbin,metabat,blobtools,concoct> specify binning software to run." | ||
echo " Seperate multiple options by a , (eg. -o maxbin,blobtools)." | ||
echo " -t number of threads for multi threaded parts" | ||
echo " -v Display program version" | ||
echo | ||
echo "Options specific to blobtools:" | ||
echo " The blobtools container used here uses diamond instead of blast to increase speed." | ||
echo " Options needed when blobtools should be run. The blobtools container used here uses diamond instead of blast to increase speed." | ||
echo " -b full (absolute) path to diamond database" | ||
echo " -p full (absolute) path to directory containing prot.accession2taxid file provided by NCBI" | ||
1>&2; exit 1; } | ||
|
||
version() { | ||
echo "binner version 0.1" | ||
exit 0 | ||
} | ||
|
||
while getopts ":t:m:a:f:r:vb:p:" option; | ||
do | ||
case "${option}" | ||
in | ||
a) ASSEMBLY=${OPTARG};; | ||
f) R1=${OPTARG};; | ||
r) R2=${OPTARG};; | ||
v) version;; | ||
m) OPTIONS=${OPTARG};; | ||
t) THREADS=${OPTARG};; | ||
b) DIAMONDDB=${OPTARG};; | ||
p) PROTID=${OPTARG};; | ||
*) usage;; | ||
?) usage;; | ||
esac | ||
done | ||
if [ $OPTIND -eq 1 ]; then usage; fi | ||
#echo $OPTIONS | ||
|
||
# this needs to be set because on Linux docker created files will be owned by root by default. | ||
unset DOCKER_USER | ||
if [[ "$OSTYPE" == "linux-gnu" ]]; then | ||
DOCKER_USER="--user $(id -u):$(id -g)" | ||
elif [[ "$OSTYPE" == "darwin"* ]]; then #nothing to be done on MacOS | ||
DOCKER_USER="" | ||
fi | ||
|
||
if [[ ! -f "$ASSEMBLY".index.1.bt2 ]]; then | ||
echo "(binner) No Bowtie2 index file found. Creating Bowtie2 index..." | ||
docker run -t --rm $DOCKER_USER -v $(pwd):/data/ reslp/bowtie2 bowtie2-build /data/$ASSEMBLY /data/$ASSEMBLY.index -q | ||
fi | ||
|
||
if [[ ! -f "$ASSEMBLY".bam ]]; then | ||
echo "(binner) No BAM file found. Will perform read mapping with bowtie2 ..." | ||
docker run -t --rm $DOCKER_USER -v $(pwd):/data/ reslp/bowtie2 bowtie2 -p $THREADS -q --phred33 --fr -x /data/$ASSEMBLY.index -1 /data/$R1 -2 /data/$R2 -S /data/$ASSEMBLY.sam --quiet | ||
echo "(binner) Converting SAM to BAM ..." | ||
docker run -t --rm $DOCKER_USER -v $(pwd):/data/ reslp/samtools samtools view -bS /data/$ASSEMBLY.sam -o /data/$ASSEMBLY.bam | ||
docker run -t --rm $DOCKER_USER -v $(pwd):/data/ reslp/samtools samtools sort -o /data/$ASSEMBLY.bam /data/$ASSEMBLY.bam | ||
echo "(binner) Will index BAM file ..." | ||
docker run -t --rm $DOCKER_USER -v $(pwd):/data/ reslp/samtools samtools index /data/$ASSEMBLY.bam | ||
fi | ||
|
||
if [[ $OPTIONS == *"maxbin"* ]]; then | ||
echo "(binner) Will run MaxBin" | ||
mkdir -p maxbin | ||
# these docker commands are not optimal because the create files as the root user. | ||
# passing UID and GID don't work in this case because of the way maxbin is set up. | ||
# I have not yet found a way around this. | ||
docker run -t -v $(pwd):/data/ reslp/samtools samtools idxstats /data/$ASSEMBLY.bam > maxbin/$ASSEMBLY.idxstats | ||
cut -f1,3 maxbin/$ASSEMBLY.idxstats > maxbin/$ASSEMBLY.counts | ||
docker run -t -v $(pwd):/data/ reslp/maxbin run_MaxBin.pl -contig /data/$ASSEMBLY -abund /data/maxbin/$ASSEMBLY.counts -thread $THREADS -out /data/maxbin/maxbin_bin_out | ||
fi | ||
|
||
if [[ $OPTIONS == *"metabat"* ]]; then | ||
echo "(binner) Will run MetaBat" | ||
mkdir -p metabat | ||
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/metabat jgi_summarize_bam_contig_depths --outputDepth /data/metabat/metabat_depth.txt --pairedContigs /data/metabat/metabat_paired.txt /data/$ASSEMBLY.bam | ||
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/metabat metabat2 -i /data/$ASSEMBLY -a /data/metabat/metabat_depth.txt -o metabat --sensitive -v | ||
fi | ||
|
||
if [[ $OPTIONS == *"concoct"* ]]; then | ||
echo "(binner) Will run concoct" | ||
mkdir -p concoct | ||
echo "(binner) Digesting FASTA file ..." | ||
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/concoct cut_up_fasta.py /data/"$ASSEMBLY" -c 10000 -o 0 --merge_last -b /data/"$ASSEMBLY"_contigs_10K.bed > "$ASSEMBLY"_contigs_10K.fa | ||
echo "(binner) Creating coverage table ..." | ||
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/concoct concoct_coverage_table.py /data/"$ASSEMBLY"_contigs_10K.bed /data/"$ASSEMBLY".bam > concoct_coverage_table.tsv | ||
echo "(binner) running concoct ..." | ||
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/concoct concoct --composition_file /data/"$ASSEMBLY"_contigs_10K.fa --coverage_file /data/concoct_coverage_table.tsv -b /data/concoct/"$ASSEMBLY"_concoct --threads $THREADS | ||
echo "(binner) Merging results ..." | ||
docker run $DOCKER_USER -v $(pwd):/data/ --rm reslp/concoct merge_cutup_clustering.py /data/concoct/"$ASSEMBLY"_concoct_clustering_gt1000.csv > concoct/"$ASSEMBLY"_concoct_clustering_merged.csv | ||
echo "(binner) Exract FASTA chunks ..." | ||
mkdir -p concoct/bins | ||
docker run -t -v $(pwd):/data/ --rm reslp/concoct extract_fasta_bins.py /data/"$ASSEMBLY" /data/concoct/"$ASSEMBLY"_concoct_clustering_merged.csv --output_path /data/concoct/bins | ||
cd concoct/bins | ||
rename "s/^/"$ASSEMBLY"_concoct_/" *.fa | ||
cd ../.. | ||
fi | ||
|
||
if [[ $OPTIONS == *"blobtools"* ]]; then | ||
echo "(binner) Will prepare for blobtools" | ||
if [ -z $DIAMONDDB ]; then | ||
echo "(binner) Error: Path to diamond db not set." | ||
exit 1 | ||
fi | ||
if [ -z $PROTID ]; then | ||
echo "(binner) Error: Path to prot.accession2taxid not set." | ||
exit 1 | ||
fi | ||
if [ ! -f $PROTID ]; then | ||
echo "(binner) Error: $PROTID does not exist. Is the path correct?" | ||
exit 1 | ||
fi | ||
if [ ! -f $DIAMONDDB".dmnd" ]; then | ||
echo "(binner) Error: $DIAMONDDB.dmnd does not exist. Is the path correct?" | ||
exit 1 | ||
fi | ||
|
||
echo "(binner) location of diamond db: "$DIAMONDDB | ||
echo "(binner) location of prot.accession2taxid: "$PROTID | ||
|
||
mkdir -p blobtools | ||
|
||
if [ ! -f blobtools/"$ASSEMBLY"_diamond_matches ]; then | ||
echo "(binner) No diamond results found. Will therefore run diamond" | ||
docker run -t $DOCKER_USER -v $(pwd):/data/ -v $(dirname $DIAMONDDB):/opt/database/ --rm reslp/diamond diamond blastx -d /opt/database/$(basename $DIAMONDDB) -q /data/$ASSEMBLY -o /data/blobtools/"$ASSEMBLY"_diamond_matches -p $THREADS | ||
fi | ||
echo "(binner) reformatting diamond results for use with blobtools..." | ||
docker run $DOCKER_USER -v $(pwd):/data/ -v $(dirname $PROTID):/opt/mapping --rm reslp/get_taxids /opt/mapping/prot.accession2taxid /data/blobtools/"$ASSEMBLY"_diamond_matches > blobtools/"$ASSEMBLY"_diamond_matches_formatted | ||
echo "(binner) Running blobtools" | ||
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/blobtools create -i /data/$ASSEMBLY -b /data/$ASSEMBLY.bam -t /data/blobtools/"$ASSEMBLY"_diamond_matches_formatted -o /data/blobtools/"$ASSEMBLY"_blobtools | ||
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/blobtools view -i /data/blobtools/"$ASSEMBLY"_blobtools.blobDB.json -o /data/blobtools/ | ||
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/blobtools plot -i /data/blobtools/"$ASSEMBLY"_blobtools.blobDB.json -o /data/blobtools/ | ||
echo "(binner) extracting contigs from blobtools" | ||
cp $ASSEMBLY blobtools/ | ||
cd blobtools | ||
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/extract_contigs /data/"$ASSEMBLY" /data/"$ASSEMBLY"_blobtools.blobDB.table.txt | ||
rm $ASSEMBLY | ||
cd .. | ||
fi | ||
|