Skip to content

Commit

Permalink
binner initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
reslp committed Oct 3, 2019
0 parents commit b0bb618
Show file tree
Hide file tree
Showing 3 changed files with 280 additions and 0 deletions.
24 changes: 24 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@

testdata/illumina_reads_5_R1_trimmed.fq
testdata/illumina_reads_5_R2_trimmed.fq
testdata/maxbin/maxbin_bin_out.contig.tmp
testdata/maxbin/maxbin_bin_out.contig.tmp.abund1
testdata/maxbin/maxbin_bin_out.contig.tmp.frag.faa
testdata/maxbin/maxbin_bin_out.contig.tmp.hmmout
testdata/maxbin/maxbin_bin_out.contig.tmp.hmmout.FINISH
testdata/maxbin/maxbin_bin_out.log
testdata/maxbin/maxbin_bin_out.tooshort
testdata/maxbin/scaffolds.fasta.counts
testdata/maxbin/scaffolds.fasta.idxstats
testdata/metabat/metabat_depth.txt
testdata/metabat/metabat_paired.txt
testdata/scaffolds.fasta
testdata/scaffolds.fasta.bam
testdata/scaffolds.fasta.bam.bai
testdata/scaffolds.fasta.index.1.bt2
testdata/scaffolds.fasta.index.2.bt2
testdata/scaffolds.fasta.index.3.bt2
testdata/scaffolds.fasta.index.4.bt2
testdata/scaffolds.fasta.index.rev.1.bt2
testdata/scaffolds.fasta.index.rev.2.bt2
testdata/scaffolds.fasta.sam
106 changes: 106 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
binner
=========

binner is a wrapper script to run several metagenome binning programs using Docker.


Supported Binners
===========

Currently binner supports these metagenome binning programs:

[CONCOCT] (https://github.com/BinPro/CONCOCT)
[MaxBin2] (https://sourceforge.net/projects/maxbin2/)
[MetaBat] (https://bitbucket.org/berkeleylab/metabat/src/master/)
[blobtools] (https://github.com/DRL/blobtools)


REQUIREMENTS
============

- MacOS X or other Unix like operating system
- [Docker] (https://www.docker.com/get-started)


INSTALLATION
=======
Assuming Docker is installed and configured properly, it is straightforward to install binner:

```
$ git clone git clone https://github.com/reslp/binner.git
$ cd binner
$ chmod +x binner
$ ./binner -h
Welcome to binner. A script to quickly run metagenomic binning software using Docker.
Usage: ./binner.sh [-v] [-a <assembly_file>] [-f <read_file1>] [-r <read_file2>] [-m maxbin,metabat,blobtools,concoct] [-t nthreads] [[-b /path/to/diamonddb -p /path/to/prot.accession2taxid]]
Options:
-a <assembly_file> Assembly file in FASTA format (needs to be in current folder)
-f <read_file1> Forward read file in FASTQ format (can be gzipped)
-r <read_file2> Reverse read file in FASTQ format (can be gzipped)
-m <maxbin,metabat,blobtools,concoct> specify binning software to run.
Seperate multiple options by a , (eg. -o maxbin,blobtools).
-t number of threads for multi threaded parts
-v Display program version
Options specific to blobtools:
The blobtools container used here uses diamond instead of blast to increase speed.
Options needed when blobtools should be run. The blobtools container used here uses diamond instead of blast to increase speed.
-b full (absolute) path to diamond database
-p full (absolute) path to directory containing prot.accession2taxid file provided by NCBI
```




USAGE
========

binner can run multiple binning software. The components of different binners are contained as individual Docker containers. It is not necessary to install them individually. Most metagenomic binners need an assembly and the associated read files used to create the assembly. Binner expects that the Assembly to filter is provided in FASTA format and the read files in FASTQ format. Assembly and reads should be in the same directory. binner should be executed in this directory.

**Running MetaBat with binner:**

```$ binner -a metagenome.fasta -f forward_readfile.fq -r reverse_readfile.fq -m metabat```

**Running MaxBin with binner:**

```$ binner -a metagenome.fasta -f forward_readfile.fq -r reverse_readfile.fq -m maxbin```

**Running CONCOCT with binner:**

```$ binner -a metagenome.fasta -f forward_readfile.fq -r reverse_readfile.fq -m concoct```

**Running blobtools with binner:**

Blobtools requires blast results to get the taxonomic identity (by using NCBI taxids) of individual contigs in the assembly. binner creates these blast results with diamond blastx. However you will need a diamond based sequence database (typically the NCBI nr database). If you don't already have one you can set it up like this.

```
$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz
$ docker run --rm reslp/diamond diamond makedb --in nr.gz -d nr
```

Using this command has the advantage that the database is compatible with the used diamond Docker container used in binner which is reslp/binner.

Because diamond cannot output taxids directly binner maps the ids retrieved by diamond blastx to NCBI taxids. This is done using the file `prot.accession2taxid` provided by NCBI. If you don't have this file already download by running the following commands:

```
$ wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz
$ gunzip prot.accession2taxid.gz
```

```$ binner -a metagenome.fasta -f forward_readfile.fq -r reverse_readfile.fq -m metabat -b /path/to/diamonddb -p /path/to/prot.accession2taxid```




COPYRIGTH AND LICENSE
=====================

Copyright (C) 2019 Philipp Resl

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program in the file LICENSE. If not, see http://www.gnu.org/licenses/.
150 changes: 150 additions & 0 deletions binner
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
#!/bin/bash
# written by Philipp Resl, Oct. 2019, github.com/reslp/binner

usage() {
echo "Welcome to binner. A script to quickly run metagenomic binning software using Docker."
echo
echo "Usage: $0 [-v] [-a <assembly_file>] [-f <read_file1>] [-r <read_file2>] [-m maxbin,metabat,blobtools,concoct] [-t nthreads] [[-b /path/to/diamonddb -p /path/to/prot.accession2taxid]]"
echo
echo "Options:"
echo " -a <assembly_file> Assembly file in FASTA format (needs to be in current folder)"
echo " -f <read_file1> Forward read file in FASTQ format (can be gzipped)"
echo " -r <read_file2> Reverse read file in FASTQ format (can be gzipped)"
echo " -m <maxbin,metabat,blobtools,concoct> specify binning software to run."
echo " Seperate multiple options by a , (eg. -o maxbin,blobtools)."
echo " -t number of threads for multi threaded parts"
echo " -v Display program version"
echo
echo "Options specific to blobtools:"
echo " The blobtools container used here uses diamond instead of blast to increase speed."
echo " Options needed when blobtools should be run. The blobtools container used here uses diamond instead of blast to increase speed."
echo " -b full (absolute) path to diamond database"
echo " -p full (absolute) path to directory containing prot.accession2taxid file provided by NCBI"
1>&2; exit 1; }

version() {
echo "binner version 0.1"
exit 0
}

while getopts ":t:m:a:f:r:vb:p:" option;
do
case "${option}"
in
a) ASSEMBLY=${OPTARG};;
f) R1=${OPTARG};;
r) R2=${OPTARG};;
v) version;;
m) OPTIONS=${OPTARG};;
t) THREADS=${OPTARG};;
b) DIAMONDDB=${OPTARG};;
p) PROTID=${OPTARG};;
*) usage;;
?) usage;;
esac
done
if [ $OPTIND -eq 1 ]; then usage; fi
#echo $OPTIONS

# this needs to be set because on Linux docker created files will be owned by root by default.
unset DOCKER_USER
if [[ "$OSTYPE" == "linux-gnu" ]]; then
DOCKER_USER="--user $(id -u):$(id -g)"
elif [[ "$OSTYPE" == "darwin"* ]]; then #nothing to be done on MacOS
DOCKER_USER=""
fi

if [[ ! -f "$ASSEMBLY".index.1.bt2 ]]; then
echo "(binner) No Bowtie2 index file found. Creating Bowtie2 index..."
docker run -t --rm $DOCKER_USER -v $(pwd):/data/ reslp/bowtie2 bowtie2-build /data/$ASSEMBLY /data/$ASSEMBLY.index -q
fi

if [[ ! -f "$ASSEMBLY".bam ]]; then
echo "(binner) No BAM file found. Will perform read mapping with bowtie2 ..."
docker run -t --rm $DOCKER_USER -v $(pwd):/data/ reslp/bowtie2 bowtie2 -p $THREADS -q --phred33 --fr -x /data/$ASSEMBLY.index -1 /data/$R1 -2 /data/$R2 -S /data/$ASSEMBLY.sam --quiet
echo "(binner) Converting SAM to BAM ..."
docker run -t --rm $DOCKER_USER -v $(pwd):/data/ reslp/samtools samtools view -bS /data/$ASSEMBLY.sam -o /data/$ASSEMBLY.bam
docker run -t --rm $DOCKER_USER -v $(pwd):/data/ reslp/samtools samtools sort -o /data/$ASSEMBLY.bam /data/$ASSEMBLY.bam
echo "(binner) Will index BAM file ..."
docker run -t --rm $DOCKER_USER -v $(pwd):/data/ reslp/samtools samtools index /data/$ASSEMBLY.bam
fi

if [[ $OPTIONS == *"maxbin"* ]]; then
echo "(binner) Will run MaxBin"
mkdir -p maxbin
# these docker commands are not optimal because the create files as the root user.
# passing UID and GID don't work in this case because of the way maxbin is set up.
# I have not yet found a way around this.
docker run -t -v $(pwd):/data/ reslp/samtools samtools idxstats /data/$ASSEMBLY.bam > maxbin/$ASSEMBLY.idxstats
cut -f1,3 maxbin/$ASSEMBLY.idxstats > maxbin/$ASSEMBLY.counts
docker run -t -v $(pwd):/data/ reslp/maxbin run_MaxBin.pl -contig /data/$ASSEMBLY -abund /data/maxbin/$ASSEMBLY.counts -thread $THREADS -out /data/maxbin/maxbin_bin_out
fi

if [[ $OPTIONS == *"metabat"* ]]; then
echo "(binner) Will run MetaBat"
mkdir -p metabat
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/metabat jgi_summarize_bam_contig_depths --outputDepth /data/metabat/metabat_depth.txt --pairedContigs /data/metabat/metabat_paired.txt /data/$ASSEMBLY.bam
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/metabat metabat2 -i /data/$ASSEMBLY -a /data/metabat/metabat_depth.txt -o metabat --sensitive -v
fi

if [[ $OPTIONS == *"concoct"* ]]; then
echo "(binner) Will run concoct"
mkdir -p concoct
echo "(binner) Digesting FASTA file ..."
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/concoct cut_up_fasta.py /data/"$ASSEMBLY" -c 10000 -o 0 --merge_last -b /data/"$ASSEMBLY"_contigs_10K.bed > "$ASSEMBLY"_contigs_10K.fa
echo "(binner) Creating coverage table ..."
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/concoct concoct_coverage_table.py /data/"$ASSEMBLY"_contigs_10K.bed /data/"$ASSEMBLY".bam > concoct_coverage_table.tsv
echo "(binner) running concoct ..."
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/concoct concoct --composition_file /data/"$ASSEMBLY"_contigs_10K.fa --coverage_file /data/concoct_coverage_table.tsv -b /data/concoct/"$ASSEMBLY"_concoct --threads $THREADS
echo "(binner) Merging results ..."
docker run $DOCKER_USER -v $(pwd):/data/ --rm reslp/concoct merge_cutup_clustering.py /data/concoct/"$ASSEMBLY"_concoct_clustering_gt1000.csv > concoct/"$ASSEMBLY"_concoct_clustering_merged.csv
echo "(binner) Exract FASTA chunks ..."
mkdir -p concoct/bins
docker run -t -v $(pwd):/data/ --rm reslp/concoct extract_fasta_bins.py /data/"$ASSEMBLY" /data/concoct/"$ASSEMBLY"_concoct_clustering_merged.csv --output_path /data/concoct/bins
cd concoct/bins
rename "s/^/"$ASSEMBLY"_concoct_/" *.fa
cd ../..
fi

if [[ $OPTIONS == *"blobtools"* ]]; then
echo "(binner) Will prepare for blobtools"
if [ -z $DIAMONDDB ]; then
echo "(binner) Error: Path to diamond db not set."
exit 1
fi
if [ -z $PROTID ]; then
echo "(binner) Error: Path to prot.accession2taxid not set."
exit 1
fi
if [ ! -f $PROTID ]; then
echo "(binner) Error: $PROTID does not exist. Is the path correct?"
exit 1
fi
if [ ! -f $DIAMONDDB".dmnd" ]; then
echo "(binner) Error: $DIAMONDDB.dmnd does not exist. Is the path correct?"
exit 1
fi

echo "(binner) location of diamond db: "$DIAMONDDB
echo "(binner) location of prot.accession2taxid: "$PROTID

mkdir -p blobtools

if [ ! -f blobtools/"$ASSEMBLY"_diamond_matches ]; then
echo "(binner) No diamond results found. Will therefore run diamond"
docker run -t $DOCKER_USER -v $(pwd):/data/ -v $(dirname $DIAMONDDB):/opt/database/ --rm reslp/diamond diamond blastx -d /opt/database/$(basename $DIAMONDDB) -q /data/$ASSEMBLY -o /data/blobtools/"$ASSEMBLY"_diamond_matches -p $THREADS
fi
echo "(binner) reformatting diamond results for use with blobtools..."
docker run $DOCKER_USER -v $(pwd):/data/ -v $(dirname $PROTID):/opt/mapping --rm reslp/get_taxids /opt/mapping/prot.accession2taxid /data/blobtools/"$ASSEMBLY"_diamond_matches > blobtools/"$ASSEMBLY"_diamond_matches_formatted
echo "(binner) Running blobtools"
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/blobtools create -i /data/$ASSEMBLY -b /data/$ASSEMBLY.bam -t /data/blobtools/"$ASSEMBLY"_diamond_matches_formatted -o /data/blobtools/"$ASSEMBLY"_blobtools
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/blobtools view -i /data/blobtools/"$ASSEMBLY"_blobtools.blobDB.json -o /data/blobtools/
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/blobtools plot -i /data/blobtools/"$ASSEMBLY"_blobtools.blobDB.json -o /data/blobtools/
echo "(binner) extracting contigs from blobtools"
cp $ASSEMBLY blobtools/
cd blobtools
docker run -t $DOCKER_USER -v $(pwd):/data/ --rm reslp/extract_contigs /data/"$ASSEMBLY" /data/"$ASSEMBLY"_blobtools.blobDB.table.txt
rm $ASSEMBLY
cd ..
fi

0 comments on commit b0bb618

Please sign in to comment.