Command line application to perform BLAST queries from multiple files against different databases at once.
We recommend using a packaged release of MassBlast available at this link. The only requirement is to have BLAST+ installed. See all the information in the section below.
The latest release can be downloaded here.
A pre-print of the manuscript describing this application is available at bioRxiv and can be accessed here.
The latest release can be downloaded here.
Pre-requirements:
- Install BLAST+ available here
Important note for Windows users:
- Can only install 32-bit version of BLAST+ that can be downloaded here
- latest win32 version is 2.2.30
- If it gives an error, please delete
ncbi.ini
located at a subdirectory at theAppData
folder in the user directory, if problem persists, submit an issue.
note: Ruby and all other requirements are included in the package files, it is not necessary to install it.
It supports all major operating systems Linux, Mac OSX and Windows (For windows it only supports 32-bits)
- Place fasta files with queries at
db_and_queries/queries
folder. - Place blast databases at
db_and_queries/db
folder.- Check "How to setup a Blast database for a transcriptome" below for more information on creating a Blast database.
- Edit user.yml file to change options and BLAST engine to be used.
- run mass-blast script (either double click it on Windows or as a command in the command line.
We do not recommend installing from source unless you plan to develop MassBlast further. The package available already has all dependencies pre-packaged and is ready to be used.
Requirements:
-
Ruby interpreter
-
Bundler gem
-
rub
bundle install
at root directory -
Options are configurable via
config/user.yml
file- Change 'db_parent' and 'query_parent' to specify the parent directories for blast databases and queries
- Change 'dbs' and 'folder_queries' to specify the databases that should be used and which query folders should be crawled
$ ruby script.rb
The test blast database and the taxonomy database are not kept in the git tree anymore, to get this auxiliary data run the command below or call mass-blast via script.rb
$ rake bootstrap.rb
If you need to include it on your code use:
require_relative 'src/download'
ExternalData.download(path_to_db_parent)
$ rake spec
- Blastn
- TBlastn
- TBlastx
All different types have two implemented methods, blast and blast_folders
- blast(qfile, db, out_file, query_parent=nil, db_parent=nil)
- qfile: query file path - string
- db: database name - string
- out_file: output file path (can be relative) -string
- query_parent: parent directory of query (optional) - string
- db_parent: parent directory of database (optional) - string
notes: 'qfile' and 'db' arguments can be relative to 'query_parent' and 'db_parent' (respectively).
- blast_folders( folders=nil, query_parent=nil, db_parent=nil )
- folders: list of folders (optional) - array of strings
- query_parent: parent directory of folders (optional) - string
- db_parent: parent directory of database (optional) - string
notes: 'folder' argument can be relative to 'query_parent'. All optional parameters must be set in the config.yml file
Using makeblastdb command that comes bundled with Blast+
-
Open the command line in your operating system
-
Navigate to directory
-
Go to directory that has the fasta file with the assembly
-
Run makeblastdb command in that directory
-
nucleotides database
$ makeblastdb -in <filename> -dbtype nucl -out "<blast_db_new_name>" -title "<blast_db_new_name>"
-
protein database
$ makeblastdb -in <filename> -dbtype nucl -out "<blast_db_new_name>" -title "<blast_db_new_name>"
-
note: do to not use spaces in the <blast db new name>
In Linux and OSX you can place the fasta files in db_and_queries/import_dbs directory and run the import_fastas.sh script
$ cd db_and_queries/import_dbs
$ sh import_fastas.sh [nucl|prot]
In Windows run the import_fastas.bat script
$ cd db_and_queries/import_dbs
$ import_fastas.bat [nucl|prot]
- Gene Extractor: can be used to extract genes from Kegg2 and GenBank using keyword search.
- ORF-Finder: Finds the longest Open Reading Frame from a nucleotide sequence.
- MassBlast package bundler: Creates a package that can be easily used in all main Operating Systems without having to install Ruby and any Ruby dependecies.
MassBlast was developed primarily by André Veríssimo and Dr. Jean-Etienne Bassard.
A pre-print of the manuscript is available at bioRxiv and can be accessed here
This work was supported by:
- European Union Framework Program 7, Project BacHBERRY (FP7-613793);
- FCT, through IDMEC, under LAETA, projects (UID/EMS/50022/2013);
We would like to thank Dra. Cathie Martin and Dr. Philippe Vain for reading the manuscript and providing us with important comments and insights. We would also like to thank Dr. Aldo Ricardo Almeida Robles and Dr. Nuno Mira for testing MassBlast.