Library to query multiple files against many databases
Pre-packaged releases of MassBlast are available at github, download here and support:
- Linux 32/64-bit
- Mac OSX (recent versions)
- Windowns (the binaries are 32-bits, due to our packaging tool that includes ruby and other dependencies)
- BLAST+ installed
- link to download latest version
- note for Windows users:
- Can only install 32-bit version of BLSAT+ (latest win32 version is 2.2.30 that can be downloaded here)
- If it gives an error, please delete ncbi.ini located at a subdirectory at the AppData folder, if problem persists, submit an issue.
Default options can be changed at user.yml, check user.yml.example for more information (manual soon).
- Place fasta files with queries at
folder. - Place blast databases at
folder.- Check "How to setup a Blast database for a transcriptome" below for more information on creating a Blast database.
- Edit user.yml file to change options and BLAST engine to be used.
- run mass-blast script (either double click it on Windows or as a command in the command line.
Ruby interpreter
Bundler gem
bundle install
at root directory -
Options are configurable via
file- Change 'db_parent' and 'query_parent' to specify the parent directories for blast databases and queries
- Change 'dbs' and 'folder_queries' to specify the databases that should be used and which query folders should be crawled
$ ruby script.rb
The test blast database and the taxonomy database are not kept in the git tree anymore, to get this auxiliary data run the command below or call mass-blast via script.rb
$ rake bootstrap.rb
If you need to include it on your code use:
require_relative 'src/download'
$ rake spec
- Blastn
- TBlastn
- TBlastx
All different types have two implemented methods, blast and blast_folders
- blast(qfile, db, out_file, query_parent=nil, db_parent=nil)
- qfile: query file path - string
- db: database name - string
- out_file: output file path (can be relative) -string
- query_parent: parent directory of query (optional) - string
- db_parent: parent directory of database (optional) - string
notes: 'qfile' and 'db' arguments can be relative to 'query_parent' and 'db_parent' (respectively).
- blast_folders( folders=nil, query_parent=nil, db_parent=nil )
- folders: list of folders (optional) - array of strings
- query_parent: parent directory of folders (optional) - string
- db_parent: parent directory of database (optional) - string
notes: 'folder' argument can be relative to 'query_parent'. All optional parameters must be set in the config.yml file
Using makeblastdb command that comes bundled with Blast+
Open the command line in your operating system
Navigate to directory
Go to directory that has the fasta file with the assembly
Run makeblastdb command in that directory
nucleotides database
$ makeblastdb -in <filename> -dbtype nucl -out "<blast_db_new_name>" -title "<blast_db_new_name>"
protein database
$ makeblastdb -in <filename> -dbtype nucl -out "<blast_db_new_name>" -title "<blast_db_new_name>"
note: do to not use spaces in the <blast db new name>
In Linux and OSX you can place the fasta files in db_and_queries/import_dbs directory and run the script
$ cd db_and_queries/import_dbs
$ sh [nucl|prot]
In Windows run the import_fastas.bat script
$ cd db_and_queries/import_dbs
$ import_fastas.bat [nucl|prot]
- Gene Extractor: can be used to extract genes from Kegg2 and GenBank using keyword search.
- ORF-Finder: Finds the longest Open Reading Frame from a nucleotide sequence.
- MassBlast package bundler: Creates a package that can be easily used in all main Operating Systems without having to install Ruby and any Ruby dependecies.
This tool was created as a part of FCT grant SFRH/BD/97415/2013 and European Commission research project BacHBerry (FP7- 613793)