Library to query multiple files against many databases
We recommend using a packaged release of MassBlast available at this link. The only requirement is to have BLAST+ installed. See all the information in the section below.
Pre-packaged releases of MassBlast are available at github, download here and support:
- Linux 32/64-bit
- Mac OSX (recent versions)
- Windowns (the binaries are 32-bits, due to our packaging tool that includes ruby and other dependencies)
Requirements:
- BLAST+ installed
- link to download latest version
- note for Windows users:
- Can only install 32-bit version of BLSAT+ (latest win32 version is 2.2.30 that can be downloaded here)
- If it gives an error, please delete
ncbi.ini
located at a subdirectory at theAppData
folder, if problem persists, submit an issue.
Default options can be changed at user.yml, check user.yml.example for more information (manual soon).
- Place fasta files with queries at
db_and_queries/queries
folder. - Place blast databases at
db_and_queries/db
folder.- Check "How to setup a Blast database for a transcriptome" below for more information on creating a Blast database.
- Edit user.yml file to change options and BLAST engine to be used.
- run mass-blast script (either double click it on Windows or as a command in the command line.
We do not recommend installing from source unless you plan to develop MassBlast further. The package available already has all dependencies pre-packaged and is ready to be used.
Requirements:
-
Ruby interpreter
-
Bundler gem
-
rub
bundle install
at root directory -
Options are configurable via
config/user.yml
file- Change 'db_parent' and 'query_parent' to specify the parent directories for blast databases and queries
- Change 'dbs' and 'folder_queries' to specify the databases that should be used and which query folders should be crawled
$ ruby script.rb
The test blast database and the taxonomy database are not kept in the git tree anymore, to get this auxiliary data run the command below or call mass-blast via script.rb
$ rake bootstrap.rb
If you need to include it on your code use:
require_relative 'src/download'
ExternalData.download(path_to_db_parent)
$ rake spec
- Blastn
- TBlastn
- TBlastx
All different types have two implemented methods, blast and blast_folders
- blast(qfile, db, out_file, query_parent=nil, db_parent=nil)
- qfile: query file path - string
- db: database name - string
- out_file: output file path (can be relative) -string
- query_parent: parent directory of query (optional) - string
- db_parent: parent directory of database (optional) - string
notes: 'qfile' and 'db' arguments can be relative to 'query_parent' and 'db_parent' (respectively).
- blast_folders( folders=nil, query_parent=nil, db_parent=nil )
- folders: list of folders (optional) - array of strings
- query_parent: parent directory of folders (optional) - string
- db_parent: parent directory of database (optional) - string
notes: 'folder' argument can be relative to 'query_parent'. All optional parameters must be set in the config.yml file
Using makeblastdb command that comes bundled with Blast+
-
Open the command line in your operating system
-
Navigate to directory
-
Go to directory that has the fasta file with the assembly
-
Run makeblastdb command in that directory
-
nucleotides database
$ makeblastdb -in <filename> -dbtype nucl -out "<blast_db_new_name>" -title "<blast_db_new_name>"
-
protein database
$ makeblastdb -in <filename> -dbtype nucl -out "<blast_db_new_name>" -title "<blast_db_new_name>"
-
note: do to not use spaces in the <blast db new name>
In Linux and OSX you can place the fasta files in db_and_queries/import_dbs directory and run the import_fastas.sh script
$ cd db_and_queries/import_dbs
$ sh import_fastas.sh [nucl|prot]
In Windows run the import_fastas.bat script
$ cd db_and_queries/import_dbs
$ import_fastas.bat [nucl|prot]
- Gene Extractor: can be used to extract genes from Kegg2 and GenBank using keyword search.
- ORF-Finder: Finds the longest Open Reading Frame from a nucleotide sequence.
- MassBlast package bundler: Creates a package that can be easily used in all main Operating Systems without having to install Ruby and any Ruby dependecies.
This tool was created as a part of FCT grant SFRH/BD/97415/2013 and European Commission research project BacHBerry (FP7- 613793)