SMRT pipe is a tool from PacBio which is useful for secondary analysis of PacBio data. This program helps run lima (for demultiplexing) and CCS2 on microbiome data from PacBio's new Sequel machine.
-
SMRT pipe comes installed with the SMRT analysis software suite. No additional installation is required to run this script.
-
Install Ruby v2.2.1 or greater For Fedora:
$ sudo dnf install ruby
For Centos/RHEL
$ sudo yum install ruby
Install the 'bundler' gem:
$ gem install bundler
To install dependencies run the command
$ bundle
- Sequencing data from microbiome samples which were pooled and sequenced on the Sequel.
- Barcodes file with all the barcodes that were used for pooling. File should be in FASTA format. This script only works for symmetric barcodes.
- Sample file with information regarding each sample.
Preset files are json format files which have all the settings required to run the LIMA command for demultiplexing and CCS2 command for to find the consensus sequence from subreads. Please make sure to change the value for id/setting "pbsmrtpipe.options.tmp_dir" in both the json files. This should point to a path where you want your temporary files to be stored. Other settings like number of CCS passes, pred accuracy, etc., can also be adjusted based on your requirements using the preset_ccs.json file.
SMRTPIPE (-p)
- The path where smrtpipe is located. Use full path, avoid relative paths.OUTDIR (-o)
- Path to where you want your intermediate (LIMA and CCS2) result files to be stored.SAMPLE_INFO_FILE (-s)
– This is the file which will have a list of all the PacBio jobs which are to be demutiplexed and run thorugh CCS2. The header of this file (first row) should have column names corresponding to pool_id, path_for_lima, barcode, and sample. These column names HAVE TO BE exactly as is described here because the program initializes data in each column based on these column names. Data in each column is described as follows:- pool_id – The name of each pool, i.e., all the samples pooled togteher into one set will have the same pool_id.
- path_for_lima – Path to where the subreadset.xml file is located for this particular pool. This is the path that is listed as "Data path" on SMRT link.
- barcode – Name of the barcode used for this sample, can only use symmetirc barcodes at this point.
- sample – This is the name given to each sample. This is the one that is going to be added in the FASTQ sequence header with a tag of “barcodelabel”. So, if you want any information to be kept track of, add it as a sample name. Multiple things can be kept track of in the sample name, separated by a “_”. For example, if I want to keep track of patient ID and sample ID in this location, give it the sample name “Pat123_Samp167” where Pat123 corresponds to the patient ID and Samp167 corresponds to the sample ID. This way all this information will be associated with each sequence and can later be tracked easily.
BARCODE_FILE (-b)
- A FASTA file with all the barcode sequences.
Run the smrtpipe.rb script along with the arguments that are required as input.
ruby smrtpipe.rb -p xx/bin/pbsmrtpipe -s sample_key.txt -o out_dir_name -b pacbio_barcodes_96.fasta
- reads folder - Consists of FASTQ files after running LIMA and CCS2. One FASTQ file is produced for every sample that was given in the sample key file.
- reads_2 folder - Consists of FASTQ files after running LIMA and CCS2, and also includes the number of ccs passes and barcode labels in the FASTQ read headers. One FASTQ file is produced for every sample that was given in the sample key file.