Erik K Flemington, Samuel A Flemington, Tina M O’Grady, Melody Baddoo, Trang Nguyen, Yan Dong, Nathan A Ungerleider, SpliceTools, a suite of downstream RNA splicing analysis tools to investigate mechanisms and impact of alternative splicing, Nucleic Acids Research, 2023;, gkad111, https://doi.org/10.1093/nar/gkad111
- Python 3
- Perl 5
Python libraries:
- Matplotlib
- Seaborn
- Pandas
Perl libraries:
- Parallel::ForkManager
git clone https://github.com/flemingtonlab/SpliceTools.git
cd SpliceTools/bin
File | Description |
---|---|
AS files | Must be in rMATS123 output tsv format. SpliceTools utilizes columns 2-11, 20 and 23 of A3SS, A5SS, RI, and SE files and 2-13, 22, and 25 of rMATS MXE files (see rMATS_file_formats.xlsx) for those interested in converting other file formats for use in SpliceTools. |
BED12 annotation | Bed12 file format should be used with Gene ID column (column 4) containing the ENSEMBL ID and gene ID separated by an underscore (for example, “ENST00000456328_DDX11L1“). |
Genome fasta | Standard genome fasta file (can be wrapped or unwrapped). |
Gene expression file | TSV file with gene ID in first column, control TPM values in following columns, test TPM values in the columns following control TPM values |
**Output files will be saved in the same directory as input AS files
Determines the fraction of expressed genes with statistically significant RI events at genes with a minimum input control or test TPM value.
Generates:
-
Lists of statistically significant negative and positive IncDiff RI events in expressed genes (includes gene expression values for each)
-
File with list of genes without statistically significant change in RI.
perl RIFractionExpressed.pl \
-r <retained intron file (rMATS JCEC)> \
-e <expression file> \
-TPM <min TPMs condition 1,min TPMs condition 2> \
-SN <sample number condition 1,sample number condition 2> \
-f <FDR>
-r <retained intron file>
-e <expression file>
-TPM <min TPMs condition 1,min TPMs condition 2>
Note: if condition 1 or 2 to be not considered, enter "-"
(e.g. TPM 3,- or TPM -,3)
-SN <sample number condition 1,sample number condition 2>
-f <FDR>
-h help
perl RIFractionExpressed.pl -r PATH/RIfile.txt -e PATH/expression.tsv -TPM 2,2 -SN 3,3 -f 0.05
perl RIFractionExpressedBatch.pl -r PATH/RI_files_directory -e PATH/expression.tsv -TPM 2,2 -SN 3,3 -f 0.05
Generates:
- Lists of sizes for upstream exon, retained intron, and downstream exon for statistically significant positive and negative IncDiff retained intron events
- Summary file with average sizes
perl RIIntronExonSizes.pl \
-r <retained intron file (rMATS JCEC)> \
-a <bed12 annotation file> \
-f <FDR>
-r <retained intron file>
-a <bed12 annotation file>
-f <FDR>
-h help
perl RIIntronExonSizes.pl -r PATH/RIfile.txt -a PATH/bed12_annotation_file -f 0.05
perl RIIntronExonSizesBatch.pl -r PATH/RI_input_files_directory -a PATH/bed12_annotation_file -f 0.05
Automatically runs the following on an rMATS RI JCEC file:
- RIFractionExpressed.pl
- RIIntronExonSizes.pl
- RISpliceSiteScoring.pl
perl RIMedley.pl \
-r <retained intron file (rMATS JCEC)> \
-a <bed12 annotation file> \
-g <genome fasta file> \
-e <expression file> \
-TPM <min TPMs condition 1,min TPMs condition 2> \
-SN <sample number condition 1,sample number condition2> \
-f <FDR>
-r <retained intron file>
-a <bed12 annotation file>
-g <genome fasta file>
-e <expression file>
-TPM <min TPMs condition 1,min TPMs condition 2>
-SN <sample number condition 1,sample number condition 2>
-f <FDR>
-h help
perl RIMedley.pl -r PATH/RIfile.txt -a PATH/bed12_annotation.bed -g PATH/genome.fa -e PATH/expression.tsv -TPM 2,2 -SN 3,3 -f 0.05
Generates:
- Lists of splice site scores (for plotting score distributions) for upstream donor and downstream acceptor sites for statistically significantly changed retained intron events
- Summary file with average scores
Note: Inclusion of annotation file is optional but will generate data for annotated events for comparison.
perl RISpliceSiteScoring.pl \
-r <retained intron file (rMATS JCEC)> \
-g <genome fasta file> \
-f <FDR> \
-a <bed12 annotation file>
-r <retained intron file>
-g <genome fasta file>
-f <FDR>
-a <bed12 annotation file> (optional)
-h help
perl RISpliceSiteScoring.pl -r PATH/RIfile.txt -g PATH/genome.fa -a PATH/bed12_annotation.bed -f 0.05
perl RISpliceSiteScoring.pl -r PATH/RI_input_files_directory -g PATH/genome.fa -a PATH/bed12_annotation.bed -f 0.05
Determines the fraction of expressed genes with statistically significant SE events at genes with a minimum input control or test TPM value.
Also generates:
- Lists of statistically significant negative and positive IncDiff SE events in expressed genes (includes gene expression values for each)
- File with list of genes without statistically significant exon skipping.
perl SEFractionExpressed.pl \
-s <skipped exon file (rMATS JCEC)> \
-e <expression file> \
-TPM <min TPMs condition 1,min TPMs condition 2> \
-SN <sample number condition 1,sample number condition 2> \
-f <FDR>
-s <skipped exon file>
-e <expression file>
-TPM <min TPMs condition 1,min TPMs condition 2>
Note: if condition 1 or 2 to be not considered, enter "-"
(e.g. TPM 3,- or TPM -,3)
-SN <sample number condition 1,sample number condition 2>
-f <FDR>
-h help
perl SEFractionExpressed.pl -s PATH/SEfile.txt -e PATH/expression.tsv -TPM 2,2 -SN 3,3 -f 0.05
perl SEFractionExpressedBatch.pl -s PATH/SE_input_files_directory -e PATH/expression.tsv -TPM 2,2 -SN 3,3 -f 0.05
Generates:
- Lists of sizes for upstream exon, upstream intron, skipped exon, downstream intron, and downstream exon for statistically significant positive and negative IncDiff skipped exon events.
- Summary file with average sizes (for comparison, includes average sizes for potential SE events based on input annotation file).
perl SEIntronExonSizes.pl \
-s <skipped exon file (rMATS JCEC)> \
-a <bed12 annotation file> \
-f <FDR>
-s <skipped exon file>
-a <bed12 annotation file>
-f <FDR>
-h help
perl SEIntronExonSizes.pl -s PATH/SEfile.txt -a PATH/bed12_annotation_file.bed -f 0.05
perl SEIntronExonSizesBatch.pl -s PATH/SE_input_files_directory -a PATH/bed12_annotation.bed -f 0.05
Automatically runs the following on an rMATS SE JCEC file:
- SEFractionExpressed.pl
- SEIntronExonSizes.pl
- SENumberSkipped.pl
- SESpliceSiteScoring.pl
- SETranslateNMD.pl
- SEUnannotated.pl
SEMedley.pl \
-s <skipped exon file (rMATS JCEC)> \
-a <bed12 annotation file> \
-g <genome fasta file> \
-e <expression file> \
-TPM <min TPMs condition 1,min TPMs condition 2> \
-SN <sample number condition 1,sample number condition2> \
-f <FDR>
-s <skipped exon file>
-a <bed12 annotation file>
-g <genome fasta file>
-e <expression file>
-TPM <min TPMs condition 1,min TPMs condition 2>
-SN <sample number condition 1,sample number condition 2>-f <FDR>
-h help
perl SEMedley.pl -s PATH/SEfile.txt -a PATH/bed12_annotation.bed -g PATH/genome.fa -e PATH/expression.tsv -TPM 2,2 -SN 3,3 -f 0.05
Generates:
- Lists of all predicted SE isoforms with number of exons skipped
- Lists of isoforms with maximum number of skipped exons
- Statistics for number of exons skipped
Note: Uses annotated exon information to determine number of exons skipped (0 exons skipped means that skipped exons were not in annotation file)
SENumberSkipped.pl \
-s <skipped exon file (rMATS JCEC)> \
-a <bed12 annotation file> \
-f <FDR>
-s <skipped exon file>
-a <bed12 annotation file>
-f <FDR>
-h help
perl SENumberSkipped.pl -s PATH/SEfile.txt -a PATH/bed12_annotation.bed -f 0.05
perl SENumberSkippedBatch.pl -s PATH/SE_input_files_directory -a PATH/bed12_annotation.bed -f 0.05
Generates:
- Lists of splice site scores (for plotting score distributions) for upstream donor, skipped exon, and downstream acceptor sites for statistically significantly changed exon skipping events.
- Summary file with average scores.
Note: Inclusion of annotation file is optional but will generate data for annotated events for comparison.
perl SESpliceSiteScoring.pl \
-s <skipped exon file (rMATS JCEC)> \
-g <genome fasta file> \
-f <FDR> \
-a <bed12 annotation file>
-s <skipped exon file>
-g <genome fasta file>
-f <FDR>
-a <bed12 annotation file> (optional)
-h help
perl SESpliceSiteScoring.pl -s PATH/SEfile.txt -g PATH/genome.fa -a PATH/bed12_annotation.bed -f 0.05
perl SESpliceSiteScoringBatch.pl -s PATH/SE_input_files_directory -g PATH/genome.fa -a PATH/bed12_annotation.bed -f 0.05
Program will:
- Generate predicted RNA(DNA) isoform sequences based on annotated exon structures upstream and downstream from skipping event
- Generate translated isoform sequences for all such events
- Generate separate listings of frameshifted protein isoforms
- Generate candidate neopeptides produced by frameshifts
- Generate lists of skipped exon protein sequences for in-frame events for BATCH submission to NCBI conserved domain search (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi)
- Identify events predicted to undergo Nonsense Mediated RNA Decay (NMD) and generate statistics on skipping events predicted to undergo and not undergo NMD
perl SETranslateNMD.pl \
-s <skipped exon file (rMATS)> \
-a <bed12 annotation file> \
-g <genome fasta file> \
-f <FDR>
-s <skipped exon file>
-a <bed12 annotation file>
-g <genome fasta file>
-f <FDR>
-h help
perl SETranslateNMD.pl -s PATH/SEfile.txt -a PATH/bed12_annotation.bed -g PATH/genome.fa -f 0.05
perl SETranslateNMDBatch.pl -s PATH/SE_input_files_directory -a PATH/bed12_annotation.bed -g PATH/genome.fa -f 0.05
Generates statistics on the number of skipping events that are present (annotated) and not present (unannotated) in a bed12 input annotation file. Also outputs lists of annotated and unannotated events (pos IncDiff and neg IncDiff).
perl SEUnannotated.pl \
-s <skipped exon file (rMATS)> \
-a <bed12 annotation file> \
-f <FDR>
-s <skipped exon file>
-a <bed12 annotation file>
-f <FDR>
-h help
perl SEUnannotated.pl -s PATH/SEfile.txt -a PATH/bed12_annotation.bed -f 0.05
perl SEUnannotatedBatch.pl -s PATH/SE_input_files_directory -a PATH/bed12_annotation.bed -f 0.05
Takes groups of rMATS differential splicing files and outputs lists of statistically significant events with positive IncDiff and negative IncDiff and summary files. It then compares differential splicing changes across experiments to assess functional relationships (hypergeometric test), outputting pval matrices, -log10(pval) matrices, and .svg graphical representation files displaying -log10(pval) of all comparisons.
Important notes:
-
Input file names must end in one or more of the following suffixes: A3SS.MATS.JCEC.txt A5SS.MATS.JCEC.txt MXE.MATS.JCEC.txt RI.MATS.JCEC.txt SE.MATS.JCEC.txt
-
By default, SpliceCompare.pl generates a file of common splicing changes for each comparison. For large numbers of input files, this can lead to the generation of millions of comparison files. Use option -p 0 to suppress this output.
-
By default, -log10(pval) maximum is 280. For experiments with less significant hypergeometric test values, max -log10(pval) can be decreased using -m option.*
perl SpliceCompare.pl \
-i <input files directory (rMATS JCEC)> \
-o <path to output directory> \
-f <FDR>
-i <input files directory (rMATS JCEC)>
-o <path to output directory>
-f <FDR>
-p <1=output overlap files (default), 0=no overlap files>
-m <max cluster array value (default = 280)>
-h help
perl SpliceCompare.pl -i PATH/input_files_directory -o PATH/output_files_directory -m 280 -p 1 -f 0.05
Footnotes
-
Shen S., Park JW., Lu ZX., Lin L., Henry MD., Wu YN., Zhou Q., Xing Y. rMATS: Robust and Flexible Detection of Differential Alternative Splicing from Replicate RNA-Seq Data. PNAS, 111(51):E5593-601. doi: 10.1073/pnas.1419161111 ↩
-
Park JW., Tokheim C., Shen S., Xing Y. Identifying differential alternative splicing events from RNA sequencing data using RNASeq-MATS. Methods in Molecular Biology: Deep Sequencing Data Analysis, 2013;1038:171-179 doi: 10.1007/978-1-62703-514-9_10 ↩
-
Shen S., Park JW., Huang J., Dittmar KA., Lu ZX., Zhou Q., Carstens RP., Xing Y. MATS: A Bayesian Framework for Flexible Detection of Differential Alternative Splicing from RNA-Seq Data. Nucleic Acids Research, 2012;40(8):e61 doi: 10.1093/nar/gkr1291 ↩