-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Steven Monger
committed
Oct 30, 2018
0 parents
commit 42de077
Showing
95 changed files
with
345,155 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
Copyright (c) 2003, The Institute for Genomic Research (TIGR), Rockville, | ||
Maryland, U.S.A. All rights reserved. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
The Artistic License | ||
|
||
Preamble | ||
|
||
The intent of this document is to state the conditions under which a | ||
Package may be copied, such that the Copyright Holder maintains some | ||
semblance of artistic control over the development of the package, | ||
while giving the users of the package the right to use and distribute | ||
the Package in a more-or-less customary fashion, plus the right to | ||
make reasonable modifications. | ||
|
||
Definitions: | ||
* "Package" refers to the collection of files distributed by the | ||
Copyright Holder, and derivatives of that collection of files | ||
created through textual modification. | ||
* "Standard Version" refers to such a Package if it has not been | ||
modified, or has been modified in accordance with the wishes of | ||
the Copyright Holder. | ||
* "Copyright Holder" is whoever is named in the copyright or | ||
copyrights for the package. | ||
* "You" is you, if you're thinking about copying or distributing | ||
this Package. | ||
* "Reasonable copying fee" is whatever you can justify on the | ||
basis of media cost, duplication charges, time of people | ||
involved, and so on. (You will not be required to justify it to | ||
the Copyright Holder, but only to the computing community at | ||
large as a market that must bear the fee.) | ||
* "Freely Available" means that no fee is charged for the item | ||
itself, though there may be fees involved in handling the | ||
item. It also means that recipients of the item may redistribute | ||
it under the same conditions they received it. | ||
|
||
1. You may make and give away verbatim copies of the source form of | ||
the Standard Version of this Package without restriction, provided | ||
that you duplicate all of the original copyright notices and | ||
associated disclaimers. | ||
|
||
2. You may apply bug fixes, portability fixes and other modifications | ||
derived from the Public Domain or from the Copyright Holder. A | ||
Package modified in such a way shall still be considered the | ||
Standard Version. | ||
|
||
3. You may otherwise modify your copy of this Package in any way, | ||
provided that you insert a prominent notice in each changed file | ||
stating how and when you changed that file, and provided that you | ||
do at least ONE of the following: | ||
|
||
a) place your modifications in the Public Domain or otherwise make | ||
them Freely Available, such as by posting said modifications to | ||
Usenet or an equivalent medium, or placing the modifications on a | ||
major archive site such as ftp.uu.net, or by allowing the | ||
Copyright Holder to include your modifications in the Standard | ||
Version of the Package. | ||
|
||
b) use the modified Package only within your corporation or | ||
organization. | ||
|
||
c) rename any non-standard executables so the names do not | ||
conflict with standard executables, which must also be provided, | ||
and provide a separate manual page for each non-standard | ||
executable that clearly documents how it differs from the Standard | ||
Version. | ||
|
||
d) make other distribution arrangements with the Copyright Holder. | ||
|
||
4. You may distribute the programs of this Package in object code or | ||
executable form, provided that you do at least ONE of the | ||
following: | ||
|
||
a) distribute a Standard Version of the executables and library | ||
files, together with instructions (in the manual page or | ||
equivalent) on where to get the Standard Version. | ||
|
||
b) accompany the distribution with the machine-readable source of | ||
the Package with your modifications. | ||
|
||
c) accompany any non-standard executables with their corresponding | ||
Standard Version executables, giving the non-standard executables | ||
non-standard names, and clearly documenting the differences in | ||
manual pages (or equivalent), together with instructions on where | ||
to get the Standard Version. | ||
|
||
d) make other distribution arrangements with the Copyright Holder. | ||
|
||
5. You may charge a reasonable copying fee for any distribution of | ||
this Package. You may charge any fee you choose for support of this | ||
Package. You may not charge a fee for this Package itself. However, | ||
you may distribute this Package in aggregate with other (possibly | ||
commercial) programs as part of a larger (possibly commercial) | ||
software distribution provided that you do not advertise this | ||
Package as a product of your own. | ||
|
||
6. The scripts and library files supplied as input to or produced as | ||
output from the programs of this Package do not automatically fall | ||
under the copyright of this Package, but belong to whomever | ||
generated them, and may be sold commercially, and may be aggregated | ||
with this Package. | ||
|
||
7. C or perl subroutines supplied by you and linked into this Package | ||
shall not be considered part of this Package. | ||
|
||
8. The name of the Copyright Holder may not be used to endorse or | ||
promote products derived from this software without specific prior | ||
written permission. | ||
|
||
9. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED | ||
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES | ||
OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE. | ||
|
||
The End | ||
This license is approved by the Open Source Initiative | ||
(www.opensource.org) for certifying software as OSI Certified Open | ||
Source. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
#!/bin/bash | ||
#set default parameters | ||
POSITIONAL=() | ||
INPUTVCF="FALSE" | ||
INPUTBED="FALSE" | ||
INPUTFILES="" | ||
USEBP="FALSE" | ||
USEBPINDELS="FALSE" | ||
ANNOTATION="" | ||
FASTAPATH="" | ||
#parse command line args | ||
while [[ $# -gt 0 ]] | ||
do | ||
key="$1" | ||
case $key in | ||
-inputVCF) | ||
INPUTVCF="TRUE" | ||
INPUTFILES="$2" | ||
shift | ||
shift | ||
while [ "$1" ] && [[ ! $1 == *-* ]]; do | ||
INPUTFILES="$INPUTFILES $1" | ||
shift | ||
done | ||
;; | ||
-inputBED) | ||
INPUTBED="TRUE" | ||
INPUTFILES="$2" | ||
shift | ||
shift | ||
while [ "$1" ] && [[ ! $1 == *-* ]]; do | ||
INPUTFILES="$INPUTFILES $1" | ||
shift | ||
done | ||
;; | ||
-fasta) | ||
FASTAPATH="$2" | ||
shift | ||
shift ;; | ||
-gtf) | ||
ANNOTATION="$2" | ||
shift | ||
shift ;; | ||
-branchpointer) | ||
USEBP="TRUE" | ||
USEBPINDELS="FALSE" | ||
shift ;; | ||
-branchpointerIndels) | ||
USEBPINDELS="TRUE" | ||
USEBP="FALSE" | ||
shift ;; | ||
*) | ||
POSITIONAL+=("$1") | ||
shift ;; | ||
esac | ||
done | ||
set - "${POSITIONAL[@]}" | ||
#input error handling | ||
if [ ! -f $FASTAPATH ]; then | ||
echo -e "Fasta file not found: use -fasta ./path/to/hgXX.fa\nExiting..." | ||
exit 1 | ||
elif [ ! -f "$ANNOTATION" ]; then | ||
echo "GTF annotation file not found: use -annotation path/to/gencodeXX.gtf\nExiting..." | ||
exit 1 | ||
fi | ||
#prepare splice site intervals from annotation.gtf | ||
if [ ! -f data/annotationIntervals.txt ] || [[ "$ANNOTATION" -nt data/annotationIntervals.txt ]] ; then | ||
echo "Preparing splice site annotation..." | ||
grep '[[:blank:]]gene[[:blank:]]\|[[:blank:]]exon[[:blank:]]' data/"$ANNOTATION" | java -cp bin getSpliceSiteIntervalsFromGTF > data/annotationIntervals.txt | ||
fi | ||
#for each file | ||
for FILE in $INPUTFILES; do | ||
fileID=$(echo "$FILE" | xargs -n 1 basename) | ||
echo "Input file: $fileID" | ||
#remove temp files from any previous run | ||
rm temp/"$fileID"* 2> /dev/null | ||
#sort body of input file | ||
grep "^#" "$FILE" > temp/"$fileID"_sorted | ||
grep -v "^#" "$FILE" | sort -k1,1 -k2,2n >> temp/"$fileID"_sorted | ||
#bedtools intersect to get strand info | ||
echo "Retrieving strand info..." | ||
grep '[[:blank:]]gene[[:blank:]]' data/"$ANNOTATION" | sort -k1,1 -k4,4n | bedtools intersect -a temp/"$fileID"_sorted -b stdin -wa -wb -sorted > temp/"$fileID"unstrandedInput.txt | ||
#generate flanking intervals.bed for bedtools getfasta | ||
if [ ! "$INPUTVCF" = "FALSE" ]; then | ||
grep '[[:blank:]]+[[:blank:]]' temp/"$fileID"unstrandedInput.txt | awk -v OFS="\\t" '{print ".", $1, $2, "+", $4, $5}' | sort -u | java -cp bin getFastaIntervals > temp/"$fileID"fastaIntervals.bed | ||
grep '[[:blank:]]-[[:blank:]]' temp/"$fileID"unstrandedInput.txt | awk -v OFS="\\t" '{print ".", $1, $2, "-", $4, $5}' | sort -u | java -cp bin getFastaIntervals >> temp/"$fileID"fastaIntervals.bed | ||
elif [ ! "$INPUTBED" = "FALSE" ]; then | ||
grep '[[:blank:]]+[[:blank:]]' temp/"$fileID"unstrandedInput.txt | awk -v OFS="\\t" '{print ".", $1, $2, "+", $5, $6}' | sort -u | java -cp bin getFastaIntervals > temp/"$fileID"fastaIntervals.bed | ||
grep '[[:blank:]]-[[:blank:]]' temp/"$fileID"unstrandedInput.txt | awk -v OFS="\\t" '{print ".", $1, $2, "-", $5, $6}' | sort -u | java -cp bin getFastaIntervals >> temp/"$fileID"fastaIntervals.bed | ||
fi | ||
echo "Retrieving flanking FASTA sequence..." | ||
bedtools getfasta -fi $FASTAPATH -bed temp/"$fileID"fastaIntervals.bed -name -s > temp/"$fileID"seqToScan.FASTA | ||
#seqScan: generates input strings for maxentscan and genesplicer as well as ESRseq scores | ||
echo "Scanning for motifs..." | ||
java -cp bin seqScan temp/"$fileID"seqToScan.FASTA -useESR $fileID 1>&2 | ||
#run maxentscan | ||
echo "Running MaxEntScan..." | ||
perl score5.pl temp/"$fileID"mesDonorInput.txt | java -cp bin processScoresMES > temp/"$fileID"mesDonorScores.txt | ||
retVal=( ${PIPESTATUS[0]} ) | ||
if [ $retVal -ne 0 ]; then | ||
echo "MaxEntScan returned non-zero exit status. It is likely not all variants were processed. Exiting..." | ||
exit $retVal | ||
fi | ||
perl score3.pl temp/"$fileID"mesAcceptorInput.txt | java -cp bin processScoresMES > temp/"$fileID"mesAcceptorScores.txt | ||
retVal=( ${PIPESTATUS[0]} ) | ||
if [ $retVal -ne 0 ]; then | ||
echo "MaxEntScan returned non-zero exit status. It is likely not all variants were processed. Exiting..." | ||
exit $retVal | ||
fi | ||
#run genesplicer | ||
echo "Running GeneSplicer..." | ||
bin/linux/genesplicerAdapted temp/"$fileID"gsInput.FASTA human > temp/"$fileID"gsScores.txt | ||
#run branchpointer SNPs | ||
if [ "$USEBP" = "TRUE" ]; then | ||
echo "Running Branchpointer..." | ||
Rscript --slave --vanilla bin/bpProcessing.R temp/"$fileID"input.txt $(pwd) "$BPGTF" &> output/"$fileID"bpLog.txt | ||
#awk -v OFS=\\t '{print $2, $3, $4, $8, $9, $15, $16, $21, $22, $23, $24}' output/bpOutputSNPs.txt > output/bpSNPsSummarised.txt | ||
fi | ||
#run branchpointer indels | ||
if [ "$USEBPINDELS" = "TRUE" ] ; then | ||
echo "Running Branchpointer..." | ||
while read -r na chr start strand ref alt; do | ||
refLength=${#ref} | ||
altLength=${#ref} | ||
end=$((refLength+start)) | ||
if [ $altLength -gt 1 ] || [ $refLength -gt 1 ]; then | ||
echo -e ".\t$chr\t$start\t$end\t$strand\t$ref\t$alt" >> temp/"$fileID"bpInputIndels.txt | ||
fi | ||
done < temp/"$fileID"input.txt | ||
Rscript --slave --vanilla bin/bpProcessingINDELS.R temp/"$fileID"input.txt temp/"$fileID"bpInputIndels.txt $(pwd) "$BPGTF" &> output/"$fileID"bpLog.txt | ||
#awk -v OFS=\\t '{print $2, $3, $4, $8, $9, $16, $17, $23, $24, $25, $26}' output/bpOutputIndels.txt > output/bpIndelsSummarised.txt | ||
#awk -v OFS=\\t '{print $2, $3, $4, $8, $9, $15, $16, $22, $23, $24, $25}' $SNPpath"bpOutput_SNPs.txt" > output/bpSNPsSummarised.txt | ||
fi | ||
#merge scores into one line | ||
echo "Processing scores..." | ||
cat temp/"$fileID"mesDonorScores.txt temp/"$fileID"mesAcceptorScores.txt temp/"$fileID"gsScores.txt temp/"$fileID"ESRoutput.txt data/annotationIntervals.txt | sort -k1,1 -V -k 2,2n -k 3 -k 4 -s | java -cp bin mergeOutput > output/"$fileID"_out.txt | ||
#clean up temp files | ||
rm temp/"$fileID"* 2> /dev/null | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Assess the effects of SNPs on branchpoint annotations (adapted from Branchpointer vignette) | ||
# read in fileName as Rscript argument | ||
args=(commandArgs(TRUE)) | ||
if(length(args)<3){ | ||
print("Error: Not all arguments supplied.") | ||
}else{ | ||
querySNPFile <- args[1] | ||
outputPath <- args[2] | ||
annotationPath <- args[3] | ||
} | ||
|
||
#load libraries and exon annotation | ||
suppressMessages(library(knitr)) | ||
suppressMessages(library(BSgenome.Hsapiens.UCSC.hg38)) | ||
suppressMessages(library(branchpointer)) | ||
g <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38 | ||
exons <- gtfToExons(annotationPath) | ||
#format and filter for branchpoint window SNPs | ||
querySNP <- readQueryFile(querySNPFile, | ||
queryType = "SNP", | ||
exons = exons, | ||
filter = TRUE, | ||
maxDist = 50) | ||
|
||
#predict branchpoints and specify number of cores | ||
branchpointPredictionsSNP <- predictBranchpoints(querySNP, | ||
queryType = "SNP", | ||
BSgenome = g, | ||
useParallel = TRUE, | ||
cores = 8) | ||
|
||
#summarise and define prediction thresholds | ||
querySNPSummary <- predictionsToSummary(querySNP,branchpointPredictionsSNP, probabilityCutoff = 0.52, probabilityChange = 0.15) | ||
|
||
#identify branchpoints created/removed by SNP | ||
if (FALSE) { | ||
l <- length(querySNPSummary) | ||
BPcreated_or_removed <- vector() | ||
for (i in 1:l) { | ||
if (querySNPSummary$created_n[i]==1 | querySNPSummary$deleted_n[i]==1) { | ||
BPcreated_or_removed <- c(BPcreated_or_removed, i) | ||
} | ||
} | ||
|
||
#write to file | ||
fileDestination <- "bpOutputSNPs.txt" | ||
write.table(querySNPSummary[BPcreated_or_removed], fileDestination , quote = FALSE, sep="\t") | ||
} | ||
|
||
setwd(outputPath) | ||
fileDestination <- "./output/bpOutputSNPs.txt" | ||
write.table(querySNPSummary, fileDestination , quote = FALSE, sep="\t") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# Assess the effects of SNPs and INDELS on branchpoint annotations (adapted from Branchpointer vignette) | ||
# read in fileName as Rscript argument | ||
args=(commandArgs(TRUE)) | ||
if(length(args)<3){ | ||
print("Error: Not all arguments supplied.") | ||
}else{ | ||
querySNPFile <- args[1] | ||
queryIndelFile <- args[2] | ||
outputPath <- args[3] | ||
annotationPath <- args[4] | ||
} | ||
|
||
#load libraries and exon annotation | ||
suppressMessages(library(knitr)) | ||
suppressMessages(library(BSgenome.Hsapiens.UCSC.hg38)) | ||
suppressMessages(library(branchpointer)) | ||
g <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38 | ||
exons <- gtfToExons(annotationPath) | ||
|
||
#format and filter for branchpoint window SNPs | ||
querySNP <- readQueryFile(querySNPFile, | ||
queryType = "SNP", | ||
exons = exons, | ||
filter = TRUE, | ||
maxDist = 50) | ||
|
||
#predict branchpoints and specify number of cores | ||
branchpointPredictionsSNP <- predictBranchpoints(querySNP, | ||
queryType = "SNP", | ||
BSgenome = g, | ||
useParallel = TRUE, | ||
cores = 8) | ||
|
||
#summarise and define prediction thresholds | ||
querySNPSummary <- predictionsToSummary(querySNP,branchpointPredictionsSNP, probabilityCutoff = 0.52, probabilityChange = 0.15) | ||
|
||
#format and filter for branchpoint window indels | ||
queryIndel <- readQueryFile(queryIndelFile, | ||
queryType = "indel", | ||
exons = exons) | ||
|
||
#predict branchpoints and specify number of cores | ||
branchpointPredictionsIndel <- predictBranchpoints(queryIndel, | ||
queryType = "indel", | ||
BSgenome = g, | ||
useParallel = TRUE, | ||
cores = 8) | ||
|
||
#summarise and define prediction thresholds | ||
queryIndelSummary <- predictionsToSummary(queryIndel,branchpointPredictionsIndel, probabilityCutoff = 0.52, probabilityChange = 0.15) | ||
|
||
#identify branchpoints created/removed by Indel | ||
if (FALSE) { | ||
l <- length(queryIndelSummary) | ||
BPcreated_or_removed <- vector() | ||
for (i in 1:l) { | ||
if (queryIndelSummary$created_n[i]>=1 | queryIndelSummary$deleted_n[i]>=1) { | ||
BPcreated_or_removed <- c(BPcreated_or_removed, i) | ||
} | ||
} | ||
} | ||
|
||
#write to file | ||
setwd(outputPath) | ||
outputFileSNPs <- "./output/bpOutputSNPs.txt" | ||
outputFileIndels <- "./output/bpOutputIndels.txt" | ||
write.table(querySNPSummary, outputFileSNPs , quote = FALSE, sep="\t") | ||
write.table(queryIndelSummary, outputFileIndels , quote = FALSE, sep="\t") |
Binary file not shown.
Oops, something went wrong.