NYU Spring 2021
-
Download_data.sh
a. Download fastq files from SRA run selector
-
Fastqc.sh
a. Run and review QC results
-
ref_genome.sh
a. Download reference genome zip (https://gatk.broadinstitute.org/hc/en-us/articles/360035890711-GRCh37-hg19-b37-humanG1Kv37-Human-Reference-Discrepancies#hg19)
i. Manually uploaded from local to ...Final_Project/data/genome directory
ii. Unzip: in command line tar -zxvf chromFa.tar.gz
iii. Concatenate: in command line cat chr*.fa > hg19.fa
b. Index
-
Align.sh
a. Using bwa
b. Output of seq alignment is a sam file
-
Sam_to_bam.sh
a. Sort the sam file by coordinate and convert to bam (bam = binary version)
b. Deleted the sam files
-
Dedup.sh
a. Mark (PCR) duplicates using Picard/GATK
-
Readgroups.sh
a. Add readgroups
-
Sequence dict from ref genome & index bam files
a. Indexing
i. Create Sequence Dictionary from Reference Genome
ii. Index ref
iii. Index dedup.bam
-
VARIANT CALLING
a. GATK.sh
i. Haplotype caller
b. GATK_INDEL_SNP.sh
i. Separate indels and SNPs
c. GATK_var_filt.sh
i. Filtering
d. snpEff.sh
i. ANNOTATION