forked from ospupegam/ngs2-assignment
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Troubleshooting
74 lines (52 loc) · 4.58 KB
/
Troubleshooting
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
- STAR recommends a 100GB storage capacity for the whole genome to be indexed. I only have 33 GB. I will index only chromosome 22.
- Due to long processing of STAR mapping, only sample of 10000 reads were extracted from SRA file.
- Here "STAR --runThreadN 1 --runMode genomeGenerate --genomeDir ~/assign2/STARmap --genomeFastaFiles chr22_with_ERCC92.fa --sjdbGTFfile chr22_with_ERCC92.gtf --sjdbOverhang 149", I thought I had only one core processor. However, using $ nproc --all command, I realized I have 4 cores so I changed the thread number into 4 instead of 1 in the STAR mapping command. It saved much time.
- According to the satistics calculated after mapping, length of the read is 300 bp. I believe it integrated both forward and reverse reads. Therefore, in the above command, --sjdbOverhang should be 299 instead of 149.
- I thought 2-pass mapping takes two separate commands. So, I run the first mapping command then I realized I should have added "--twopassMode Basic" to the mapping command. It took less time using 4 threads and it overwrote on the results of the first mapping.
- I don't think I should merge and sort using Picard as it is only one sample I have from the SRA file. I executed it anyways.
- Viewing the difference in headers between the sorted and the merged sorted files, I don't see quite a difference as it was only one sample.
sorted:
@HD VN:1.4 SO:coordinate
@SQ SN:22 LN:50818468
merged sorted:
@HD VN:1.6 GO:none SO:coordinate
@SQ SN:22 LN:50818468
- After merging relplicates and viewing the mapping statistics, surprisingly, only 75014 reads are shown!! Out of 1 million reads! I am not quite sure if I am doing it the right way or not. I don't undrstand the resulting statistics.
- I don't understand why we index again at GATK even though we index using STAR
- GATK4 updated version has a different command syntax for splitting N cigars
- For base recalibration, I had to search for vcf for known variants of chr22 in human on Ensembl.
- Some meta-information on the vcf downloaded :Somatic mutations found in human cancers from the COSMIC catalogue, Variants from HGMD-PUBLIC dataset December 2018
- During recalibration, I run into those errors:
A USER ERROR has occurred: Cannot read file:///home/nehal/assign2/STARmap/human_fam_chr22.vcf because no suitable codecs found
A USER ERROR has occurred: Couldn't read file. Error was: Aligned.out_merged.report with exception: Aligned.out_merged.report (No such file or directory)
I concluded that I should index vcf file and sort it. I added dictionary to the vcf file, sorted and indexed it. I still get the same error.
- During haplotype caller, I received the following error: A USER ERROR has occurred: Argument --emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace
I tired different command: gatk --java-options "-Xmx4g" HaplotypeCaller \
-R chr22_with_ERCC92.fa \
-I Aligned.out_merged.dedup.bam \
-O output.g.vcf.gz
I have the following error: java.lang.IllegalArgumentException: samples cannot be empty. I searched the error and validated the dedupbam using:
java -jar ~/miniconda3/envs/ngs1/share/picard-2.19.0-0/picard.jar ValidateSamFile \
I=Aligned.out_merged.dedup.bam \
R=chr22_with_ERCC92.fa \
MODE=VERBOSE
java -jar ~/miniconda3/envs/ngs1/share/picard-2.19.0-0/picard.jar ValidateSamFile \
I=Aligned.out_merged.sorted.bam \
R=chr22_with_ERCC92.fa \
MODE=VERBOSE
I got the following error: ERROR: NM tag (nucleotide differences) is missing, Read groups is empty.
I used the following command to fix the NM tag error:
samtools calmd -bAr Aligned.out_merged.dedup.bam chr22_with_ERCC92.fa > Aligned.out_merged_fixeddedup.bam
I still have the same error: ERROR: Read groups is empty
I have to idea what to do
cd assign2
mkdir draft && cd draft
ln -s ~/assign2/SRR8797509.sra
fastq-dump --split-files -X 10000 ~/assign2/SRR8797509.sra
mkdir STAR && cd STAR
ln -s ~/workdir/sample_data/chr22_with_ERCC92.fa
ln -s ~/workdir/sample_data/chr22_with_ERCC92.gtf
STAR --runThreadN 1 --runMode genomeGenerate --genomeDir ~/assign2/draft/STAR --genomeFastaFiles chr22_with_ERCC92.fa --sjdbGTFfile chr22_with_ERCC92.gtf --sjdbOverhang 149
#pass-2 STAR mapping
STAR --runThreadN 1 --genomeDir ~/assign2/draft/STAR --readFilesIn ~/assign2/draft/SRR8797509_1.fastq ~/assign2/draft/SRR8797509_2.fastq -sjdbFileChrStartEnd ~/assign2/draft/STAR/sjdbList.out.tab