Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large DEL filtered by Sniffles #367

Open
ziphra opened this issue Nov 15, 2022 · 5 comments
Open

Large DEL filtered by Sniffles #367

ziphra opened this issue Nov 15, 2022 · 5 comments

Comments

@ziphra
Copy link

ziphra commented Nov 15, 2022

Hello,

I am using Sniffles Version 2.0.7.

We used nanopore sequencing with 18X median coverage to recover an 8Mb deletion.

Variant calling with Sniffles did not call this deletion, with default parameters and with --long-del-coverage 1

But running Sniffles with the --no-qc option as suggested in #366 showed that this deletion was filtered because of COV_CHANGE:

chr2	220393162	Sniffles2.DEL.59CAS1	N	<DEL>	60	COV_CHANGE	PRECISE;SVTYPE=DEL;SVLEN=-8514152;END=228907314;SUPPORT=6;COVERAGE=14,8,16,10,15;STRAND=+-;AF=0.545;STDEV_LEN=0.000;STDEV_POS=0.000	GT:GQ:DR:DV	0/1:33:5:6

Indeed, as I understand the COV_CHANGE filter, (14+15)/2*1 is still bigger than coverage near the center for this deletion (I assumed the svcall.coverage_center used for COV_CHANGE filtering is coverage near the center - so 16 here).

In the future, I guess we could set a very high value for --long-del-coverage to not miss this kind of deletion.

However, I feel like large coverage variation could be expected for such large deletion. Maybe, using the mean coverage for large deletion would be more appropriate ? Here, the mean coverage would be 11, so just under the threshold in our case with --long-del-coverage=1 .

Also, STDEV_LEN and STDEV_POS =0, which could be considered for variant filtering.

Thank you,

@stefandiederich
Copy link

Hi,
we are facing a similar problem with sniffles V 2.0.7. also using ONT Data. The deletion

chr14 88391507 Sniffles2.DEL.181SE gttgcat...caatttagttcttt N 60 COV_MIN PRECISE;SVTYPE=DEL;SVLEN=-31666;END=88423173;SUPPORT=15;COVERAGE=13,0,0,0,14;STRAND=+-;STDEV_LEN=0.000;STDEV_POS=0.000 GT:GQ:DR:DV ./.:0:0:0

was filtered by Sniffles with COV_MIN filter (running with --no-qc).

I then tried to use --minsupport 1 and also --long-del-coverage 1 to see if there is any chance to have this variant in the resulting vcf. But it is always filtered out. Which parameter do we have to adjust, so the variant is poping up in the output?

Bests
Stefan

@Phillip-a-richmond
Copy link

I had a similar issue...for a 160kb DEL validated orthogonally (Illumina, PCR) and clearly visible in the reads...but I do see that the coverage fluctuates near the centre of the deletion. Trying with --no-qc to see if the variant is removed, but would also +1 to Stefan's request for other parameters to alter for this specific problem.

Thanks,
Phil

@Phillip-a-richmond
Copy link

Turns out for my DEL this is fully missed by Sniffles, but picked up by CuteSV. Even after adding the --no-qc as suggested above:

Code:

sniffles --input $Proband_BAM \
	--vcf ${Proband_ID}_noQC.vcf.gz \
	--reference $Fasta_Dir/$Fasta_File \
	--no-qc \
	--snf ${Proband_ID}_noQC.snf

The Deletion in question:
DH0808_CEP170

(I know that in this snapshot I didn't expand feature visibility for the default sniffles but it's not there either). It's also not shown in the --no-qc file at all, showing this in the output VCF showing these variants upstream and downstream of our de novo deletion of interest:

chr1	243109028	Sniffles2.INS.2713S0	N	AAAATGCCTTCTTTTGCCTATTTTATTAAGGATGTAATAACCCTAATGGCCTTTCATGAAGAGCATTCTCTCCAAATGCATTGCACTGGGACACTCCCGAGGGTCCTGGGCCAACACACACTTATAACATAAAATGTAAAAGGGG	60	SUPPORT_MIN	PRECISE;SVTYPE=INS;SVLEN=145;END=243109028;SUPPORT=1;COVERAGE=22,22,22,22,22;STRAND=-;AF=0.045;STDEV_LEN=0;STDEV_POS=0;SUPPORT_LONG=0	GT:GQ:DR:DV	0/0:48:21:1
chr1	243128302	Sniffles2.INS.2714S0	N	TGCAGGGAAAGCAATAACAAAAATTAGCCTACTTTTAGCTAAATGTTATCACTTTACAAGCAATGAATTTCACTCTCACTTTATTTGGAACACTTAATATTATCAT	60	SUPPORT_MIN	PRECISE;SVTYPE=INS;SVLEN=106;END=243128302;SUPPORT=1;COVERAGE=8,8,8,8,8;STRAND=-;AF=0.125;STDEV_LEN=0;STDEV_POS=0;SUPPORT_LONG=0	GT:GQ:DR:DV	0/0:9:7:1

From CuteSV (cutting out ref+alt cols because CuteSV puts the entire 164kb seq in the ref column...)

chr1 243119556 cuteSV.DEL.1178 CCAG... C . PASS PRECISE;SVTYPE=DEL;SVLEN=-164482;END=243284038;CIPOS=-0,0;CILEN=-0,0;RE=6;RNAMES=NULL;STRAND=+- GT:DR:DV:PL:GQ ./.:.:6:.,.,.:.

@fritzsedlazeck if there is a test version of Sniffles you'd like me to try to get this variant to be detected let me know.

Thanks,
Phil

@fritzsedlazeck
Copy link
Owner

Dear all,
thanks for the clear reports!

We have identified some things over the past weeks and will soon make a new release.
In addition, we have identified the parameters that likely are causing these issues and will further optimize them. For this it would be fantastic if you could share some of these regions (bam file) with me: [email protected]. I know data sharing is often tricky, but I hope to obtain these regions (+/-2kbp) to make sure that won't happen anymore in the future!

@smolkmo is also further including other debug options (e.g. read tracing) so we can easier see why Sniffles is ignoring certain reads/regions easier.

Thank you all
Fritz

@yajun1314
Copy link

Hello,
This is my structural variation format. Why is each read-long reference genome sequence N base? Why is the FILTER column full of PASS? I also want to know where to check the filter commands and conditions of the sniffles software.

##fileformat=VCFv4.2
##source=Sniffles2_2.0.7
##command="/data/dongjie/anaconda3/envs/dj/bin/sniffles -i /data/dongjie/ONT/output/JD17-HN35mapped.sorted-4.bam -v /data/dongjie/ONT/output/JD17-HN35variants-4.vcf"
##fileDate="2022/12/08 04:45:22"
##contig=<ID=Chr01,length=59293188>
##contig=<ID=Chr02,length=52595666>
##contig=<ID=Chr03,length=47832972>
##contig=<ID=Chr04,length=52830390>
##contig=<ID=Chr05,length=43637999>
##contig=<ID=Chr06,length=52199335>
##contig=<ID=Chr07,length=47014944>
##contig=<ID=Chr08,length=49700108>
##contig=<ID=Chr09,length=50106997>
##contig=<ID=Chr10,length=54367898>
##contig=<ID=Chr11,length=41062601>
##contig=<ID=Chr12,length=43377035>
##contig=<ID=Chr13,length=46482933>
##contig=<ID=Chr14,length=52685030>
##contig=<ID=Chr15,length=53594103>
##contig=<ID=Chr16,length=39861124>
##contig=<ID=Chr17,length=42923138>
##contig=<ID=Chr18,length=60455165>
##contig=<ID=Chr19,length=52277988>
##contig=<ID=Chr20,length=50612565>
##contig=<ID=Chr21,length=28839>
##contig=<ID=Chr22,length=54539>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
Chr01 4372 Sniffles2.DEL.1833S0 N 58 PASS PRECISE;SVTYPE=DEL;SVLEN=-47;END=4419;SUPPORT=3;COVERAGE=4,4,4,4,4;STRAND=-;AF=0.750;STDEV_LEN=0.577;STDEV_POS=0.577 GT:GQ:DR:DV 0/1:1:1:3
Chr01 7271 Sniffles2.DEL.1838S0 N 58 PASS PRECISE;SVTYPE=DEL;SVLEN=-428;END=7699;SUPPORT=3;COVERAGE=4,4,4,4,7;STRAND=-;AF=0.750;STDEV_LEN=2.082;STDEV_POS=12.662 GT:GQ:DR:DV 0/1:1:1:3
Chr01 9454 Sniffles2.DEL.183BS0 N 58 PASS PRECISE;SVTYPE=DEL;SVLEN=-41;END=9495;SUPPORT=4;COVERAGE=7,6,6,6,4;STRAND=+-;AF=0.667;STDEV_LEN=0.000;STDEV_POS=0.000 GT:GQ:DR:DV 0/1:8:2:4
Chr01 84948 Sniffles2.DUP.7C85S0 N 60 PASS PRECISE;SVTYPE=DUP;SVLEN=31856;END=116804;SUPPORT=13;COVERAGE=40,40,80,25,25;STRAND=+-;AF=0.310;STDEV_LEN=3.240;STDEV_POS=0.000 GT:GQ:DR:DV 0/1:49:29:13
Chr01 108653 Sniffles2.DEL.185ES0 N 60 PASS PRECISE;SVTYPE=DEL;SVLEN=-6003;END=114656;SUPPORT=10;COVERAGE=62,27,29,25,38;STRAND=+-;AF=0.370;STDEV_LEN=2.340;STDEV_POS=2.340 GT:GQ:DR:DV 0/1:52:17:10
Chr01 738346 Sniffles2.INS.2BS0 N ATATATATATATATATATATATATATATATATATAT 60 PASS PRECISE;SVTYPE=INS;SVLEN=36;END=738346;SUPPORT=11;COVERAGE=22,22,20,21,21;STRAND=+-;AF=0.550;STDEV_LEN=2.498;STDEV_POS=20.410;SUPPORT_LONG=0 GT:GQ:DR:DV 0/1:59:9:11

Best wishes to you.
Sincerely  yours,
Dong Yajun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants