Skip to content
This repository has been archived by the owner on Mar 2, 2021. It is now read-only.

Commit

Permalink
Various updates to annotation
Browse files Browse the repository at this point in the history
  • Loading branch information
RCollins13 committed Sep 20, 2018
1 parent a211328 commit 215ef4a
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 6 deletions.
22 changes: 19 additions & 3 deletions svtk/annotation/annotate.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ def annotate(sv, gencode, noncoding):
'##INFO=<ID=COPY_GAIN,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to have a copy-gain effect.">',
'##INFO=<ID=INTRONIC,Number=.,Type=String,Description="Gene(s) where the SV was found to lie entirely within an intron.">',
'##INFO=<ID=DUP_PARTIAL,Number=.,Type=String,Description="Gene(s) which are partially overlapped by an SV\'s duplication, such that an unaltered copy is preserved.">',
'##INFO=<ID=MSV_EXON_OVR,Number=.,Type=String,Description="Gene(s) on which the multiallelic SV would be predicted to have a LOF, DUP_LOF, COPY_GAIN, or DUP_PARTIAL annotation if the SV were biallelic.">',
'##INFO=<ID=INV_SPAN,Number=.,Type=String,Description="Gene(s) which are entirely spanned by an SV\'s inversion.">',
'##INFO=<ID=UTR,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to disrupt a UTR.">',
'##INFO=<ID=NEAREST_TSS,Number=.,Type=String,Description="Nearest transcription start site to intragenic variants.">',
Expand Down Expand Up @@ -155,9 +156,24 @@ def annotate_vcf(vcf, gencode, noncoding, annotated_vcf):
fout.write(record)
continue

for info, genelist in anno.items():
if genelist != 'NA':
record.info[info] = genelist
#Handle general catch-all intersection for MULTIALLELIC variants
if 'MULTIALLELIC' in record.filter:
multi_ovr = []
for info, genelist in anno.items():
if info in 'LOF DUP_LOF COPY_GAIN DUP_PARTIAL'.split():
if genelist != 'NA':
for gene in genelist.split(','):
if gene not in multi_ovr:
multi_ovr.append(gene)
else:
if genelist != 'NA':
record.info[info] = genelist
if len(multi_ovr) > 0:
record.info['MSV_EXON_OVR'] = ','.join(multi_ovr)
else:
for info, genelist in anno.items():
if genelist != 'NA':
record.info[info] = genelist

if 'NEAREST_TSS' in record.info:
record.info['INTERGENIC'] = True
Expand Down
2 changes: 1 addition & 1 deletion svtk/annotation/classify_effect.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ def classify_disrupt(disrupt_dict, svtype):

if svtype == 'DEL':
return classify_del(disrupt_dict)
if svtype == 'DUP':
if svtype in 'DUP MCNV'.split():
return classify_dup(disrupt_dict)
if svtype == 'INV':
return classify_inv(disrupt_dict)
Expand Down
4 changes: 2 additions & 2 deletions svtk/cli/annotate.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@
The following classes of genic effects are annotated as new VCF INFO fields if
the SV meets the defined criteria:
1) LOF - Loss of function.
1) LOF (and DUP_LOF) - Loss of function.
* Deletions are annotated LOF if they overlap any exon.
* Duplications are annotated LOF if they reside entirely within
* Duplications are annotated DUP_LOF if they reside entirely within
a gene boundary and overlap any exon.
* Inversions are annotated LOF if reside entirely within an exon, if
one breakpoint falls within an exon, if they reside entirely within a
Expand Down

0 comments on commit 215ef4a

Please sign in to comment.