CHANGELOG

0.9.1 - November 30th 2020

Data updates: ClinVar, GWAS catalog, CIViC, CancerMine, dbNSFP, KEGG, ChEMBL/DGIdb, Disease Ontology, Experimental Factor Ontology

Added

added possibility to configure algorithm for TMB calculation, optional argument tmb_algorithm - all coding variants (all_coding) or non-synonymous variants only (nonsyn)
R code subject to static analysis with lintr
Improved Conda recipe (i.e. meta.yaml) with version pinning of all package dependencies

Changed

Removed DisGeNET annotations from output (associations from Open Targets Platform serve same purpose)
Version pinning of software dependencies in Dockerfile:
- All R packages necessary for PCGR is installed using the renv framework, ensuring improved versioning and reproducibility
- Other tools/utilities and Python libraries that have been version pinned:
  - bedtools, samtools, numpy, cython, scipy, cyvcf2, toml, pandas

0.9.0rc - September 24th 2020

Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine, UniProt KB, dbNSFP, Pfam, KEGG, Open Targets Platform
Software updates: VEP 101

Fixed

An extra comma was mistakenly present in the template for tier 2 variants, issue #96
Missing protein domain annotations for grch38, issue #116

Changed

All arguments to pcgr.py is now non-positional
Arguments to pcgr.py are divided into two groups: required and optional
Options allelic_support:tumor_dp_min, allelic_support:tumor_af_min, allelic_support:control_dp_min, allelic_support:control_af_max in PCGR configuration file are now optional arguments --tumor_dp_min, --tumor_af_min, --control_dp_min, –control_af_maxincpsr.py`
Option mutational_burden:mutational_burden in PCGR configuration file is now optional argument --estimate_tmb in pcgr.py
Option msi:msi in PCGR configuration file is now optional argument --estimate_msi_status in pcgr.py
Option mutational_signatures:mutational_signatures in PCGR configuration file is now optional argument --estimate_signatures in pcgr.py
Options mutational_signatures:mutsignatures_signature_limit, mutational_signatures:mutsignatures_normalization, mutational_signatures:mutsignatures_mutation_limit, mutational_signatures:mutsignatures_cutoff are removed (used for deconstructSigs analysis, which is no longer in use)
Optional argument --cna_overlap_pct in pcgr.py replaces cna:cna_overlap_pct in PCGR configuration file
Optional argument --logr_gain in pcgr.py replaces cna:logr_gain in PCGR configuration file
Optional argument --logr_homdel in pcgr.py replaces cna:logr_homdel in PCGR configuration file
Removed mutational_burden:tmb_low_limit and mutational_burden:tmb_intermediate_limit - TMB is no longer interpreted in the context of thresholds
Classifications of genes as tumor suppressors/oncogenes are now based on a combination of CancerMine citation count and presence in Network of Cancer Genes
Settings section of report is now divived into three:
- Metadata - sample and sequencing assay
- Report configuration

Added

Optional argument --include_trials in pcgr.py - includes a section with annotated clinical trials for the tumor type in question
Optional argument --assay in pcgr.py - designates type of sequencing assay
Optional argument --cell_line in pcgr.py - designates runs of tumor cell lines (only for display, not used to configure any analysis)
Optional argument --min_mutations_signatures in pcgr.py - minimum number of required mutations for mutational signature analysis with MutationalPatterns
Optional argument --all_reference_signatures in pcgr.py - considers all reference signatures during fitting of mutational profile to known signatures
Optional argument --estimate_signatures now also includes detection of potential kataegis events (WGS/WES assays only), and rainfall plot in the flexdashboard output
The user can now distinguish (through color codes) whether a biomarker has been mapped exactly (nucleotide change) or at a regional level (codon/exon)
All variant-associated biomarkers (regardless of assignment to TIER 1/2) are now found in a new section (SNVs/InDels)
For copy number amplifications, other putative drug targets in cancer are listed in a new section
Detailed documentation of report contents are added to the Documentation section
References are updated and all provided with DOI

0.8.4 - November 18th 2019

Data updates: ClinVar, CIViC, CancerMine, UniProt KB
Software updates: VEP 98.3

0.8.3 - October 14th 2019

Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine
Software updates: VEP 98.2, vcf2tsv

Fixed

More improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)

Added

Possibility to filter evidence items by RATING in interactive data tables

Changed

Option target_size_mb in pcgr.py replaces target_size_mb in configuration file, more convenient in terms of configuring runs
Option tumor_type in pcgr.py replaces tumor_type in configuration file

0.8.2 - Sep 29th 2019

Data updates: ClinVar, GWAS catalog, GENCODE, DiseaseOntology, CIViC, CancerMine, UniProt KB
Software updates: VEP 97.3, vcfanno 0.3.2, LOFTEE (VEP plugin) 1.0.3

Fixed

Bug in concatenation of clinical evidence items from different sources (CIVIC + CBMDB) (issues #83,#87)
Silent variants that coincide with biomarkers reported at codon level are ignored
Distinction between clinical evidence items of different origins (somatic + germline)
Improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)
Bug in UpSetPlot for cases where filtering produce less than two intersecting sets

Added

New field ‘mane’ as criteria for pick order in configuration file (VEP)
Sample identifier to copy number annotation output (convenient for concatenation of output from multiple samples)
Capturing allelic depth (t_depth, t_ref_count etc.) in vcf2maf output (enhancement #52)
Option tumor_only in pcgr.py, replaces vcf_tumor_only in configuration file, more convenient in terms of configuration

0.8.1 - May 22nd 2019

Added

Cancer_NOS.toml as configuration file for unspecified tumor types

0.8.0 - May 20th 2019

Fixed

Bug in value box for Tier 2 variants (new line carriage) Issue #73

Added

Upgraded VEP to v96
- Skipping the –regulatory VEP option to avoid forking issues and to improve speed (See this issue)
- Added option to configure pick-order for choice of primary transcript in configuration file
Pre-made configuration files for each tumor type in conf folder
Possibility to append a CNA plot file (.png format) to the section of the report with Somatic CNAs previous feature request
Added possibility to input estimates of tumor purity and ploidy
- shown as value boxes in Main results
Tumor mutational burden is now compared with the distribution of TMB observed for TCGA’s cohorts (organized by primary site)
- Default target size is now 34Mb (approx. estimate from exome-wide calculation of protein-coding parts of GENCODE)
Added flexibility for variant filtering in tumor-only input callsets
- Added additional options to exclude likely germline variants (both requires the tumor VAF tag to be correctly specified in the input VCF)
  - exclude_likely_hom_germline - removes any variant with an allelic fraction of 1 (100%) - very unlikely somatic event
- exclude_likely_het_germline - removes any variant with
  - an allelic fraction between 0.4 and 0.6, and
  - presence in dbSNP + gnomAD, and
  - no presence as somatic event in COSMIC/TCGA
- Added possibility to input PANEL-OF-NORMALS VCF - this to support the many labs that have sequenced a database/pool of healthy controls. This set of variants are utilized in PCGR to improve the variant filtering when running in tumor-only mode. The PANEL-OF-NORMALS annotation work as follows:
  - all variants in the tumor that coincide with any variant listed in the PANEL-OF-NORMALS VCF is appended with a PANEL_OF_NORMALS flag in the query VCF with tumor variants.
- If configuration parameter exclude_pon is set to True in tumor_only runs, all variants with a PANEL_OF_NORMALS flag are filtered/excluded
For tumor-only runs, added an UpSet plot showing how different filtering sources (gnomAD, 1KG Project, panel-of-normals etc) contribute in the germline filtering procedure
Variants in Tier 3 / Tier 4 / Noncoding are now sorted (and color-coded) according to the target (gene) association score to the cancer phenotype, as provided by the OpenTargets Platform
Added annotation of TCGA’s ten oncogenic signaling pathways
Added EXONIC_STATUS annotation tag (VCF and TSV)
- exonic denotes all protein-altering AND cannonical splicesite altering AND synonymous variants, nonexonic denotes the complement
Added CODING_STATUS annotation tag (VCF and TSV)
- coding denotes all protein-altering AND cannonical splicesite altering, noncoding denotes the complement
Added SYMBOL_ENTREZ annotation tag (VCF)
- Official gene symbol from NCBI EntreZ (SYMBOL provided by VEP can sometimes be non-official/alias (i.e. for GENCODE v19/grch37))
Added SIMPLEREPEATS_HIT annotation tag (VCF and TSV)
- Variant overlaps UCSC simpleRepeat sequence repeat track - used for MSI prediction
Added WINMASKER_HIT annotation tag (VCF and TSV)
- Variant overlaps UCSC windowmaskerSdust sequence repeat track - used for MSI prediction
Added PUTATIVE_DRIVER_MUTATION annotation tag (VCF and TSV)
- Putative cancer driver mutation discovered by multiple approaches from 9,423 tumor exomes in TCGA. Format: symbol:hgvsp:ensembl_transcript_id:discovery_approaches
Added OPENTARGETS_DISEASE_ASSOCS annotation tag (VCF and TSV)
- Associations between protein targets and disease based on multiple lines of evidence (mutations,affected pathways,GWAS, literature etc). Format: CUI:EFO_ID:IS_DIRECT:OVERALL_SCORE
Added OPENTARGETS_TRACTABILITY_COMPOUND annotation tag (VCF and TSV)
- Confidence for the existence of a modulator (small molecule) that interacts with the target (protein) to elicit a desired biological effect
Added OPENTARGTES_TRACTABILITY_ANTIBODY annotation tag (VCF and TSV)
- Confidence for the existence of a modulator (antibody) that interacts with the target (protein) to elicit a desired biological effect
Added CLINVAR_REVIEW_STATUS_STARS annotation tag
- Rating of the ClinVar variant (0-4 stars) with respect to level of review

Changed

Moved from IntoGen’s driver mutation resource to TCGA’s putative driver mutation list in display of driver mutation status
Moved option for vcf_validation from configuration file to run script (--no_vcf_validate)

Removed

Original tier model ‘pcgr’

0.7.0 - Nov 27th 2018

Fixed

Bug in assignment of variants to tier1/tier2 Issue #61
Missing config option for maf_gnomad_asj in TOML file (also setting operator to <=) Issue #60
Bug in new CancerMine oncogene/tumor suppressor annotation Issue #53
vcfanno fix for empty Description (upgrade to vcfanno v0.3.1 Issue #49)
Bug in message showing too few variants for MSI prediction, Issue #55
Bug in appending of custom VCF tags
- Still unsolved: how to disambiguate identical FORMAT and INFO tags in vcf2tsv
Bug in SCNA value box display for multiple copy number hits (Issue #47)
Bug in vcf2tsv (handling INFO tags encoded with ‘Type = String’, Issue #39)
Bug in search of UniProt functional features (BED feature regions spanning exons are now handled)
Stripped off HTML elements (TCGA_FREQUENCY, DBSNP) in TSV output
Some effect predictions from dbNSFP were not properly parsed (e.g. multiple prediction entries from multiple transcript isoforms), these should now be retrieved correctly
Removed ‘COSM’ prefix in COSMIC mutation links
Bug in retrieval of splice site predictions from dbscSNV

Added

Possibility to run PCGR in a non-Docker environment (e.g. using the –no-docker option). Thanks to an excellent contribution by Vlad Saveliev, Issue #35
- Added possibility to add docker user-id
Possibility for MAF file output (converted with vcf2maf), must be configured by the user in the TOML file (i.e. vcf2maf = true, Issue #17)
Possibility for adding custom VCF INFO tags to PCGR output files (JSON/TSV), must be configured by the user in the TOML file (i.e. custom_tags)
Added MUTATION_HOTSPOT_CANCERTYPE in data tables (i.e. listing tumor types in which hotspot mutations have been found)
Included the ‘rs’ prefix for dbSNP identifiers (HTML and TSV output)
Individual entries/columns for variant effect predictions:
- Individual algorithms: SIFT_DBNSFP, M_CAP_DBNSFP, MUTPRED_DBNSFP, MUTATIONTASTER_DBNSFP, MUTATIONASSESSOR_DBNSFP, FATHMM_DBNSFP, FATHMM_MKL_DBNSFP, PROVEAN_DBNSFP
- Ensemble predictions (META_LR_DBNSFP), dbscSNV splice site predictions (SPLICE_SITE_RF_DBNSFP, SPLICE_SITE_ADA_DBNSFP)
Upgraded samtools to v1.9 (makes vcf2maf work properly)
Added Ensembl gene/transcript id and corresponding RefSeq mRNA id to TSV/JSON
Added for future implementation:
- SeqKat + karyoploteR for exploration of kataegis/hypermutation
- CELLector - genomics-guided selection of cancer cell lines
Upgraded VEP to v94

Changed

Changed CANCER_MUTATION_HOTSPOT to MUTATION_HOTSPOT
Moved from TSGene 2.0 to CancerMine for annotation of tumor suppressor genes and proto-oncogenes
- A minimum of n=3 citations were required to include literatured-mined tumor suppressor genes and proto-oncogenes from CancerMine

0.6.2.1 - May 14th 2018

Fixed

Bug in copy number annotation (broad/focal)

0.6.2 - May 9th 2018

Fixed

Bug in copy number segment display (missing variable initalization, Issue #34))
Typo in gnomAD filter statistic (fraction, Issue #31)
Bug in mutational signature analysis for grch38 (forgot to pass BSgenome object, Issue #27)
Missing proper ASCII-encoding in vcf2tsv conversion, Issue #
Removed ‘Noncoding mutations’ section when no input VCF is present
Bug in annotation of copy number event type (focal/broad)
Bug in copy number annotation (missing protein-coding transcripts)
Updated MSI prediction (variable importance, performance measures)

Added

Genome assembly is appended to every output file
Issue warning for copy number segment that goes beyond chromosomal lengths of specified assembly (segments will be skipped)
Added missing subtypes for ‘Skin_Cancer_NOS’ in the cancer phenotype dataset

0.6.1 - May 2nd 2018

Fixed

Bug in tier assignment ‘pcgr_acmg’ (case for no variants in tier1,2,3)
Bug in tier assignment ‘pcgr_acmg’ (no tumor type specified, evidence items with weak support detected)
Bug: duplicated variants in ‘Tier 3’ resulting from genes encoded with dual roles as tumor suppressor genes/oncogenes
Bug: duplicated variants in ‘Tier 1/Noncoding variants’ resulting from rare cases of noncoding variants occurring in Tier 1 (synonymous variants with biomarker role)

0.6.0 - April 25th 2018

Added

New argument in pcgr.py
- assembly (grch37/grch38)
New option in pcgr.py
- –basic - run comprehensive VCF annotation only, skip report generation and additional analyses
New sections in HTML report
- Settings and annotation sources - now also listing key PCGR configuration settings
- Main findings - Six value boxes indicating the main findings of clinical relevance
New configuration options
- [tier_model](string) - choice between pcgr_acmg and pcgr
- [mutational_burden] - set TMB tertile limits
  - tmb_low_limit (float)
  - tmb_intermediate_limit (float)
- [tumor_type] - choose between 34 tumor types/classes:
  - Adrenal_Gland_Cancer_NOS (logical)
  - Ampullary_Carcinoma_NOS (logical)
  - Biliary_Tract_Cancer_NOS (logical)
  - Bladder_Urinary_Tract_Cancer_NOS (logical)
  - Blood_Cancer_NOS (logical)
  - Bone_Cancer_NOS (logical)
  - Breast_Cancer_NOS (logical)
  - CNS_Brain_Cancer_NOS (logical)
  - Colorectal_Cancer_NOS (logical)
  - Cervical_Cancer_NOS (logical)
  - Esophageal_Stomach_Cancer_NOS (logical)
  - Head_And_Neck_Cancer_NOS (logical)
  - Hereditary_Cancer_NOS (logical)
  - Kidney_Cancer_NOS (logical)
  - Leukemia_NOS (logical)
  - Liver_Cancer_NOS (logical)
  - Lung_Cancer_NOS (logical)
  - Lymphoma_Hodgkin_NOS (logical)
  - Lymphoma_Non_Hodgkin_NOS (logical)
  - Ovarian_Fallopian_Tube_Cancer_NOS (logical)
  - Pancreatic_Cancer_NOS (logical)
  - Penile_Cancer_NOS (logical)
  - Peripheral_Nervous_System_Cancer_NOS (logical)
  - Peritoneal_Cancer_NOS (logical)
  - Pleural_Cancer_NOS (logical)
  - Prostate_Cancer_NOS (logical)
  - Skin_Cancer_NOS (logical)
  - Soft_Tissue_Cancer_NOS (logical)
  - Stomach_Cancer_NOS (logical)
  - Testicular_Cancer_NOS (logical)
  - Thymic_Cancer_NOS (logical)
  - Thyroid_Cancer_NOS (logical)
  - Uterine_Cancer_NOS (logical)
  - Vulvar_Vaginal_Cancer_NOS (logical)
- [mutational_signatures]
  - mutsignatures_cutoff (float) - discard any signature contributions with a weight less than the cutoff
- [cna]
  - transcript_cna_overlap (float) - minimum percent overlap between copy number segment and transcripts (average) for tumor suppressor gene/proto-oncogene to be reported
- [allelic_support]
  - If input VCF has correctly formatted depth/allelic fraction as INFO tags, users can add thresholds on depth/support that are applied prior to report generation
    - tumor_dp_min (integer) - minimum sequencing depth for variant in tumor sample
    - tumor_af_min (float) - minimum allelic fraction for variant in tumor sample
    - normal_dp_min (integer) - minimum sequencing depth for variant in normal sample
    - normal_af_max (float) - maximum allelic fraction for variant in normal sample
- [visual]
  - report_theme (string) - visual theme of report (Bootstrap)
- [other]
  - vcf_validation (logical) - keep/skip VCF validation by vcf-validator
New output file - JSON output of HTML report content
New INFO tags of PCGR-annotated VCF
- CANCER_PREDISPOSITION
- PFAM_DOMAIN
- TCGA_FREQUENCY
- TCGA_PANCANCER_COUNT
- ICGC_PCAWG_OCCURRENCE
- ICGC_PCAWG_AFFECTED_DONORS
- CLINVAR_MEDGEN_CUI
New column entries in annotated SNV/InDel TSV file:
- CANCER_PREDISPOSITION
- ICGC_PCAWG_OCCURRENCE
- TCGA_FREQUENCY
New column in CNA output
- TRANSCRIPTS - aberration-overlapping transcripts (Ensembl transcript IDs)
- MEAN_TRANSCRIPT_CNA_OVERLAP - Mean overlap (%) betweeen gene transcripts and aberration segment

Removed

Elements of databundle (now annotated directly through VEP):
- dbsnp
- gnomad/exac
- 1000G project
INFO tags of PCGR-annotated VCF
- DBSNPBUILDID
- DBSNP_VALIDATION
- DBSNP_SUBMISSIONS
- DBSNP_MAPPINGSTATUS
- GWAS_CATALOG_PMID
- GWAS_CATALOG_TRAIT_URI
- DOCM_DISEASE
Output files
- TSV files with mutational signature results and biomarkers (i.e. sample_id.pcgr.snvs_indels.biomarkers.tsv and sample_id.pcgr.mutational_signatures.tsv)
  - Data can still be retrieved - now from the JSON dump
- MAF file
  - The previous MAF output was generated in a custom fashion, a more accurate MAF output based on https://github.com/mskcc/vcf2maf will be incorporated in the next release

Changed

HTML report sections
- Tier statistics and Variant statistics are now grouped into the section Tier and variant statistics
- Tier 5 is now Noncoding mutations (i.e. not considered a tier per se)
- Sliders for allelic fraction in the Global variant browser are now fixed from 0 to 1 (0.05 intervals)

Files

CHANGELOG.rst

Latest commit

History

CHANGELOG.rst

File metadata and controls

CHANGELOG

0.9.1 - November 30th 2020

Added

Changed

0.9.0rc - September 24th 2020

Fixed

Changed

Added

0.8.4 - November 18th 2019

0.8.3 - October 14th 2019

Fixed

Added

Changed

0.8.2 - Sep 29th 2019

Fixed

Added

0.8.1 - May 22nd 2019

Added

0.8.0 - May 20th 2019

Fixed

Added

Changed

Removed

0.7.0 - Nov 27th 2018

Fixed

Added

Changed

0.6.2.1 - May 14th 2018

Fixed

0.6.2 - May 9th 2018

Fixed

Added

0.6.1 - May 2nd 2018

Fixed

0.6.0 - April 25th 2018

Added

Removed

Changed