- Data updates: ClinVar, GWAS catalog, CIViC, CancerMine, dbNSFP, KEGG, ChEMBL/DGIdb, Disease Ontology, Experimental Factor Ontology
- added possibility to configure algorithm for TMB calculation,
optional argument
tmb_algorithm
- all coding variants (all_coding) or non-synonymous variants only (nonsyn) - R code subject to static analysis with lintr
- Improved Conda recipe (i.e.
meta.yaml
) with version pinning of all package dependencies
- Removed DisGeNET annotations from output (associations from Open Targets Platform serve same purpose)
- Version pinning of software dependencies in Dockerfile:
- All R packages necessary for PCGR is installed using the renv framework, ensuring improved versioning and reproducibility
- Other tools/utilities and Python libraries that have been version
pinned:
- bedtools, samtools, numpy, cython, scipy, cyvcf2, toml, pandas
- Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine, UniProt KB, dbNSFP, Pfam, KEGG, Open Targets Platform
- Software updates: VEP 101
- An extra comma was mistakenly present in the template for tier 2 variants, issue #96
- Missing protein domain annotations for grch38, issue #116
- All arguments to
pcgr.py
is now non-positional - Arguments to
pcgr.py
are divided into two groups: required and optional - Options allelic_support:tumor_dp_min,
allelic_support:tumor_af_min, allelic_support:control_dp_min,
allelic_support:control_af_max in PCGR configuration file are now
optional arguments
--tumor_dp_min
,--tumor_af_min
,--control_dp_min
, –control_af_maxin
cpsr.py` - Option mutational_burden:mutational_burden in PCGR configuration
file is now optional argument
--estimate_tmb
inpcgr.py
- Option msi:msi in PCGR configuration file is now optional
argument
--estimate_msi_status
inpcgr.py
- Option mutational_signatures:mutational_signatures in PCGR
configuration file is now optional argument
--estimate_signatures
inpcgr.py
- Options mutational_signatures:mutsignatures_signature_limit, mutational_signatures:mutsignatures_normalization, mutational_signatures:mutsignatures_mutation_limit, mutational_signatures:mutsignatures_cutoff are removed (used for deconstructSigs analysis, which is no longer in use)
- Optional argument
--cna_overlap_pct
inpcgr.py
replaces cna:cna_overlap_pct in PCGR configuration file - Optional argument
--logr_gain
inpcgr.py
replaces cna:logr_gain in PCGR configuration file - Optional argument
--logr_homdel
inpcgr.py
replaces cna:logr_homdel in PCGR configuration file - Removed mutational_burden:tmb_low_limit and mutational_burden:tmb_intermediate_limit - TMB is no longer interpreted in the context of thresholds
- Classifications of genes as tumor suppressors/oncogenes are now based on a combination of CancerMine citation count and presence in Network of Cancer Genes
- Settings section of report is now divived into three:
- Metadata - sample and sequencing assay
- Report configuration
- Optional argument
--include_trials
inpcgr.py
- includes a section with annotated clinical trials for the tumor type in question - Optional argument
--assay
inpcgr.py
- designates type of sequencing assay - Optional argument
--cell_line
inpcgr.py
- designates runs of tumor cell lines (only for display, not used to configure any analysis) - Optional argument
--min_mutations_signatures
inpcgr.py
- minimum number of required mutations for mutational signature analysis with MutationalPatterns - Optional argument
--all_reference_signatures
inpcgr.py
- considers all reference signatures during fitting of mutational profile to known signatures - Optional argument
--estimate_signatures
now also includes detection of potential kataegis events (WGS/WES assays only), and rainfall plot in the flexdashboard output - The user can now distinguish (through color codes) whether a biomarker has been mapped exactly (nucleotide change) or at a regional level (codon/exon)
- All variant-associated biomarkers (regardless of assignment to TIER 1/2) are now found in a new section (SNVs/InDels)
- For copy number amplifications, other putative drug targets in cancer are listed in a new section
- Detailed documentation of report contents are added to the Documentation section
- References are updated and all provided with DOI
- Data updates: ClinVar, CIViC, CancerMine, UniProt KB
- Software updates: VEP 98.3
- Data updates: ClinVar, GWAS catalog, GENCODE, CIViC, CancerMine
- Software updates: VEP 98.2, vcf2tsv
- More improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)
- Possibility to filter evidence items by RATING in interactive data tables
- Option target_size_mb in pcgr.py replaces target_size_mb in configuration file, more convenient in terms of configuring runs
- Option tumor_type in pcgr.py replaces tumor_type in configuration file
- Data updates: ClinVar, GWAS catalog, GENCODE, DiseaseOntology, CIViC, CancerMine, UniProt KB
- Software updates: VEP 97.3, vcfanno 0.3.2, LOFTEE (VEP plugin) 1.0.3
- Bug in concatenation of clinical evidence items from different sources (CIVIC + CBMDB) (issues #83,#87)
- Silent variants that coincide with biomarkers reported at codon level are ignored
- Distinction between clinical evidence items of different origins (somatic + germline)
- Improved mapping between Ensembl transcripts and UniProt accessions (using also RefSeq accessions where available)
- Bug in UpSetPlot for cases where filtering produce less than two intersecting sets
- New field ‘mane’ as criteria for pick order in configuration file (VEP)
- Sample identifier to copy number annotation output (convenient for concatenation of output from multiple samples)
- Capturing allelic depth (t_depth, t_ref_count etc.) in vcf2maf output (enhancement #52)
- Option tumor_only in pcgr.py, replaces vcf_tumor_only in configuration file, more convenient in terms of configuration
- Cancer_NOS.toml as configuration file for unspecified tumor types
- Bug in value box for Tier 2 variants (new line carriage) Issue #73
- Upgraded VEP to v96
- Skipping the –regulatory VEP option to avoid forking issues and to improve speed (See this issue)
- Added option to configure pick-order for choice of primary transcript in configuration file
- Pre-made configuration files for each tumor type in conf folder
- Possibility to append a CNA plot file (.png format) to the section of the report with Somatic CNAs previous feature request
- Added possibility to input estimates of tumor purity and
ploidy
- shown as value boxes in Main results
- Tumor mutational burden is now compared with the distribution of TMB
observed for TCGA’s cohorts (organized by primary site)
- Default target size is now 34Mb (approx. estimate from exome-wide calculation of protein-coding parts of GENCODE)
- Added flexibility for variant filtering in tumor-only input callsets
- Added additional options to exclude likely germline variants (both
requires the tumor VAF tag to be correctly specified in the input
VCF)
- exclude_likely_hom_germline - removes any variant with an allelic fraction of 1 (100%) - very unlikely somatic event
- exclude_likely_het_germline - removes any variant with
- an allelic fraction between 0.4 and 0.6, and
- presence in dbSNP + gnomAD, and
- no presence as somatic event in COSMIC/TCGA
- Added possibility to input PANEL-OF-NORMALS VCF - this to
support the many labs that have sequenced a database/pool of
healthy controls. This set of variants are utilized in PCGR to
improve the variant filtering when running in tumor-only mode. The
PANEL-OF-NORMALS annotation work as follows:
- all variants in the tumor that coincide with any variant listed in the PANEL-OF-NORMALS VCF is appended with a PANEL_OF_NORMALS flag in the query VCF with tumor variants.
- If configuration parameter exclude_pon is set to True in tumor_only runs, all variants with a PANEL_OF_NORMALS flag are filtered/excluded
- Added additional options to exclude likely germline variants (both
requires the tumor VAF tag to be correctly specified in the input
VCF)
- For tumor-only runs, added an UpSet plot showing how different filtering sources (gnomAD, 1KG Project, panel-of-normals etc) contribute in the germline filtering procedure
- Variants in Tier 3 / Tier 4 / Noncoding are now sorted (and color-coded) according to the target (gene) association score to the cancer phenotype, as provided by the OpenTargets Platform
- Added annotation of TCGA’s ten oncogenic signaling pathways
- Added EXONIC_STATUS annotation tag (VCF and TSV)
- exonic denotes all protein-altering AND cannonical splicesite altering AND synonymous variants, nonexonic denotes the complement
- Added CODING_STATUS annotation tag (VCF and TSV)
- coding denotes all protein-altering AND cannonical splicesite altering, noncoding denotes the complement
- Added SYMBOL_ENTREZ annotation tag (VCF)
- Official gene symbol from NCBI EntreZ (SYMBOL provided by VEP can sometimes be non-official/alias (i.e. for GENCODE v19/grch37))
- Added SIMPLEREPEATS_HIT annotation tag (VCF and TSV)
- Variant overlaps UCSC simpleRepeat sequence repeat track - used for MSI prediction
- Added WINMASKER_HIT annotation tag (VCF and TSV)
- Variant overlaps UCSC windowmaskerSdust sequence repeat track - used for MSI prediction
- Added PUTATIVE_DRIVER_MUTATION annotation tag (VCF and TSV)
- Putative cancer driver mutation discovered by multiple approaches from 9,423 tumor exomes in TCGA. Format: symbol:hgvsp:ensembl_transcript_id:discovery_approaches
- Added OPENTARGETS_DISEASE_ASSOCS annotation tag (VCF and TSV)
- Associations between protein targets and disease based on multiple lines of evidence (mutations,affected pathways,GWAS, literature etc). Format: CUI:EFO_ID:IS_DIRECT:OVERALL_SCORE
- Added OPENTARGETS_TRACTABILITY_COMPOUND annotation tag (VCF and
TSV)
- Confidence for the existence of a modulator (small molecule) that interacts with the target (protein) to elicit a desired biological effect
- Added OPENTARGTES_TRACTABILITY_ANTIBODY annotation tag (VCF and
TSV)
- Confidence for the existence of a modulator (antibody) that interacts with the target (protein) to elicit a desired biological effect
- Added CLINVAR_REVIEW_STATUS_STARS annotation tag
- Rating of the ClinVar variant (0-4 stars) with respect to level of review
- Moved from IntoGen’s driver mutation resource to TCGA’s putative driver mutation list in display of driver mutation status
- Moved option for vcf_validation from configuration file to run script
(
--no_vcf_validate
)
- Original tier model ‘pcgr’
- Bug in assignment of variants to tier1/tier2 Issue #61
- Missing config option for maf_gnomad_asj in TOML file (also setting
operator to
<=
) Issue #60 - Bug in new CancerMine oncogene/tumor suppressor annotation Issue #53
- vcfanno fix for empty Description (upgrade to vcfanno v0.3.1 Issue #49)
- Bug in message showing too few variants for MSI prediction, Issue #55
- Bug in appending of custom VCF tags
- Still unsolved: how to disambiguate identical FORMAT and INFO tags in vcf2tsv
- Bug in SCNA value box display for multiple copy number hits (Issue #47)
- Bug in vcf2tsv (handling INFO tags encoded with ‘Type = String’, Issue #39)
- Bug in search of UniProt functional features (BED feature regions spanning exons are now handled)
- Stripped off HTML elements (TCGA_FREQUENCY, DBSNP) in TSV output
- Some effect predictions from dbNSFP were not properly parsed (e.g. multiple prediction entries from multiple transcript isoforms), these should now be retrieved correctly
- Removed ‘COSM’ prefix in COSMIC mutation links
- Bug in retrieval of splice site predictions from dbscSNV
- Possibility to run PCGR in a non-Docker environment (e.g. using the
–no-docker option). Thanks to an excellent contribution by Vlad
Saveliev, Issue
#35
- Added possibility to add docker user-id
- Possibility for MAF file output (converted with vcf2maf), must be configured by the user in the TOML file (i.e. vcf2maf = true, Issue #17)
- Possibility for adding custom VCF INFO tags to PCGR output files (JSON/TSV), must be configured by the user in the TOML file (i.e. custom_tags)
- Added MUTATION_HOTSPOT_CANCERTYPE in data tables (i.e. listing tumor types in which hotspot mutations have been found)
- Included the ‘rs’ prefix for dbSNP identifiers (HTML and TSV output)
- Individual entries/columns for variant effect predictions:
- Individual algorithms: SIFT_DBNSFP, M_CAP_DBNSFP, MUTPRED_DBNSFP, MUTATIONTASTER_DBNSFP, MUTATIONASSESSOR_DBNSFP, FATHMM_DBNSFP, FATHMM_MKL_DBNSFP, PROVEAN_DBNSFP
- Ensemble predictions (META_LR_DBNSFP), dbscSNV splice site predictions (SPLICE_SITE_RF_DBNSFP, SPLICE_SITE_ADA_DBNSFP)
- Upgraded samtools to v1.9 (makes vcf2maf work properly)
- Added Ensembl gene/transcript id and corresponding RefSeq mRNA id to TSV/JSON
- Added for future implementation:
- SeqKat + karyoploteR for exploration of kataegis/hypermutation
- CELLector - genomics-guided selection of cancer cell lines
- Upgraded VEP to v94
- Changed CANCER_MUTATION_HOTSPOT to MUTATION_HOTSPOT
- Moved from TSGene 2.0 to
CancerMine for
annotation of tumor suppressor genes and proto-oncogenes
- A minimum of n=3 citations were required to include literatured-mined tumor suppressor genes and proto-oncogenes from CancerMine
- Bug in copy number annotation (broad/focal)
- Bug in copy number segment display (missing variable initalization, Issue #34))
- Typo in gnomAD filter statistic (fraction, Issue #31)
- Bug in mutational signature analysis for grch38 (forgot to pass BSgenome object, Issue #27)
- Missing proper ASCII-encoding in vcf2tsv conversion, Issue #
- Removed ‘Noncoding mutations’ section when no input VCF is present
- Bug in annotation of copy number event type (focal/broad)
- Bug in copy number annotation (missing protein-coding transcripts)
- Updated MSI prediction (variable importance, performance measures)
- Genome assembly is appended to every output file
- Issue warning for copy number segment that goes beyond chromosomal lengths of specified assembly (segments will be skipped)
- Added missing subtypes for ‘Skin_Cancer_NOS’ in the cancer phenotype dataset
- Bug in tier assignment ‘pcgr_acmg’ (case for no variants in tier1,2,3)
- Bug in tier assignment ‘pcgr_acmg’ (no tumor type specified, evidence items with weak support detected)
- Bug: duplicated variants in ‘Tier 3’ resulting from genes encoded with dual roles as tumor suppressor genes/oncogenes
- Bug: duplicated variants in ‘Tier 1/Noncoding variants’ resulting from rare cases of noncoding variants occurring in Tier 1 (synonymous variants with biomarker role)
- New argument in pcgr.py
- assembly (grch37/grch38)
- New option in pcgr.py
- –basic - run comprehensive VCF annotation only, skip report generation and additional analyses
- New sections in HTML report
- Settings and annotation sources - now also listing key PCGR configuration settings
- Main findings - Six value boxes indicating the main findings of clinical relevance
- New configuration options
- [tier_model](string) - choice between pcgr_acmg and pcgr
- [mutational_burden] - set TMB tertile limits
- tmb_low_limit (float)
- tmb_intermediate_limit (float)
- [tumor_type] - choose between 34 tumor types/classes:
- Adrenal_Gland_Cancer_NOS (logical)
- Ampullary_Carcinoma_NOS (logical)
- Biliary_Tract_Cancer_NOS (logical)
- Bladder_Urinary_Tract_Cancer_NOS (logical)
- Blood_Cancer_NOS (logical)
- Bone_Cancer_NOS (logical)
- Breast_Cancer_NOS (logical)
- CNS_Brain_Cancer_NOS (logical)
- Colorectal_Cancer_NOS (logical)
- Cervical_Cancer_NOS (logical)
- Esophageal_Stomach_Cancer_NOS (logical)
- Head_And_Neck_Cancer_NOS (logical)
- Hereditary_Cancer_NOS (logical)
- Kidney_Cancer_NOS (logical)
- Leukemia_NOS (logical)
- Liver_Cancer_NOS (logical)
- Lung_Cancer_NOS (logical)
- Lymphoma_Hodgkin_NOS (logical)
- Lymphoma_Non_Hodgkin_NOS (logical)
- Ovarian_Fallopian_Tube_Cancer_NOS (logical)
- Pancreatic_Cancer_NOS (logical)
- Penile_Cancer_NOS (logical)
- Peripheral_Nervous_System_Cancer_NOS (logical)
- Peritoneal_Cancer_NOS (logical)
- Pleural_Cancer_NOS (logical)
- Prostate_Cancer_NOS (logical)
- Skin_Cancer_NOS (logical)
- Soft_Tissue_Cancer_NOS (logical)
- Stomach_Cancer_NOS (logical)
- Testicular_Cancer_NOS (logical)
- Thymic_Cancer_NOS (logical)
- Thyroid_Cancer_NOS (logical)
- Uterine_Cancer_NOS (logical)
- Vulvar_Vaginal_Cancer_NOS (logical)
- [mutational_signatures]
- mutsignatures_cutoff (float) - discard any signature contributions with a weight less than the cutoff
- [cna]
- transcript_cna_overlap (float) - minimum percent overlap between copy number segment and transcripts (average) for tumor suppressor gene/proto-oncogene to be reported
- [allelic_support]
- If input VCF has correctly formatted depth/allelic fraction as
INFO tags, users can add thresholds on depth/support that are
applied prior to report generation
- tumor_dp_min (integer) - minimum sequencing depth for variant in tumor sample
- tumor_af_min (float) - minimum allelic fraction for variant in tumor sample
- normal_dp_min (integer) - minimum sequencing depth for variant in normal sample
- normal_af_max (float) - maximum allelic fraction for variant in normal sample
- If input VCF has correctly formatted depth/allelic fraction as
INFO tags, users can add thresholds on depth/support that are
applied prior to report generation
- [visual]
- report_theme (string) - visual theme of report (Bootstrap)
- [other]
- vcf_validation (logical) - keep/skip VCF validation by vcf-validator
- New output file - JSON output of HTML report content
- New INFO tags of PCGR-annotated VCF
- CANCER_PREDISPOSITION
- PFAM_DOMAIN
- TCGA_FREQUENCY
- TCGA_PANCANCER_COUNT
- ICGC_PCAWG_OCCURRENCE
- ICGC_PCAWG_AFFECTED_DONORS
- CLINVAR_MEDGEN_CUI
- New column entries in annotated SNV/InDel TSV file:
- CANCER_PREDISPOSITION
- ICGC_PCAWG_OCCURRENCE
- TCGA_FREQUENCY
- New column in CNA output
- TRANSCRIPTS - aberration-overlapping transcripts (Ensembl transcript IDs)
- MEAN_TRANSCRIPT_CNA_OVERLAP - Mean overlap (%) betweeen gene transcripts and aberration segment
- Elements of databundle (now annotated directly through VEP):
- dbsnp
- gnomad/exac
- 1000G project
- INFO tags of PCGR-annotated VCF
- DBSNPBUILDID
- DBSNP_VALIDATION
- DBSNP_SUBMISSIONS
- DBSNP_MAPPINGSTATUS
- GWAS_CATALOG_PMID
- GWAS_CATALOG_TRAIT_URI
- DOCM_DISEASE
- Output files
- TSV files with mutational signature results and biomarkers
(i.e. sample_id.pcgr.snvs_indels.biomarkers.tsv and
sample_id.pcgr.mutational_signatures.tsv)
- Data can still be retrieved - now from the JSON dump
- MAF file
- The previous MAF output was generated in a custom fashion, a more accurate MAF output based on https://github.com/mskcc/vcf2maf will be incorporated in the next release
- TSV files with mutational signature results and biomarkers
(i.e. sample_id.pcgr.snvs_indels.biomarkers.tsv and
sample_id.pcgr.mutational_signatures.tsv)
- HTML report sections
- Tier statistics and Variant statistics are now grouped into the section Tier and variant statistics
- Tier 5 is now Noncoding mutations (i.e. not considered a tier per se)
- Sliders for allelic fraction in the Global variant browser are now fixed from 0 to 1 (0.05 intervals)