` .. image:: https://img.shields.io/pypi/v/vcfstats?style=flat-square
target: https://img.shields.io/pypi/v/vcfstats?style=flat-square alt: Pypi
There are a couple of tools that can plot some statistics of VCF files, including ``bcftools` <https://samtools.github.io/bcftools/bcftools.html#stats>`_ and ``jvarkit` <http://lindenb.github.io/jvarkit/VcfStatsJfx.html>`_. However, none of them could:
- plot specific metrics
- customize the plots
- focus on variants with certain filters
R package ``vcfR` <https://knausb.github.io/vcfR_documentation/visualization_1.html>`_ can do some of the above. However, it has to load entire VCF into memory, which is not friendly to large VCF files.
vcfstats
also requires ``R` <https://www.r-project.org/>`_ with ``ggplot2` <https://ggplot2.tidyverse.org/>`_ to be installed. If you are doing pie
chart, ``ggrepel` <https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html>`_ is also required.
pip install vcfstats
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1) ~ CONTIG' \
--title 'Number of variants on each chromosome' \
--config examples/config.toml
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1) ~ CONTIG' \
--title 'Number of variants on each chromosome (modified)' \
--config examples/config.toml \
--ggs 'scale_x_discrete(name ="Chromosome", \
limits=c("1","2","3","4","5","6","7","8","9","10","X")) + \
ylab("# Variants")'
.col.png)
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1) ~ CONTIG[1,2,3,4,5]' \
--title 'Number of variants on each chromosome (first 5)' \
--config examples/config.toml
# or
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1) ~ CONTIG[1-5]' \
--title 'Number of variants on each chromosome (first 5)' \
--config examples/config.toml
# or
# require vcf file to be tabix-indexed.
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1) ~ CONTIG' \
--title 'Number of variants on each chromosome (first 5)' \
--config examples/config.toml -r 1 2 3 4 5
.col.png)
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1, VARTYPE[snp]) ~ SUBST[A>T,A>G,A>C,T>A,T>G,T>C,G>A,G>T,G>C,C>A,C>T,C>G]' \
--title 'Number of substitutions of SNPs' \
--config examples/config.toml
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1, VARTYPE[snp]) ~ SUBST[A>T,A>G,A>C,T>A,T>G,T>C,G>A,G>T,G>C,C>A,C>T,C>G]' \
--title 'Number of substitutions of SNPs (passed)' \
--config examples/config.toml \
--passed
.col.png)
# using a dark theme
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'AAF ~ CONTIG' \
--title 'Allele frequency on each chromosome' \
--config examples/config.toml --ggs 'theme_dark()'
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'AAF ~ CONTIG' \
--title 'Allele frequency on each chromosome (boxplot)' \
--config examples/config.toml \
--figtype boxplot
You can plot the distribution, using density plot or histogram
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'AAF ~ CONTIG[1,2]' \
--title 'Allele frequency on chromosome 1,2' \
--config examples/config.toml \
--figtype density
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'AAF ~ 1' \
--title 'Overall allele frequency distribution' \
--config examples/config.toml
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'AAF[0.05, 0.95] ~ 1' \
--title 'Overall allele frequency distribution (0.05-0.95)' \
--config examples/config.toml
.histogram.png)
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1, group=VARTYPE) ~ CHROM' \
# or simply
# --formula 'VARTYPE ~ CHROM' \
--title 'Types of variants on each chromosome' \
--config examples/config.toml
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1, group=VARTYPE) ~ CHROM[1]' \
# or simply
# --formula 'VARTYPE ~ CHROM[1]' \
--title 'Types of variants on each chromosome 1' \
--config examples/config.toml \
--figtype pie
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
# or simply
# --formula 'VARTYPE ~ 1' \
--formula 'COUNT(1, group=VARTYPE) ~ 1' \
--title 'Types of variants on whole genome' \
--config examples/config.toml
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
# or simply
# --formula 'GTTYPEs[HET,HOM_ALT]{0} ~ CHROM' \
--formula 'COUNT(1, group=GTTYPEs[HET,HOM_ALT]{0}) ~ CHROM' \
--title 'Mutant genotypes on each chromosome (sample 1)' \
--config examples/config.toml
.col.png)
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'MEAN(GQs{0}) ~ MEAN(DEPTHs{0}, group=CHROM)' \
--title 'GQ vs depth (sample 1)' \
--config examples/config.toml
.scatter.png)
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'DEPTHs{0} ~ DEPTHs{1}' \
--title 'Depths between sample 1 and 2' \
--config examples/config.toml