Skip to content

grenaud/VCFShark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VCFShark: how to squeeze a VCF file

VCFShark is a tool to compress any VCF file. It achieves compression ratios up to an order of magnitude better than the de facto standards (gzipped VCF and BCF).

As an input it takes a VCF (or VCF.GZ or BCF) file.

Requirements

VCFShark requires:

  • A modern, C++14 ready compiler such as g++ version 7.2 or higher or clang version 3.4 or higher.
  • A 64-bit operating system. Either Mac OS X or Linux are currently supported.

Installation

To download, build and install VCFShark use the following commands.

git clone https://github.com/refresh-bio/VCFShark.git
cd VCFShark
./install.sh 
make
make install

The install.sh script downloads and installs the HTSlib library into the include and lib directories in the VCFShark/htslib directory. The compiled libbsc library is statically linked (.

By default VCFShark is installed in the bin directory of the /usr/local/ directory. A different location prefix can be specified with prefix parameter:

make prefix=/usr/local install

To uninstall VCFShark:

make uninstall

This uninstalls VCFShark from the /usr/local directory. To uninstall from different location use the prefix parameter:

make prefix=/usr/local uninstall

To uninstall the HTSlib library use the provided uninstall script:

./uninstall.sh 

To clean the VCFShark build use:

make clean

Usage

  • Compress the input VCF/BCF/VCF.gz file.
Input: <input_vcf> VCF/BCF file. 
Output: <archive> output file with the compressed VCF

Usage: 
vcfshark compress [options] <input_vcf> <output_db>
Parameters:
  input_vcf - path to input VCF (or VCF.GZ or BCF) file
  archive - path to the output compressed VCF
Options:
  -nl <value> - ignore rare variants; value is a limit of alternative alleles (default: 10)
  -t <value>  - max. no. of compressing threads (default: 8)
  • Decompress the archive.
Input: <archive> archive 
Output: <output_vcf> VCF/BCF file.

Usage: 
vcfshark decompress [options] <archive> <output_vcf>
Parameters:
 archive   - path to compressed VCF
 output_vcf - path to output VCF/BCF file
Options:
 -b - output BCF file (VCF file by default)
 -c [0-9]   set level of compression of the output bcf (number from 0 to 9; 1 by default; 0 means no compression)	
 -t <value>  - max. no. of compressing threads (default: 8)

Toy example

There is an example VCF files in the toy_ex folder: toy.vcf. It can be used to test VCFShark. The instructions should be called within toy_ex folder.

To compress a single example VCF file and store the archive called toy.vcfshark in the toy_ex folder:

../vcfshark compress toy.vcf toy.vcfshark

This will create an archive toy.vcfshark.

To decompress the toy.vcfshark archive to a VCF file toy_decomp.vcf:

../vcfshark decompress toy.vcfshark toy_decomp.vcf

For more options see Usage section.

Dockerfile

Dockerfile can be used to build a Docker image with all necessary dependencies and VCFShark compressor.

The first image is based on Ubuntu 18.04 (Dockerfile_ubuntu), the second one on CentOS 7 (Dockerfile_centos).

To build a Docker image and run a Docker container, you need Docker Desktop (https://www.docker.com).

Example commands (run it within a directory with Dockerfile):

docker build -f Dockerfile_ubuntu -t ubuntu-vcfshark .
docker run -it ubuntu-vcfshark

or:

docker build -f Dockerfile_centos -t centos-vcfshark .
docker run -it centos-vcfshark

Note: The Docker image is not intended as a way of using VCFShark. It can be used to test the instalation process and basic operation of the VCFShark tool.

Developers

The VCFShark algorithm was invented by Sebastian Deorowicz and Agnieszka Danek. The implementation is by Sebastian Deorowicz and Agnieszka Danek.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 96.2%
  • C 2.8%
  • Other 1.0%