Skip to content
/ genozip Public
forked from divonlan/genozip

Compressor for genomic files (VCF/BCF, SAM/BAM, fastq, fasta, GVF, 23andMe), up to 5x better than gzip and faster too

License

Unknown, Unknown licenses found

Licenses found

Unknown
license.c
Unknown
license.h
Notifications You must be signed in to change notification settings

knmkr/genozip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

genozip


genozip is a compressor for VCF genomic files (it compresses .vcf or .vcf.gz or .vcf.bz2 files).

It achieves x2 to x5 better compression ratios than gzip because it leverages some properties of the genomic data, such as linkage disequilibrium, to compress better. It is also a lot faster than gzip.

The compression is lossless - the decompressed VCF file is 100% identical to the original VCF file.

The command line options are similar to gzip and bcftools, so if you're familiar with these, it works pretty much the same. To get started, try: genozip --help

Commands:
genozip - compress one or more files
genounzip - decompress one or more files
genols - show metadata of files or the entire directory
genocat - view one or more files

Some advanced options:

Lookups:
genocat -r ^Y,MT file1.vcf -- displays all chromosomes except Y and MT
genocat -r -10000 file1.vcf -- displays positions up to 10000
genocat -s SMPL1,SMPL2 file1.vcf -- displays 2 samples
Note: there is no need for a separate indexing step or index file

Concatenating & splitting:
genozip file1.vcf file2.vcf -o concat.vcf.genozip
genounzip concat.vcf.genozip -O

Calculating the MD5 of the VCF file:
genozip file.vcf --md5
genols file.vcf.genozip --md5
Note: the MD5 is always calculated under the hood in genozip, and automatically verified during genounzip

Encryption:
genozip file.vcf --password abc

Feature requests and bug reports: [email protected]

genozip is free for non-commercial use. For a commercial license, please contact [email protected]

Usage is subject to terms and conditions. The license can be viewed with genozip --license

THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

Compressor for genomic files (VCF/BCF, SAM/BAM, fastq, fasta, GVF, 23andMe), up to 5x better than gzip and faster too

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
license.c
Unknown
license.h

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 94.9%
  • C++ 3.1%
  • Makefile 1.0%
  • Shell 0.6%
  • Objective-C 0.4%
  • HTML 0.0%