forked from divonlan/genozip
-
Notifications
You must be signed in to change notification settings - Fork 0
Compressor for genomic files (VCF/BCF, SAM/BAM, fastq, fasta, GVF, 23andMe), up to 5x better than gzip and faster too
License
Unknown, Unknown licenses found
Licenses found
Unknown
license.c
Unknown
license.h
knmkr/genozip
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
genozip is a compressor for VCF genomic files (it compresses .vcf or .vcf.gz or .vcf.bz2 files).
It achieves x2 to x5 better compression ratios than gzip because it leverages some properties of the genomic data, such as linkage disequilibrium, to compress better. It is also a lot faster than gzip.
The compression is lossless - the decompressed VCF file is 100% identical to the original VCF file.
The command line options are similar to gzip and bcftools, so if you're familiar with these, it works pretty much the same. To get started, try: genozip --help
Commands:
genozip - compress one or more files
genounzip - decompress one or more files
genols - show metadata of files or the entire directory
genocat - view one or more files
Some advanced options:
Lookups:
genocat -r ^Y,MT file1.vcf -- displays all chromosomes except Y and MT
genocat -r -10000 file1.vcf -- displays positions up to 10000
genocat -s SMPL1,SMPL2 file1.vcf -- displays 2 samples
Note: there is no need for a separate indexing step or index file
Concatenating & splitting:
genozip file1.vcf file2.vcf -o concat.vcf.genozip
genounzip concat.vcf.genozip -O
Calculating the MD5 of the VCF file:
genozip file.vcf --md5
genols file.vcf.genozip --md5
Note: the MD5 is always calculated under the hood in genozip, and automatically verified during genounzip
Encryption:
genozip file.vcf --password abc
Feature requests and bug reports: [email protected]
genozip is free for non-commercial use. For a commercial license, please contact [email protected]
Usage is subject to terms and conditions. The license can be viewed with genozip --license
THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
About
Compressor for genomic files (VCF/BCF, SAM/BAM, fastq, fasta, GVF, 23andMe), up to 5x better than gzip and faster too
Resources
License
Unknown, Unknown licenses found
Licenses found
Unknown
license.c
Unknown
license.h
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- C 94.9%
- C++ 3.1%
- Makefile 1.0%
- Shell 0.6%
- Objective-C 0.4%
- HTML 0.0%