forked from divonlan/genozip
-
Notifications
You must be signed in to change notification settings - Fork 0
Compressor for genomic files (VCF/BCF, SAM/BAM, fastq, fasta, GVF, 23andMe), up to 5x better than gzip and faster too
License
Unknown, Unknown licenses found
Licenses found
Unknown
license.c
Unknown
license.h
knmkr/genozip
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
(also available on Conda and Docker Hub)
genozip is a compressor for genomic files - it compresses VCF/BCF, SAM/BAM, fastq, fasta, GVF and 23andMe files. If can even compress them if they are already compressed with .gz .bz2 .xz (for full list of supported file types see 'genozip --input --help').
It achieves x2 to x5 better compression ratios than gzip because it leverages some properties specific to genomic data to compress better. It is also a lot faster than gzip.
The compression is lossless - the decompressed file is 100% identical to the original file.
The command line options are similar to gzip and bcftools, so if you're familiar with these, it works pretty much the same. To get started, try: genozip --help
Commands:
genozip - compress one or more files
genounzip - decompress one or more files
genols - show metadata of one or more files or the entire directory
genocat - view one or more files
Some advanced options:
Lookups:
genocat -r ^Y,MT file1.vcf -- displays all chromosomes except Y and MT
genocat -r -10000 file1.vcf -- displays positions up to 10000
genocat -s SMPL1,SMPL2 file1.vcf -- displays 2 samples
Note: there is no need for a separate indexing step or index file
Concatenating & splitting:
genozip file1.vcf file2.vcf -o concat.vcf.genozip
genounzip concat.vcf.genozip -O
Calculating the MD5:
genozip file.vcf --md5
Encryption:
genozip file.vcf --password abc
Even better compression, with some minor modifications of the data:
genozip file.vcf --optimize
Compress and then verify that the compressed file decompresses correctly:
genozip file.vcf --test
Do you find genozip to be helpful in your research? Please be so kind as to support continued development by citing Citing: https://doi.org/10.1093/bioinformatics/btaa290
Feature requests and bug reports: [email protected]
genozip is free for non-commercial use. For a commercial license, please contact [email protected]
Usage is subject to terms and conditions. The non-commercial license can be viewed with genozip --license
THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
About
Compressor for genomic files (VCF/BCF, SAM/BAM, fastq, fasta, GVF, 23andMe), up to 5x better than gzip and faster too
Resources
License
Unknown, Unknown licenses found
Licenses found
Unknown
license.c
Unknown
license.h
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- C 94.9%
- C++ 3.1%
- Makefile 1.0%
- Shell 0.6%
- Objective-C 0.4%
- HTML 0.0%