Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpredictable compression speed using "--optimal" level #11

Open
KirillKryukov opened this issue May 18, 2020 · 10 comments
Open

Unpredictable compression speed using "--optimal" level #11

KirillKryukov opened this issue May 18, 2020 · 10 comments

Comments

@KirillKryukov
Copy link

BriefLZ compression speed with "--optimal" varies wildly depending on input data.

I used "blzpack --optimal" to compress several biological datasets. Most of time it compresses at speed of about 1 to 3 MB/s (sometimes 6). However, when compressing human genome, its speed dropped to 33 kB/s. I tried it twice, both time it took ~27 hours to complete.

In case if this is normal and expected behaviour, I think it should be documented, so that users know the risks of using "--optimal" mode.

blzpack seems to works fine with other settings. I used BriefLZ 1.3.0, commit 0ab07a5, built using instructions from readme. I used only default block size so far, though I plan to test it with other block sizes too.

Test machine: Ubuntu 18.04.1 LTS, dual Xeon E5-2643v3, 128 MB RAM, no other tasks running.

This test was a part of Sequence Compression Benchmark: http://kirr.dyndns.org/sequence-compression-benchmark/ - this website includes all test data, commands, and measurements.

In particular, all test data is available at: http://kirr.dyndns.org/sequence-compression-benchmark/?page=Datasets

Compression and decompression speed of the "blzpack --optimal" on all datasets:

C-Speed (MB/s) D-Speed (MB/s) DataSize (MB) DataName
1.184 3.033 0.051 Gordonia phage GAL1 GCF_001884535.1
2.526 24.85 0.522 WS1 bacterium JGI 0000059-K21 GCA_000398605.1
3.069 54.70 1.712 Astrammina rara GCA_000211355.2
3.116 108.8 5.809 Nosema ceranae GCA_000988165.1
3.052 135.5 9.217 Cryptosporidium parvum Iowa II GCA_000165345.1
3.152 151.1 13.14 Spironucleus salmonicida GCA_000497125.1
3.153 172.3 23.67 Tieghemostelium lacteum GCA_001606155.1
3.017 182.2 36.92 Fusarium graminearum PH-1 GCF_000240135.3
0.990 209.8 56.15 Salpingoeca rosetta GCA_000188695.1
6.379 237.9 67.61 PDB
2.734 311.1 73.24 Homo sapiens GRCh38 peptides all
3.273 228.8 106.4 Chondrus crispus GCA_000350225.2
6.384 184.8 122.4 NCBI Virus RefSeq Protein
2.903 256.0 245.3 Mitochondrion
2.787 310.9 340.4 UCSC hg38 7way knownCanonical-exonNuc
3.352 238.5 341.0 Kappaphycus alvarezii GCA_002205965.2
0.216 482.0 481.8 NCBI Virus Complete Nucleotide Human
0.840 505.8 610.3 SILVA 132 LSURef
2.719 410.3 968.8 UCSC hg38 20way knownCanonical-exonNuc
0.749 257.9 1008 Strongylocentrotus purpuratus GCF_000002235.4
2.587 404.6 1109 SILVA 132 SSURef Nr99
1.071 758.2 1215 Influenza
2.812 252.6 2756 Helicobacter
1.669 523.2 3282 SILVA 132 SSURef
0.033 255.4 3313 Homo sapiens GCA_000001405.28

Please let me know if you need any other details or help reproducing this issue.

@jibsen
Copy link
Owner

jibsen commented May 19, 2020

Firstly, thank you for such a well written and detailed report!

I am afraid this behavior of the --optimal level is to be expected. In the current version it uses a binary tree match finder, which works well on many types of data, but has worst-case quadratic performance on certain types of data. The homo sapiens set in question contains parts that exhibit this.

The usual way to avoid such behavior is to limit the match search in some way (for instance stop when a certain number of potential matches have been checked, or when a match of acceptable length has been found -- this is what level 9 does). But in order to find the bit-optimal compressed representation for the BriefLZ format, it has to consider all possible matches.

I would suggest sticking to levels 1-9 with the current version of BriefLZ if your data may trigger this.

The function description of blz_pack_level() in the include file, as well as the command-line help of the blzpack utility, warn that the optimal level may be very slow. Would it help if I add that it may be very slow depending on the type of data?

@Sanmayce
Copy link

Very glad to see BriefLZ performing in Kirill's SCB, just would like to see the effectiveness in action, therefore Joergen, please provide the options you wanna see.
For instance, obviously using the default block size (by the way what is its size) is far from what BriefLZ can deliver, since the machine of Kirill is so cool for memory hogs as Nakamichi and (~45GB for 2GB block for BriefLZ):

timer64 BriefLZ_130_Intel_v19_64bit.exe --optimal -b2g "%1" "%1.blz"
  • What is the default blocksize?
  • What is the maximum blocksize and how much memory (roughly) it uses, like in 22N?
  • Does using -b4g make any sense with, say level 1?
  • Is -9 -b4g the best combination, currently, for Human Genome (~3.3GB)?
  • What options you personally want to see?

And one quick console dump:

E:\TEXTUAL_MADNESS_bare-minimum_2020_Apr_28>timer64 bcrush_ICL150_64bit.exe --optimal ""Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar"" ""Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar".optimal.bcrush"

Kernel  Time =     0.250 =    1%
User    Time =    23.375 =   97%
Process Time =    23.625 =   98%    Virtual  Memory =   1420 MB
Global  Time =    23.911 =  100%    Physical Memory =    844 MB

E:\TEXTUAL_MADNESS_bare-minimum_2020_Apr_28>timer64 crush_xezz.exe -9 ""Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar"" ""Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar".9.crush_xezz"
Compressing Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar:
40303104 -> 9163198 in 18.43s

Kernel  Time =     0.234 =    1%
User    Time =    18.250 =   97%
Process Time =    18.484 =   98%    Virtual  Memory =    852 MB
Global  Time =    18.728 =  100%    Physical Memory =    545 MB

E:\TEXTUAL_MADNESS_bare-minimum_2020_Apr_28>timer64 BriefLZ_130_Intel_v19_64bit.exe --optimal -b2g ""Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar"" ""Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar".blz"

Kernel  Time =     1.078 =    3%
User    Time =    30.312 =   95%
Process Time =    31.390 =   99%    Virtual  Memory =  45402 MB
Global  Time =    31.650 =  100%    Physical Memory =   2829 MB

40,303,104 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
 9,163,198 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.9.crush_xezz
 8,851,573 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.blz
 7,021,419 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.bro
 7,038,202 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.L22W2GB.zst
 8,619,461 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.method211.zpaq
 5,726,983 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.method511.zpaq
 7,123,784 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.MX9.bzip2
10,079,339 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.MX9.gzip
 6,943,680 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.MX9Dict1024.7z
 9,155,961 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.Nakamichi
 8,502,992 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.optimal.bcrush
 5,937,587 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.rz
 5,315,954 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.ST0Block1024.bsc
 6,849,828 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.ST6Block1024.bsc
 5,446,644 Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar.zv

As for the 'SILVA' ~1GB DNA:

http://kirr.dyndns.org/sequence-compression-benchmark/?d=SILVA+132+SSURef+Nr99+%281.11+GB%29&doagg=1&agg=average&cs=1&cg=1&cc=1&com=yes&src=all&nt=4&only-best=1&bn=1&bm=dspeed&sm=same&tn=10&bs=100&rr=gzip-9&c=brieflz&c=nakamichi&tm0=name&tm1=size&tm2=ratio&tm3=ctime&tm4=dtime&tm5=cmem&tm6=tdtime&tm7=empty&gm=size&cyl=lin&ccw=1500&cch=500&sxm=tdspeed&sxmin=0&sxmax=0&sxl=lin&sym=ctdspeed&symin=0&symax=0&syl=lin&button=Show+table

Compressor        Size(B)  Ratio(times)    C-Time(s)   D-Time(s)   C-Mem(MB) TD-Time(s)
copy-cat    1,108,994,702          1.00        0.666       0.234       0.690      88.95
lz4-9         233,573,222          4.75      155.0         0.592       5.290      19.28
nakamichi     137,427,694          8.07    9,751           1.105  88,408          12.10
brieflz-opt   168,241,177          6.59      428.6         2.741      21.97       16.20

On i5-7200u 2.7GHz my old (a year ago) test shows:

D:\TEXTORAMIC_benchmarking_2019-Aug-06>lzbench173 -c4 -i1,15 -o3 -etornado,16/blosclz,9/brieflz/crush,2/csc,5/density,3/fastlz,2/gipfeli/lzo1b,999/libdeflate,1,12/lz4hc,1,12/lizard,19,29,39,49/lzf,1/lzfse/lzg,9/lzjb/lzlib,9/lzma,9/lzrw,5/lzsse2,17/lzsse4,17/lzsse8,17/lzvn/pithy,9/quicklz,3/snappy/slz_zlib,3/ucl_nrv2b,9/ucl_nrv2d,9/ucl_nrv2e,9/xpack,1,9/xz,9/yalz77,12/yappy,99/zlib,1,5,9/zling,4/shrinker/wflz/lzmat SILVA_132_SSURef_Nr99_tax_silva.fasta
lzbench 1.7.3 (64-bit Windows)   Assembled by P.Skibinski
Compressor name         Compress. Decompress.  Orig. size  Compr. size  Ratio Filename
lzlib 1.8 -9             0.93 MB/s   140 MB/s  1108994702     58323686   5.26 SILVA_132_SSURef_Nr99_tax_silva.fasta
tornado 0.6a -16         1.28 MB/s   419 MB/s  1108994702     60275681   5.44 SILVA_132_SSURef_Nr99_tax_silva.fasta
lzma 16.04 -9            1.38 MB/s   248 MB/s  1108994702     62229404   5.61 SILVA_132_SSURef_Nr99_tax_silva.fasta
xz 5.2.3 -9              1.57 MB/s   236 MB/s  1108994702     62230769   5.61 SILVA_132_SSURef_Nr99_tax_silva.fasta
csc 2016-10-13 -5        2.53 MB/s   184 MB/s  1108994702     73401349   6.62 SILVA_132_SSURef_Nr99_tax_silva.fasta
Nakamichi 'Ryuugan-ditto-1TB'       1725 MB/s                105649099        ! outside lzbench !
lizard 1.0 -29           1.00 MB/s  2063 MB/s  1108994702    105745551   9.54 SILVA_132_SSURef_Nr99_tax_silva.fasta
lizard 1.0 -49           0.92 MB/s  2047 MB/s  1108994702    110132321   9.93 SILVA_132_SSURef_Nr99_tax_silva.fasta
crush 1.0 -2             0.18 MB/s   570 MB/s  1108994702    143260107  12.92 SILVA_132_SSURef_Nr99_tax_silva.fasta
xpack 2016-06-02 -9      7.17 MB/s   979 MB/s  1108994702    168644300  15.21 SILVA_132_SSURef_Nr99_tax_silva.fasta
zling 2016-01-10 -4        46 MB/s   291 MB/s  1108994702    190047636  17.14 SILVA_132_SSURef_Nr99_tax_silva.fasta
libdeflate 0.7 -12       3.26 MB/s   944 MB/s  1108994702    196295435  17.70 SILVA_132_SSURef_Nr99_tax_silva.fasta
ucl_nrv2e 1.03 -9        0.37 MB/s   416 MB/s  1108994702    205317109  18.51 SILVA_132_SSURef_Nr99_tax_silva.fasta
zlib 1.2.11 -9           2.16 MB/s   385 MB/s  1108994702    206566505  18.63 SILVA_132_SSURef_Nr99_tax_silva.fasta
lzsse2 2016-05-14 -17    2.00 MB/s  4319 MB/s  1108994702    206623630  18.63 SILVA_132_SSURef_Nr99_tax_silva.fasta
...

D:\TEXTORAMIC_benchmarking_2019-Aug-06>zstd-v1.4.2-win64.exe -b1e22 -i9 --priority=rt "SILVA_132_SSURef_Nr99_tax_silva.fasta"
Note : switching to real-time priority .fasta...
 1#9_tax_silva.fasta :1108994702 -> 210161592 (5.277), 299.5 MB/s , 743.8 MB/s
 2#9_tax_silva.fasta :1108994702 -> 233260454 (4.754), 273.4 MB/s , 645.7 MB/s
 3#9_tax_silva.fasta :1108994702 -> 177601790 (6.244), 208.5 MB/s , 692.4 MB/s
 4#9_tax_silva.fasta :1108994702 -> 175148431 (6.332), 174.1 MB/s , 703.7 MB/s
 5#9_tax_silva.fasta :1108994702 -> 222428075 (4.986), 113.7 MB/s , 643.0 MB/s
 6#9_tax_silva.fasta :1108994702 -> 201332037 (5.508),  78.1 MB/s , 694.3 MB/s
 7#9_tax_silva.fasta :1108994702 -> 186684821 (5.940),  51.5 MB/s , 774.1 MB/s
 8#9_tax_silva.fasta :1108994702 -> 179492127 (6.179),  43.7 MB/s , 807.5 MB/s
 9#9_tax_silva.fasta :1108994702 -> 165048155 (6.719),  23.8 MB/s , 861.9 MB/s
10#9_tax_silva.fasta :1108994702 -> 161026970 (6.887),  25.1 MB/s , 881.0 MB/s
11#9_tax_silva.fasta :1108994702 -> 159628337 (6.947),  21.4 MB/s , 880.2 MB/s
12#9_tax_silva.fasta :1108994702 -> 146761536 (7.556),  11.3 MB/s , 931.6 MB/s
13#9_tax_silva.fasta :1108994702 -> 121097075 (9.158),  3.95 MB/s ,1025.9 MB/s
14#9_tax_silva.fasta :1108994702 -> 116047146 (9.556),  4.36 MB/s ,1057.0 MB/s
Nakamichi 'Ryuugan-ditto-1TB'       105649099                      1725   MB/s ! outside zstdbench !
15#9_tax_silva.fasta :1108994702 -> 102816474 (10.79),  2.05 MB/s ,1050.9 MB/s
16#9_tax_silva.fasta :1108994702 ->  88817385 (12.49),  2.35 MB/s ,1001.7 MB/s
17#9_tax_silva.fasta :1108994702 ->  80163392 (13.83),  2.03 MB/s ,1014.6 MB/s
18#9_tax_silva.fasta :1108994702 ->  78822655 (14.07),  1.87 MB/s , 978.7 MB/s
19#9_tax_silva.fasta :1108994702 ->  73759356 (15.04),  1.36 MB/s ,1010.7 MB/s
20#9_tax_silva.fasta :1108994702 ->  64128802 (17.29),  1.26 MB/s ,1050.8 MB/s
21#9_tax_silva.fasta :1108994702 ->  60540485 (18.32),  1.13 MB/s ,1093.3 MB/s
22#9_tax_silva.fasta :1108994702 ->  58543278 (18.94),  0.99 MB/s ,1118.4 MB/s

So, it would be nice to see best of BriefLZ juxtaposed to other performers, especially to my toy.

@KirillKryukov
Copy link
Author

Firstly, thank you for such a well written and detailed report!

It's my pleasure, and thank you for reply!

I am afraid this behavior of the --optimal level is to be expected. In the current version it uses a binary tree match finder, which works well on many types of data, but has worst-case quadratic performance on certain types of data. The homo sapiens set in question contains parts that exhibit this.

The usual way to avoid such behavior is to limit the match search in some way (for instance stop when a certain number of potential matches have been checked, or when a match of acceptable length has been found -- this is what level 9 does). But in order to find the bit-optimal compressed representation for the BriefLZ format, it has to consider all possible matches.

Thank you for explanation. (May be this could be worth mentioning in the readme?)

I would suggest sticking to levels 1-9 with the current version of BriefLZ if your data may trigger this.

Problem is that I did not know that my data will trigger it. Testing with smaller data did not reveal it, until trying a 3 GB dataset. I'm afraid someone else might experience the same situation, and not have the patience (or time budget) to wait for compression to finish.

The function description of blz_pack_level() in the include file, as well as the command-line help of the blzpack utility, warn that the optimal level may be very slow. Would it help if I add that it may be very slow depending on the type of data?

I'm afraid it would not be enough. It's technically and strictly speaking correct to say "very slow" and "very slow depending on the type of data", but it will still not enable the user to build accurate model of what to expect.

The slowdown is ~100 times compared to its normal performance on other data. From the viewpoint of the user, it's indistinguishable from blzpack being frozen.

If I experienced this slowdown using blzpack for actual data compression (rather than benchmarking), I would certalinly kill the process (rather than waiting 27 hours) and conclude that it's broken. (And re-consider whether I can rely on a this broken compressor at all, even using other levels).

I would recommend clearly explaining this possible slowdown in a way that's hard to miss. E.g. something like:

"Important note about --optimal mode: Please use it only for testing, and never in production. It's compression speed is unpredictable and can get 100 times slower depending on data. It should be only used with unlimited time budget."

I fully understand being curious about the ultimate limit of the compression ratio, thus in the --optimal mode. I just think the user normally will not expect the slowdown, and it will help to add warning about the slowdown.

@KirillKryukov
Copy link
Author

KirillKryukov commented May 20, 2020

In addition, here is a chart comparing compression speed of several compressors:

http://kirr.dyndns.org/sequence-compression-benchmark/?d=all&cg=1&com=yes&src=all&nt=1&only-best=1&bn=1&bm=ratio&sm=same&tn=10&bs=100&rr=gzip-9&gm=same&cyl=lin&ccw=1500&cch=500&sxm=datasize&sxl=log&sym=cspeed&syl=log&button=Show+scatterplot

I included only general purpose compressors, and all datasets. Only strongest level (single thread) of each compressor is shown.

You can highlight each compressor by clicking on its marker in the legend.

This may help to compare the variation in compression speed of BriefLZ with other popular compressors.

@Sanmayce
Copy link

Your SCB is the best in many regards/departments, it serves quite well in estimating the effectiveness in real scenarios.

These old results of Nakamichi are bettered by the 2020-May-09 revision, personally I would love to see a clash/showdown of Lizard 49 vs Nakamichi vs BriefLZ (Joergen's favorite options) vs pigz-11-4t, with:
X axis: Test data size (MB)
Y axis: Transfer + decompression speed (MB/s)
as in Link speed: 100 Mbit/s (for estimating transfer time):

http://kirr.dyndns.org/sequence-compression-benchmark/?d=Gordonia+phage+GAL1+GCF_001884535.1+%2850.7+kB%29&d=WS1+bacterium+JGI+0000059-K21+GCA_000398605.1+%28522+kB%29&d=Astrammina+rara+GCA_000211355.2+%281.71+MB%29&d=Nosema+ceranae+GCA_000988165.1+%285.81+MB%29&d=Cryptosporidium+parvum+Iowa+II+GCA_000165345.1+%289.22+MB%29&d=Spironucleus+salmonicida+GCA_000497125.1+%2813.1+MB%29&d=Tieghemostelium+lacteum+GCA_001606155.1+%2823.7+MB%29&d=Fusarium+graminearum+PH-1+GCF_000240135.3+%2836.9+MB%29&d=Salpingoeca+rosetta+GCA_000188695.1+%2856.2+MB%29&d=Chondrus+crispus+GCA_000350225.2+%28106+MB%29&d=Kappaphycus+alvarezii+GCA_002205965.2+%28341+MB%29&d=Strongylocentrotus+purpuratus+GCF_000002235.4+%281.01+GB%29&d=Homo+sapiens+GCA_000001405.28+%283.31+GB%29&d=Mitochondrion+%28245+MB%29&d=NCBI+Virus+Complete+Nucleotide+Human+%28482+MB%29&d=Influenza+%281.22+GB%29&d=Helicobacter+%282.76+GB%29&d=SILVA+132+LSURef+%28610+MB%29&d=SILVA+132+SSURef+Nr99+%281.11+GB%29&d=SILVA+132+SSURef+%283.28+GB%29&d=UCSC+hg38+7way+knownCanonical-exonNuc+%28340+MB%29&d=UCSC+hg38+20way+knownCanonical-exonNuc+%28969+MB%29&d=PDB+%2867.6+MB%29&d=Homo+sapiens+GRCh38+peptides+all+%2873.2+MB%29&d=NCBI+Virus+RefSeq+Protein+%28122+MB%29&d=UniProtKB+Reviewed+%28Swiss-Prot%29+%28277+MB%29&agg=average&com=yes&src=all&nt=4&only-best=1&bn=1&bm=ratio&sm=same&tn=10&bs=100&rr=gzip-9&c=brieflz&c=lizard&c=nakamichi&c=pigz&tm0=empty&tm1=empty&tm2=empty&tm3=empty&tm4=empty&tm5=empty&tm6=empty&tm7=empty&gm=same&cyl=lin&ccw=1500&cch=500&sxm=datasize&sxmin=0&sxmax=0&sxl=log&sym=tdspeed&symin=0&symax=0&syl=log&button=Show+scatterplot

Also, when decompressing from non-remote i.e. local (on an SSD SATA 3 with 6GB/s) i.e. Link speed: 6000 Mbit/s (for estimating transfer time):

http://kirr.dyndns.org/sequence-compression-benchmark/?d=Gordonia+phage+GAL1+GCF_001884535.1+%2850.7+kB%29&d=WS1+bacterium+JGI+0000059-K21+GCA_000398605.1+%28522+kB%29&d=Astrammina+rara+GCA_000211355.2+%281.71+MB%29&d=Nosema+ceranae+GCA_000988165.1+%285.81+MB%29&d=Cryptosporidium+parvum+Iowa+II+GCA_000165345.1+%289.22+MB%29&d=Spironucleus+salmonicida+GCA_000497125.1+%2813.1+MB%29&d=Tieghemostelium+lacteum+GCA_001606155.1+%2823.7+MB%29&d=Fusarium+graminearum+PH-1+GCF_000240135.3+%2836.9+MB%29&d=Salpingoeca+rosetta+GCA_000188695.1+%2856.2+MB%29&d=Chondrus+crispus+GCA_000350225.2+%28106+MB%29&d=Kappaphycus+alvarezii+GCA_002205965.2+%28341+MB%29&d=Strongylocentrotus+purpuratus+GCF_000002235.4+%281.01+GB%29&d=Homo+sapiens+GCA_000001405.28+%283.31+GB%29&d=Mitochondrion+%28245+MB%29&d=NCBI+Virus+Complete+Nucleotide+Human+%28482+MB%29&d=Influenza+%281.22+GB%29&d=Helicobacter+%282.76+GB%29&d=SILVA+132+LSURef+%28610+MB%29&d=SILVA+132+SSURef+Nr99+%281.11+GB%29&d=SILVA+132+SSURef+%283.28+GB%29&d=UCSC+hg38+7way+knownCanonical-exonNuc+%28340+MB%29&d=UCSC+hg38+20way+knownCanonical-exonNuc+%28969+MB%29&d=PDB+%2867.6+MB%29&d=Homo+sapiens+GRCh38+peptides+all+%2873.2+MB%29&d=NCBI+Virus+RefSeq+Protein+%28122+MB%29&d=UniProtKB+Reviewed+%28Swiss-Prot%29+%28277+MB%29&agg=average&com=yes&src=all&nt=4&only-best=1&bn=1&bm=ratio&sm=same&tn=10&bs=6000&rr=gzip-9&c=brieflz&c=lizard&c=nakamichi&c=pigz&tm0=empty&tm1=empty&tm2=empty&tm3=empty&tm4=empty&tm5=empty&tm6=empty&tm7=empty&gm=same&cyl=lin&ccw=1500&cch=500&sxm=datasize&sxmin=0&sxmax=0&sxl=log&sym=tdspeed&symin=0&symax=0&syl=log&button=Show+scatterplot

@KirillKryukov
Copy link
Author

Thanks, Georgi.

A quick note about BriefLZ decompression speed results in these charts. Currently it is handicapped by not supporting streaming output. I.e., blzpack always outputs to a file (as far as I understand). However in my benchmark I always measure streaming decompression speed. Therefore, for compressors without such streaming mode (such as blzpack), I let them decompress to a file, then stream this file, and measure the total time.

If blzpack will get support for streaming mode, its decompression speed results will improve. (This happened with Nakamich after Georgi added such streaming mode recently).

Streaming mode is important for many practical applications, where you don't want to store decompressed data in a file, but instead process it on the fly with another tool, as part of data analysis pipeline.

(I also use only streaming mode during compression, but compression is much slower so the impact is comparatively smaller for blzpack. However its compression speed results should also improve with addition of streaming mode).

(Note that only uncompressed data is streamed in my benchmark. For compressed data it's OK to work with a file directly).

@jibsen
Copy link
Owner

jibsen commented May 21, 2020

Thank you for the comments both.

What is the default blocksize?

The example program blzpack uses a default block size of 1 MiB.

What is the maximum blocksize and how much memory (roughly) it uses, like in 22N?

The example program is limited to a block size of roughly 3.6 GiB (-b3600m) -- this is due to storing the compressed size in a 32-bit value in the block header, and the max expansion of uncompressible data.

The compression library itself is limited by the size of the type used to store offsets and lengths (blz_word), which is currently a 32-bit value to reduce memory usage.

The optimal level uses 5 * block size words of work memory, which for 32-bit values is 20N bytes. Add to that the current input and output blocks which are also held in memory and you get 22N.

For reference (in 1.3.0) levels 1-4 use a fixed amount of workmem, levels 5-7 use 12N bytes of workmem, and levels 8-10 use 20N bytes of workmem.

Does using -b4g make any sense with, say level 1?

Usually not. Lower compression levels limit the search depth (at level 1 only the most recent match is checked), so using larger block sizes may have less of an impact than on higher levels. That being said, you could of course construct inputs where it would matter hugely, like 1GiB of random data repeated.

Playing around with some genome data today, I must admit sometimes it gives surprising results compared to other types of data.

Is -9 -b4g the best combination, currently, for Human Genome (~3.3GB)?
What options you personally want to see?

If you have the ~80 GiB of RAM it would take, then -9 -b3600m should give the highest compression avoiding the behavior of --optimal on certain areas. -6 -b3600m might be a better trade-off between compression ratio and time.

From a few quick tests, it does not appear that the block size has a huge effect on the compression ratio on the genome data.

Problem is that I did not know that my data will trigger it. Testing with smaller data did not reveal it, until trying a 3 GB dataset. I'm afraid someone else might experience the same situation, and not have the patience (or time budget) to wait for compression to finish.

That is a good point, I will try to come up with a suitable warning.

Streaming mode is important for many practical applications, where you don't want to store decompressed data in a file, but instead process it on the fly with another tool, as part of data analysis pipeline.

If by streaming you mean the ability to read blocks from stdin and write blocks to stdout then that would be possible to add. blzpack is really just a simple example to let people play around with the BriefLZ library.

@KirillKryukov
Copy link
Author

If you have the ~80 GiB of RAM it would take, then -9 -b3600m should give the highest compression avoiding the behavior of --optimal on certain areas. -6 -b3600m might be a better trade-off between compression ratio and time.

I'll try to add these settings to the benchmark in the future.

I will try to come up with a suitable warning.

Thanks.

If by streaming you mean the ability to read blocks from stdin and write blocks to stdout then that would be possible to add. blzpack is really just a simple example to let people play around with the BriefLZ library.

Yes, this is exactly what I mean, sorry that it was unclear.

BriefLZ decompression is fast enough for this to matter in my tests, so this would be a very welcome addition.

@Sanmayce
Copy link

Joergen, you are welcome to see my latest post (inhere my old web-browser prohibits me of posting attachments and pictures), looking forward to seeing how BriefLZ performs with bigger blocks, for now, I included the default use:
https://community.centminmod.com/posts/84082/

@jibsen
Copy link
Owner

jibsen commented Jun 12, 2020

Regarding streaming, if you are on Linux you could try simply specifying stdin and stdout as filenames, like:

./blzpack -9 /dev/stdin /dev/stdout <foo >foo.blz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants