forked from borisbrodski/sevenzipjbinding
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathcompression_snippets.html
484 lines (384 loc) · 18.6 KB
/
compression_snippets.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
<h1>Archive compression and modification code snippets</h1>
<h2>Index</h2>
<ul>
<li><a href="#creating">Creating new archives</a></li>
<ul>
<li><a href="#archive-test-structure">Archive test structure</a></li>
<li><a href="#create-specific-api">Creating archives with the generic API</a></li>
<ul>
<li><a href="#create-specific-api-zip">Creating Zip archive using archive format specific API</a></li>
<li><a href="#create-specific-api-7z">Creating 7-Zip archive using archive format specific API</a></li>
<li><a href="#create-specific-api-tar">Creating Tar archive using archive format specific API</a></li>
<li><a href="#create-specific-api-gzip">Creating GZip archive using archive format specific API</a></li>
<li><a href="#create-specific-api-bzip2">Creating BZip2 archive using archive format specific API</a></li>
</ul>
<li><a href="#create-generic">Creating archives with the archive format specific API</a></li>
<li><a href="#create-encrypted">Creating password protected archives</a></li>
</ul>
<li><a href="#update">Modifying existing archives</a></li>
<ul>
<li><a href="#update-alter-items">Altering existing archive items</a></li>
<li><a href="#update-add-remove-items">Adding and removing archive items</a></li>
</ul>
<li><a href="#trace">Troubleshoot problems using tracing</a></li>
</ul>
<h1><a name="creating">Creating new archives</h1>
<p>
The are two slightly different APIs for creating new archives:
<ul>
<li>archive format specific API</li>
<li>archive format independent API</li>
</ul>
The first API is designed to work with one particular archive format, like Zip. The Second API
allows archive format independent programming.
</p>
<h2><a name="archive-test-structure">Archive test structure</h2>
<p>
Some archive formats like GZip only support compression of a single file, while other archive formats
allow multiple files and folders to be compressed.
In order to demonstrate how those archives can be created, some test file and folder structure is required.
The following snippets use a static structure defined by the CompressArchiveStructure class:
</p>
##INCLUDE_SNIPPET(CompressArchiveStructure)
<h2><a name="create-specific-api">Creating archives with the archive format specific API</h2>
<p>
The archive format specific API provides easy access to the archive configuration
methods (e.g. for setting the compression level).
Also it uses archive format specific item description interfaces (like IOutItemZip for Zip).
Different archive formats support different archive item properties.
Those interfaces provide access to the properties supported by the corresponding
archive format, whether the unsupported properties remain hidden.
</p>
<p>
Lets see how different archives can be created using archive format specific API
</p>
<h3><a name="create-specific-api-zip">Creating Zip archive using archive format specific API</h3>
<p>
Creating Zip archive using archive format specific API was already presented in the "first steps".
The key parts of the code are:
</p>
<ul>
<li>
Implementation of the IOutCreateCallback<IOutItemCallbackZip> interface specifying the structure of
the new archive. Also the progress of the compression/update operation get reported here.
</li>
<li>
Creating an instance of the IOutArchive interface and calling createArchive() method.
</li>
</ul>
##INCLUDE_SNIPPET(CompressNonGenericZip)
<p>
If you run this program with (on Linux)
</p>
<div class="fragment"><pre class="fragment">
$ java -cp ‹path-to-lib›\sevenzipjbinding.jar; \
‹path-to-lib›\sevenzipjbinding-Windows-x86.jar;. \
CompressNonGenericZip compressed.zip
</pre></div>
<p>
you will get the output
</p>
##INCLUDE_OUTPUT(CompressNonGenericZip)
<p>
The archive file compressed.zip should be created. It contains files and folders specified in the <a href="#archive-test-structure">CompressArchiveStructure</a> class.
</p>
<div class="fragment"><pre class="fragment">
$ 7z l compressed.zip
Listing archive: compressed.zip
--
Path = compressed.zip
Type = zip
Physical Size = 718
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2015-09-09 08:56:42 ..... 16 16 info.txt
2015-09-09 08:56:42 ..... 100 100 random-100-bytes.dump
2015-09-09 08:56:42 ..... 38 38 dir1/file1.txt
2015-09-09 08:56:42 D.... 0 0 dir2
2015-09-09 08:56:42 ..... 38 38 dir2/file2.txt
------------------- ----- ------------ ------------ ------------------------
192 192 4 files, 1 folders
</pre></div>
<h3><a name="create-specific-api-7z">Creating 7-Zip archive using archive format specific API</h3>
<p>
Creating 7z archive is a little bit easer that creating Zip archives. The main difference is the implementation
of the MyCreateCallback.getItemInformation() method. 7z doesn't need relative complex calculation of the attribute
property providing a nice default behavior.
</p>
##INCLUDE_SNIPPET(CompressNonGeneric7z)
<p>
For instructions on how to running the snippet and check the results see
<a href="#create-specific-api-zip">Creating Zip archive using archive format specific API</a>.
</p>
<h3><a name="create-specific-api-tar">Creating Tar archive using archive format specific API</h3>
<p>
Creating tar archives is pretty much the same, as creating 7z archives, since the default values for most properties
are good enough in most cases. Note, that the tar archive format do have attribute property.
But due to the Unix-nature of the tar, it was renamed to PosixAttributes. Also the meaning of the bits is different.
</p>
##INCLUDE_SNIPPET(CompressNonGenericTar)
<p>
For instructions on how to running the snippet and check the results see <a href="#create-specific-api-zip">Creating Zip archive using archive format specific API</a>.
</p>
<h3><a name="create-specific-api-gzip">Creating GZip archive using archive format specific API</h3>
<p>
GZip format is a stream archive format meaning, that it can only compress a single file. This simplifies
the programming quite a bit. In the following snippet a single message passed through the second command line
parameter get compressed. Note, that like non-stream archive formats GZip also supports optional Path and
LastModificationTime properties for the single archive item.
</p>
##INCLUDE_SNIPPET(CompressNonGenericGZip)
<p>
For instructions on how to running the snippet and check the results see <a href="#create-specific-api-zip">Creating Zip archive using archive format specific API</a>.
</p>
<h3><a name="create-specific-api-bzip2">Creating BZip2 archive using archive format specific API</h3>
<p>
BZip2 is like GZip a stream archive format. It compresses single archive item supporting no additional
archive item properties at all.
</p>
##INCLUDE_SNIPPET(CompressNonGenericBZip2)
<p>
For instructions on how to running the snippet and check the results see <a href="#create-specific-api-zip">Creating Zip archive using archive format specific API</a>.
</p>
<h2><a name="create-generic">Creating archives with the generic API</h2>
<p>
The one of the great features of the 7-Zip (and though of the 7-Zip-JBinding) is the ability to write archive format
independent code supporting most or even all of the archive formats, supported by 7-Zip. The following code snippet
accepts the required archive format as the first parameter compressing the test data in the specified archive format.
</p>
<p>
The key steps to write a generic compression code are
<ul>
<li>Use IOutItemAllFormats interface instead of the one of the archive specific interfaces, like IOutItem7z</li>
<li>Create out-archive object using generic create method SevenZip.openOutArchive(ArchiveFormat)</li>
</ul>
</p>
##INCLUDE_SNIPPET(CompressGeneric)
<p>
Now you can run the snippet with different parameters creating archives with different formats. The last parameter
specifies, how many archive items from the <a href="#archive-test-structure">CompressArchiveStructure</a> should be
compressed. This number should be between 1 and 5 for 7z, Zip and Tar, and must be 1 for the stream formats GZip and BZip2.
</p>
<ul class="list-with-pre-s">
<li>To create a 7z archive with 5 items:
<div class="fragment"><pre class="fragment">
C:\Test> java -cp ... CompressGeneric SEVEN_ZIP compressed_generic.zip 5
##INCLUDE_RAW_OUTPUT(CompressGeneric7z)
</pre></div>
</li>
<li>To create a Zip archive with 5 items:
<div class="fragment"><pre class="fragment">
C:\Test> java -cp ... CompressGeneric ZIP compressed_generic.zip 5
##INCLUDE_RAW_OUTPUT(CompressGenericZip)
</pre></div>
</li>
<li>To create a Tar archive with 5 items:
<div class="fragment"><pre class="fragment">
C:\Test> java -cp ... CompressGeneric TAR compressed_generic.tar 5
##INCLUDE_RAW_OUTPUT(CompressGenericTar)
</pre></div>
</li>
<li>To create a GZip archive with 1 item:
<div class="fragment"><pre class="fragment">
C:\Test> java -cp ... CompressGeneric GZIP compressed_generic.zip 1
##INCLUDE_RAW_OUTPUT(CompressGenericGZip)
</pre></div>
</li>
<li>To create a BZip2 archive with 1 item:
<div class="fragment"><pre class="fragment">
C:\Test> java -cp ... CompressGeneric BZIP2 compressed_generic.bz2 1
##INCLUDE_RAW_OUTPUT(CompressGenericBZip2)
</pre></div>
</li>
</ul>
<p>
Also a bunch of the compressed_generic.* archives should be created with the corresponding contents.
</p>
<h2><a name="create-encrypted">Creating password protected archives</h2>
<p>
Depending on the archive format it's possible to create and update password protected archives or even encrypt the archive header.
Encrypting archive items secures the content of the item leaving the item meta data (like archive item name, size, ...) unprotected.
The header encryption solves this problem by encrypting the entire archive.
</p>
<p>
To create a password protected archive the create- or update-callback class should additionally implement the ICryptoGetTextPassword interface.
The cryptoGetTextPassword() method of the interface is called to get the password for the encryption. The header encryption can be
activated using IOutFeatureSetEncryptHeader.setHeaderEncryption() method of the outArchive object, if the outArchive object implements the
IOutFeatureSetEncryptHeader interface an though supports the header encryption.
</p>
<p>
The following snippet takes the password as a second argument of the main() method. It uses the password to encrypt archive items.
It also activates the header encryption. It's supported by the selected 7z algorithm.
</p>
##INCLUDE_SNIPPET(CompressWithPassword)
<p>
You can test the snippet like this:
</p>
<div class="fragment"><pre class="fragment">
C:\Test> java -cp ... CompressWithPassword compressed_with_pass.zip my-pass
##INCLUDE_RAW_OUTPUT(CompressWithPassword)
</pre></div>
<br/>
<h1><a name="update">Modifying existing archives</h1>
<p>
7-Zip-JBinding provides API for archive modification. Especially by small changes the modification of an archive
is much faster compared to the extraction and the consequent compression.
The archive modification API (like the compression API) offers archive format specific
and archive format independent variants.
The process of the modification of an existing archive contains following steps:
</p>
<ul>
<li>Open existing archive obtaining an instance of the IInArchive</li>
<li>Call IInArchive.getConnectedOutArchive() to get an instance of the IOutUpdateArchive</li>
<li>Optionally cast it to an archive format specific API interface like IOutUpdateArchiveZip</li>
<li>Call IOutUpdateArchive.updateItems() passing the new count of items and a callback implementing the modification</li>
</ul>
<p>
The following snippets show the modification process in details using archive format independent API.
The archive to be modified is one of the Zip, 7z or Tar archives created by the corresponding compression
snippets on this page. The structure of those archives is specified in the <a href="#archive-test-structure">CompressArchiveStructure</a> class.
</p>
<h2><a name="update-alter-items">Altering existing archive items</h2>
<p>
The first snippet modifies one existing item with index 2 (info.txt, 16 bytes):
</p>
<ul>
<li>It changes the path (name of the item) to the "info2.txt"</li>
<li>It changes the content to the "More Info!" (10 bytes)</li>
</ul>
##INCLUDE_SNIPPET(UpdateAlterItems)
<p>
If you run this program with (on Linux)
</p>
<div class="fragment"><pre class="fragment">
$ java -cp ‹path-to-lib›\sevenzipjbinding.jar; \
‹path-to-lib›\sevenzipjbinding-Windows-x86.jar;. \
UpdateAlterItems <git-root>/testdata/snippets/to-update.7z updated.7z
</pre></div>
<p>
you will get the output
</p>
##INCLUDE_OUTPUT(UpdateAlterItems)
<p>
Now lets look at original and modified archives:
</p>
<div class="fragment"><pre class="fragment">
$ 7z l <git-root>/testdata/snippets/to-update.7z
...
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2015-09-14 07:57:09 ..... 38 159 dir1/file1.txt
2015-09-14 07:57:09 ..... 38 dir2/file2.txt
2015-09-14 07:57:09 ..... 16 info.txt
2015-09-14 07:57:09 ..... 100 random-100-bytes.dump
2015-09-14 07:57:09 D.... 0 0 dir2/
------------------- ----- ------------ ------------ ------------------------
192 159 4 files, 1 folders
</pre></div>
<div class="fragment"><pre class="fragment">
$ 7z l updated.7z
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2015-09-14 07:57:09 ..... 38 151 dir1/file1.txt
2015-09-14 07:57:09 ..... 38 dir2/file2.txt
2015-09-14 07:57:09 ..... 100 random-100-bytes.dump
2015-09-14 07:57:09 ..... 10 16 info2.txt
2015-09-14 07:57:09 D.... 0 0 dir2/
------------------- ----- ------------ ------------ ------------------------
186 167 4 files, 1 folders
</pre></div>
<p>
As you can see, the file "info.txt" (16 bytes) was replaces with the file "info2.txt" (10 bytes).
</p>
<h2><a name="update-add-remove-items">Adding and removing archive items</h2>
<p>
Now lets see how archive items can be added and removed.
In order to remove an archive item a reindexing is necessary. In the previous snippet for each archive
item the indexes in the old archive and the index in the new archive were the same. But after removing one item all consequent
indexes in the new archive will change and will be less, that corresponding indexes in the old archive. Here is an example of removing
an item C with index 2:
</p>
<pre>
Index: 0 1 2 3 4
Old archive: (A) (B) (C) (D) (E)
New archive: (A) (B) (D) (E)
</pre>
<p>
Here the index of D in the old archive is 3, but in the new archive is 2.
</p>
<p>
In order to add a new item the count of items in archive passed to the IOutArchive.updateItems() method should be increased.
In the callback the new item with the new index (that doesn't map to any old archive items) should be initialized exactly, like
new items get initialized during a compression operation. The next snippet
</p>
<ul>
<li>Removes "info.txt" file</li>
<li>Adds "data.dmp" file with 11 bytes of content</li>
</ul>
##INCLUDE_SNIPPET(UpdateAddRemoveItems)
<p>
If you run this program with (on Linux)
</p>
<div class="fragment"><pre class="fragment">
$ java -cp ‹path-to-lib›\sevenzipjbinding.jar; \
‹path-to-lib›\sevenzipjbinding-Windows-x86.jar;. \
UpdateAddRemoveItems ‹git›/testdata/snippets/to-update.7z updated.7z
</pre></div>
<p>
you will get the output
</p>
##INCLUDE_OUTPUT(UpdateAlterItems)
<p>
Now lets look at original and modified archives:
</p>
<div class="fragment"><pre class="fragment">
$ 7z l <git-root>/testdata/snippets/to-update.7z
...
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2015-09-14 07:57:09 ..... 38 159 dir1/file1.txt
2015-09-14 07:57:09 ..... 38 dir2/file2.txt
2015-09-14 07:57:09 ..... 16 info.txt
2015-09-14 07:57:09 ..... 100 random-100-bytes.dump
2015-09-14 07:57:09 D.... 0 0 dir2/
------------------- ----- ------------ ------------ ------------------------
192 159 4 files, 1 folders
</pre></div>
<div class="fragment"><pre class="fragment">
$ 7z l updated.7z
...
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2015-09-14 07:57:09 ..... 38 151 dir1/file1.txt
2015-09-14 07:57:09 ..... 38 dir2/file2.txt
2015-09-14 07:57:09 ..... 100 random-100-bytes.dump
2015-09-16 21:43:52 ..... 11 16 data.dmp
2015-09-14 07:57:09 D.... 0 0 dir2/
------------------- ----- ------------ ------------ ------------------------
187 167 4 files, 1 folders
</pre></div>
<p>
As you can see, the file info.txt is gone and the file data.dmp (11 bytes) appears in the archive.
</p>
<h1><a name="trace">Troubleshoot problems using tracing</h1>
<p>
One of the weak sides of the 7-zip compression engine is a rather simple error reporting. If
some provided data doesn't satisfy the compressor it fails without any descriptive error message.
One way to get an clue is to use 7-Zip-JBinding tracing feature. Here is the code passing invalid
data size for the item 1 and though failing.
</p>
##INCLUDE_SNIPPET(CompressWithError)
<p>
If you run this program you will get the following error message printed to the System.err:
</p>
##INCLUDE_OUTPUT(CompressWithErrorErr)
<p>
The error message provides no useful information for finding the bug. But since the snippet enables
tracing by calling IOutArchive.setTrace(true), the trace log get printed to the System.out.
</p>
##INCLUDE_OUTPUT(CompressWithError)
<p>
As you see, the tracing stops just after 7-zip retrieved the size of the data for the item 1. This
suggests, that the value for the size of the data of the item 1 may cause the failure. In this
small example, like in most other cases, this will help to find the problem.
</p>