forked from mozilla/gecko-dev
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Bug 715113: Update Snappy to r56. r=bent
- Loading branch information
Showing
9 changed files
with
221 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,3 +36,6 @@ _OPT\.OBJ/ | |
|
||
# Java HTML5 parser classes | ||
^parser/html/java/(html|java)parser/ | ||
|
||
# SVN directories | ||
\.svn/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
Snappy framing format description | ||
Last revised: 2011-12-15 | ||
|
||
This format decribes a framing format for Snappy, allowing compressing to | ||
files or streams that can then more easily be decompressed without having | ||
to hold the entire stream in memory. It also provides data checksums to | ||
help verify integrity. It does not provide metadata checksums, so it does | ||
not protect against e.g. all forms of truncations. | ||
|
||
Implementation of the framing format is optional for Snappy compressors and | ||
decompressor; it is not part of the Snappy core specification. | ||
|
||
|
||
1. General structure | ||
|
||
The file consists solely of chunks, lying back-to-back with no padding | ||
in between. Each chunk consists first a single byte of chunk identifier, | ||
then a two-byte little-endian length of the chunk in bytes (from 0 to 65535, | ||
inclusive), and then the data if any. The three bytes of chunk header is not | ||
counted in the data length. | ||
|
||
The different chunk types are listed below. The first chunk must always | ||
be the stream identifier chunk (see section 4.1, below). The stream | ||
ends when the file ends -- there is no explicit end-of-file marker. | ||
|
||
|
||
2. File type identification | ||
|
||
The following identifiers for this format are recommended where appropriate. | ||
However, note that none have been registered officially, so this is only to | ||
be taken as a guideline. We use "Snappy framed" to distinguish between this | ||
format and raw Snappy data. | ||
|
||
File extension: .sz | ||
MIME type: application/x-snappy-framed | ||
HTTP Content-Encoding: x-snappy-framed | ||
|
||
|
||
3. Checksum format | ||
|
||
Some chunks have data protected by a checksum (the ones that do will say so | ||
explicitly). The checksums are always masked CRC-32Cs. | ||
|
||
A description of CRC-32C can be found in RFC 3720, section 12.1, with | ||
examples in section B.4. | ||
|
||
Checksums are not stored directly, but masked, as checksumming data and | ||
then its own checksum can be problematic. The masking is the same as used | ||
in Apache Hadoop: Rotate the checksum by 15 bits, then add the constant | ||
0xa282ead8 (using wraparound as normal for unsigned integers). This is | ||
equivalent to the following C code: | ||
|
||
uint32_t mask_checksum(uint32_t x) { | ||
return ((x >> 15) | (x << 17)) + 0xa282ead8; | ||
} | ||
|
||
Note that the masking is reversible. | ||
|
||
The checksum is always stored as a four bytes long integer, in little-endian. | ||
|
||
|
||
4. Chunk types | ||
|
||
The currently supported chunk types are described below. The list may | ||
be extended in the future. | ||
|
||
|
||
4.1. Stream identifier (chunk type 0xff) | ||
|
||
The stream identifier is always the first element in the stream. | ||
It is exactly six bytes long and contains "sNaPpY" in ASCII. This means that | ||
a valid Snappy framed stream always starts with the bytes | ||
|
||
0xff 0x06 0x00 0x73 0x4e 0x61 0x50 0x70 0x59 | ||
|
||
The stream identifier chunk can come multiple times in the stream besides | ||
the first; if such a chunk shows up, it should simply be ignored, assuming | ||
it has the right length and contents. This allows for easy concatenation of | ||
compressed files without the need for re-framing. | ||
|
||
|
||
4.2. Compressed data (chunk type 0x00) | ||
|
||
Compressed data chunks contain a normal Snappy compressed bitstream; | ||
see the compressed format specification. The compressed data is preceded by | ||
the CRC-32C (see section 3) of the _uncompressed_ data. | ||
|
||
Note that the data portion of the chunk, i.e., the compressed contents, | ||
can be at most 65531 bytes (2^16 - 1, minus the checksum). | ||
However, we place an additional restriction that the uncompressed data | ||
in a chunk must be no longer than 32768 bytes. This allows consumers to | ||
easily use small fixed-size buffers. | ||
|
||
|
||
4.3. Uncompressed data (chunk type 0x01) | ||
|
||
Uncompressed data chunks allow a compressor to send uncompressed, | ||
raw data; this is useful if, for instance, uncompressible or | ||
near-incompressible data is detected, and faster decompression is desired. | ||
|
||
As in the compressed chunks, the data is preceded by its own masked | ||
CRC-32C (see section 3). | ||
|
||
An uncompressed data chunk, like compressed data chunks, should contain | ||
no more than 32768 data bytes, so the maximum legal chunk length with the | ||
checksum is 32772. | ||
|
||
|
||
4.4. Reserved unskippable chunks (chunk types 0x02-0x7f) | ||
|
||
These are reserved for future expansion. A decoder that sees such a chunk | ||
should immediately return an error, as it must assume it cannot decode the | ||
stream correctly. | ||
|
||
Future versions of this specification may define meanings for these chunks. | ||
|
||
|
||
4.5. Reserved skippable chunks (chunk types 0x80-0xfe) | ||
|
||
These are also reserved for future expansion, but unlike the chunks | ||
described in 4.4, a decoder seeing these must skip them and continue | ||
decoding. | ||
|
||
Future versions of this specification may define meanings for these chunks. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.