===
Chronicle Wire supports a separation of describing what data you want to store and retrieve and how it should be rendered/parsed. Wire handles a variety of formatting options for a wide range of formats.
How do these options affect performance?
The tests were run with -Xmx1g -XX:MaxInlineSize=400 on an isolated CPU on an i7-3970X with 32 GB of memory.
These are latency test with the full Results Here
Something I found interesting is that while a typical time might be stable for a given run, you can get different results at different time.
For this reason I ran the tests with 10 forks. By looking at the high percentiles, we tend to pick up the results of the worst run.
Wire Format | Text encoding | Fixed width values? | Numeric Fields? | field-less? | Bytes | 99.9 %tile | 99.99 %tile | 99.999 %tile | worst |
---|---|---|---|---|---|---|---|---|---|
YAML (TextWire) | UTF-8 | false | false | false | 91 | 2.81 | 4.94 | 8.62 | 17.2 |
YAML (TextWire) | 8-bit | false | false | false | 91 | 2.59 | 4.70 | 8.58 | 16.8 |
JSONWire | 8-bit | false | false | false | 100 | 3.11 | 5.56 | 10.62 | 36.9 |
BinaryWire | UTF-8 | false | false | false | 70 | 1.57 | 3.42 | 7.14 | 35.1 |
BinaryWire | UTF-8 | false | true | false | 44 | 0.67 | 2.44 | 5.93 | 12.1 |
BinaryWire | UTF-8 | true | false | false | 84 | 1.51 | 3.32 | 7.22 | 37.4 |
BinaryWire | UTF-8 | true | true | false | 57 | 0.57 | 2.26 | 5.09 | 17.5 |
BinaryWire | UTF-8 | false | na | true | 32 | 0.65 | 2.42 | 5.53 | 8.6 |
RawWire | UTF-8 | true | na | true | 43 | 0.49 | 2.07 | 4.87 | 8.6 |
RawWire | 8-bit | true | na | true | 43 | 0.40 | 0.57 | 2.90 | 7.6 |
BytesMarshallable | 8-bit | true | na | true | 39 | 0.17 | 0.21 | 2.13 | 6.9 |
BytesMarshallable + stop bit encoding | 8-bit | true | na | true | 28 | 0.21 | 0.25 | 2.40 | 6.4 |
All times are in micro-seconds
"8-bit" encoding is ISO-8859-1 where characters 0 to 255 are mapped to bytes 0 to 255. I this case, (bytes) string.charAt(0)
In all cases, a 4 bytes header was used to determine the length of message.
The code to support Wire can be added to the class itself. These marshallers can also be added to another classes, but in general you would only do this if you can't modify the class you wish to serialize.
@Override
public void readMarshallable(WireIn wire) throws IllegalStateException {
wire.read(DataFields.price).float64(setPrice)
.read(DataFields.flag).bool(setFlag)
.read(DataFields.text).text(text)
.read(DataFields.side).asEnum(Side.class, setSide)
.read(DataFields.smallInt).int32(setSmallInt)
.read(DataFields.longInt).int64(setLongInt);
}
@Override
public void writeMarshallable(WireOut wire) {
wire.write(DataFields.price).float64(price)
.write(DataFields.flag).bool(flag)
.write(DataFields.text).text(text)
.write(DataFields.side).asEnum(side)
.write(DataFields.smallInt).int32(smallInt)
.write(DataFields.longInt).int64(longInt);
}
The code to support BytesMarshallable can eb added to the class marshalled. While this is faster than using Wire, there is no option to visualise the data without deserializing it, nor does it directly support schema changes.
@Override
public void readMarshallable(Bytes<?> bytes) {
price = bytes.readDouble();
longInt = bytes.readLong();
smallInt = bytes.readInt();
flag = bytes.readBoolean();
// side = bytes.readEnum(Side.class);
side = bytes.readBoolean() ? Side.Buy : Side.Sell;
bytes.read8bit(text);
}
@Override
public void writeMarshallable(Bytes bytes) {
bytes.writeDouble(price)
.writeLong(longInt)
.writeInt(smallInt)
// .writeEnum(side)
.writeBoolean(flag)
.writeBoolean(side == Side.Buy)
.write8bit(text);
}
There was a small advantage in encoding the side as a boolean rather than an Enum as the later writes the name() as a String.
Wire Format | Text encoding | Fixed width values? | Numeric Fields? | field-less? | Bytes | 99.9 %tile | 99.99 %tile | 99.999 %tile | worst |
---|---|---|---|---|---|---|---|---|---|
SBE | 8-bit | true | true | true | 43 | 0.31 | 0.44 | 4.11 | 9.2 |
Jackson | UTF-8 | false | false | false | 100 | 4.95 | 8.33 | 1,400 | 1,500 |
BSON | UTF-8 | true | false | false | 96 | 19.8 | 1,430 | 1,400 | 1,600 |
Snake YAML | UTF-8 | false | false | false | 89 | 80.3 | 4,067 | 16,000 | 24,000 |
BOON Json | UTF-8 | false | false | false | 100 | 20.7 | 32.5 | 11,000 | 69,000 |
Externalizable | UTF-8 | true | false | false | 197 | 25.0 | 29.7 | 85,000 | 120,000 |
Externalizable with Chronicle Bytes | UTF-8 | true | false | false | 197 | 24.5 | 29.3 | 85,000 | 118,000 |
All times are in micro-seconds
Wire Format | Bytes | 99.9 %tile | 99.99 %tile | 99.999 %tile | worst |
---|---|---|---|---|---|
JSONWire | 100* | 3.11 | 5.56 | 10.62 | 36.9 |
Jackson | 100 | 4.95 | 8.3 | 1,400 | 1,500 |
Jackson + C-Bytes | 100* | 2.87 | 10.1 | 1,300 | 1,400 |
Jackson + C-Bytes Reader/Writer | 100* | 3.06 | 10.3 | 883 | 1,500 |
BSON | 96 | 19.8 | 1,430 | 1,400 | 1,600 |
BSON + C-Bytes | 96* | 7.47 | 15.1 | 1,400 | 11,600 |
BOON Json | 100 | 20.7 | 32.5 | 11,000 | 69,000 |
"C-Bytes" means using a recycled Chronicle Bytes buffer.
Tests with "*" on Bytes mean this has been written to/read from direct memory and won't have a copy overhead when working with NIO such as TCP or Files.
SBE performs as well are BytesMarshallable. Even though it was slower in this test, the difference to too small to draw any conclusions. i.e. in a different use case, a different developer might find the difference reversed.
However, I didn't find SBE simple. Perhaps this is because I didn't use the generation for different languages, but I found it was non-trivial to use and setup.
For a flat class with just six fields it generated 9 classes, we have three support classes, and I ended up adding methods to the generated class to get it to perform as efficiently as I wanted.
This is likely to be a lack of understand on my part, though I might not be alone in this.
Snake YAML is a fully featured YAML 1.1 parser. The library has to do much more work to support all the features of the YAML standard which necessarily takes longer.
However, it takes a lot longer which means it may not be suitable if you need performance but you don't need a complete YAML parser.
If you need to decode YAML which could come from any sources, Snake YAML is a better choice.
Here some selected examples. The UTF-8 encoded and 8-bit encoded tests look the same in these cases as there isn't any characters >= 128.
The main advantage of YAML based wire format is it is easier to implement with, document and debug.
What you want is the ease of a text wire format but the speed of a binary wire format.
Being able to switch from one to the other can save you a lot of time in development and support, but still give you the speed you want.
This uses 91 bytes, Note: the "--- !!data" is added by the method to dump the data. This information is encoded in the first 4 bytes which contains the size.
--- !!data
price: 1234
flag: true
text: Hello World!
side: Sell
smallInt: 123
longInt: 1234567890
This wire produces a JSON style output. It has some YAML based extensions for typed data.
Test json8bit used 100 bytes.
--- !!data
"price":1234,"longInt":1234567890,"smallInt":123,"flag":true,"text":"Hello World!","side":"Sell"
This binary wire has be automatically decoded to text by Wires.fromSizePrefixedBlobs(Bytes)
This uses 70 bytes. Note: the "--- !!data #binary" is added by the method to dump the data. This information is encoded in the first 4 bytes which contains the size.
--- !!data #binary
price: 1234
flag: true
text: Hello World!
side: Sell
smallInt: 123
longInt: 1234567890
Fixed width fields support binding to values later and updating them atomically.
Note: if you want to bind to specific values, there is support for this which will also ensure the values are aligned.
This format uses 84 bytes
--- !!data #binary
price: 1234
flag: true
text: Hello World!
side: Sell
smallInt: 123
longInt: 1234567890
Numbered fields are more efficient to write and read, but are not as friendly to work with.
Test bwireFTF used 44 bytes.
--- !!data #binary
3: 1234
4: true
5: Hello World!
6: Sell
1: 123
2: 1234567890
Test bwireTTF used 58 bytes.
--- !!data #binary
3: 1234
4: true
5: Hello World!
6: Sell
1: 123
2: 1234567890
Test bwireFTT used 32 bytes.
--- !!data #binary
1234
true
Hello World!
Sell
123
1234567890
Raw wire format drops all meta data. It must be fixed width as there is no way to use compact types.
Test rwireUTF and rwire8bit used 43 bytes.
00000000 27 00 00 00 00 00 00 00 00 48 93 40 B1 0C 48 65 '······· ·H·@··He
00000010 6C 6C 6F 20 57 6F 72 6C 64 21 04 53 65 6C 6C 7B llo Worl d!·Sell{
00000020 00 00 00 D2 02 96 49 00 00 00 00 ······I· ···
The BytesMarshallable uses fixed width data types.
Test bytesMarshallable used 39 bytes.
00000000 23 00 00 00 00 00 00 00 00 48 93 40 D2 02 96 49 #······· ·H·@···I
00000010 00 00 00 00 7B 00 00 00 59 00 0C 48 65 6C 6C 6F ····{··· Y··Hello
00000020 20 57 6F 72 6C 64 21 World!
This example used stop bit encoding to reduce the size of the message.
Test bytesMarshallable used 28 bytes.
00000000 18 00 00 00 A0 A4 69 D2 85 D8 CC 04 7B 59 00 0C ······i· ····{Y··
00000010 48 65 6C 6C 6F 20 57 6F 72 6C 64 21 Hello Wo rld!
Snake YAML used the .0 on the end of the price to signify that it was a double.
This added two characters but is an elegant way of encoding that it should be a double.
flag: true
longInt: 1234567890
price: 1234.0
side: Sell
smallInt: 123
text: Hello World!
Test boon used 100 chars.
{"smallInt":123,"longInt":1234567890,"price":1234.0,"flag":true,"side":"Sell","text":"Hello World!"}
Test jackson used 100 chars.
{"price":1234.0,"flag":true,"text":"Hello World!","side":"Sell","smallInt ":123,"longInt":1234567890}
Test bson used 96 chars.
00000000 60 00 00 00 01 70 72 69 63 65 00 00 00 00 00 00 `····pri ce······
00000010 48 93 40 08 66 6C 61 67 00 01 02 74 65 78 74 00 H·@·flag ···text·
00000020 0D 00 00 00 48 65 6C 6C 6F 20 57 6F 72 6C 64 21 ····Hell o World!
00000030 00 02 73 69 64 65 00 05 00 00 00 53 65 6C 6C 00 ··side·· ···Sell·
00000040 10 73 6D 61 6C 6C 49 6E 74 00 7B 00 00 00 12 6C ·smallIn t·{····l
00000050 6F 6E 67 49 6E 74 00 D2 02 96 49 00 00 00 00 00 ongInt·· ··I·····
SBE has a method to extract the binary as text. It is likely this data structure could be optimised and made much shorter.
Test sbe used 43 chars.
00000000 29 00 7B 00 00 00 D2 02 96 49 00 00 00 00 00 00 )·{····· ·I······
00000010 00 00 00 48 93 40 01 0C 48 65 6C 6C 6F 20 57 6F ···H·@·· Hello Wo
00000020 72 6C 64 21 00 00 00 00 01 00 00 rld!···· ···
While Externalizable is more efficient than Serializable, it is still a heavy weight serialization.
Where Java Serialization does well is in serializing Object Graphs instead of Object Tree, i.e. objects with circular references.
Test externalizable used 293 chars.
00000000 AC ED 00 05 73 72 00 2A 6E 65 74 2E 6F 70 65 6E ····sr·* net.open
00000010 68 66 74 2E 63 68 72 6F 6E 69 63 6C 65 2E 77 69 hft.chro nicle.wi
00000020 72 65 2E 62 65 6E 63 68 6D 61 72 6B 73 2E 44 61 re.bench marks.Da
00000030 74 61 FB 5E C8 1F BA EB 33 6F 0C 00 00 78 70 77 ta·^···· 3o···xpw
00000040 15 40 93 48 00 00 00 00 00 00 00 00 00 49 96 02 ·@·H···· ·····I··
00000050 D2 00 00 00 7B 01 7E 72 00 2A 6E 65 74 2E 6F 70 ····{·~r ·*net.op
00000060 65 6E 68 66 74 2E 63 68 72 6F 6E 69 63 6C 65 2E enhft.ch ronicle.
00000070 77 69 72 65 2E 62 65 6E 63 68 6D 61 72 6B 73 2E wire.ben chmarks.
00000080 53 69 64 65 00 00 00 00 00 00 00 00 12 00 00 78 Side···· ·······x
00000090 72 00 0E 6A 61 76 61 2E 6C 61 6E 67 2E 45 6E 75 r··java. lang.Enu
000000a0 6D 00 00 00 00 00 00 00 00 12 00 00 78 70 74 00 m······· ····xpt·
000000b0 04 53 65 6C 6C 74 00 0C 48 65 6C 6C 6F 20 57 6F ·Sellt·· Hello Wo
000000c0 72 6C 64 21 78 60 00 00 00 01 70 72 69 63 65 00 rld!x`·· ··price·
000000d0 00 00 00 00 00 48 93 40 08 66 6C 61 67 00 01 02 ·····H·@ ·flag···
000000e0 74 65 78 74 00 0D 00 00 00 48 65 6C 6C 6F 20 57 text···· ·Hello W
000000f0 6F 72 6C 64 21 00 02 73 69 64 65 00 05 00 00 00 orld!··s ide·····
00000100 53 65 6C 6C 00 10 73 6D 61 6C 6C 49 6E 74 00 7B Sell··sm allInt·{
00000110 00 00 00 12 6C 6F 6E 67 49 6E 74 00 D2 02 96 49 ····long Int····I
00000120 00 00 00 00 00 ·····