Skip to content

A Java Serialisation Library that supports multiple formats

License

Notifications You must be signed in to change notification settings

low-latency/Chronicle-Wire

 
 

Repository files navigation

Chronicle-Wire

Version

badge

About

Chronicle Wire is a a Wire Format abstraction library, The purpose of Chronicle Wire is to address the following concerns in a consistent manner:

  • Application configuration. (Using YAML)

  • Data serialization (YAML, binary YAML, JSON, Raw binary data, CSV)

  • Accessing off-heap memory in a thread-safe manner. (Bind to shared off-heap memory)

  • High performance data exchange using binary formats. Only include as much meta data as you need.

Design

Chronicle Wire uses Chronicle Bytes for bytes manipulation, and Chronicle Core for low level JVM access.

Why are these concerns conflated?

Often you want to use these interchangeably.

  • Configuration includes aliased type information. This supports easy extension by adding new classes/versions, and cross-platform using type aliasing.

  • By supporting types, a configuration file can bootstrap itself. You control how the configuration file is decoded. See engine.yaml.

  • To send the configuration of a server to a client, or from a client to a server.

  • To store the configuration of a data store in its header.

  • In configuration, to be able to create any object or component.

  • Save a configuration after you have changed it.

  • To be able to share data in memory between processes in a thread-safe manner.

Features

Chronicle Wire supports a separation of describing what data you want to store and retrieve, and how it should be rendered/parsed. Chronicle Wire handles a variety of formatting options, for a wide range of formats.

A key aim of Chronicle Wire is to support schema changes. It should make reasonable attempts to handle:

  • optional fields

  • fields in a different order

  • fields that the consumer doesn’t expect; optionally parsing them, or ignoring them

  • more or less data than expected; in field-less formats

  • reading a different type to the one written

  • updating fixed-length fields, automatically where possible using a bound data structure.

Chronicle Wire will also be efficient where any, or all, of the following points are true:

  • fields are in the order expected

  • fields are the type expected

  • fields names/numbers are not used

  • self-describing types are not needed

  • random access of data values is supported.

Chronicle Wire is designed to make it easy to convert from one wire format to another. For example, you can use fixed-width binary data in memory for performance, and variable-width or text over the network. Different TCP connections could use different formats.

Chronicle Wire also supports hybrid wire formats. For example, you can have one format embedded in another.

Support

This library will require Java 8. Support for C++ and C# is planned.

Text Formats

The text formats include:

  • YAML - subset of mapping structures included

  • JSON - superset to support serialization

  • CSV - superset to support serialization

  • XML - planned

  • FIX - proposed

Options include:

  • field names (for example, JSON), or field numbers (for example, FIX)

  • optional fields with default values that can be dropped

  • zero-copy access to fields - planned

  • thread-safe operations in text - planned

To support wire format discovery, the first byte should be in the ASCII range; adding an ASCII whitespace if needed.

Binary Formats

The binary formats include:

  • binary YAML.

  • delta compressing Binary YAML. This is a Chronicle Wire Enterprise feature

  • typed data without fields

  • raw untyped fieldless data

  • BSON (Binary JSON) - planned

Options for Binary format:

  • field names or field numbers

  • variable width

  • optional fields with a default value can be dropped

  • fixed width data with zero copy support

  • thread-safe operations

Note: Chronicle Wire supports debug/transparent combinations like self-describing data with zero copy support.

To support wire format discovery, the first bytes should have the top bit set.

Using Wire

Simple use case.

First you need to have a buffer to write to. This can be a byte[], a ByteBuffer, off-heap memory, or even an address and length that you have obtained from some other library.

// Bytes which wraps a ByteBuffer which is resized as needed.
Bytes<ByteBuffer> bytes = Bytes.elasticByteBuffer();

Now you can choose which format you are using. As the wire formats are themselves unbuffered, you can use them with the same buffer, but in general using one wire format is easier.

Wire wire = new TextWire(bytes);
// or
WireType wireType = WireType.TEXT;
Wire wireB = wireType.apply(bytes);
// or
Bytes<ByteBuffer> bytes2 = Bytes.elasticByteBuffer();
Wire wire2 = new BinaryWire(bytes2);
// or
Bytes<ByteBuffer> bytes3 = Bytes.elasticByteBuffer();
Wire wire3 = new RawWire(bytes3);

So now you can write to the wire with a simple document.

wire.write(() -> "message").text("Hello World")
      .write(() -> "number").int64(1234567890L)
       .write(() -> "code").asEnum(TimeUnit.SECONDS)
      .write(() -> "price").float64(10.50);
System.out.println(bytes);

prints

message: Hello World
number: 1234567890
code: SECONDS
price: 10.5
// the same code as for text wire
wire2.write(() -> "message").text("Hello World")
        .write(() -> "number").int64(1234567890L)
        .write(() -> "code").asEnum(TimeUnit.SECONDS)
        .write(() -> "price").float64(10.50);
        System.out.println(bytes2.toHexString());

prints

00000000 C7 6D 65 73 73 61 67 65  EB 48 65 6C 6C 6F 20 57 ·message ·Hello W
00000010 6F 72 6C 64 C6 6E 75 6D  62 65 72 A3 D2 02 96 49 orld·num ber····I
00000020 C4 63 6F 64 65 E7 53 45  43 4F 4E 44 53 C5 70 72 ·code·SE CONDS·pr
00000030 69 63 65 90 00 00 28 41                          ice···(A

Using RawWire strips away all the meta data to reduce the size of the message, and improve speed. The down-side is that we cannot easily see what the message contains.

        // the same code as for text wire
        wire3.write(() -> "message").text("Hello World")
                .write(() -> "number").int64(1234567890L)
                .write(() -> "code").asEnum(TimeUnit.SECONDS)
                .write(() -> "price").float64(10.50);
        System.out.println(bytes3.toHexString());

prints in RawWire.

00000000 0B 48 65 6C 6C 6F 20 57  6F 72 6C 64 D2 02 96 49 ·Hello W orld···I
00000010 00 00 00 00 07 53 45 43  4F 4E 44 53 00 00 00 00 ·····SEC ONDS····
00000020 00 00 25 40                                      ··%@

For more examples see Examples Chapter1

A note on Wires.reset()

Chronicle Wire allows (and encourages) objects to be re-used in order to reduce allocation rates.

When a marshallable object is re-used or initialised by the framework, it is first reset by way of Wires.reset(). In the case of most DTOs with simple scalar values, this will not cause any issues. However, more complicated objects with object instance fields may experience undesired behaviour.

In order to reset a marshallable object, the process is as follows:

  1. create a new instance of the object to be reset

  2. copy all fields from the new instance to the existing instance

  3. the existing instance is now considered 'reset' back to default values

The object created in step 1 is cached for performance reasons, meaning that both the new and existing instance of the marshallable object could have a reference to the same object.

While this will not be a problem for primitive or immutable values (for example, int, Long, String), a mutable field such as ByteBuffer will cause problems. Consider the following case:

private static final class BufferContainer {
    private ByteBuffer b = ByteBuffer.allocate(16);
}

@Test
public void shouldDemonstrateMutableFieldIssue2() {
    // create 2 instances of a marshallable POJO
    final BufferContainer c1 = new BufferContainer();
    final BufferContainer c2 = new BufferContainer();
    // reset both instances - this will set each container's
    // b field to a 'default' value
    Wires.reset(c1);
    Wires.reset(c2);
    // write to the buffer in c1
    c1.b.putInt(42);
    // inspect the buffer in both c1 and c2
    System.out.println(c1.b.position());
    System.out.println(c2.b.position());
    System.out.println(c1.b == c2.b);
}

The output of the test above is:

4
4
true

showing that the field b of each container object is now referencing the same ByteBuffer instance.

In order to work around this, if necessary, the marshallable class should implement ResetOverride:

private static final class BufferContainer implements ResetOverride {
    private ByteBuffer b = ByteBuffer.allocate(16);

    @Override
    public void onReset() {
        // or acquire from a pool if allocation should
        // be kept to a minimum
        b = ByteBuffer.allocate(16);
    }
}

Binding to a field value

While serialized data can be updated by replacing a whole record, this might not be the most efficient option, nor thread-safe.

Chronicle Wire offers the ability to bind a reference to a fixed value of a field, and perform atomic operations on that field; for example, volatile read/write, and compare-and-swap.

   // field to cache the location and object used to reference a field.
   private LongValue counter = null;

   // find the field and bind an approritae wrapper for the wire format.
   wire.read(COUNTER).int64(counter, x -> counter = x);

   // thread safe across processes on the same machine.
   long id = counter.getAndAdd(1);

Other types are supported; for example, 32-bit integer values, and an array of 64-bit integer values.

Compression Options

  • No compression

  • Snappy compression - planned

  • LZW compression - planned

Bytes options

Chronicle Wire is built on top of the Bytes library, however Bytes, in turn, can wrap:

  • ByteBuffer - heap and direct

  • byte\[\] - using ByteBuffer

  • raw memory addresses.

Handling instance classes of an unknown type

This feature allows Chronicle Wire to de-serialize, manipulate, and serialize an instance class of an unknown type.

If the type is unknown at runtime, a proxy is created; assuming that the required type is an interface.

When the tuple is serialized, it will be give the same type as when it was deserialized, even if that class is not available.

Methods following our getter/setter convention will be treated as getters and setters.

This feature is needed for a service that stores and passes on data, for classes it might not have in its class path.

Note
This is not garbage collection free, but if the volume is low, this may be easier to work with.
Note
This only works when the expected type is not a class.

Example

@Test
public void unknownType() throws NoSuchFieldException {
    Marshallable marshallable = Wires.tupleFor(Marshallable.class, "UnknownType");
    marshallable.setField("one", 1);
    marshallable.setField("two", 2.2);
    marshallable.setField("three", "three");
    String toString = marshallable.toString();
    assertEquals("!UnknownType {\n" +
            "  one: !int 1,\n" +
            "  two: 2.2,\n" +
            "  three: three\n" +
            "}\n", toString);
    Object o = Marshallable.fromString(toString);
    assertEquals(toString, o.toString());
}

@Test
public void unknownType2() {
    String text = "!FourValues {\n" +
            "  string: Hello,\n" +
            "  num: 123,\n" +
            "  big: 1e6,\n" +
            "  also: extra\n" +
            "}\n";
    ThreeValues tv = Marshallable.fromString(ThreeValues.class, text);
    assertEquals(text, tv.toString());
    assertEquals("Hello", tv.string());
    tv.string("Hello World");
    assertEquals("Hello World", tv.string());

    assertEquals(123, tv.num());
    tv.num(1234);
    assertEquals(1234, tv.num());

    assertEquals(1e6, tv.big(), 0.0);
    tv.big(0.128);
    assertEquals(0.128, tv.big(), 0.0);

    assertEquals("!FourValues {\n" +
            "  string: Hello World,\n" +
            "  num: !int 1234,\n" +
            "  big: 0.128,\n" +
            "  also: extra\n" +
            "}\n", tv.toString());

}

interface ThreeValues {
    ThreeValues string(String s);
    String string();

    ThreeValues num(int n);
    int num();

    ThreeValues big(double d);
    double big();
}

Example with MethodReaders

@Test
public void testUnknownClass() {
    Wire wire2 = new TextWire(Bytes.elasticHeapByteBuffer(256));
    MRTListener writer2 = wire2.methodWriter(MRTListener.class);

    String text = "top: !UnknownClass {\n" +
            "  one: 1,\n" +
            "  two: 2.2,\n" +
            "  three: words\n" +
            "}\n" +
            "---\n" +
            "top: {\n" +
            "  one: 11,\n" +
            "  two: 22.2,\n" +
            "  three: many words\n" +
            "}\n" +
            "---\n";
    Wire wire = TextWire.from(text);
    MethodReader reader = wire.methodReader(writer2);
    assertTrue(reader.readOne());
    assertTrue(reader.readOne());
    assertFalse(reader.readOne());
    assertEquals(text, wire2.toString());
}

Filtering with MethodReaders

To support filtering, you need to make sure the first of multiple arguments can be used to filter the method call. If you have only one argument, you may need to add an additional argument to support efficient filtering.

This feature calls an implementation of MethodFilterOnFirstArg to see if the rest of the method call should be parsed. For example, today you have:

interface MyInterface {
    void method(ExpensiveDto dto);
}

This can be migrated to:

interface MyInterface extends MethodFilterOnFirstArg<String> {
    @Deprecated
    void method(ExpensiveDto dto);
    void method2(String filter, ExpensiveDto dto);
}

where the implementation can look like this:

class MyInterfaceImpl extends MyInterface {
    public void method(ExpensiveDto dto) {
       // something
    }

    public void method2(String filter, ExpensiveDto dto) {
        method(dto);
    }

    public boolean ignoreMethodBasedOnFirstArg(String methodName, String filter) {
        return someConditionOn(methodName, filter);
    }
}

For an example, see net.openhft.chronicle.wire.MethodFilterOnFirstArgTest

Uses of Chronicle Wire

Chronicle Wire can be used for:

  • file headers

  • TCP connection headers; where the optimal wire format taht is actually used can be negotiated

  • message/excerpt contents

  • Chronicle Queue version 4.x and later

  • the API for marshalling generated data types.

Similar projects

Simple Binary Encoding (SBE)

Simple Binary Encoding (SBE) is designed to be a more efficient replacement for FIX. It is not limited to FIX protocols, and can be easily extended by updating an XML schema. It is simple, binary, and it supports C++ and Java.

XML, when it first started, did not use XML for its own schema files, and it is not insignificant that SBE does not use SBE for its schema either. This is because it is not trying to be human readable. It has XML which, though standard, is not designed to be human readable either. Chronicle believes that it is a limitation that it does not naturally lend itself to a human readable form.

The encoding that SBE uses is similar to binary; with field numbers and fixed-width types.

SBE assumes the field types, which can be more compact than Chronicle Wire’s most similar option; though not as compact as others.

SBE has support for schema changes provided that the type of a field doesn’t change.

Message Pack (msgpack)

Message Pack is a packed binary wire format which also supports JSON for human readability and compatibility. It has many similarities to the binary (and JSON) formats of this library. Chronicle Wire is designed to be human readable first, based on YAML, and has a range of options to make it more efficient. The most extreme being fixed position binary.

Message Pack has support for embedded binary, whereas Chronicle Wire has support for comments and hints, to improve rendering for human consumption.

The documentation looks well thought out, and it is worth emulating.

Comparison with Cap’n’Proto

Feature

Wire Text

Wire Binary

Protobuf

Cap’n Proto

SBE

FlatBuffers

Schema evolution

yes

yes

yes

yes

caveats

yes

Zero-copy

yes

yes

no

yes

yes

yes

Random-access reads

yes

yes

no

yes

no

yes

Random-access writes

yes

yes

no

?

no

?

Safe against malicious input

yes

yes

yes

yes

yes

opt-in / upfront

Reflection / generic algorithms

yes

yes

yes

yes

yes

yes

Initialization order

any

any

any

any

preorder

bottom-up

Unknown field retention

yes

yes

yes

yes

no

no

Object-capability RPC system

yes

yes

no

yes

no

no

Schema language

no

no

custom

custom

XML

custom

Usable as mutable state

yes

yes

yes

no

no

no

Padding takes space on wire?

optional

optional

no

optional

yes

yes

Unset fields take space on wire?

optional

optional

no

yes

yes

no

Pointers take space on wire?

no

no

no

yes

no

yes

C++

planned

planned

yes

yes (C++11)*

yes

yes

Java

Java 8

Java 8

yes

yes*

yes

yes

C#

yes

yes

yes

yes*

yes

yes*

Go

no

no

yes

yes

no

yes*

Other languages

no

no

6+

others*

no

no

Authors' preferred use case

distributed computing

financial / trading

distributed computing

platforms / sandboxing

financial / trading

games

Note
The Binary YAML format can be automatically converted to YAML without any knowledge of the schema, because the messages are self-describing.
Note
You can parse all the expected fields (if any) and then parse any remaining fields. As YAML supports object field names (or keys), these could be strings or even objects as keys and values.
Note
It not clear what padding which does not take up space on the wire means.

Design notes.

Schema evolution.

Wire optionally supports:

  • field name changes

  • field order changes

  • capturing or ignoring unexpected fields

  • setting of fields to the default, if not available

  • raw messages can be longer or shorter than expected

The more flexibility, the larger the overhead in terms of CPU and memory. Chronicle Wire allows you to dynamically pick the optimal configuration, and convert between these options.

Zero copy.

Chronicle Wire supports zero-copy random access to fields, and direct-copy from in-memory to the network. It also supports translation from one wire format to another. For example, switching between fixed-length data and variable-length data.

Random Access.

You can access a random field in memory, For example, in a 2TB file, page-in/pull-into CPU cache, only the data relating to your read or write.

format access style

fixed-length binary

random access without parsing first

variable-length binary

random access with partial parsing allowing you to skip large portions

fixed-length text

random access with parsing

variable-length text

no random access

Chronicle Wire references are relative to the start of the data contained, to allow loading in an arbitrary point in memory.

Safe against malicious input.

Chronicle Wire has built in tiers of bounds checks to prevent accidental read/writing that corrupts the data. It is not complete enough for a security review.

Reflection / generic algorithms.

Chronicle Wire supports generic reading and writing of an arbitrary stream. This can be used in combination with predetermined fields. For example, you can read the fields you know about, and ask it to provide the fields that you do not. You can also give generic field names like keys to a map as YAML does.

Initialization order.

Chronicle Wire can handle unknown information like lengths, by using padding. It will go back and fill in any data that it was not aware of when it was writing the data. For example, when it writes an object, it does not know how long it is going to be, so it adds padding at the start. Once the object has been written, it goes back and overwrites the length. It can also handle situations where the length was more than needed; this is known as packing.

Unknown field retention?

Chronicle Wire can read data that it did not expect, interspersed with data it did expect. Rather than specify the expected field name, a StringBuilder is provided.

Note: There are times when you want to skip/copy an entire field or message, without reading any more of it. This is also supported.

Object-maximumLimit RPC system.

Chronicle Wire supports references based on name, number, or UUID. This is useful when including a reference to an object that the reader should look up by other means.

A common case is if you have a proxy to a remote object, and you want to pass or return this in an RPC call.

Schema language

Chronicle Wire’s schema is not externalised from the code. However it is planned to use YAML in a format that it can parse.

Usable as mutable state

Chronicle Wire supports storing an application’s internal state. This will not allow it to grow or shrink. You can’t free any of it without copying the pieces that you need, and discarding the original copy.

Padding takes space on the wire.

The Chronicle Wire format that is chosen determines if there is any padding on the wire. If you copy the in-memory data directly, its format does not change.

If you want to drop padding, you can copy the message to a wire format without padding. You can decide whether the original padding is to be preserved or not, if turned back into a format with padding.

We could look at supporting Cap’n’Proto's zero-byte removal compression.

Un-set fields take space on the wire?

Chronicle Wire supports fields with, and without, optional fields, and automatic means of removing them. Chronicle Wire does not support automatically adding them back in, because information has been lost.

Pointers take space on the wire.

Chronicle Wire does not have pointers, but it does have content-lengths which are a useful hint for random access and robustness; but these are optional.

Platform support

Chronicle Wire supports Java 8. Future versions may support Java 9, C++, and C#.


About

A Java Serialisation Library that supports multiple formats

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%