Skip to content

A specification language for MessagePack data schema

License

Notifications You must be signed in to change notification settings

SoniGames/msgpack-schema

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

msgpack-schema Crates.io docs.rs CI

msgpack-schema is a schema language for describing data formats encoded in MessagePack. It provides two derive macros Serialize and Deserialize that allow you to transcode MessagePack binary data to/from Rust data structures in a type-directed way.

use msgpack_schema::{Deserialize, Serialize};

#[derive(Deserialize, Serialize)]
struct Human {
    #[tag = 0]
    name: String,
    #[tag = 2]
    #[optional]
    age: Option<u32>,
}

Compared with other schema languages like rmp-serde, msgpack-schema allows to specify more compact data representation, e.g., fixints as field keys, fixints as variant keys, etc.

Feature flags

  • proptest: Enable proptest::arbitrary::Arbitrary impls for msgpack_value::Value.

Behaviours of serializers and deserializers

Structs with named fields

Structs with named fields are serialized into a Map object where keys are fixints specified by #[tag] attributes. The current implementation serializes fields in order but one must not rely on this behavior.

The deserializer interprets Map objects to create such structs. Field order is irrelevant to the result. If Map objects contains extra key-value pairs which are not contained in the definition of the struct, the deserializer simply ignores them. If there are two or more values with the same key within a Map object, the preceding value is overwritten by the last value.

#[derive(Serialize, Deserialize)]
struct S {
    #[tag = 0]
    x: u32,
    #[tag = 1]
    y: String,
}

let s = S {
  x: 42,
  y: "hello".to_owned(),
};

let b = b"\x82\x00\x2A\x01\xA5\x68\x65\x6c\x6c\x6f"; // 10 bytes; `{ 0: 42, 1: "hello" }`
assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());

// ignores irrelevant key-value pairs
let b = b"\x83\x00\x2A\x02\xC3\x01\xA5\x68\x65\x6c\x6c\x6f"; // 12 bytes; `{ 0: 42, 2: true, 1: "hello" }`
assert_eq!(s, deserialize(b).unwrap());

// last value wins
let b = b"\x83\x00\xC3\x00\x2A\x01\xA5\x68\x65\x6c\x6c\x6f"; // 12 bytes; `{ 0: true, 0: 42, 1: "hello" }`
assert_eq!(s, deserialize(b).unwrap());

Fields in named structs may be tagged with #[optional].

  • The tagged field must be of type Option<T>.
  • On serialization, the key-value pair will not be included in the result map object when the field data contains None.
  • On deserialization, the field of the result struct will be filled with None when the given MsgPack map object contains no corresponding key-value pair.
#[derive(Serialize, Deserialize)]
struct S {
    #[tag = 0]
    x: u32,
    #[optional]
    #[tag = 1]
    y: Option<String>,
}

let s = S {
  x: 42,
  y: Some("hello".to_owned()),
};
let b = b"\x82\x00\x2A\x01\xA5\x68\x65\x6c\x6c\x6f"; // 10 bytes; `{ 0: 42, 1: "hello" }`
assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());

let s = S {
  x: 42,
  y: None,
};
let b = b"\x81\x00\x2A"; // 3 bytes; `{ 0: 42 }`
assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());

The #[flatten] attribute is used to factor out a single definition of named struct into multiple ones.

#[derive(Serialize)]
struct S1 {
    #[tag = 1]
    x: u32,
}

#[derive(Serialize)]
struct S2 {
    #[flatten]
    s1: S1,
    #[tag = 2]
    y: u32,
}

#[derive(Serialize)]
struct S3 {
    #[tag = 1]
    x: u32,
    #[tag = 2]
    y: u32,
}

assert_eq!(serialize(S2 { s1: S1 { x: 42 }, y: 43, }), serialize(S3 { x: 42, y: 43 }));

Structs with named fields may be attached #[untagged]. Untagged structs are serialized into an array and will not contain tags.

#[derive(Serialize, Deserialize)]
#[untagged]
struct S {
    x: u32,
    y: String,
}

let s = S {
  x: 42,
  y: "hello".to_owned(),
};
let b = b"\x92\x2A\xA5\x68\x65\x6c\x6c\x6f"; // 8 bytes; `[ 42, "hello" ]`

assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());

Newtype structs

Tuple structs with only one element are treated transparently.

#[derive(Serialize, Deserialize)]
struct S(u32);

let s = S(42);
let b = b"\x2A"; // 1 byte; `42`

assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());

Unit structs and empty tuple structs

Serialization and deserialization of unit structs and empty tuple structs are intentionally unsupported.

// It is error to derive `Serialize` / `Deserialize` for these types of structs.
struct S1;
struct S2();

Tuple structs

Tuple structs with more than one element are encoded as an array. It is validation error to deserialize an array with unmatched length.

#[derive(Serialize, Deserialize)]
struct S(u32, bool);

let s = S(42, true);
let b = b"\x92\x2A\xC3"; // 3 bytes; `[ 42, true ]`

assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());

Unit variants and empty tuple variants

Unit variants and empty tuple variants are serialized into a single fixint whose value is determined by the tag.

#[derive(Serialize, Deserialize)]
enum E {
    #[tag = 3]
    Foo
}

let e = E::Foo;
let b = b"\x03"; // 1 byte; `3`

assert_eq!(serialize(&e), b);
assert_eq!(e, deserialize(b).unwrap());
#[derive(Serialize, Deserialize)]
enum E {
    #[tag = 3]
    Foo()
}

let e = E::Foo();
let b = b"\x03"; // 1 byte; `3`

assert_eq!(serialize(&e), b);
assert_eq!(e, deserialize(b).unwrap());

Newtype variants

Newtype variants (one-element tuple variants) are serialized into an array of the tag and the inner value.

#[derive(Serialize, Deserialize)]
enum E {
    #[tag = 3]
    Foo(u32)
}

let e = E::Foo(42);
let b = b"\x92\x03\x2A"; // 3 bytes; `[ 3, 42 ]`

assert_eq!(serialize(&e), b);
assert_eq!(e, deserialize(b).unwrap());

Untagged variants

Enums may be attached #[untagged] when all variants are newtype variants. Serializing untagged variants results in the same data layout as the inner type. The deserializer deserializes into an untagged enum type by trying deserization one by one from the first variant to the last.

#[derive(Serialize, Deserialize)]
#[untagged]
enum E {
    Foo(String),
    Bar(u32),
}

let e = E::Bar(42);
let b = b"\x2A"; // 1 byte; `42`

assert_eq!(serialize(&e), b);
assert_eq!(e, deserialize(b).unwrap());

Write your own implementation of Serialize and Deserialize

You may want to write your own implementation of Serialize and Deserialize in the following cases:

  1. You need impl for types that are already defined by someone.
  2. You need extreme efficiency.
  3. Both.

IpAddr is such a type satisfying (3). In the most efficient situation, we want it to be 4 or 16 byte length plus one byte for a tag at any time. This is achieved by giving a hard-written implementation like below.

struct IpAddr(pub std::net::IpAddr);

impl Serialize for IpAddr {
    fn serialize(&self, serializer: &mut Serializer) {
        match self.0 {
            std::net::IpAddr::V4(v4) => {
                serializer.serialize_str(&v4.octets()); // 5 bytes
            }
            std::net::IpAddr::V6(v6) => {
                serializer.serialize_str(&v6.octets()); // 17 bytes
            }
        }
    }
}

impl Deserialize for IpAddr {
    fn deserialize(deserializer: &mut Deserializer) -> Result<Self, DeserializeError> {
        let Str(data) = deserializer.deserialize()?;
        let ipaddr = match data.len() {
            4 => std::net::IpAddr::V4(std::net::Ipv4Addr::from(
                <[u8; 4]>::try_from(data).unwrap(),
            )),
            16 => std::net::IpAddr::V6(std::net::Ipv6Addr::from(
                <[u8; 16]>::try_from(data).unwrap(),
            )),
            _ => return Err(ValidationError.into()),
        };
        Ok(Self(ipaddr))
    }
}

Appendix: Cheatsheet

schema Rust MessagePack (human readable)
struct S {
    #[tag = 0]
    x: u32,
    #[tag = 1]
    y: bool,
}
S { x: 42, y: true } { 0: 42, 1: true }
struct S {
    #[optional]
    #[tag = 0]
    x: Option<u32>,
}
S { x: Some(42) } { 0: 42 }
struct S {
    #[optional]
    #[tag = 0]
    x: Option<u32>,
}
S { x: None } {}
#[untagged]
struct S {
    #[tag = 0]
    x: u32,
    #[tag = 1]
    y: bool,
}
S { x: 42, y: true } [ 42, true ]
struct S(u32) S(42) 42
struct S S UNSUPPORTED
struct S() S() UNSUPPORTED
struct S(u32, bool) S(42, true) [ 42, true ]
enum E {
    #[tag = 3]
    Foo
}
E::Foo 3
enum E {
    #[tag = 3]
    Foo()
}
E::Foo() 3
enum E {
    #[tag = 3]
    Foo(u32)
}
E::Foo(42) [ 3, 42 ]
#[untagged]
enum E {
    Foo(u32)
    Bar(bool)
}
E::Bar(true) true

License

Licensed under MIT license.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in msgpack-schema by you shall be licensed as above, without any additional terms or conditions.

About

A specification language for MessagePack data schema

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 100.0%