Binlog Mserialize

A fast, platform dependent serialization library.

Overview
Serialization and Deserialization
- Adapting custom types
Visiting serialized values
Limitations
Design Rationale
Concepts
References
- Type tags
  - Type tag of Adapted Enum
  - Type tag of Adapted Struct
- Serialized format

Overview

Mserialize supports the serialization and deserialization of values, represented by objects of supported types, and the visitation of serialized values. The primary goal of the library is to make serialization as fast as possible, while allowing correct deserialization and visitation. To achieve this goal, Mserialize adds very little overhead, if at all, to the serialized values. Most of the time, the serialized format matches the original value, byte by byte. The format is not intended to be universal, it does not serve the purpose of exchanging messages between different platforms (e.g: between platforms of different endianness).

Serialization and Deserialization

Objects of supported types can be easily serialized:

#include <mserialize/serialize.hpp>

const T my_value;
std::ofstream ostream(path);
mserialize::serialize(my_value, ostream);

The supported types (T) for serialization include:

Fundamental types (bool, char, integer and floating point types)
Enums
Strings (const char*, string, string_view)
Containers (vector, deque, set, map, etc., anything with begin/end and supported value_type)
Pairs and Tuples
Pointers and Smart Pointers (T*, unique_ptr, shared_ptr, weak_ptr, with a supported elem_type)
Optional like types
Variant like types

Additional types can be adapted to the library, as shown below. The target ostream can be anything that models the OutputStream concept.

Deserialization of serialized objects is equally easy:

#include <mserialize/deserialize.hpp>

T my_value;
std::ifstream istream(path);
istream.exceptions(std::ios_base::failbit);
mserialize::deserialize(my_value, istream);

Almost any object, that can be serialized, can be also deserialized into, except a few, that do not own the underlying resource (e.g: T*, string_view, weak_ptr) - see Design Rationale for details.

The source istream can be anything that models the InputStream concept. Therefore, standard streams must be configured to throw exceptions on failure.

The type of the value in the input stream is inferred from the destination (first) argument of deserialize. It is only allowed to deserialize a value into an object of a compatible type. Compatibility is defined in terms of type tags. Two types are compatible if their type tags match.

If the deserialization fails (e.g: if the content of the stream cannot be interpreted as a serialized value of the type inferred from the destination), deserialize throws std::runtime_error.

Adapting custom types

By default, serialization and deserialization of generic user defined types are not supported (unless they meet the requirements of some supported concept, e.g: a container). Such types can be still adapted though, by specializing extension points for these types. The easiest way is to use macros:

#include <mserialize/make_struct_deserializable.hpp>
#include <mserialize/make_struct_serializable.hpp>

// Given a custom type:
struct Alpha { int a = 0; std::string b; };

// Serialization and deserialization can be enabled by macros:
MSERIALIZE_MAKE_STRUCT_SERIALIZABLE(Alpha, a, b)
MSERIALIZE_MAKE_STRUCT_DESERIALIZABLE(Alpha, a, b)

// At this point, objects of `Alpha` can be used
// together with mserialize::serialize and deserialize,
// the same way as by-default supported objects.
const Alpha in{30, "foo"};
std::stringstream stream;
mserialize::serialize(in, stream);

Alpha out;
stream.exceptions(std::ios_base::failbit);
mserialize::deserialize(out, stream);

assert(in.a == out.a && in.b == out.b);

The macros have to be invoked at the global scope, outside of any namespace. The first argument is the name of the type, the rest are the members (The member list can be empty). The member list does not have to enumerate every member of the given type: if a member is omitted, it will be simply ignored during serialization/deserialization (e.g: a mutex member is typically not to be serialized). However, to make roundtrip work, the member lists given to each macro must match exactly.

For serialization, a member can be either a non-static, non-reference, non-bitfield data member, or a getter, which is a const qualified, nullary member function, which returns a serializable object.

For deserialization, a member can be either a non-const, non-static, non-reference, non-bitfield data member, or a setter, which takes a single, deserializable argument.

// Given a custom type with getters and setters:
class Beta
{
  std::string c;
  float d;

public:
  const std::string& getC() const;
  void setC(std::string);

  float getD() const;
  void setD(float);
};

// Serialization and deserialization can be enabled the same way:
MSERIALIZE_MAKE_STRUCT_SERIALIZABLE(Beta, getC, getD)
MSERIALIZE_MAKE_STRUCT_DESERIALIZABLE(Beta, setC, setD)

If some of the data members, getters or setters are private, but serialization or deserialization is still preferred via those members, the following friend declarations can be added to the type:

class Gamma
{
  std::string e;  // private data member
  int f() const;  // private getter
  void f(int);    // private setter

  template <typename, typename>
  friend struct mserialize::CustomSerializer;

  template <typename, typename>
  friend struct mserialize::CustomDeserializer;
};

If a type publicly derives from serializable or deserializable bases, it can be made serializable and deserializable without repeating the fields of its bases:

#include <mserialize/make_derived_struct_deserializable.hpp>
#include <mserialize/make_derived_struct_serializable.hpp>

struct Zeta : Beta { int e = 0; };

MSERIALIZE_MAKE_DERIVED_STRUCT_SERIALIZABLE(Zeta, (Beta), e)
MSERIALIZE_MAKE_DERIVED_STRUCT_DESERIALIZABLE(Zeta, (Beta), e)

The same rules apply as above, with the addition that the second argument must be a non-empty parenthesised list of serializable or deserializable base classes.

Class templates can be made serializable and deserializable on the same conditions, except that a different macro must be called:

#include <mserialize/make_template_deserializable.hpp>
#include <mserialize/make_template_serializable.hpp>

template <typename A, typename B>
struct Pair { A a; B b; };

MSERIALIZE_MAKE_TEMPLATE_SERIALIZABLE((typename A, typename B), (Pair<A,B>), a, b)
MSERIALIZE_MAKE_TEMPLATE_DESERIALIZABLE((typename A, typename B), (Pair<A,B>), a, b)

The first argument of the macro must be the arguments of the template, with the necessary typename prefix, where needed, as they appear after the template keyword in the definition, wrapped by parentheses. (The parentheses are required to avoid the preprocessor splitting the arguments at the commas)

The second argument is the template name with the template arguments, as it should appear in a specialization, wrapped by parentheses. The rest of the arguments are members, same as above.

Visiting serialized values

As an alternative to deserialization, serialized objects can be visited. Visitation is useful if the precise type of the serialized object is not known, the type is not available, or not deserializable.

While the precise type of the serialized object is not needed, a type tag still must be available for visitation to work. A type tag is a string, that describes a serializable type to the extent that it can be visited.

The following example shows how serialization and visitation can work together:

#include <mserialize/serialize.hpp>
#include <mserialize/tag.hpp>

// serialize a T object
const T t;
const auto tag = mserialize::tag<T>();
std::ofstream ostream(path);
mserialize::serialize(tag, ostream);
mserialize::serialize(t, ostream);

#include <mserialize/deserialize.hpp>
#include <mserialize/visit.hpp>

// visit the object
std::ifstream istream(path);
istream.exceptions(std::ios_base::failbit);
std::string tag;
mserialize::deserialize(tag, istream);
Visitor visitor;
mserialize::visit(tag, visitor, istream);

Visitor can be any type that models the Visitor concept. visit throws std::exception if the visitation fails (e.g: the provided tag does not match the serialized object in the stream). In the example, the tag is serialized alongside the object. In general, the tag is not required to be in the stream, it can be sent to the visiting party by any other means. The tag given to visit must be a valid type tag: do not use tags coming from a potentially malicious source.

Adapting enums for visitation

By default, enums have no tag associated. A tag, suitable for visitation can be defined in the following way:

#include <mserialize/make_enum_tag.hpp>

enum Delta { a, b, c };
MSERIALIZE_MAKE_ENUM_TAG(Delta, a, b, c)

This works with both enums and enum classes, regardless the underlying type of the enum. The macro has to be called in global scope (outside of any namespace). If an enumerator is omitted from the macro call, the tag will be incomplete, and during visitation, if the missing enumerator is visited, only its underlying value will be available, the enumerator name will be empty.

Adapting user defined types for visitation

By default, in general, user defined types have no tag associated. (In general, since any type modeling a specific supported concept, e.g: user defined containers, does have a tag associated by default). A tag, suitable for visitation can be defined in the following way:

#include <mserialize/make_struct_tag.hpp>

struct Epsilon { int a; std::string b; };
MSERIALIZE_MAKE_STRUCT_TAG(Epsilon, a, b)

The macro has to be called in global scope (outside of any namespace). The members can be data members or getters, just like for serialization. For private members, the following friend declaration can be added:

template <typename, typename>
friend struct mserialize::CustomTag;

The member list must be in sync with the MSERIALIZE_MAKE_STRUCT_SERIALIZABLE call, if visitation of objects serialized that way is desired. MSERIALIZE_MAKE_STRUCT_TAG cannot be used with recursive types. See Adapting user defined recursive types for visitation for a solution.

A tag can be assigned to a class that derives from a base or bases that are tagged already, without enumerating the members of the bases again:

#include <mserialize/make_derived_struct_tag.hpp>

MSERIALIZE_MAKE_DERIVED_STRUCT_TAG(Zeta, (Beta), e)

The same rules apply as above, with the addition that the second argument must be a non-empty parenthesised list of tagged base classes.

A tag can be assigned to class templates on the same conditions, except that a different macro must be called:

#include <mserialize/make_template_tag.hpp>

template <typename A, typename B, typename C>
struct Triplet { A a; B b; C c; };

MSERIALIZE_MAKE_TEMPLATE_TAG((typename A, typename B, typename C), (Triplet<A,B,C>), a, b, c)

The second argument is the template name with the template arguments, as it should appear in a specialization, wrapped by parentheses. The rest of the arguments are members, same as above.

Adapting user defined recursive types for visitation

From the tag generation point of view, a structure is recursive if one of its fields has a type tag that includes the type tag of the parent type. Currently, MSERIALIZE_MAKE_STRUCT_TAG is unable to deal with such recursive structures. As a workaround, such type tags can be manually assigned:

#include <mserialize/tag.hpp>

struct Node { int value; Node* next; };

namespace mserialize {

template <>
struct CustomTag<Node>
{
  static constexpr auto tag_string()
  {
    return make_cx_string("{Node`value'i`next'<0{Node}>}");
  }
};

} // namespace mserialize

A breakdown of the string literal:

{: Start structure tag
Node: Name of the structure
`value': Name of the first field
i: Type tag of the first field. Also see the type tag reference
`next': Name of the second field
<: Begin variant tag (pointers are modeled as either nothing or something)
0: The pointer is either null
{Node}: or points to a Node objects. This is the important part: the structure definition here is not expanded again, as that would result in infinite recursion. The visitor will recognize that Node is not an empty type, but something defined earlier.
> End variant tag
} End structure tag

Limitations

To keep the implementation and interface simple, values cannot be deserialized into object that do not own the underlying resource, e.g: T* or string_view. See Design Rationale for considered alternatives.
In the Serialized format the size of a serialized sequence is represented by a 32 bit unsigned integer. Therefore, sequences longer than 2^32 cannot be serialized. As a workaround, such sequences can be split into a sequence of smaller sequences.
In the Serialized format the discriminator of a variant is represented by an 8 bit unsigned integer. Therefore, variants with more than 256 alternatives cannot be serialized. As a workaround, such variants can be split into a variant of smaller variants.
Macros taking arbitrary number of arguments (e.g: member lists, enumerators) need to iterate over the given arguments. The iteration is done by loop unrolling, which is currently capped at 100. This limit can be increased by regenerating foreach.hpp, but MSVC does not support macros with more than 127 arguments.

Design Rationale

Library design tends to be arguable. Some decisions need to be explained.

How to signal errors when deserializing?

Leave the stream in bad state. This is common practice in standard library components, but does not give enough context about the nature of the error.
Set an error_code. This requires the ec to be propagated through every deserialization layer (which might or might not be good), and also requires several extra checks (to stop if the ec is set). As a deserialization error is considered exceptional, the nominal case should not be penalized with extra checks, which can be avoided with exceptions.
Throw an exception. Can provide enough context, fast if there are no errors, requires extra care. This is the chosen solution. The type of the exception should be std::runtime_error, but on platforms using the pre-C++11 ABI, std::ios_base::failure (thrown by streams) is not derived from std::runtime_error, therefore std::exception must be used.

How to deserialize non-owning types? Let's consider T*:

Simply allocate a T object on the heap, assign its address to the target pointer, and expect that the user will properly delete it later. This solution is simple, but hard to get right, especially with complicated structures.
Provide an overload, which takes a memory manager. This solution is memory safe and allows a wider range of types to be deserialized, by requires the introduction of yet another concept, with further complexity.
Do not allow direct deserialization of such types. This sharp solution is simple to implement, but it restricts some common types (e.g: string_view), and prevents the user from using the same type on both ends. Because of its simplicity, this is the chosen solution.

How should the type tag of user defined types look like?

Type tags of user defined types should be shallow, e.g: {Person}, and the complete definition of the type has to be supplied via yet another side channel. This approach diverges from the original meaning of type tag, (as the shallow tag on its own doesn't allow visitation), and puts additional load on the user. On the other hand, it is easy to implement, even for recursive structures.
Type tags should always describe the complete type. e.g: {Person`age'i`name'[c}, allow automatic generation of tags for recursive structures. This is a pure approach, fits nicely to the original concept of type tags. However, it is difficult to implement (in a efficient constexpr fashion) if recursive (including mutually recursive) types need to be supported.
Type tags should always describe the complete type. e.g: {Person`age'i`name'[c}, disallow automatic generation of tags for recursive structures. A pure approach with some restriction. It remains easy to use, while allowing clients to use more difficult ways if visitation of recursive structures is needed. This is the chosen solution.

Split making types serializable and generation of tags or not?

Combining them leads to slightly smaller source code, but ties the requirements and usage together.
Separate serialization and tag generation logic aligns with the only pay for what you use principle. It allows the two to have different requirements (e.g: whether recursive types are allowed), and deserializer programs to inspect tags without pulling in the serializer logic. On the other hand, the separate specialization logic needs slightly more code. This is the chosen solution.

Concepts

OutputStream

template <typename OutStr>
concept OutputStream = requires(OutStr ostream, const char* buf, std::streamsize size)
{
  // Append `size` bytes from `buffer` to the stream
  { ostream.write(buf, size) } -> OutStr&;
};

InputStream

template <typename InpStr>
concept InputStream = requires(InpStr istream, char* buf, std::streamsize size)
{
  // Consume `size` bytes from the stream and copy them to the `buffer`.
  // Throw std::exception on failure (i.e: not enough bytes available)
  { istream.read(buf, size) } -> InpStr&;
};

Visitor

template <typename V, typename InputStream>
concept Visitor = requires(V visitor)
{
  visitor.visit(bool          );
  visitor.visit(char          );
  visitor.visit(std::int8_t   );
  visitor.visit(std::int16_t  );
  visitor.visit(std::int32_t  );
  visitor.visit(std::int64_t  );
  visitor.visit(std::uint8_t  );
  visitor.visit(std::uint16_t );
  visitor.visit(std::uint32_t );
  visitor.visit(std::uint64_t );

  visitor.visit(float         );
  visitor.visit(double        );
  visitor.visit(long double   );

  visitor.visit(mserialize::Visitor::SequenceBegin, InputStream&) -> bool;
  visitor.visit(mserialize::Visitor::SequenceEnd   );

  visitor.visit(mserialize::Visitor::String        );

  visitor.visit(mserialize::Visitor::TupleBegin, InputStream&) -> bool;
  visitor.visit(mserialize::Visitor::TupleEnd      );

  visitor.visit(mserialize::Visitor::VariantBegin, InputStream&) -> bool;
  visitor.visit(mserialize::Visitor::VariantEnd    );
  visitor.visit(mserialize::Visitor::Null          );

  visitor.visit(mserialize::Visitor::StructBegin, InputStream&) -> bool;
  visitor.visit(mserialize::Visitor::StructEnd     );

  visitor.visit(mserialize::Visitor::FieldBegin    );
  visitor.visit(mserialize::Visitor::FieldEnd      );

  visitor.visit(mserialize::Visitor::Enum          );

  visitor.visit(mserialize::Visitor::RepeatBegin   );
  visitor.visit(mserialize::Visitor::RepeatEnd     );
};

References

Type tags

The table below describes the type tags of supported types. In the first column, T refers to any supported type, and T... to any pack of supported types. In the second column, t refers to the tag of T in the cell left of it, and t... to the concatenated tags of the T... pack.

Type	Type Tag
`bool`	`y`
`char`	`c`
`int8_t`	`b`
`int16_t`	`s`
`int32_t`	`i`
`int64_t`	`l`
`uint8_t`	`B`
`uint16_t`	`S`
`uint32_t`	`I`
`uint64_t`	`L`
`float`	`f`
`double`	`d`
`long double`	`D`
Array of `T`	`[t`
Tuple of `T...`	`(t...)`
Variant of `T...`	`<t...>`
`void` (only to indicate empty state of a variant)	`0`
Adapted `enum E : T { a, b = 123, c}`	/t`E'0`a'7B`b'7C`c'\ (see below)
Adapted `struct Foo { T1 a; T2 b; }`	{Foo`a't1`b't2} (see below)

Type tag of Adapted Enum

<EnumTag> ::= /<UnderlyingTypeTag><EnumName><Enumerator>*\
<UnderlyingTypeTag> ::= b|s|i|l|B|S|I|L
<EnumName> ::= `Typename'
<Enumerator> ::= ValueInHex `EnumeratorName'

Type tag of Adapted Struct

<StructTag> ::= {<StructName><StructField>*}
<StructName> ::= `Typename'
<StructField> ::= `FieldName' FieldTag

Serialized format

By default, serializable types are mapped to type tags, and serialized according to that type tag, as described below. User defined serializers are allowed to use different serialization schemas, not described here.

Type	Serialized format
Arithmetic types (`y,c,b,s,i,l,B,S,I,L,f,d,D`)	Serialized as if by memcpy
Array of `T`	4 bytes (host endian) size of the array, followed by the serialized array elements
Tuple of `T...`	Elements are serialized in order, without additional decoration
Variant of `T...`	1 byte discriminator, followed by the serialized active option
Adapted enum	Serialized as if by memcpy
Adapted user defined type	Members are serialized in order, without additional decoration