A fast, platform dependent serialization library.
Mserialize supports the serialization and deserialization of values, represented by objects of supported types, and the visitation of serialized values. The primary goal of the library is to make serialization as fast as possible, while allowing correct deserialization and visitation. To achieve this goal, Mserialize adds very little overhead, if at all, to the serialized values. Most of the time, the serialized format matches the original value, byte by byte. The format is not intended to be universal, it does not serve the purpose of exchanging messages between different platforms (e.g: between platforms of different endianness).
Objects of supported types can be easily serialized:
#include <mserialize/serialize.hpp>
const T my_value;
std::ofstream ostream(path);
mserialize::serialize(my_value, ostream);
The supported types (T
) for serialization include:
bool
, char
, integer and floating point types)const char*
, string, string_view)value_type
)T*
, unique_ptr, shared_ptr, weak_ptr, with a supported elem_type
)Additional types can be adapted to the library, as shown below.
The target ostream
can be anything that models the OutputStream concept.
Deserialization of serialized objects is equally easy:
#include <mserialize/deserialize.hpp>
T my_value;
std::ifstream istream(path);
istream.exceptions(std::ios_base::failbit);
mserialize::deserialize(my_value, istream);
Almost any object, that can be serialized, can be also deserialized into,
except a few, that do not own the underlying resource (e.g: T*
, string_view, weak_ptr)
- see Design Rationale for details.
The source istream
can be anything that models the InputStream concept.
Therefore, standard streams must be configured to throw exceptions on failure.
The type of the value in the input stream is inferred from the destination (first) argument of deserialize
.
It is only allowed to deserialize a value into an object of a compatible type.
Compatibility is defined in terms of type tags.
Two types are compatible if their type tags match.
If the deserialization fails (e.g: if the content of the stream cannot be interpreted
as a serialized value of the type inferred from the destination), deserialize
throws std::runtime_error
.
By default, serialization and deserialization of generic user defined types are not supported (unless they meet the requirements of some supported concept, e.g: a container). Such types can be still adapted though, by specializing extension points for these types. The easiest way is to use macros:
#include <mserialize/make_struct_deserializable.hpp>
#include <mserialize/make_struct_serializable.hpp>
// Given a custom type:
struct Alpha { int a = 0; std::string b; };
// Serialization and deserialization can be enabled by macros:
MSERIALIZE_MAKE_STRUCT_SERIALIZABLE(Alpha, a, b)
MSERIALIZE_MAKE_STRUCT_DESERIALIZABLE(Alpha, a, b)
// At this point, objects of `Alpha` can be used
// together with mserialize::serialize and deserialize,
// the same way as by-default supported objects.
const Alpha in{30, "foo"};
std::stringstream stream;
mserialize::serialize(in, stream);
Alpha out;
stream.exceptions(std::ios_base::failbit);
mserialize::deserialize(out, stream);
assert(in.a == out.a && in.b == out.b);
The macros have to be invoked at the global scope, outside of any namespace. The first argument is the name of the type, the rest are the members (The member list can be empty). The member list does not have to enumerate every member of the given type: if a member is omitted, it will be simply ignored during serialization/deserialization (e.g: a mutex member is typically not to be serialized). However, to make roundtrip work, the member lists given to each macro must match exactly.
For serialization, a member can be either a non-static, non-reference, non-bitfield data member, or a getter, which is a const qualified, nullary member function, which returns a serializable object.
For deserialization, a member can be either a non-const, non-static, non-reference, non-bitfield data member, or a setter, which takes a single, deserializable argument.
// Given a custom type with getters and setters:
class Beta
{
std::string c;
float d;
public:
const std::string& getC() const;
void setC(std::string);
float getD() const;
void setD(float);
};
// Serialization and deserialization can be enabled the same way:
MSERIALIZE_MAKE_STRUCT_SERIALIZABLE(Beta, getC, getD)
MSERIALIZE_MAKE_STRUCT_DESERIALIZABLE(Beta, setC, setD)
If some of the data members, getters or setters are private, but serialization or deserialization is still preferred via those members, the following friend declarations can be added to the type:
class Gamma
{
std::string e; // private data member
int f() const; // private getter
void f(int); // private setter
template <typename, typename>
friend struct mserialize::CustomSerializer;
template <typename, typename>
friend struct mserialize::CustomDeserializer;
};
If a type publicly derives from serializable or deserializable bases, it can be made serializable and deserializable without repeating the fields of its bases:
#include <mserialize/make_derived_struct_deserializable.hpp>
#include <mserialize/make_derived_struct_serializable.hpp>
struct Zeta : Beta { int e = 0; };
MSERIALIZE_MAKE_DERIVED_STRUCT_SERIALIZABLE(Zeta, (Beta), e)
MSERIALIZE_MAKE_DERIVED_STRUCT_DESERIALIZABLE(Zeta, (Beta), e)
The same rules apply as above, with the addition that the second argument must be a non-empty parenthesised list of serializable or deserializable base classes.
Class templates can be made serializable and deserializable on the same conditions, except that a different macro must be called:
#include <mserialize/make_template_deserializable.hpp>
#include <mserialize/make_template_serializable.hpp>
template <typename A, typename B>
struct Pair { A a; B b; };
MSERIALIZE_MAKE_TEMPLATE_SERIALIZABLE((typename A, typename B), (Pair<A,B>), a, b)
MSERIALIZE_MAKE_TEMPLATE_DESERIALIZABLE((typename A, typename B), (Pair<A,B>), a, b)
The first argument of the macro must be the arguments of the template, with the necessary typename prefix, where needed, as they appear after the template keyword in the definition, wrapped by parentheses. (The parentheses are required to avoid the preprocessor splitting the arguments at the commas)
The second argument is the template name with the template arguments, as it should appear in a specialization, wrapped by parentheses. The rest of the arguments are members, same as above.
As an alternative to deserialization, serialized objects can be visited. Visitation is useful if the precise type of the serialized object is not known, the type is not available, or not deserializable.
While the precise type of the serialized object is not needed, a type tag still must be available for visitation to work. A type tag is a string, that describes a serializable type to the extent that it can be visited.
The following example shows how serialization and visitation can work together:
#include <mserialize/serialize.hpp>
#include <mserialize/tag.hpp>
// serialize a T object
const T t;
const auto tag = mserialize::tag<T>();
std::ofstream ostream(path);
mserialize::serialize(tag, ostream);
mserialize::serialize(t, ostream);
#include <mserialize/deserialize.hpp>
#include <mserialize/visit.hpp>
// visit the object
std::ifstream istream(path);
istream.exceptions(std::ios_base::failbit);
std::string tag;
mserialize::deserialize(tag, istream);
Visitor visitor;
mserialize::visit(tag, visitor, istream);
Visitor
can be any type that models the Visitor concept.
visit
throws std::exception
if the visitation fails
(e.g: the provided tag does not match the serialized object in the stream).
In the example, the tag is serialized alongside the object.
In general, the tag is not required to be in the stream,
it can be sent to the visiting party by any other means.
The tag given to visit must be a valid type tag:
do not use tags coming from a potentially malicious source.
By default, enums have no tag associated. A tag, suitable for visitation can be defined in the following way:
#include <mserialize/make_enum_tag.hpp>
enum Delta { a, b, c };
MSERIALIZE_MAKE_ENUM_TAG(Delta, a, b, c)
This works with both enums and enum classes, regardless the underlying type of the enum. The macro has to be called in global scope (outside of any namespace). If an enumerator is omitted from the macro call, the tag will be incomplete, and during visitation, if the missing enumerator is visited, only its underlying value will be available, the enumerator name will be empty.
By default, in general, user defined types have no tag associated. (In general, since any type modeling a specific supported concept, e.g: user defined containers, does have a tag associated by default). A tag, suitable for visitation can be defined in the following way:
#include <mserialize/make_struct_tag.hpp>
struct Epsilon { int a; std::string b; };
MSERIALIZE_MAKE_STRUCT_TAG(Epsilon, a, b)
The macro has to be called in global scope (outside of any namespace). The members can be data members or getters, just like for serialization. For private members, the following friend declaration can be added:
template <typename, typename>
friend struct mserialize::CustomTag;
The member list must be in sync with the MSERIALIZE_MAKE_STRUCT_SERIALIZABLE
call,
if visitation of objects serialized that way is desired.
MSERIALIZE_MAKE_STRUCT_TAG
cannot be used with recursive types.
See Adapting user defined recursive types for visitation for a solution.
A tag can be assigned to a class that derives from a base or bases that are tagged already, without enumerating the members of the bases again:
#include <mserialize/make_derived_struct_tag.hpp>
MSERIALIZE_MAKE_DERIVED_STRUCT_TAG(Zeta, (Beta), e)
The same rules apply as above, with the addition that the second argument must be a non-empty parenthesised list of tagged base classes.
A tag can be assigned to class templates on the same conditions, except that a different macro must be called:
#include <mserialize/make_template_tag.hpp>
template <typename A, typename B, typename C>
struct Triplet { A a; B b; C c; };
MSERIALIZE_MAKE_TEMPLATE_TAG((typename A, typename B, typename C), (Triplet<A,B,C>), a, b, c)
The first argument of the macro must be the arguments of the template, with the necessary typename prefix, where needed, as they appear after the template keyword in the definition, wrapped by parentheses. (The parentheses are required to avoid the preprocessor splitting the arguments at the commas)
The second argument is the template name with the template arguments, as it should appear in a specialization, wrapped by parentheses. The rest of the arguments are members, same as above.
From the tag generation point of view, a structure is recursive
if one of its fields has a type tag that includes the type tag of the parent type.
Currently, MSERIALIZE_MAKE_STRUCT_TAG
is unable to deal with such recursive structures.
As a workaround, such type tags can be manually assigned:
#include <mserialize/tag.hpp>
struct Node { int value; Node* next; };
namespace mserialize {
template <>
struct CustomTag<Node>
{
static constexpr auto tag_string()
{
return make_cx_string("{Node`value'i`next'<0{Node}>}");
}
};
} // namespace mserialize
A breakdown of the string literal:
{
: Start structure tagNode
: Name of the structure`value'
: Name of the first fieldi
: Type tag of the first field. Also see the type tag reference`next'
: Name of the second field<
: Begin variant tag (pointers are modeled as either nothing or something)0
: The pointer is either null{Node}
: or points to a Node objects. This is the important part: the structure
definition here is not expanded again, as that would result in infinite recursion.
The visitor will recognize that Node is not an empty type, but something defined earlier.>
End variant tag}
End structure tagTo keep the implementation and interface simple, values cannot be deserialized
into object that do not own the underlying resource, e.g: T*
or string_view.
See Design Rationale for considered alternatives.
In the Serialized format the size of a serialized sequence is represented by a 32 bit unsigned integer. Therefore, sequences longer than 2^32 cannot be serialized. As a workaround, such sequences can be split into a sequence of smaller sequences.
In the Serialized format the discriminator of a variant is represented by an 8 bit unsigned integer. Therefore, variants with more than 256 alternatives cannot be serialized. As a workaround, such variants can be split into a variant of smaller variants.
Macros taking arbitrary number of arguments (e.g: member lists, enumerators) need to iterate over the given arguments. The iteration is done by loop unrolling, which is currently capped at 100. This limit can be increased by regenerating foreach.hpp, but MSVC does not support macros with more than 127 arguments.
Library design tends to be arguable. Some decisions need to be explained.
How to signal errors when deserializing?
Leave the stream in bad state. This is common practice in standard library components, but does not give enough context about the nature of the error.
Set an error_code
. This requires the ec
to be propagated through every
deserialization layer (which might or might not be good), and also requires
several extra checks (to stop if the ec
is set). As a deserialization
error is considered exceptional, the nominal case should not be penalized
with extra checks, which can be avoided with exceptions.
Throw an exception. Can provide enough context, fast if there are no
errors, requires extra care. This is the chosen solution. The type of the
exception should be std::runtime_error
, but on platforms using the
pre-C++11 ABI, std::ios_base::failure
(thrown by streams) is not derived
from std::runtime_error
, therefore std::exception
must be used.
How to deserialize non-owning types? Let's consider T*
:
Simply allocate a T
object on the heap, assign its address to the target
pointer, and expect that the user will properly delete it later. This
solution is simple, but hard to get right, especially with complicated
structures.
Provide an overload, which takes a memory manager. This solution is memory safe and allows a wider range of types to be deserialized, by requires the introduction of yet another concept, with further complexity.
Do not allow direct deserialization of such types. This sharp solution is simple to implement, but it restricts some common types (e.g: string_view), and prevents the user from using the same type on both ends. Because of its simplicity, this is the chosen solution.
How should the type tag of user defined types look like?
Type tags of user defined types should be shallow, e.g: {Person}
, and the
complete definition of the type has to be supplied via yet another side
channel. This approach diverges from the original meaning of type tag, (as
the shallow tag on its own doesn't allow visitation), and puts additional
load on the user. On the other hand, it is easy to implement, even for
recursive structures.
Type tags should always describe the complete type. e.g:
{Person`age'i`name'[c}
, allow automatic generation of tags for recursive
structures. This is a pure approach, fits nicely to the original concept
of type tags. However, it is difficult to implement (in a efficient
constexpr fashion) if recursive (including mutually recursive) types need
to be supported.
Type tags should always describe the complete type. e.g:
{Person`age'i`name'[c}
, disallow automatic generation of tags for
recursive structures. A pure approach with some restriction. It remains
easy to use, while allowing clients to use more difficult ways if
visitation of recursive structures is needed. This is the chosen solution.
Split making types serializable and generation of tags or not?
Combining them leads to slightly smaller source code, but ties the requirements and usage together.
Separate serialization and tag generation logic aligns with the only pay for what you use principle. It allows the two to have different requirements (e.g: whether recursive types are allowed), and deserializer programs to inspect tags without pulling in the serializer logic. On the other hand, the separate specialization logic needs slightly more code. This is the chosen solution.
template <typename OutStr>
concept OutputStream = requires(OutStr ostream, const char* buf, std::streamsize size)
{
// Append `size` bytes from `buffer` to the stream
{ ostream.write(buf, size) } -> OutStr&;
};
template <typename InpStr>
concept InputStream = requires(InpStr istream, char* buf, std::streamsize size)
{
// Consume `size` bytes from the stream and copy them to the `buffer`.
// Throw std::exception on failure (i.e: not enough bytes available)
{ istream.read(buf, size) } -> InpStr&;
};
template <typename V, typename InputStream>
concept Visitor = requires(V visitor)
{
visitor.visit(bool );
visitor.visit(char );
visitor.visit(std::int8_t );
visitor.visit(std::int16_t );
visitor.visit(std::int32_t );
visitor.visit(std::int64_t );
visitor.visit(std::uint8_t );
visitor.visit(std::uint16_t );
visitor.visit(std::uint32_t );
visitor.visit(std::uint64_t );
visitor.visit(float );
visitor.visit(double );
visitor.visit(long double );
visitor.visit(mserialize::Visitor::SequenceBegin, InputStream&) -> bool;
visitor.visit(mserialize::Visitor::SequenceEnd );
visitor.visit(mserialize::Visitor::String );
visitor.visit(mserialize::Visitor::TupleBegin, InputStream&) -> bool;
visitor.visit(mserialize::Visitor::TupleEnd );
visitor.visit(mserialize::Visitor::VariantBegin, InputStream&) -> bool;
visitor.visit(mserialize::Visitor::VariantEnd );
visitor.visit(mserialize::Visitor::Null );
visitor.visit(mserialize::Visitor::StructBegin, InputStream&) -> bool;
visitor.visit(mserialize::Visitor::StructEnd );
visitor.visit(mserialize::Visitor::FieldBegin );
visitor.visit(mserialize::Visitor::FieldEnd );
visitor.visit(mserialize::Visitor::Enum );
visitor.visit(mserialize::Visitor::RepeatBegin );
visitor.visit(mserialize::Visitor::RepeatEnd );
};
The table below describes the type tags of supported types.
In the first column, T
refers to any supported type, and T...
to any pack of supported types.
In the second column, t
refers to the tag of T
in the cell left of it, and t...
to the
concatenated tags of the T...
pack.
Type | Type Tag |
---|---|
bool | y |
char | c |
int8_t | b |
int16_t | s |
int32_t | i |
int64_t | l |
uint8_t | B |
uint16_t | S |
uint32_t | I |
uint64_t | L |
float | f |
double | d |
long double | D |
Array of T | [t |
Tuple of T... | (t...) |
Variant of T... | <t...> |
void (only to indicate empty state of a variant) | 0 |
Adapted enum E : T { a, b = 123, c} |
/t`E'0`a'7B`b'7C`c'\ (see below)
|
Adapted struct Foo { T1 a; T2 b; } |
{Foo`a't1`b't2} (see below)
|
<EnumTag> ::= /<UnderlyingTypeTag><EnumName><Enumerator>*\
<UnderlyingTypeTag> ::= b|s|i|l|B|S|I|L
<EnumName> ::= `Typename'
<Enumerator> ::= ValueInHex `EnumeratorName'
<StructTag> ::= {<StructName><StructField>*}
<StructName> ::= `Typename'
<StructField> ::= `FieldName' FieldTag
By default, serializable types are mapped to type tags, and serialized according to that type tag, as described below. User defined serializers are allowed to use different serialization schemas, not described here.
Type | Serialized format |
---|---|
Arithmetic types (y,c,b,s,i,l,B,S,I,L,f,d,D ) |
Serialized as if by memcpy |
Array of T |
4 bytes (host endian) size of the array, followed by the serialized array elements |
Tuple of T... |
Elements are serialized in order, without additional decoration |
Variant of T... |
1 byte discriminator, followed by the serialized active option |
Adapted enum | Serialized as if by memcpy |
Adapted user defined type | Members are serialized in order, without additional decoration |