Serializing your data structures using immer::persist
allows you
preserve the structural sharing across sessions of your application.
This has multiple practical use cases, like storing the undo history or the clipboard of a complex application, or applying advanced logging techniques.
The library serializes multiple containers together via the notion of a pool. These pools are produced automatically and represent in the JSON the internal structure (trees) that implement the Immer containers.
For this example, we’ll use a document
type that contains two
immer
vectors.
// Set the BL constant to 1, so that only 2 elements are stored in leaves.
// This allows to demonstrate structural sharing even in vectors with just a few
// elements.
using vector_one =
immer::vector<int, immer::default_memory_policy, immer::default_bits, 1>;
struct document
{
vector_one ints;
vector_one ints2;
friend bool operator==(const document&, const document&) = default;
// Make the struct serializable with cereal as usual, nothing special
// related to immer::persist.
template <class Archive>
void serialize(Archive& ar)
{
ar(CEREAL_NVP(ints), CEREAL_NVP(ints2));
}
};
using json_t = nlohmann::json;
Let’s say we have two vectors v1
and v2
, where v2
is
derived from v1
so that it shares data with it:
const auto v1 = vector_one{1, 2, 3};
const auto v2 = v1.push_back(4).push_back(5).push_back(6);
const auto value = document{v1, v2};
We can serialize the document using cereal
with this:
auto os = std::ostringstream{};
{
auto ar = cereal::JSONOutputArchive{os};
ar(value);
}
return os.str();
Generating a JSON like this one:
{"value0": {"ints": [1, 2, 3], "ints2": [1, 2, 3, 4, 5, 6]}}
As you can see, ints
and ints2
contain the full linearization
of each vector. The structural sharing between these two data
structures is not represented in its serialized form.
First, let’s make the document
struct compatible with
boost::hana
. This way, the persist
library can automatically
determine what pool types are needed, and to name the
pools.
BOOST_HANA_ADAPT_STRUCT(document, ints, ints2);
Then using immer::persist
we can serialize it with:
const auto policy =
immer::persist::hana_struct_auto_member_name_policy(document{});
const auto str = immer::persist::cereal_save_with_pools(value, policy);
Which generates some JSON like this:
const auto expected_json = json_t::parse(R"(
{
"value0": {"ints": 0, "ints2": 1},
"pools": {
"ints": {
"B": 5,
"BL": 1,
"inners": [
[0, {"children": [2], "relaxed": false}],
[3, {"children": [2, 5], "relaxed": false}]
],
"leaves": [[1, [3]], [2, [1, 2]], [4, [5, 6]], [5, [3, 4]]],
"vectors": [{"root": 0, "tail": 1}, {"root": 3, "tail": 4}]
}
}
}
)");
As you can see, the value is serialized with every immer
container
replaced by an identifier. This identifier is a key into a
pool, which is serialized just after.
Note
Currently, immer-persist
makes a distiction between
pools used for saving containers (output pools) and for loading
containers (input pools), similar to cereal
with its
InputArchive
and OutputArchive
distiction.
Currently, immer-persist
focuses on JSON as the serialization
format and uses the cereal
library internally. In principle, other
formats and serialization libraries could be supported in the future.
sharing across sessions.
You can see in the out that the nodes of the trees that make up the
immer
containers are directly represented in the JSON and, because
we are representing all the containers as a whole, those nodes that
are referenced in multiple trees can be stored only once. That same
structure is preserved when reading the pool back from disk and
reconstructing the vectors (and other containers) from it, thus
allowing us to preserve the structural sharing across sessions.
We can use policy to control the names of the pools for each container.
For this example, let’s define a new document type doc_2
. It will
also contain another type extra_data
with a vector
of
strings
in it. To demonstrate the responsibilities of the policy,
the doc_2
type will not be a boost::hana::Struct
and will not
allow for compile-time reflection.
using vector_str = immer::
vector<std::string, immer::default_memory_policy, immer::default_bits, 1>;
struct extra_data
{
vector_str comments;
friend bool operator==(const extra_data&, const extra_data&) = default;
template <class Archive>
void serialize(Archive& ar)
{
ar(CEREAL_NVP(comments));
}
};
struct doc_2
{
vector_one ints;
vector_one ints2;
vector_str strings;
extra_data extra;
friend bool operator==(const doc_2&, const doc_2&) = default;
template <class Archive>
void serialize(Archive& ar)
{
ar(CEREAL_NVP(ints),
CEREAL_NVP(ints2),
CEREAL_NVP(strings),
CEREAL_NVP(extra));
}
};
We define the doc_2_policy
as following:
struct doc_2_policy
{
auto get_pool_types(const auto&) const
{
return boost::hana::tuple_t<vector_one, vector_str>;
}
template <class Archive>
void save(Archive& ar, const doc_2& doc2_value) const
{
ar(CEREAL_NVP(doc2_value));
}
template <class Archive>
void load(Archive& ar, doc_2& doc2_value) const
{
ar(CEREAL_NVP(doc2_value));
}
auto get_pool_name(const vector_one&) const { return "vector_of_ints"; }
auto get_pool_name(const vector_str&) const { return "vector_of_strings"; }
};
The get_pool_types
function returns the types of containers that
should be serialized with pools, in this case it’s both vector
of
ints
and strings
. The save
and load
functions control
the name of the document node, in this case it is doc2_value
. And
the get_pool_name
overloaded functions supply the name of the pool
for each corresponding immer
container. To create and serialize a
value of doc_2
, you can use the following approach:
const auto v1 = vector_one{1, 2, 3};
const auto v2 = v1.push_back(4).push_back(5).push_back(6);
const auto str1 = vector_str{"one", "two"};
const auto str2 =
str1.push_back("three").push_back("four").push_back("five");
const auto value = doc_2{v1, v2, str1, extra_data{str2}};
const auto str =
immer::persist::cereal_save_with_pools(value, doc_2_policy{});
The serialized JSON looks like this:
const auto expected_json = json_t::parse(R"(
{
"doc2_value": {"ints": 0, "ints2": 1, "strings": 0, "extra": {"comments": 1}},
"pools": {
"vector_of_ints": {
"B": 5,
"BL": 1,
"leaves": [[1, [3]], [2, [1, 2]], [4, [5, 6]], [5, [3, 4]]],
"inners": [
[0, {"children": [2], "relaxed": false}],
[3, {"children": [2, 5], "relaxed": false}]
],
"vectors": [{"root": 0, "tail": 1}, {"root": 3, "tail": 4}]
},
"vector_of_strings": {
"B": 5,
"BL": 1,
"leaves": [[1, ["one", "two"]], [3, ["five"]], [4, ["three", "four"]]],
"inners": [
[0, {"children": [], "relaxed": false}],
[2, {"children": [1, 4], "relaxed": false}]
],
"vectors": [{"root": 0, "tail": 1}, {"root": 2, "tail": 3}]
}
}
}
)");
And it can also be loaded from JSON like this:
const auto loaded_value =
immer::persist::cereal_load_with_pools<doc_2>(str, doc_2_policy{});
This example also demonstrates a scenario in which the main document
type doc_2
contains another type extra_data
with a
vector
. As you can see in the resulting JSON, nested types are
also serialized with pools: "extra": {"comments": 1}
. Only the ID
of the comments
vector
is serialized instead of its content.