Suppose, we want to apply certain transforming functions to the
immer
containers inside a large document type. The most
straightforward way would be to simply create new containers with the
new data, running the transforming function over each
element. However, this approach has some disadvantages:
Let’s consider a simple case using the document from the
first-example. The desired transformation would be to multiply
each element of the immer::vector<int>
by 10.
First, the document value would be created in the same way:
const auto v1 = vector_one{1, 2, 3};
const auto v2 = v1.push_back(4).push_back(5).push_back(6);
const auto value = document{v1, v2};
The next component we need is the pools of all the containers from the value:
const auto pools = immer::persist::get_output_pools(value);
The get_output_pools
function returns the output pools of all
immer
containers that would be serialized using pools, as
controlled by the policy. Here we use the default policy
hana_struct_auto_policy
which will use pools for all immer
containers inside the document type which must be a hana::Struct
.
The other required component is the conversion_map
:
const auto conversion_map = hana::make_map(hana::make_pair(
hana::type_c<vector_one>, [](int val) { return val * 10; }));
This is a hana::map
that describes the desired transformations to
be applied. The key of the map is an immer
container and the value
is the function to be applied to each element of the corresponding
container type. In this case, it will apply [](int val) { return
val * 10; }
to each int
of the vector_one
type, we have two
of those in the document
.
Having these two parts, we can create new pools with the transformations:
auto transformed_pools =
immer::persist::transform_output_pool(pools, conversion_map);
At this point, we can start converting the immer
containers and
create the transformed document value with them, new_value
:
const auto new_v1 =
immer::persist::convert_container(pools, transformed_pools, v1);
const auto expected_new_v1 = vector_one{10, 20, 30};
REQUIRE(new_v1 == expected_new_v1);
const auto new_v2 =
immer::persist::convert_container(pools, transformed_pools, v2);
const auto expected_new_v2 = vector_one{10, 20, 30, 40, 50, 60};
REQUIRE(new_v2 == expected_new_v2);
const auto new_value = document{new_v1, new_v2};
In order to confirm that the structural sharing has been preserved
after applying the transformations, let’s serialize the new_value
and inspect the JSON:
const auto policy =
immer::persist::hana_struct_auto_member_name_policy(document{});
const auto str =
immer::persist::cereal_save_with_pools(new_value, policy);
const auto expected_json = json_t::parse(R"(
{
"pools": {
"ints": {
"B": 5,
"BL": 1,
"inners": [
[0, {"children": [2], "relaxed": false}],
[3, {"children": [2, 5], "relaxed": false}]
],
"leaves": [[1, [30]], [2, [10, 20]], [4, [50, 60]], [5, [30, 40]]],
"vectors": [{"root": 0, "tail": 1}, {"root": 3, "tail": 4}]
}
},
"value0": {"ints": 0, "ints2": 1}
}
)");
REQUIRE(json_t::parse(str) == expected_json);
And indeed, we can see in the JSON that the node [2, [10, 20]]
is
reused in both vectors.
The transforming function can even return a different type. In the
following example, vector<int>
is transformed into
vector<std::string>
. The first two steps are the same as in the
previous example:
const auto v1 = vector_one{1, 2, 3};
const auto v2 = v1.push_back(4).push_back(5).push_back(6);
const auto value = document{v1, v2};
const auto pools = immer::persist::get_output_pools(value);
Only this time the transforming function will convert an integer into a string:
const auto conversion_map = hana::make_map(hana::make_pair(
hana::type_c<vector_one>,
[](int val) -> std::string { return fmt::format("_{}_", val); }));
Then we convert the two vectors the same way as before:
auto transformed_pools =
immer::persist::transform_output_pool(pools, conversion_map);
const auto new_v1 =
immer::persist::convert_container(pools, transformed_pools, v1);
const auto expected_new_v1 = vector_str{"_1_", "_2_", "_3_"};
REQUIRE(new_v1 == expected_new_v1);
const auto new_v2 =
immer::persist::convert_container(pools, transformed_pools, v2);
const auto expected_new_v2 =
vector_str{"_1_", "_2_", "_3_", "_4_", "_5_", "_6_"};
REQUIRE(new_v2 == expected_new_v2);
And in order to confirm that the structural sharing has been
preserved, we can introduce a new document type with the two vectors
being vector<std::string>
.
namespace {
struct document_str
{
vector_str str;
vector_str str2;
friend bool operator==(const document_str&, const document_str&) = default;
template <class Archive>
void serialize(Archive& ar)
{
ar(CEREAL_NVP(str), CEREAL_NVP(str2));
}
};
} // namespace
BOOST_HANA_ADAPT_STRUCT(document_str, str, str2);
And serialize it with pools:
const auto new_value = document_str{new_v1, new_v2};
const auto policy =
immer::persist::hana_struct_auto_member_name_policy(document_str{});
const auto str =
immer::persist::cereal_save_with_pools(new_value, policy);
const auto expected_json = json_t::parse(R"(
{
"pools": {
"str": {
"B": 5,
"BL": 1,
"inners": [
[0, {"children": [2], "relaxed": false}],
[3, {"children": [2, 5], "relaxed": false}]
],
"leaves": [
[1, ["_3_"]],
[2, ["_1_", "_2_"]],
[4, ["_5_", "_6_"]],
[5, ["_3_", "_4_"]]
],
"vectors": [{"root": 0, "tail": 1}, {"root": 3, "tail": 4}]
}
},
"value0": {"str": 0, "str2": 1}
}
)");
REQUIRE(json_t::parse(str) == expected_json);
In the resulting JSON we can confirm that the node
[2, ["_1_", "_2_"]]
is reused for both vectors.
As it was shown, converting vectors
is conceptually simple: the
transforming function is applied to each element of each node,
producing a new node with the transformed elements. When it comes to
the hash-based containers, that is set, map and table, their
structure is defined by the used hash function, so defining the
transformation may become a bit more verbose.
In the following example, we’ll start with a simple case of
transforming a map. For a map, only the hash of the key matters and we
will not modify the key yet. We will focus on transformations here and
not on the structural sharing within the document, so we will use the
immer
container itself as the document. Let’s define the following
policy to indicate that we want to use pools only for our container:
template <class Container>
struct direct_container_policy : immer::persist::value0_serialize_t
{
auto get_pool_types(const auto&) const
{
return boost::hana::tuple_t<Container>;
}
};
By default, immer
uses std::hash
for the hash-based
containers. While this hash is sufficient for runtime use, it can’t be
used for persistence, as noted in the C++ reference:
Note
Hash functions are only required to produce the same result for the same input within a single execution of a program
We will use xxHash as the hash for this example. Let’s create a small map like this:
using int_map_t =
immer::map<std::string, int, immer::persist::xx_hash<std::string>>;
const auto value = int_map_t{{"one", 1}, {"two", 2}};
const auto pools = immer::persist::get_output_pools(
value, direct_container_policy<int_map_t>{});
Our goal is to convert the value from int
to
std::string
. Let’s create the conversion_map
like this:
namespace hana = boost::hana;
using string_map_t = immer::
map<std::string, std::string, immer::persist::xx_hash<std::string>>;
const auto conversion_map = hana::make_map(hana::make_pair(
hana::type_c<int_map_t>,
hana::overload(
[](const std::pair<std::string, int>& item) {
return std::make_pair(item.first,
fmt::format("_{}_", item.second));
},
[](immer::persist::target_container_type_request) {
return string_map_t{};
})));
A few important details to note:
std::pair<std::string, int>
.immer::persist::target_container_type_request
. This is
achieved by using hana::overload
to combine 2 lambdas into one
callable value. When called with that argument, it should return an
empty container of the type we’re transforming to. This explicit
approach is necessary because there is no reliable way to
automatically determine the hash algorithm for the new
container. Even though in this case the type of the key doesn’t
change (and so the hash remains the same), in other scenarios it
might.Once the conversion_map
is defined, the actual conversion is done
as before:
auto transformed_pools =
immer::persist::transform_output_pool(pools, conversion_map);
const auto new_value =
immer::persist::convert_container(pools, transformed_pools, value);
const auto expected_new = string_map_t{{"one", "_1_"}, {"two", "_2_"}};
REQUIRE(new_value == expected_new);
And we can see that the original map’s values have been transformed into strings.
For this example, we’ll transform the type of the ID of the table element while keeping the hash of it the same. This can occur, for instance, if the member that serves as the ID gets wrapped in a wrapper type.
To begin, let’s define an item type for a table:
struct old_item
{
std::string id;
int data;
template <class Archive>
void serialize(Archive& ar)
{
ar(CEREAL_NVP(id), CEREAL_NVP(data));
}
};
We can create a table value with some data and get the pools for it like this:
using table_t = immer::table<old_item,
immer::table_key_fn,
immer::persist::xx_hash<std::string>>;
const auto value = table_t{old_item{"one", 1}, old_item{"two", 2}};
const auto pools = immer::persist::get_output_pools(
value, direct_container_policy<table_t>{});
In this example, we want to change the type of the old_item's
ID,
which is std::string
, while keeping its hash the same. Let’s
define a wrapper for std::string
and a new_item
type like
this:
struct new_id_t
{
std::string id;
friend bool operator==(const new_id_t&, const new_id_t&) = default;
friend std::size_t xx_hash_value(const new_id_t& value)
{
return immer::persist::xx_hash<std::string>{}(value.id);
}
};
struct new_item
{
new_id_t id;
std::string data;
friend bool operator==(const new_item&, const new_item&) = default;
};
We’re also changing the type for data
from int
to
std::string
but this change doesn’t affect the structure of the
table. We define the xx_hash_value
function for the new_id_t
type to make it compatible with the
immer::persist::xx_hash<new_id_t>
hash. Then, we can define the
target new_table_t
type and the conversion_map
that describes
how to convert old_item
into a new_item
.
using new_table_t = immer::
table<new_item, immer::table_key_fn, immer::persist::xx_hash<new_id_t>>;
const auto conversion_map = hana::make_map(hana::make_pair(
hana::type_c<table_t>,
hana::overload(
[](const old_item& item) {
return new_item{
.id = new_id_t{item.id},
.data = fmt::format("_{}_", item.data),
};
},
[](immer::persist::target_container_type_request) {
return new_table_t{};
})));
Finally, to convert the value
using the defined conversion_map
we prepare the converted pools with transform_output_pool
and use
convert_container
to convert the value
table.
auto transformed_pools =
immer::persist::transform_output_pool(pools, conversion_map);
const auto new_value =
immer::persist::convert_container(pools, transformed_pools, value);
const auto expected_new =
new_table_t{new_item{{"one"}, "_1_"}, new_item{{"two"}, "_2_"}};
REQUIRE(new_value == expected_new);
We can see that the new_value
table contains the transformed data
from the original value
table.
If the key of a map, the ID of a table item or an element of a set changes its hash due to a transformation, the transformed hash-based container can no longer keep its shape and it can’t be efficiently transformed by simply applying transformations to its nodes.
immer::persist
validates every container it creates from a
pool. If such a hash modification occurs, a runtime exception will be
thrown because it is not possible to detect this issue during
compile-time. Let’s modify the previous example to also change the
data of the ID:
const auto conversion_map = hana::make_map(hana::make_pair(
hana::type_c<table_t>,
hana::overload(
[](const old_item& item) {
return new_item{
// the ID's data is changed and its hash won't be the
// same
.id = new_id_t{item.id + "_key"},
.data = fmt::format("_{}_", item.data),
};
},
[](immer::persist::target_container_type_request) {
return new_table_t{};
})));
Now, if we attempt to convert the original table, a
immer::persist::champ::hash_validation_failed_exception
will be
thrown:
auto transformed_pools =
immer::persist::transform_output_pool(pools, conversion_map);
REQUIRE_THROWS_AS(
immer::persist::convert_container(pools, transformed_pools, value),
immer::persist::champ::hash_validation_failed_exception);
Even though such transformation can’t be performed efficiently, on a node level, we can still request these transformations to be applied. This will run for each value of the original container, creating a new independent container that doesn’t use structural sharing:
const auto conversion_map = hana::make_map(hana::make_pair(
hana::type_c<table_t>,
hana::overload(
[](const old_item& item) {
return new_item{
// the ID's data is changed and its hash won't be the
// same
.id = new_id_t{item.id + "_key"},
.data = fmt::format("_{}_", item.data),
};
},
[](immer::persist::target_container_type_request) {
// We know that the hash is changing and requesting to
// transform in a less efficient manner
return immer::persist::incompatible_hash_wrapper<
new_table_t>{};
})));
We can request for such container-level (as opposed to per-node level)
transformation to be performed by wrapping the desired new container
type new_table_t
in a
immer::persist::incompatible_hash_wrapper
as the result of the
immer::persist::target_container_type_request
call.
auto transformed_pools =
immer::persist::transform_output_pool(pools, conversion_map);
const auto new_value =
immer::persist::convert_container(pools, transformed_pools, value);
const auto expected_new = new_table_t{new_item{{"one_key"}, "_1_"},
new_item{{"two_key"}, "_2_"}};
REQUIRE(new_value == expected_new);
We can see that the transformation has been applied, the keys have the _key
suffix.
Note
While different transformed containers will not have structural sharing, transforming the same container multiple times will reuse previously transformed data. In other words, transformation will be cached on the container level but not on the nodes level.
const auto new_value_2 =
immer::persist::convert_container(pools, transformed_pools, value);
REQUIRE(new_value_2.impl().root == new_value.impl().root);
Let’s consider a scenario where a transforming function works on an
item within an immer
container and also needs to transform another
immer
container. We define the types as follows:
struct nested_t
{
vector_one ints;
friend bool operator==(const nested_t&, const nested_t&) = default;
template <class Archive>
void serialize(Archive& ar)
{
ar(CEREAL_NVP(ints));
}
};
struct with_nested_t
{
immer::vector<nested_t> nested;
friend bool operator==(const with_nested_t&,
const with_nested_t&) = default;
template <class Archive>
void serialize(Archive& ar)
{
ar(CEREAL_NVP(nested));
}
};
The important property here is that we have a vector<nested_t>
where nested_t
contains vector<int>
, so we can say a
vector
is nested inside another vector
. We can prepare a value
with some structural sharing and then serialize it:
const auto v1 = vector_one{1, 2, 3};
const auto v2 = v1.push_back(4).push_back(5).push_back(6);
const auto value = with_nested_t{
.nested =
{
nested_t{.ints = v1},
nested_t{.ints = v2},
},
};
const auto policy =
immer::persist::hana_struct_auto_member_name_policy(with_nested_t{});
const auto str = immer::persist::cereal_save_with_pools(value, policy);
The resulting JSON looks like:
const auto expected_json = json_t::parse(R"(
{
"pools": {
"ints": {
"B": 5,
"BL": 1,
"inners": [
[0, {"children": [2], "relaxed": false}],
[3, {"children": [2, 5], "relaxed": false}]
],
"leaves": [[1, [3]], [2, [1, 2]], [4, [5, 6]], [5, [3, 4]]],
"vectors": [{"root": 0, "tail": 1}, {"root": 3, "tail": 4}]
},
"nested": {
"B": 5,
"BL": 3,
"inners": [[0, {"children": [], "relaxed": false}]],
"leaves": [[1, [{"ints": 0}, {"ints": 1}]]],
"vectors": [{"root": 0, "tail": 1}]
}
},
"value0": {"nested": 0}
}
)");
Looking at the JSON we can confirm that the node [2, [1, 2]]
is reused.
Let’s define a conversion_map
like this:
const auto conversion_map = hana::make_map(
hana::make_pair(
hana::type_c<vector_one>,
[](int val) -> std::string { return fmt::format("_{}_", val); }),
hana::make_pair(
hana::type_c<immer::vector<nested_t>>,
[](const nested_t& item, const auto& convert_container) {
return new_nested_t{
.str =
convert_container(hana::type_c<vector_str>, item.ints),
};
}));
The transforming function for vector_one
is simple as it
transforms an int
into a std::string
. However, the function
for the vector<nested_t>
is more involved. When we attempt to
transform one item of that vector, nested_t
, we realize that
inside that function we have a vector<int>
to deal with. This
brings us back to the problems described in the beginning of the
transformations-with-pools section. To solve this issue,
immer::persist
provides an optional second argument to the
transforming function, a function called convert_container
. This
function can be called with two arguments: the desired container type
and the immer
container to convert. This allows us to access the
conversion_map
we’re defining. This transformation will be
performed using pools and will preserve structural sharing as
expected.
Having defined the conversion_map
, we apply it in the usual way
and get the new_value
:
const auto pools = immer::persist::get_output_pools(value, policy);
auto transformed_pools =
immer::persist::transform_output_pool(pools, conversion_map);
const auto new_value = with_new_nested_t{
.nested = immer::persist::convert_container(
pools, transformed_pools, value.nested),
};
We can verify that the new_value
has the expected content:
const auto expected_new = with_new_nested_t{
.nested =
{
new_nested_t{.str = {"_1_", "_2_", "_3_"}},
new_nested_t{.str = {"_1_", "_2_", "_3_", "_4_", "_5_", "_6_"}},
},
};
REQUIRE(new_value == expected_new);
And we can serialize it again to confirm that the structural sharing of the nested vectors has been preserved:
const auto transformed_str = immer::persist::cereal_save_with_pools(
new_value,
immer::persist::hana_struct_auto_member_name_policy(
with_new_nested_t{}));
const auto expected_transformed_json = json_t::parse(R"(
{
"pools": {
"nested": {
"B": 5,
"BL": 3,
"inners": [[0, {"children": [], "relaxed": false}]],
"leaves": [[1, [{"str": 0}, {"str": 1}]]],
"vectors": [{"root": 0, "tail": 1}]
},
"str": {
"B": 5,
"BL": 1,
"inners": [
[0, {"children": [2], "relaxed": false}],
[3, {"children": [2, 5], "relaxed": false}]
],
"leaves": [
[1, ["_3_"]],
[2, ["_1_", "_2_"]],
[4, ["_5_", "_6_"]],
[5, ["_3_", "_4_"]]
],
"vectors": [{"root": 0, "tail": 1}, {"root": 3, "tail": 4}]
}
},
"value0": {"nested": 0}
}
)");
We can see that the [2, ["_1_", "_2_"]]
node is still being reused
in the two vectors.