Persistence for the masses

RRB-Vectors in a systems language

Juanpe Bolívar
https://sinusoid.al

       

  …

Question

Can we implement
functional data structures
in a systems programming language?

Question

Can we implement
persistent vectors
in C++?

background

Radix Balanced Tree

$M=2^B$ $M=32$ $B=5$ $\left\lceil\log_{32}(2^{32})\right\rceil = 7$

Radix Balanced Search

v[17] 01 00 01 v.set(17, 'R')

Relaxed Radix Balanced Tree

Operations

$$\text{effective}\ O(1)$$
  • Random access
  • Update
  • Push back
  • Slicing
$$O(\log(n))$$
  • Concat
  • Insert
  • Push front

contributions

Embedding
Radix balanced tree

$$M=2^B$$ $$M_l=2^{B_l}$$
$$ B_l^{\mathit{opt}}(B, T) = \left\lfloor \log_2\left( \frac{\mathit{sizeof}_* \times 2^B} {\mathit{sizeof}_T} \right) \right\rfloor $$

incidental metadata

type tag, length position based navigation

template <
    typename HeapPolicy,       ⟵―― malloc_policy, gc_policy, ...
    typename RefCountPolicy,   ⟵―― no_refcount_policy, atomic_refcount_policy, ...
    typename TransiencePolicy, ⟵―― gc_transience_policy, ...
    ...
>
struct memory_policy;

vector<int> myiota(vector<int> v, int first, int last)
{
    for (auto i = first; i < last; ++i)
        v = v.push_back(i);
    return v;
}

vector<int> myiota(vector<int> v, int first, int last)
{
    auto t = v.transient();
    for (auto i = first; i < last; ++i)
        t.push_back(i);
    return t.persistent();
}
          

vector<int> myiota(vector<int> v,
                   int first,
                   int last)
{
    auto t = v.transient(); ⟵―― id: #
    for (auto i = first; i < last; ++i)
        t.push_back(i);
    return t.persistent();
}
          

vector<int> myiota(vector<int> v,
                   int first,
                   int last)
{
    auto t = v.transient();
    for (auto i = first; i < last; ++i)
        t.push_back(i);
    return t.persistent();
}
          
reference counting  ➡  anonymous objects are transients

vector<char> say_hi(vector<char> v)
{
    return v.push_back('h')        ⟵―― named value: v
    return move(v).push_back('h')  ⟵―― moved value
            .push_back('i')        ⟵―― anonymous value
            .push_back('!');       ⟵―― anonymous value
}
and they compose  ➡  say_hi(say_hi(...))
$$\oplus : \alpha_\tau \times \alpha_\tau \to \alpha_\tau$$
$$\begin{split} \oplus_l^{\bf !} &: \alpha_\tau^{\bf !} \times \alpha_\tau \to \alpha_\tau^{\bf !} \\ \oplus_r^{\bf !} &: \alpha_\tau \times \alpha_\tau^{\bf !} \to \alpha_\tau^{\bf !} \\ \oplus_b^{\bf !} &: \alpha_\tau^{\bf !} \times \alpha_\tau^{\bf !} \to \alpha_\tau^{\bf !} \end{split}$$
$${\bf function}\ \text{CAT!}(node_l, id_l, node_r, id_r, id_c, ...)$$


$$\begin{split} a_l \oplus_l^{\bf !} a_r &= \text{CAT!}(a_l, id(a_l), a_r, id_{noone}, id(a_l), ...)\\ a_l \oplus_r^{\bf !} a_r &= \text{CAT!}(a_l, id_{noone}, a_r, id(a_r), id(a_r), ...)\\ a_l \oplus_b^{\bf !} a_r &= \text{CAT!}(a_l, id(a_l), a_r, id(a_r), id_{choose}(a_l, a_r), \dots) \end{split}$$

Results

Question

Can we implement
persistent data structures
in a systems programming language?

very good performance


User controls trade-offs via policies

Embedding and transients are very effective


Iteration and appends batches similar to std::vector

only 1 X to 1.5 X overhead

Python and Guile bindings


1.5 X faster than other C libraries
github.com/arximboldi/ewig
https://github.com/arximboldi/immer
https://github.com/arximboldi/ewig
https://sinusoid.al
thanks.