\chapter{Generalizing the Framework}

\begin{center}
    \emph{The following chapter is an adaptation of work completed in collaboration with Dr. Dong Xie and Dr. Zhuoyue Zhao
    and published
    in PVLDB Volume 17, Issue 11 (July 2024) under the title "Towards Systematic Index Dynamization".
    }
    \hrule
\end{center}


\label{chap:framework}

\section{Introduction}

In the previous chapter, we discussed how several of the limitations of
dynamization could be overcome by proposing a systematic dynamization
approach for sampling data structures. In doing so, we introduced
a multi-stage query mechanism to overcome the non-decomposability of
these queries, provided two mechanisms for supporting deletes along with
specialized processing to integrate these with the query mechanism, and
introduced some performance tuning capability inspired by the design space
of modern LSM Trees. While promising, these results are highly specialized
and remain useful only within the context of sampling queries. In this
chapter, we develop new generalized query abstractions based on these
specific results, and discuss a fully implemented framework based upon
these abstractions.

More specifically, in this chapter we propose \emph{extended
decomposability} and \emph{iterative deletion decomposability} as two
new, broader classes of search problem which are strict supersets of
decomposability and deletion decomposability respectively, providing a
more powerful interface to allow the efficient implementation of a larger
set of search problems over a dynamized structure. We then implement
a C++ library based upon these abstractions which is capable of adding
support for inserts, deletes, and concurrency to static data structures
automatically, and use it to provide dynamizations for independent range
sampling, range queries with learned indices, string search with succinct
tries, and high dimensional vector search with metric indices. In each
case we compare our dynamized implementation with existing dynamic
structures, and standard Bentley-Saxe dynamizations, where possible.

\section{Beyond Decomposability}

We begin our discussion of this generalized framework by proposing
new classes of search problems based upon our results from examining
sampling problems in the previous chapter. Our new classes will enable
the support of new types of search problem, enable more efficient support
for certain already supported problems, and allow for broader support
of deletes. Based on this, we will develop a taxonomy of search problems
that can be supported by our dynamization technique.


\subsection{Extended Decomposability}
\label{ssec:edsp}
As discussed in Chapter~\cite{chap:background}, the standard query model
used by dynamization techniques requires that a given query be broadcast,
unaltered, to each block within the dynamized structure, and then that
the results from these identical local queries be efficiently mergeable
to obtain the final answer to the query. This model limits dynamization
to decomposable search problems (Definition~\ref{def:dsp}).

In the previous chapter, we considered various sampling problems as
examples of non-decomposable search problems, and devised a technique for
correctly answering queries of that type over a dynamized structure. In
this section, we'll retread our steps with an eye towards a general
solution, that could be applicable in other contexts. For convenience,
we'll focus exclusively on independent range sampling. As a reminder, this
search problem is defined as,

\begin{definitionIRS}[Independent Range Sampling~\cite{tao22}]
    Let $D$ be a set of $n$ points in $\mathbb{R}$. Given a query
    interval $q = [x, y]$ and an integer $k$, an independent range sampling
    query returns $k$ independent samples from $D \cap q$ with each 
    point having equal probability of being sampled.
\end{definitionIRS}

We formalize this as a search problem $F_\text{IRS}:(\mathcal{D},
\mathcal{Q}) \to \mathcal{R}$ where the record domain is $\mathcal{D}
= \mathbb{R}$, the query parameters domain consists of order triples
containing the lower and upper bounds of the query interval, and the
number of samples to draw, $\mathcal{Q} = \mathbb{R} \times \mathbb{R}
\times \mathbb{Z}^+$, and the result domain contains subsets of the
real numbers, $\mathcal{R} = \mathcal{PS}(\mathbb{R})$.

$F_\text{IRS}$ can be solved using a variety of data structures, such as
the static ISAM solution discussed in Section~\ref{ssec:irs-struct}. For
our example here, we will use a simple sorted array. Let $\mathcal{I}$
be the sorted array data structure, with a specific instance $\mathscr{I}
\in \mathcal{I}$ built over a set $D \subset \mathbb{R}$ having $|D| =
n$ records. The problem $F_\text{IRS}(\mathscr{I}, (l, u, k))$ can be
solved by binary searching $\mathscr{I}$ twice to obtain the index of
the first element greater than or equal to $l$ ($i_l$) and the last
element less than or equal to $u$ ($i_u$). With these two indices,
$k$ random numbers can generated on the interval $[i_l, i_u]$ and the
records at these indices returned. This sampling procedure is described
in Algorithm~\ref{alg:array-irs} and runs in $\mathscr{Q}_\text{irs}
\in \Theta(\log n + k)$ time.

\SetKwFunction{IRS}{IRS}
\begin{algorithm}
\caption{Solution to IRS on a sorted array}
\label{alg:array-irs}
\KwIn{$k$: sample size, $[l,u]$: lower and upper bound of records to sample}
\KwOut{$S$: a sample set of size $k$}
\Def{\IRS{$(\mathscr{I}, (l, u, k))$}}{
    \Comment{Find the lower and upper bounds of the interval}
    $i_l \gets \text{binary\_search\_lb}(\mathscr{I}, l)$ \;
    $i_u \gets \text{binary\_search\_ub}(\mathscr{I}, u)$ \;
    \BlankLine
    \Comment{Initialize empty sample set}
    $S \gets \{\}$ \;
    \BlankLine
    \For {$i=1\ldots k$} {
        \Comment{Select a random record within the interval}
        $i_r \gets \text{randint}(i_l, i_u)$ \;

        \Comment{Add it to the sample set}
        $S \gets S \cup \{\text{get}(\mathscr{I}, i_r)\}$ \;
    }
    \BlankLine
    \Comment{Return the sample set}
    \Return $S$ \;
}
\end{algorithm}

It becomes more difficult to answer $F_\text{IRS}$ over a data structure
that has been decomposed into blocks, because the number of samples
taken from each block must be appropriately weighted to correspond to the
number of records within each block falling into the query range. In the
classical model, there isn't a way to do this, and so the only solution
is to answer $F_\text{IRS}$ against each block, asking for the full $k$
samples each time, and then down-sampling the results corresponding to
the relative weight of each block, to obtain a final sample set.

Using this idea, we can formulate $F_\text{IRS}$ as a $C(n)$-decomposable
problem by changing the result set type to $\mathcal{R} =
\mathcal{PS}(\mathbb{R}) \times \mathbb{R}$ where the first element
in the tuple is the sample set and the second argument is the number
of elements falling between $l$ and $u$ in the block being sampled
from. With this information, it is possible to implement $\mergeop$
using Bernoulli sampling over the two sample sets to be merged. This
requires $\Theta(k)$ time, and thus $F_\text{IRS}$ can be said to be
a $k$-decomposable search problem, which runs in $\Theta(\log^2 n + k
\log n)$ time. This procedure is shown in Algorithm~\ref{alg:decomp-irs}.

\SetKwFunction{IRSDecomp}{IRSDecomp}
\SetKwFunction{IRSCombine}{IRSCombine}
\begin{algorithm}[!h]
    \caption{$k$-Decomposable Independent Range Sampling}
    \label{alg:decomp-irs}
    \KwIn{$k$: sample size, $[l,u]$: lower and upper bound of records to sample}
    \KwOut{$(S, c)$: a sample set of size $k$ and a count of the number
           of records on on the interval $[l,u]$}
    \Def{\IRSDecomp{$\mathscr{I}_i, (l, u, k)$}}{
        \Comment{Find the lower and upper bounds of the interval}
        $i_l \gets \text{binary\_search\_lb}(\mathscr{I}_i, l)$ \;
        $i_u \gets \text{binary\_search\_ub}(\mathscr{I}_i, u)$ \;
        \BlankLine
        \Comment{Initialize empty sample set}
        $S \gets \{\}$ \;
        \BlankLine
        \For {$i=1\ldots k$} {
            \Comment{Select a random record within the interval}
            $i_r \gets \text{randint}(i_l, i_u)$ \;

            \Comment{Add it to the sample set}
            $S \gets S \cup \{\text{get}(\mathscr{I}_i, i_r)\}$ \;
        }
        \BlankLine
        \Comment{Return the sample set and record count}
        \Return ($S$, $i_u - i_l$) \;
    }
    \BlankLine

    \Def{\IRSCombine{$(S_1, c_1)$, $(S_2, c_2)$}}{
        \Comment{The output set should be the same size as the input ones}
        $k \gets |S_1|$ \;
        \BlankLine
        \Comment{Calculate the weighting that should be applied to each set when sampling}
        $w_1 \gets \frac{c_1}{c_1 + c_2}$ \;
        $w_2 \gets \frac{c_2}{c_1 + c_2}$ \;
        \BlankLine
        \Comment{Initialize output set and count}
        $S \gets \{\}$\;
        $c \gets c_1 + c_2$ \;
        \BlankLine
        \Comment{Down-sample the input result sets}
        $S \gets S \cup \text{bernoulli}(S_1, w_1, k\times w_1)$ \;
        $S \gets S \cup \text{bernoulli}(S_2, w_2, k\times w_2)$ \;
        \BlankLine
        \Return $(S, w)$
    }
\end{algorithm}

While this approach does allow sampling over a dynamized structure, it is
asymptotically inferior to Olken's method, which allows for sampling in
only $\Theta(k \log n)$ time~\cite{olken89}. However, we've already seen
in the previous chapter how it is possible to modify the query procedure
into a multi-stage process to enable more efficient solutions to the IRS
problem. The core idea underlying our solution in that chapter was to
introduce individualized local queries for each block, which were created
after a pre-processing step to allow information about each block to be
determined first. In that particular example, we established the weight
each block should have during sampling, and then created custom sampling
queries with variable $k$ values, following the weight distributions. We
have determined a general interface that allows for this procedure to be
expressed, and we define the term \emph{extended decomposability} to refer
to search problems that can be answered in this way.

More formally, consider search problem $F(D, q)$ capable of being
answered using a data structure instance $\mathscr{I} \in \mathcal{I}$
built over a set of records $D \in \mathcal{D}$ that has been decomposed
into $m$ blocks, $\mathscr{I}_1, \mathscr{I}_2, \ldots, \mathscr{I}_m$
each corresponding to a partition of $D$, $D_1, D_2, \ldots, D_m$. $F$
is an extended-decomposable search problem (eDSP) if it can be expressed
using the following interface,

\begin{itemize}
\item $\mathbftt{local\_preproc}(\mathscr{I}_i, q) \to \mathscr{M}_i$ \\
    Pre-process each partition, $D_i$, using its associated data
    structure, $\mathscr{I}$ and generate a meta-information object
    $\mathscr{M}_i$ for use in local query generation.

\item $\mathbftt{distribute\_query}(\mathscr{M}_1, \ldots, \mathscr{M}_m,
       q) \to q_1, \ldots, q_m$\\
        Process the set of meta-information about each block and produce
        individual local queries, $q_1, \ldots, q_m$, for each block.

\item $\mathbftt{local\_query}(\mathscr{I}_i, q_i)  \to r_i$ \\
        Evaluate the local query with parameters $q_i$ over the data
        in $D_i$ using the data structure $\mathscr{I}_i$ and produce
        a partial query result, $r_i$.

\item $\mathbftt{combine}(r_1, \ldots, r_m) \to R$ \\ 
        Combine the list of local query results, $r_1, \ldots, r_m$ into
        a final query result, $R$.
\end{itemize}

Let $P(n)$ be the cost of $\mathbftt{local\_preproc}$, $D(n)$ be
the cost of $\mathbftt{distribute\_query}$,  $\mathscr{Q}_\ell(n)$
be the cost of $\mathbftt{local\_query}$, and $C_e(n)$ be the cost
$\mathbftt{combine}$. To solve a search problem with this interface
requires calling $\mathbftt{local\_preproc}$ and $\mathbftt{local\_query}$
once per block, and $\mathbftt{distribute\_query}$ and
$\mathbftt{combine}$ once. For a Bentley-Saxe dynamization then, with
$O(\log_2 n)$ blocks, the worst-case cost of answering an eDSP is,
\begin{equation}
\label{eqn:edsp-cost}
O \left( \log_2 n \cdot P(n) + D(n) + \log_2 n \cdot \mathscr{Q}_\ell(n) + C_e(n) \right)
\end{equation}

As an example, we'll express IRS using the above interface and
analyze its complexity to show that the resulting solution as the
same $\Theta(log^2 n + k)$ cost as the specialized solution from
Chapter~\ref{chap:sampling}.  We use $\mathbftt{local\_preproc}$
to determine the number of records on each block falling on the
interval $[l, u]$ and return this, as well as $i_l$ and $i_u$ as the
meta-information. Then, $\mathbftt{distribute\_query}$ will perform
weighted set sampling using a temporary alias structure over the
weights of all of the blocks to calculate the appropriate value
of $k$ for each local query, which will consist of $(k_i, i_{l,i},
i_{u,i})$. With the appropriate value of $k$, as well as the indices of
the upper and lower bounds, pre-calculated, $\mathbftt{local\_query}$
can simply generate $k_i$ random integers and return the corresponding
records. $\mathbftt{combine}$ simply combines all of the local results
and returns the final result set. Algorithm~\ref{alg:edsp-irs} shows
each of these operations in pseudo-code.


\SetKwFunction{preproc}{local\_preproc}
\SetKwFunction{distribute}{distribute\_query}
\SetKwFunction{query}{local\_query}
\SetKwFunction{combine}{combine}
\begin{algorithm}[t]
    \caption{IRS with Extended Decomposability}
    \label{alg:edsp-irs}
    \KwIn{$k$: sample size, $[l,u]$: lower and upper bound of records to sample}
    \KwOut{$R$: a sample set of size $k$}

    \Def{\preproc{$\mathscr{I}_i$, $q=(l,u,k)$}}{
        \Comment{Find the indices for the upper and lower bounds of the query range}
        $i_l \gets \text{binary\_search\_lb}(\mathscr{I}_i, l)$ \;
        $i_u \gets \text{binary\_search\_ub}(\mathscr{I}_i, u)$ \;
        \BlankLine
        \Return $(i_l, i_u)$ \;
    }

    \BlankLine
    \Def{\distribute{$\mathscr{M}_1$, $\ldots$, $\mathscr{M}_m$, $q=(l,u,k)$}}{
        \Comment{Determine number of records to sample from each block}
        $k_1, \ldots k_m \gets \mathtt{wss}(k, \mathscr{M}_1, \ldots \mathscr{M}_m)$ \;
        \BlankLine
        \Comment{Build local query objects}
        \For {$i=1..m$} {
            $q_i \gets (\mathscr{M}.i_l, \mathscr{M}.i_u, k_i)$ \;
        }

        \BlankLine
        \Return $q_1 \ldots q_m$ \;
    }

    \BlankLine
    \Def{\query{$\mathscr{I}_i$, $q_i = (i_{l,i},i_{u,i},k_i)$}}{
 
        \For {$i=1\ldots k_i$} {
            \Comment{Select a random record within the interval}
            $i_r \gets \text{randint}(i_{l,i}, i_{u,i})$ \;

            \Comment{Add it to the sample set}
            $S \gets S \cup \{\text{get}(\mathscr{I}_i, i_r)\}$ \;
        }

        \Return $S$ \;
    }

    \BlankLine
    \Def{\combine{$r_1, \ldots, r_m$, $q=(l, u, k)$}}{
        \Comment{Union results together}
        \Return $\bigcup_{i=1}^{m} r_i$ 
    }
\end{algorithm}

These operations result in $P(n) \in \Theta(\log n)$, $D(n) \in
\Theta(\log n)$, $\mathscr{Q}(n,k) \in \Theta(k)$, and $C_e(n) \in
\Theta(1)$. At first glance, it would appear that we arrived at a
solution with a query cost of $O\left(\log_2^2 n + k\log_2 n\right)$,
and thus fallen short of our goal. However, Equation~\ref{eqn:edsp-cost}
is only an upper bound on the cost. In the case of IRS, we can leverage an
important problem-specific detail to obtain a better result: the total
cost of the local queries is actually \emph{independent} of the number
of shards.

For IRS, the cost of $\mathbftt{local\_query}$ is linear to the number
of samples requested. Our initial asymptotic cost assumes that, in the
worst case, each of the $\log_2 n$ blocks is sampled $k$ times. But
this is not true of our algorithm. Rather, only $k$ samples are taken
\emph{in total}, distributed across all of the blocks. Thus, regardless
of how many blocks there are, there will only be $k$ samples drawn,
requiring $k$ random number generations, etc. As a result, the total
cost of the local query term in the cost function is actually $\Theta(k)$.
Applying this result gives us a tighter bound of,
\begin{equation*}
\mathscr{Q}_\text{IRS} \in \Theta\left(\log_2^2 n + k\right)
\end{equation*}
which matches the result of Chapter~\ref{chap:sampling} for IRS sampling
in the absence of deletes. The other sampling problems considered in
Chapter~\ref{chap:sampling} can be similarly implemented using this
interface, with the same performance as their specialized implementations.


\subsection{Iterative Deletion Decomposability}
\label{ssec:dyn-idsp}
We next turn out attention to support for deletes. Efficient delete
support in Bentley-Saxe dynamization is provably impossible~\cite{saxe79},
but, as discussed in Section~\ref{ssec:dyn-deletes} it is possible
to support them in restricted situations, where either the search
problem is invertible (Definition~\ref{def:invert}) or the data
structure and search problem combined are deletion decomposable
(Definition~\ref{def:background-ddsp}).  In Chapter~\ref{chap:sampling},
we considered a set of search problems which did \emph{not} satisfy
any of these properties, and instead built a customized solution for
deletes that required tight integration with the query process in order
to function. While such a solution was acceptable for the goals of that
chapter, it is not sufficient for our goal in this chapter of producing
a generalized system.

Additionally, of the two types of problem that can support deletes, the
invertible case is preferable. This is because the amount of work necessary
to support deletes for invertible search problems is very small. The data
structure requires no modification (such as to implement weak deletes),
and the query requires no modification (to ignore the weak deletes) aside
from the addition of the $\Delta$ operator. This is appealing from a
framework design standpoint. Thus, it would also be worth it to consider
approaches for expanding the range of search problems that can be answered
using the ghost structure mechanism supported by invertible problems.

A significant limitation of invertible problems is that the result set
size is not able to be controlled. We do not know how many records in our
local results have been deleted until we reach the combine operation and
they begin to cancel out, at which point we lack a mechanism to go back
and retrieve more records. This presents difficulties for addressing
important search problems such as top-$k$, $k$-NN, and sampling. In
principle, these queries could be supported by repeating the query with
larger-and-larger $k$ values until the desired number of records is
returned, but in the eDSP model this requires throwing away a lot of
useful work, as the state of the query must be rebuilt each time.

We can resolve this problem by moving the decision to repeat the query
into the query interface itself, allowing retries \emph{before} the
result set is returned to the user and the local meta-information objects
discarded. This allows us to preserve this pre-processing work, and repeat
the local query process as many times as is necessary to achieve our
desired number of records. From this observation, we propose another new
class of search problem: \emph{iterative deletion decomposable} (IDSP). The
IDSP definition expands eDSP with a fifth operation,

\begin{itemize}
	\item $\mathbftt{repeat}(\mathcal{Q}, \mathcal{R}, \mathcal{Q}_1, \ldots,
        \mathcal{Q}_m) \to (\mathbb{B}, \mathcal{Q}_1, \ldots,
        \mathcal{Q}_m)$ \\
	Evaluate the combined query result in light of the query. If
	a repetition is necessary to satisfy constraints in the query
	(e.g., result set size), optionally update the local queries as
	needed and return true. Otherwise, return false.
\end{itemize}

If this routine returns true, it must also modify the local queries as
necessary to account for the work that remains to be completed (e.g.,
update the number of records to retrieve). Then, the query process resumes
from the execution of the local queries. If it returns false, then the
result is simply returned to the user. If the number of repetitions of
the query is bounded by $R(n)$, then the following provides an upper
bound on the worst-case query complexity of an IDSP,

\begin{equation*}
    O\left(\log_2 n \cdot P(n) + D(n) + R(n) \left(\log_2 n \cdot Q_s(n) +
	       C_e(n)\right)\right)
\end{equation*}

It is important that a bound on the number of repetitions exists,
as without this the worst-case query complexity is unbounded. The
details of providing and enforcing this bound are very search problem
specific. For problems like $k$-NN or top-$k$, the number of repetitions
is a function of the number of deleted records within the structure,
and so $R(n)$ can be bounded by placing a limit on the number of deleted
records. This can be done, for example, using the full-reconstruction
techniques in the literature~\cite{saxe79, merge-dsp, overmars83}
or through proactively performing reconstructions, such as with the
mechanism discussed in Section~\ref{sssec:sampling-rejection-bound},
depending on the particulars of how deletes are implemented.

As an example of how IDSP can facilitate delete support for search
problems, let's consider $k$-NN. This problem can be $C(n)$-deletion
decomposable, depending upon the data structure used to answer it, but
it is not invertible because it suffers from the problem of potentially
returning fewer than $k$ records in the final result set after the results
of the query against the primary and ghost structures have been combined.
Worse, even if the query does return $k$ records as requested, it is
possible that the result set could be incorrect, depending upon which
records were deleted, what block those records are in, and the order in
which the merge and inverse merge are applied.

\begin{example}
Consider the $k$-NN search problem, $F$, over some metric index
$\mathcal{I}$. $\mathcal{I}$ has been dynamized, with a ghost
structure for deletes, and consists of two blocks, $\mathscr{I}_1$ and
$\mathscr{I}_2$ in the primary structure, and one block, $\mathscr{I}_G$
in the ghost structure. The structures contain the following records,
\begin{align*}
\mathscr{I}_1 &= \{ x_1, x_2, x_3, x_4, x_5\} \\
\mathscr{I}_2 &= \{ x_6, x_7, x_8 \} \\
\mathscr{I}_G &= \{x_1, x_2, x_3 \}
\end{align*}
where the subscript indicates the proximity to some point, $p$. Thus,
the correct answer to the query $F(\mathscr{I}, (3, p))$ would be the
set of points $\{x_4, x_5, x_6\}$.

Querying each of the three blocks independently, however, will produce
an incorrect answer. The partial results will be,
\begin{align*}
r_1 = \{x_1, x_2, x_3\} \\
r_2 = \{x_6, x_7, x_8\} \\
r_g = \{x_1, x_2, x_3\}
\end{align*}
and, assuming that $\mergeop$ returns the $k$ elements closest to $p$
from the inputs, and $\Delta$ removes matching elements, performing
$r_1~\mergeop~r_2~\Delta~r_g$ will give an answer of $\{\}$, which
has insufficient records, and performing $r_1~\Delta~r_g~\mergeop~r_2$
will provide a result of $\{x_6, x_7, x_8\}$, which is wrong.
\end{example}

From this example, we can draw two conclusions about performing $k$-NN
using a ghost structure for deletes. First, we must ensure that all of
the local queries against the primary structure are merged, prior to
removing any deleted records, to ensure correctness. Second, once the
ghost structure records have been removed, we may need to go back to
the dynamized structure for more records to ensure that we have enough.
Both of these requirements can be accommodated by the IDSP model, and the
resulting query algorithm is shown in Algorithm~\ref{alg:idsp-knn}. This
algorithm assumes that the data structure in question can save the
current traversal state in the meta-information object, and resume a
$k$-NN query on the structure from that state at no cost. 

\SetKwFunction{repeat}{repeat}
\afterpage{\clearpage}
\begin{algorithm}[p]
    \caption{$k$-NN with Iterative Decomposability}
    \label{alg:idsp-knn}
    \KwIn{$k$: result size, $p$: query point}
    \Def{\preproc{$q=(k, p)$, $\mathscr{I}_i$}}{
        \Return $\mathscr{I}_i.\text{initialize\_state}(k, p)$ \;
    }

    \BlankLine
    \Def{\distribute{$\mathscr{M}_1$, ..., $\mathscr{M}_m$, $q=(k,p)$}}{
        \For {$i\gets1 \ldots m$} {
            $q_i \gets (k, p, \mathscr{M}_i)$ \;
        }

        \Return $q_1 \ldots q_m$ \;
    }

    \BlankLine
    \Def{\query{$\mathscr{I}_i$, $q_i=(k,p,\mathscr{M}_i)$}}{
        $(r_i, \mathscr{M}_i) \gets \mathscr{I}_i.\text{knn\_from}(k, p, \mathscr{M}_i)$ \;
        \Comment{The local result stores records in a priority queue}
        \Return $(r_i, \mathscr{M}_i)$  \;
    }

    \BlankLine
    \Def{\combine{$r_1, \ldots, r_m, \ldots, r_n$, $q=(k,p)$}}{
        $R \gets \{\}$ ;
        $pq \gets \text{PriorityQueue}()$ ;
        $gpq \gets \text{PriorityQueue}()$ \; 
        \Comment{Results $1$ through $m$ are from the primary structure,
        and $m+1$ through $n$ are from the ghost structure.}
        \For {$i\gets 1 \ldots m$} {
            $pq.\text{enqueue}(i, r_i.\text{front}())$ \;
        }

        \For {$i \gets m+1 \ldots n$} {
            $gpq.\text{enqueue}(i, r_i.\text{front}())$
        }

    \BlankLine
    \Comment{Process the primary local results}
        \While{$|R| < k \land \neg pq.\text{empty}()$} {
            $(i, d) \gets pq.\text{dequeue}()$ \;

            $R \gets R \cup r_i.\text{dequeue}()$ \;
            \If {$\neg r_i.\text{empty}()$} {
                $pq.\text{enqueue}(i, r_i.\text{front}())$ \;
            }
        }

    \BlankLine
    \Comment{Process the ghost local results}
    \While{$\neg gpq.\text{empty}()$} {
        $(i, d) \gets gpq.\text{dequeue}()$ \;

        \If {$r_i.\text{front}() \in R$} {
            $R \gets R / \{r_i.\text{front}()\}$ \;

            \If {$\neg r_i.\text{empty}()$} {
                $gpq.\text{enqueue}(i, r_i.\text{front}())$ \;
            }
        }
    }

        \Return $R$ \;
    }
    \BlankLine
    \Def{\repeat{$q=(k,p), R, q_1,\ldots q_m$}} {
        $missing \gets k - R.\text{size}()$ \;
        \If {$missing > 0$} {
            \For {$i \gets 1\ldots m$} {
                $q_i \gets (missing, p, q_i.\mathscr{M}_i)$ \;
            }

            \Return $(True, q_1 \ldots q_m)$ \;
        }

        \Return $(False, q_1 \ldots q_m)$ \;
    }
\end{algorithm}


\subsection{Search Problem Taxonomy}

Having defined two new classes of search problem, it seems sensible
at this point to collect our definitions together with pre-existing
ones from the classical literature, and present a cohesive taxonomy
of the search problems for which our techniques can be used to
support dynamization. This taxonomy is shown in the Venn diagrams of
Figure~\ref{fig:taxonomy}. Note that, for convenience, the search problem
classifications relevant for supporting deletes have been separated out
into a separate diagram. In principle, this deletion taxonomy can be
thought of as being nested inside of each of the general search problem
classifications, as the two sets of classification are orthogonal. That
a search problem falls into a particular classification in the general
taxonomy doesn't imply any particular information about where in the
deletion taxonomy that same problem might also fall.

\begin{figure}[t]
	\subfloat[General Taxonomy]{\includegraphics[width=.49\linewidth]{diag/taxonomy}
    \label{fig:taxonomy-main}} 
	\subfloat[Deletion Taxonomy]{\includegraphics[width=.49\linewidth]{diag/deletes} \label{fig:taxonomy-deletes}}
	\caption{An overview of the Taxonomy of Search Problems, as relevant to
		our discussion of data structure dynamization. Our proposed extensions
        are marked with an asterisk (*) and colored yellow. 
    }
	\label{fig:taxonomy}
\end{figure} 

Figure~\ref{fig:taxonomy-main} illustrates the classifications of search
problem that are not deletion-related, including standard decomposability
(DSP), extended decomposability (eDSP), $C(n)$-decomposability
($C(n)$-DSP), and merge decomposability (MDSP). We consider ISAM,
TrieSpline~\cite{plex}, and succinct trie~\cite{zhang18} to be examples
of MDSPs because the data structures can be constructed more efficiently
from sorted data, and so when building from existing blocks, the data
is already sorted in each block and can be merged while maintaining
a sorted order more efficiently. VP-trees~\cite{vptree} and alias
structures~\cite{walker74}, in contrast, don't have a convenient
way of merging, and so must be reconstructed in full each time. We
have classified sampling queries in this taxonomy as eDSPs because
this implementation is more efficient than the $C(n)$-decomposable
variant we have also discussed. $k$-NN, for reasons discussed in
Chapter~\ref{chap:background}, are classified as $C(n)$-decomposable.

The classification of range scans is a bit trickier. It is not uncommon
in the theoretical literature for range scans to be considered DSPs, with
$\mergeop$ taken to be the set union operator. From an implementation
standpoint, it is sometimes possible to perform a union in $\Theta(1)$
time. For example, in Chapter~\ref{chap:sampling} we accomplished this by
placing sampled records directly into a shared buffer, and not having an
explicit combine step at all. However, in the general case where we do
need an explicit combine step, the union operation does require linear
time in the size of the result sets to copy the records from the local
result into the final result. The sizes of these results are functions
of the selectivity of the range scan, but theoretically could be large
relative to the data size, and so we've decided to err on the side of
caution and classify range scans as $C(n)$-decomposable here. If the
results of the range scan are expected to be returned in sorted order,
then the problem is \emph{certainly} $C(n)$-decomposable.
Range
counts, on the other hand, are truly DSPs.\footnote{
    Because of the explicit combine interface we use for eDSPs, the
    optimization of writing samples directly into the buffer that we used
    in the previous chapter to get a $\Theta(1)$ set union cannot be used
    for the eDSP implementation of IRS in this chapter. However, our eDSP
    sampling in Algorithm~\ref{alg:edsp-irs} samples \emph{exactly} $k$
    records, and so the combination step still only requires $\Theta(k)$
    work, and the complexity remains the same.
} Point lookups are an example of a DSP as well, assuming that the lookup
key is unique, or at least minimally duplicated. In the case where
the number of results for the lookup become a substantial proportion
of the total data size, then this search problem could be considered
$C(n)$-decomposable for the same reason as range scans.

Figure~\ref{fig:taxonomy-deletes} shows the various classes of search
problem relevant to delete support. We have made the decision to
classify invertible problems (INV) as a subset of deletion decomposable
problems (DDSP), because one could always embed the ghost structure
directly into the block implementation, use the DDSP delete operation
to insert into that block, and handle the $\Delta$ operator as part of
$\mathbftt{local\_query}$. We consider range count to be invertible,
with $\Delta$ taken to be subtraction. Range scans are also invertible,
technically, but the cost of filtering out the deleted records during
result set merging is relatively expensive, as it requires either
performing a sorted merge of all of the records (rather than a simple
union) to cancel out records with their ghosts, or doing a linear
search for each ghost record to remove its corresponding data from the
result set.  As a result, we have classified them as DDSPs instead,
as weak deletes are easily supported during range scans with no extra
cost. Any records marked as deleted can simply be skipped over when
copying into the local or final result sets. Similarly, $k$-NN queries
admit a DDSP solution for certain data structures, but we've elected to
classify them as IDSPs using Algorithm~\ref{alg:idsp-knn} as this is
possible without making any modifications to the data structure to support
weak deletes, and not all metric indexing structures support efficient
point lookups that would be necessary to support weak deletes. We've also
classified IRS as an IDSP, which is the only place in the taxonomy that
it can fit. Note that IRS (and other sampling problems) are unique in this
model in that they require the IDSP classification, but must actually
support deletes using weak deletes. There's no way to support ghost structure
based deletes in our general framework for sampling queries.\footnote{
    This is in contrast to the specialized framework for sampling in
    Chapter~\ref{chap:sampling}, where we heavily modified the query
    process to make tombstone (which is analogous to ghost structure)
    based deletes possible.
}

\section{Dynamization Framework}
\label{sec:dyn-framework}
With the previously discussed new classes of search problems devised, we
can now present our generalized framework based upon those models. This
framework takes the form of a header-only C++20 library which can
automatically extend data structures with support for concurrent inserts
and deletes, depending upon the classification of the problem in the
taxonomy of Figure~\ref{fig:taxonomy}. The user provides the data
structure and query implementations as template parameters, and the
framework then provides an interface that allows for queries, inserts,
and deletes against the new dynamic structure. Specifically, in addition
to accessors for various structural information, the framework provides
the following main operations,

\begin{itemize}
\item \texttt{int insert(RecordType); } \\
    This function will insert a record into the dynamized structure,
    and will return $1$ if the record was successfully inserted, and $0$
    if it was not. Insertion failure is part of the concurrency control
    mechanism, and failed inserts should be retried after a short delay.
    More details of this are in Section~\ref{ssec:dyn-concurrency}.

\item \texttt{int erase(RecordType);} \\
    This function will delete a record from the dynamized structure,
    returning $1$ on success and $0$ on failure. The meaning of a
    failure to delete is dependent upon the delete mechanism in use,
    and will be discussed in Section~\ref{sssec:dyn-deletes}.

\item \texttt{std::future<QueryResult> query(QueryParameters); } \\
    This function will execute a query with the specified parameters
    against the structure and return the result. This interface is
    asynchronous, and returns a future immediately, which can be used
    to access the query result once the query has finished executing.

\end{itemize}

It can be configured with a template argument to run in single-threaded
mode, or multi-threaded mode. In multi-threaded mode, the above routines
can be called concurrently without any necessary synchronization in
user code, and without requiring any special modification to the data
structure and queries, beyond those changes necessary to use them in
single-threaded mode.

\subsection{Basic Principles}

Before discussing the interfaces that the user must implement to
use their code with our framework, it seems wise to discuss the
high level functioning and structure of the framework, the details
of which inform certain decisions about the necessary features
that the user must implement to interface with it. The high level
structure and organization of the framework is similar to that of
Section~\ref{ssec:sampling-framework}.

The framework requires the user to specify types to represent the
record, query, and data structure (which we call a shard). The
details of the interface requirements for these types are discussed in
Section~\ref{ssec:dyn-interface}, and are enforced using C++20's concepts
mechanism.

\begin{figure}
    \centering %\vspace{-3mm}
	\subfloat[\small Leveling]{\includegraphics[width=.5\textwidth]{diag/leveling} \label{fig:dyn-leveling}}
	%\vspace{-3mm}
	\subfloat[\small Tiering]{\includegraphics[width=.5\textwidth]{diag/tiering} \label{fig:dyn-tiering}}
	%\vspace{-3mm}
    \caption{\small An overview of the general structure of the
    dynamization framework using (a) leveling and
    (b) tiering layout policies, with a scale factor 3. 
    Each shard is shown as a 
    dotted box, wrapping its associated dataset ($D_i$) and index ($I_i$). }
	\label{fig:dyn-framework}
	%\vspace{-3mm}
\end{figure}

Internally, the framework consists of a sequence of \emph{levels} with
increasing record capacity, each containing one or more \emph{shards}. The
layout of these levels is defined by a template argument, the \emph{layout
policy}, and an integer called the \emph{scale factor}. The latter governs
how quickly the record capacities of each level grow, and the former
controls how those records are broken into shards on the level and the
way in which records move from level to level during reconstructions. The
details of layout policies, reconstruction, etc., will be discussed in
a later section.

Logically ``above'' these levels is a small unsorted array, called the
\emph{mutable buffer}. The mutable buffer is of user-configurable size,
and all inserts into the structure are first placed into it. When
the buffer fills, it will be flushed into the structure, requiring
reconstructions to occur in a manner consistent with the layout policy
in order to make room. A simple graphical representation of the framework
and two of its layout policies is shown in Figure~\ref{fig:dyn-framework}.

The framework provides two mechanisms for supporting deletes: tagging
and tombstones. These are identical to the mechanisms discussed in
Section~\ref{ssec:sampling-deletes}, with tombstone deletes operating by
inserting a record identical to the one to be deleted into the structure,
with an indicator bit set in the header, and tagged deletes performing
a lookup of the record to be deleted in the structure and setting
a bit in its header directly. Tombstone deletes are used to support
invertible search problems, and tagged deletes are used for deletion
decomposable search problems. While the delete procedure itself is handled
automatically by the framework based upon the specified mechanism, it is
the user's responsible to appropriately handle deleted records in their
query and shard implementations.


\subsection{Interfaces}
\label{ssec:dyn-interface}

In order to enforce interface requirements, our implementation takes
advantage of C++20 concepts. There are three major sets of interfaces
that the user of the framework must implement: records, shards, and
queries. We'll discuss each of these in this section.

\subsubsection{Record Interface}

The record interface is the simplest of the three. The type used as a
record only requires an implementation of an equality comparison operator,
and is assumed to be of fixed length. Beyond this, the framework places
no additional constraints and makes no assumptions about record contents,
their ordering properties, etc. Though the records must be fixed length,
variable length data can be supported using off-record storage and
pointers if necessary. Each record is automatically wrapped by the
framework with a header that is used to facilitate deletion support.
The record concept is shown in Listing~\ref{lst:record}, along with
the wrapped header type that is used to interact with records within
the framework.

\begin{lstfloat}
\begin{lstlisting}[language=C++]
template <typename R>
concept RecordInterface = requires(R r, R s) {
  { r == s } -> std::convertible_to<bool>;
};


template <RecordInterface R> struct Wrapped {
  uint32_t header;
  R rec;

  inline void set_delete();
  inline bool is_deleted() const; 
  inline void set_tombstone(bool val);
  inline bool is_tombstone() const;

  inline bool operator==(const Wrapped &other) const;
};
\end{lstlisting}
\caption{The required interface for record types in our dynamization framework.}
\label{lst:record}
\end{lstfloat}

\subsubsection{Shard Interface}

Our framework's underlying representation of the data structure is called
a \emph{shard}. The provided shard structure should provide either a full
implementation of the data structure to be dynamized, or a shim around
an existing implementation that provides the necessary functions for our
framework to interact with it. Shards must provide two constructors:
one from an unsorted set of records, and another from a set of other
shards of the same type. The second of these constructors is to allow for
efficient merging to be leveraged for merge decomposable search problems.

Shards can also expose a point lookup operation for use in supporting
deletes for DDSPs. This function is only used for DDSP deletes, and
so can be left off when this functionality isn't necessary. If a data
structure doesn't natively support an efficient point-lookup, then it
can be added by including a hash table or other data structure in the
shard if desired.  This function accepts a record type as input, and
should return a pointer to the record that exactly matches the input in
storage, if one exists, or \texttt{nullptr} if it doesn't. It should
also accept an optional boolean argument that the framework will pass
\texttt{true} into if the lookup operation is being used to search for
a tombstone records.  This flag is to allow the shard to use various
tombstone-related optimization, such as using a Bloom filter for them,
or storing them separately from the main records, etc.

Shards should also expose some accessors for basic meta-data about
its contents. In particular, the framework is reliant upon a function
that returns the number of records within the shard for planning
reconstructions, and the number of deleted records or tombstones within
the shard for use in proactive compaction to bound the number of deleted
records. The interface also requires functions for accessing memory
usage information, both the memory use for the main data structure
being dynamized, and also any auxiliary memory (e.g., memory used
for an auxiliary hash table). These memory functions are used only for
informational purposes.

The concept for shard types is shown in Listing~\ref{lst:shard}. Note
that all records within shards are wrapped by the framework header. It
is up to the shard to handle the removal of deleted records based on
this information during reconstruction.

\begin{lstfloat}
\begin{lstlisting}[language=C++]

template <typename SHARD>
concept ShardInterface = RecordInterface<typename SHARD::RECORD> 
&& requires(SHARD shard, const std::vector<SHARD *> &shard_vector,
            bool b, BufferView<typename SHARD::RECORD> bv,
            typename SHARD::RECORD rec) {
  {SHARD(shard_vector)};
  {SHARD(std::move(bv))};

  {
    shard.point_lookup(rec, b)
    } -> std::same_as<Wrapped<typename SHARD::RECORD> *>;

  { shard.get_record_count() } -> std::convertible_to<size_t>;
  { shard.get_tombstone_count() } -> std::convertible_to<size_t>;
  { shard.get_memory_usage() } -> std::convertible_to<size_t>;
  { shard.get_aux_memory_usage() } -> std::convertible_to<size_t>;

};
\end{lstlisting}
\caption{The required interface for shard types in our dynamization
framework.}
\label{lst:shard}
\end{lstfloat}


\subsubsection{Query Interface}

The most complex interface required by the framework is for queries. The
concept for query types is given in Listing~\ref{lst:query}. In
effect, it requires implementing the full IDSP interface from the
previous section, as well as versions of $\mathbftt{local\_preproc}$
and $\mathbftt{local\_query}$ for pre-processing and querying an unsorted
set of records, which is necessary to allow the mutable buffer to be
used as part of the query process.\footnote{
    In the worst case, these routines could construct temporary shard
    over the mutable buffer, and use this to answer queries.
} The $\mathbftt{repeat}$ function is necessary even for
normal eDSP problems, and should just return \texttt{false} with no other
action in those cases. The interface also allows the user to specify
whether the query process should abort after the first result is obtained,
which is a useful optimization for point lookups.

This interface allows for the local and overall query results to be
independently specified of different types. This can be used for a
variety of purposes. For example, an invertible range count can have
a local result that includes both the number of records and the number
of tombstones, while the query result itself remains a single number.
Additionally, the framework makes no decision about what, if any,
collection type should be used for these results. A range scan, for
example, could specify the result types as a vector of records, map
of records, etc., depending on the use case.

There is one significant difference between the IDSP interface and the
query concept implementation. For efficiency purposes, \texttt{combine}
does not return the query result object. Instead, the framework
itself initializes the object, and then passes it by reference into
\texttt{combine}. This is necessary because \texttt{combine} can be called
multiple times, depending on whether the query must be repeated. Adding
it as an argument to \texttt{combine}, rather than returning it,
allows for the local query results to be discarded completely, and new
results generated and added to the existing result set, in the case
of a repetition. Without this modification, the user would either need
to define an additional combination operation for final result types,
or duplicate effort in the combine step on each repetition.

\begin{lstfloat}
\begin{lstlisting}[language=C++]

template <typename QUERY, typename SHARD,
          typename RESULT = typename QUERY::ResultType,
          typename LOCAL_RESULT =
                   typename QUERY::LocalResultType,
          typename PARAMETERS = typename QUERY::Parameters,
          typename LOCAL = typename QUERY::LocalQuery,
          typename LOCAL_BUFFER =
                   typename QUERY::LocalQueryBuffer>
concept QueryInterface =
requires(PARAMETERS *parameters, LOCAL *local,
         LOCAL_BUFFER *buffer_query, SHARD *shard,
         std::vector<LOCAL *> &local_queries,
         std::vector<LOCAL_RESULT> &local_results,
         RESULT &result,
         BufferView<typename SHARD::RECORD> *bv) {

  { QUERY::local_preproc(shard, parameters)
  } -> std::convertible_to<LOCAL *>;

  { QUERY::local_preproc_buffer(bv, parameters)
  } -> std::convertible_to<LOCAL_BUFFER *>;

  { QUERY::distribute_query(parameters, local_queries,
                            buffer_query) };

  { QUERY::local_query(shard, local) 
  }  -> std::convertible_to<LOCAL_RESULT>;

  { QUERY::local_query_buffer(buffer_query)
  } -> std::convertible_to<LOCAL_RESULT>;

  { QUERY::combine(local_results, parameters, result) };

  { QUERY::repeat(parameters, result, local_queries,
                  buffer_query)
  } -> std::same_as<bool>;

  { QUERY::EARLY_ABORT } -> std::convertible_to<bool>;
};
\end{lstlisting}

\caption{The required interface for query types in our dynamization
framework.}
\label{lst:query}
\end{lstfloat}


\subsection{Internal Mechanisms}

Given a user provided query, shard, and record type, the framework
will automatically provide support for inserts, as well as deletes for
supported search problems, and concurrency if desired. This section will
discuss the internal mechanisms that the framework uses to support these
operations in a single-threaded context. Concurrency will be discussed in
Section~\ref{ssec:dyn-concurrency}.

\subsubsection{Inserts and Layout Policy}

New records are inserted into the structure by appending them to the
end of the mutable buffer. When the mutable buffer is filled, it must
be flushed to make room for further inserts. This flush involves building
a shard from the records in the buffer using the unsorted constructor,
and then performing a series of reconstructions to integrate this new
shard into the structure. Once these reconstructions are complete, the
buffer can be marked as empty and the insertion performed.

There are three layout policies supported by our framework,
\begin{itemize}
\item \textbf{Bentley-Saxe Method (BSM).} \\
Our framework supports the Bentley-Saxe method directly, which we used as
a baseline for comparison in some benchmarking tests. This configuration
requires that $N_b = 1$ and $s = 2$ to match the standard BSM exactly (a
version of this approach that relaxes these restrictions is considered
in the next chapter). Reconstructions are performed by finding the
first empty level, $i$, (or adding one to the bottom if needed) and then
constructing a new shard at that level including all of the records from
all of the shards at levels $j <= i$, as well as the newly created buffer
shard. Then all levels $j < i$ are set to empty. Our implementation of
BSM does not include any of the re-partitioning routines for bounding
deviations in record counts from the exact binary decomposition in the
face of deleted records.

\item \textbf{Leveling.}\\
Our leveling policy is identical to the one discussed in
Chapter~\ref{chap:sampling}. The capacity of level $i$ is $N_b \cdot
s^i+1$ records. The first level ($i$) with available capacity to hold
all the records from the level above it ($i-1$ or the buffer, if $i
= 0$) is found. Then, for all levels $j < i$, the records in $j$ are
merged with the records in $j+1$ and the resulting shard placed in level
$j+1$. This procedure guarantees that level $0$ will have capacity for
the shard from the buffer, which is then merged into it (if it is not
empty) or replaces it (if the level is empty).


\item \textbf{Tiering.}\\
Our tiering policy, again, is identical to the one discussed in
Chapter~\ref{chap:sampling}. The capacity of each level is $s$ shards,
each having $N_b \cdot s^i$ records at most. The first level ($i$) having
fewer than $s$ shards is identified. Then, for each level $0<j\leq i$, a
new shard is constructed by merging all of the shards in the level $j-1$
and this shard is placed in level $j$. Once this process is complete,
the number of shards in level $0$ is guaranteed to be less than $s$, and
the newly created shard from the buffer can be placed directly into it.
\end{itemize}

The general insertion algorithm, account for these policies, is shown
in Algorithm~\ref{alg:dyn-insert}.

\begin{algorithm}[t]
	\caption{Insertion with Dynamization Framework}
	\label{alg:dyn-insert}
	\KwIn{$r$: new record to insert}
	\If{\texttt{buffer is not full}}{
		$\texttt{buffer.append}(r)$\;
		\Return
	}
    $\texttt{buffer\_shard} \gets \texttt{build\_shard}(buffer)$ \;
    \BlankLine
	$\texttt{idx} \gets 0$\;
	\For{$i \gets 0 \ldots \texttt{n\_levels}$}{
		\If{$\texttt{level}_i \texttt{ can hold records in }\texttt{level}_{i - 1}$}{
			\texttt{idx} = i\;
			\Break\;
		}
	}
    \BlankLine
    \If{layout\_policy $\neq$ \texttt{BSM}} {
    	\For{$i \gets \texttt{idx} \ldots 1$}{
            \If{layout\_policy = \texttt{LEVELING}} {
                $\texttt{level}_i \gets
                \texttt{merge\_shards}(\texttt{level}_i, \texttt{level}_{i - 1})$ \;
            }

            \If{layout\_policy = \texttt{TIERING}} {
                $\texttt{new\_shard} \gets \texttt{merge\_shards}(\texttt{level}_{i-1})$ \;
                $\texttt{level}_i \gets \texttt{add\_shard}(\texttt{level}_i, \texttt{new\_shard})$  \;
            }
    	}
        $\texttt{level}_0 \gets \texttt{add\_shard}(\texttt{level}_0, \texttt{build\_shard}(\texttt{buffer}))$\;
    } \Else {
        $\texttt{level}_{idx} \gets \texttt{merge\_shards}(\texttt{level}_0, \ldots, \texttt{level}_{idx}, \texttt{buffer\_shard})$ \;
        \For {$i \gets \texttt{0} \ldots (\texttt{idx} - 1)$}  {
            $\texttt{level}_i.\texttt{truncate}()$ \;
        }
    }
	$\texttt{buffer.append}(r)$\;
	\Return
\end{algorithm}


\Paragraph{Asymptotic Complexity.} Irrespective of layout policy,
the worst-case insertion cost occurs when a buffer flush results in
cascading reconstructions across every level in the structure. Such a
reconstruction will involve $\Theta(n)$ records for all layout policies,
and thus the worst-case insertion cost is bounded above by $O(B(n))$. For
merge decomposable search problems, a slightly lower bound based on the
cost of merging data structures is possible.

The amortized cost is based upon the cost of re-writing the same record
across multiple reconstructions, over the lifetime of the structure. For
all supported layout policies, each record will be written 
a constant number of times per level (at most once for Bentley-Saxe,
exactly once for tiering, and at most $s$ times for leveling), and there
are at most $\log_s n$ levels. Thus, the amortized insertion cost is,
\begin{equation*}
    I_a(n) \in \Theta\left(\frac{B(n)}{n} \cdot \log_s n\right)
\end{equation*}
for standard search problems. Slightly more efficient solutions are
possible for merge decomposable search problems based on the cost of
merging data structures, rather than rebuilding them.

The differences in the worst-case and amortized insertion cost between
the different layout policies appear solely in the constants, which
drop out of the amortized analysis. The next chapter will analyze these
policies in detail, accounting for these constants, to demonstrate more
formally the performance trade-offs that exist between them.

\subsubsection{Delete Policy}
\label{sssec:dyn-deletes}

Our framework supports two different mechanisms for deleting records,
\emph{tagging} and \emph{tombstones}. We call these the \emph{delete
policies} here for consistency, as that was the term we used in the
published works on this topic, but these are really more of a mechanism
than a policy. The tombstone mechanism is used by our framework to support
deletes for INV search problems, and tagging is our implementation of
weak deletes for DDSPs. Either mechanism can be used for IDSPs.

Tombstone deletes function by inserting a new record, identical to the
one to be deleted, but with a tombstone bit set in its header. Tagging
deletes perform a point lookup against each shard, as well as the buffer,
to search for a record identical to the one being deleted. If a matching
record is found, then a deleted bit is set in its header. The definition
of ``identical'', for the purposes of deletes, is determined by the
record type's implementation of the equality operator. Two records that
are ``equal'' to each other will be considered identical for deletion
purposes within the framework.

A more detailed discussion of these mechanisms can be found in
Section~\ref{ssec:sampling-deletes}. The basic mechanisms we use in
this framework remain the same, including approaches for bounding the
number of deleted records in the structure using proactive compactions
and tombstone cancellation. The only significant caveat here is that
it is up to the user to implement the cancellation logic within their
shards. For tagging deletes, this is fairly trivial, but it is a little
more complex for tombstones.

\Paragraph{Tombstone Cancellation.} In order to allow for correct
tombstone cancellation logic, our framework places certain guarantees
on the order in which shards are passed into the constructor during
reconstruction.  Consider a record $r_i$ and its corresponding tombstone
$t_j$, where the subscript is the insertion time, with $i < j$ meaning
that $r_i$ was inserted \emph{before} $t_j$. Then, if we are to apply
tombstone cancellations, we must obey the following invariant within
each shard: A record $r_i$ and tombstone $t_j$ can exist in the same
shard if $i > j$. But, if $i < j$, then a cancellation should occur.

The case where the record and tombstone coexist covers the situation where
a record is deleted, and then inserted again after the delete. In this
case, there does exist a record $r_k$ with $k < j$ that the tombstone
should cancel with, but that record may exist in a different shard. So
the tombstone will \emph{eventually} cancel, but it would be incorrect
to cancel it with the matching record $r_i$ that it coexists with in
the shard being considered.

This means that correct tombstone cancellation requires that the order
that records have been inserted be known and accounted for during
shard construction. To enable this, our framework implements two important
features,

\begin{enumerate}
    \item All records in the buffer contain a timestamp in their header,
    indicating insertion order. This can be cleared or discarded once
    the buffer shard has been constructed.
    \item All shards passed into the shard constructor are provided in
    reverse chronological order. The first shard in the vector will be
    the oldest, and so on, with the final shard being the newest.
\end{enumerate}

The user can make use of these properties however they like during
shard construction. The specific approach that we use in our shard
implementations is to ensure that records are sorted by value, such that
equal records are adjacent, and then by age, such that the newest record
appears first, and the oldest last. By enforcing this order, a tombstone
at index $i$ will cancel with a record if and only if that record is
in index $i+1$. For structures that are constructed by a sorted-merge
of data, this allows tombstone cancellation at no extra cost during
the merge operation. Otherwise, it requires an extra linear pass after
sorting to remove canceled records.\footnote{
    For this reason, we use tagging based deletes for structures which
    don't require sorting by value during construction.
}

\Paragraph{Erase Return Codes.} As noted in
Section~\ref{sec:dyn-framework}, the external \texttt{erase} function can
return a $0$ on failure. The specific meaning of this failure, however,
is a function of the delete policy being used.

For tombstone deletes, a failure to delete means a failure to insert,
and the request should be retried after a brief delay. Note that, for
performance reasons, the framework makes no effort to ensure that the
record being erased using tombstones is \emph{actually} there, so it
is possible to insert a tombstone that can never be canceled. This
won't affect correctness in any way, so long as queries are correctly
implemented, but it will increase the size of the structure slightly.

For tagging deletes, a failure to delete means that the record to be
removed could not be located to tag it. Such failures should \emph{not}
be retried immediately, as the situation will not automatically resolve
itself before new records are inserted.

\Paragraph{Tombstone Asymptotic Complexity.} Tombstone deletes reduce to
inserts, and so they have the same asymptotic properties as inserts. Namely,
\begin{align*}
\mathscr{D}(n) &\in \Theta(B(n)) \\
\mathscr{D}_a(n) &\in \Theta\left( \frac{B(n)}{n} \cdot \log_s n\right)
\end{align*}

\Paragraph{Tagging Asymptotic Complexity.} Tagging deletes must perform
a linear scan of the buffer, and a point-lookup of every shard. If $L(n)$
is the worst-case cost of the shard's implementation of \texttt{point\_lookup},
then the worst-case cost of a delete under tagging is,
\begin{equation*}
\mathscr{D}(n) \in \Theta \left( N_b + L(n) \cdot \log_s n\right)
\end{equation*}
The \texttt{point\_lookup} interface requires an optional boolean argument
that is set to true when the function is called as part of a delete
process by the framework. This is to enable the use of Bloom filters,
or other similar structures, to accelerate these operations if desired.

\subsubsection{Queries}

The framework processes queries using a direct implementation of the
approach discussed in Section~\ref{ssec:dyn-idsp}, with modifications to
account for the buffer. The buffer itself is treated in the procedure like
any other shard, except with its own specialized query and preprocessing
function. The algorithm itself is shown in Algorithm~\ref{alg:dyn-query}

In order to appropriately account for deletes during result set
combination, the query interfaces make similar ordering guarantees to
the shard construction interface. Records from the buffer will have
their insertion timestamp available, and shards, local queries, and
local results, are always passed in descending order of age. This is to
allow tombstones to be accounted for during the query process using the
same mechanisms described in Section~\ref{sssec:dyn-deletes}.

\begin{algorithm}[t]
	\caption{Query with Dynamization Framework}
	\label{alg:dyn-query}
	\KwIn{$q$: query parameters, $b$: mutable buffer, $S$: static shards at all levels}
	\KwOut{$R$: query results}
	
    $\mathscr{S}_b \gets \texttt{local\_preproc}_{buffer}(b, q);\ \ \mathscr{S} \gets \{\}$ \;
    \For{$s \in S$}{$\mathscr{S} \gets \mathscr{S}\ \cup (s, \texttt{local\_preproc}(s, q))$\;}
    $(q_b, q_1, \ldots q_m) \gets \texttt{distribute\_query}(\mathscr{S}_b, \mathscr{S}, q)$ \;
    $\mathcal{R} \gets \{\}; \ \ \texttt{rpt} \gets \bot$ \;
    \Do{\texttt{rpt}}{
		$locR \gets \{\}$ \;
        $locR \gets locR \cup \texttt{local\_query}_{buffer}(b, q_b)$ \;
        % the subscript in this one is wonky. Maybe do an array of Qs?
		\For{$s \in S$}{$locR \gets locR \cup \texttt{local\_query}(s, q_s)$}
        %\Comment{For \red{name}, use \texttt{tombstone\_lookup} to remove all deleted records. }
		%\If{\textbf{not} \texttt{SKIP\_DELETE\_FILTER}}{$locR \gets \texttt{filter\_deletes}(locR, S)$}
        $\mathcal{R} \gets \mathcal{R} \cup \texttt{combine}(locR, q_b, q_1, \ldots, q_m)$\;
        $(\texttt{rpt}, q_b, q_1, \ldots,
        q_m) \gets \texttt{repeat}(q, \mathcal{R}, q_b, q_1,\ldots, q_m)$\;
	}
    \Return{$\mathcal{R}$}
	
\end{algorithm}

\Paragraph{Asymptotic Complexity.} The worst-case query cost of the
framework follows the same basic cost function as discussed for IDSPs
in Section~\ref{ssec:dyn-idsp}, with slight modifications to account for
the different cost function of buffer querying and preprocessing. The
cost is,
\begin{equation*}
\mathscr{Q}(n) \in O \left(P_B(N_B) + \log_s n \cdot P(n) + D(n) + R(n)\left(
    Q_B(n) + \log_s n \cdot Q_s(n) + C_e(n)\right)\right)
\end{equation*}
where $P_B(n)$ is the cost of pre-processing the buffer, and $Q_B(n)$ is
the cost of querying it. As $N_B$ is a small constant relative to $n$,
in some cases these terms can be omitted, but they are left here for
generality. Also note that this is an upper bound, but isn't necessarily
tight. As we saw with IRS in Section~\ref{ssec:edsp}, it is sometimes
possible to leverage problem-specific details within this interface to
get better asymptotic performance.

\subsection{Concurrency Control}
\label{ssec:dyn-concurrency}

\section{Evaluation}

Having described the framework in detail, we'll now turn to demonstrating
its performance for a variety of search problems and associated data
structures. We've predominately selected problems for which an existing
dynamic data structure also exists, to demonstrate that the performance
of our dynamization techniques can match or exceed hand-built dynamic
solutions to these problems. Specifically, we will consider IRS using
ISAM tree, range scans using learned indices, high-dimensional $k$-NN
using VPTree, and exact string matching using succinct tries.


\subsection{Experimental Setup}

All of our testing was performed using Ubuntu 20.04 LTS on a dual
socket Intel Xeon Gold 6242 server with 384 GiB of physical memory and
40 physical cores. We ran our benchmarks pinned to a specific core,
or specific NUMA node for multi-threaded testing. Our code was compiled
using GCC version 11.3.0 with the \texttt{-O3} flag, and targeted to
C++20.\footnote{
    Aside from the ALEX benchmark. ALEX does not build in this
    configuration, and we used C++13 instead for that particular test.
}

Our testing methodology involved warming up the data structure by
inserting 10\% of the dataset, and then measuring the throughput over
the insertion of the rest of the records. During this second phase, a
workload mixture of 95\% inserts and 5\% deletes was used for structures
that supported deletes. Once the insertion phase was complete, we measured
the query latency by repeatedly querying the structure with a selection
of pre-constructed queries and measuring the average latency. Reported
query performance numbers are latencies, and insertion/update numbers are
throughputs. For data structure size charts, we report the total size of
the data structure and all auxiliary structures, minus the size of the
raw data. All tests were run on a single-thread without any background
operations, unless otherwise specified.

We used several datasets for testing the different
structures. Specifically,

\begin{itemize}

    \item For range and sampling problems, we used the \texttt{book},
        \texttt{fb}, and \texttt{osm} datasets from
        SOSD~\cite{sosd-datasets}. Each has 200 million 64-bit keys
        (to which we added 64-bit values)  following a variety of
        distributions. We omitted the \texttt{wiki} dataset because it
        contains duplicate keys, which were not supported by one of our
        dynamic baselines.

    \item For vector problems, we used the Spanish Billion Words (SBW)
        dataset~\cite{sbw}, containing about 1 million 300-dimensional
        vectors of doubles, and a sample of 10 million 128-dimensional
        vectors of unsigned longs from the BigANN dataset~\cite{bigann}.

    \item For string search, we used the genome of the brown bear
        (ursarc) broken into 30 million unique 70-80 character
        chunks~\cite{ursa}, and a list of about 400,000 English words
        (english)~\cite{english-words}.

\end{itemize}


\subsection{Design Space Evaluation}
\label{ssec:dyn-ds-exp}
\begin{figure}
	%\vspace{0pt}
	\centering
    \subfloat[Insertion Throughput \\ vs. Buffer Size]{\includegraphics[width=.4\textwidth]{img/fig-ps-mt-insert} \label{fig:ins-buffer-size}}
	\subfloat[Insertion Throughput \\ vs. Scale Factor]{\includegraphics[width=.4\textwidth]{img/fig-ps-sf-insert} \label{fig:ins-scale-factor}}
    \\ %\vspace{-2mm}
    \subfloat[Query Latency vs. Buffer Size]{\includegraphics[width=.4\textwidth]{img/fig-ps-mt-query} \label{fig:q-buffer-size}}
	\subfloat[Query Latency vs. Scale Factor]{\includegraphics[width=.4\textwidth]{img/fig-ps-sf-query} \label{fig:q-scale-factor}}
	%\vspace{-2mm}
    \caption{Design Space Evaluation (Triespline)}
    %\vspace{-2mm}
\end{figure}

For our first set of experiments, we evaluated a dynamized version of the
Triespline learned index~\cite{plex} for answering range count queries.\footnote{
    We tested range scans throughout this chapter by measure the
    performance of a range count. We decided to go this route to ensure
    that the results across our baselines were comparable. Different range
    structures provided different interfaces for accessing the result
    sets, some of which required making an extra copy and others which
    didn't. Using a range count instead allowed us to measure only index
    traversal time, without needing to worry about controlling for this
    difference in interface.
} We examined different configurations of our framework to examine the
effects that our configuration parameters had on query and insertion
performance. We ran these tests using the SOSD \texttt{OSM} dataset.

First, we'll consider the effect of buffer size on performance in
Figures~\ref{fig:ins-buffer-size} and \ref{fig:q-buffer-size}. For all
of these tests, we used a fixed scale factor of $8$ and the tombstone
delete policy. Each plot shows the performance of our three supported
layout policies (note that BSM using a fixed $N_B=1$ and $s=2$ for all
tests, to accurately reflect the performance of the classical Bentley-Saxe
method).  We first note that the insertion throughput appears to increase
roughly linearly with the buffer size, regardless of layout policy
(Figure~\ref{fig:ins-buffer-size}), whereas the query latency remains
relatively flat up to $N_B=12000$, at which point it begins to increase
for both policies. It's worth noting that this is the point at which
the buffer takes up roughly half of the L1 cache on our test machine. 

It's interesting to compare these results with those in
Figures~\ref{fig:insert_mt} and \ref{fig:sample_mt} in the previous
chapter. Both of them show roughly similar insertion performance
(though this is masked slightly by the log scaling of the y-axis and
larger range of x-values in Figure~\ref{fig:insert_mt}), but there's a
clear difference in query performance.  For the sampling structure in
Figure~\ref{fig:sample_mt}, the query latency was largely independent of
buffer size. In our sampling framework, we use rejection sampling on the
buffer, and so it introduced constant overhead. For range scans, though,
we need to do a full linear scan of the buffer. Increasing the buffer
reduces the number of shards to be queried slightly, and this effect
appears to be enough to counterbalance the increasing scan cost to a
point, but there's clearly a cut-off at which larger buffers cease to make
sense. We'll examine this situation in more detail in the next chapter.

Next, we consider the effect that scale factor has on
performance. Figure~\ref{fig:ins-scale-factor} shows the change
in insertion performance as the scale factor is increased. The
pattern here is the same as we saw in the previous chapter, in
Figure~\ref{fig:insert_sf}. When leveling is used, enlarging the
scale factor hurts insertion performance. When tiering is used, it
improves performance. This is because a larger scale factor in tiering
results in more, smaller structures, and thus reduced reconstruction
time. But for leveling it increases the write amplification, hurting
performance.  Figure~\ref{fig:q-scale-factor} shows that, like with
Figure~\ref{fig:sample_sf} in the previous chapter, query latency is not
strongly affected by the scale factor, but larger scale factors due tend
to have a negative effect under tiering (due to having more structures).

As a final note, these results demonstrate that, compared the the
normal Bentley-Saxe method, our proposed design space is a strict
improvement. There are points within the space that are equivalent to,
or even strictly superior to, BSM in terms of both query and insertion
performance. Beyond this, there are also clearly available trade-offs
between insertion and query performance, particular when it comes to
selecting layout policy.


\begin{figure*}
	%\vspace{0pt}
	\centering
	\subfloat[Update Throughput]{\includegraphics[width=.32\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-irs-insert} \label{fig:irs-insert}}
	\subfloat[Query Latency]{\includegraphics[width=.32\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-irs-query} \label{fig:irs-query}} 
	\subfloat[Index Overhead]{\includegraphics[width=.32\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-irs-space} \label{fig:irs-space}}
	%\vspace{-3mm}
	\caption{IRS Index Evaluation}
	\label{fig:irs}
	%\vspace{-6mm}
\end{figure*}

\subsection{Independent Range Sampling}

Next, we'll consider the independent range sampling problem using ISAM
tree. The functioning of this structure for answering IRS queries is
discussed in more detail in Section~\ref{ssec:irs-struct}, and we use the
query algorithm described in Algorithm~\ref{alg:decomp-irs}. We use the
tagging mechanism to support deletes, and enable proactive compaction
to ensure that rejection rates are bounded. For our query class, we
obtain the upper and lower bounds of the query range, and the weight
of that range, using tree traversals in \texttt{local\_preproc}. We
use rejection sampling on the buffer, and so the buffer preprocessing
simply uses the number of records in the buffer for its weight. In
\texttt{distribute\_query}, we build an alias structure over all of
the weights and query it $k$ times to obtain the individual $k$ values
for the local queries. To avoid extra work on repeat, we stash this
alias structure in the buffer's local query object so it is available
for re-use. \texttt{local\_query} simply generates the appropriate
number of random numbers on the query interval. For each of these,
the record is checked to see if it has been tagged as deleted or not,
and added to the result set if it hasn't. No retries occur in the case
of deleted records.  \texttt{combine} simply merges all the result sets
together, and \texttt{repeat} checks if the total result set size is
the same as requested. If it is not, then \texttt{repeat} updates $k$
to be the number of missing records, and calls \texttt{distribute\_query}
again, before returning false.

This query algorithm and data structure results in a dynamized index with
the following performance characteristics,

\begin{align*}
    \text{Insert:} \quad &\Theta\left(\log_s n\right) \\
    \text{Query:}  \quad &\Theta\left(\log_s n \log_f n + \frac{k}{1 - \delta}\right) \\
    \text{Delete:} \quad &\Theta\left(\log_s n \log_f n\right)
\end{align*}
where $f$ is the fanout of the ISAM tree and $\delta$ is the maximum
proportion of deleted records that can exist on a level before a proactive
compaction is triggered.

We configured our dynamized structure to use $s=8$, $N_B=12000$, $\delta
= .05$, $f = 16$, and the tiering layout policy. We compared our method
(\textbf{DE-IRS}) to Olken's method~\cite{olken89} on a B+Tree with
aggregate weight counts (\textbf{AGG B+Tree}), as well as our bespoke
sampling solution from the previous chapter (\textbf{Bespoke}) and a
single static instance of the ISAM Tree (\textbf{ISAM}). Because IRS
is neither INV nor DDSP, the standard Bentley-Saxe Method has no way to
support deletes for it, and was not tested. All of our tested sampling
queries had a controlled selectivity of $\sigma = 0.01\%$ and $k=1000$.

The results of our performance benchmarking are in Figure~\ref{fig:irs}.
Figure~\ref{fig:irs-insert} shows that our general framework has
comparable insertion performance to the specialized one, though loses
slightly. This is to be expected, as \textbf{Bespoke} was hand-written for
specifically this type of query and data structure, and has hard-coded
data types, among other things. Despite losing to \textbf{Bespoke}
slightly, \textbf{DE-IRS} does still manage to defeat the dynamic baseline
in all cases.

Figure~\ref{fig:irs-query} shows the average query latencies of the
three dynamic solutions, as well as a lower bound provided by querying a
single instance of ISAM statically built over all of the records. This
shows that our generalized solution actually manages to defeat the
\textbf{Bespoke} in query latency, coming in a bit closer to the static
structure. Both \textbf{DE-IRS} and \textbf{Bespoke} manage to default
the dynamic baseline.

Finally, Figure~\ref{fig:irs-space} shows the space usage of the
data structures, less the storage required for the raw data. The two
dynamized solutions require \emph{significantly} less storage than the
dynamic B+Tree, which must leave empty spaces in its nodes for inserts.
This is a significant advantage of static data structures--they can pack
data much more tightly and require less storage. Dynamization, at least
in this case, doesn't add a significant amount of overhead over a single
instance of the static structure.

\subsection{$k$-NN Search}
\label{ssec:dyn-knn-exp}
Next, we'll consider answering high dimensional exact $k$-NN queries
using a static Vantage Point Tree (VPTree)~\cite{vptree}. This is a
binary search tree with internal nodes that partition records based
on their distance to a selected point, called the vantage point. All
of the points within a fixed distance of the vantage point are covered
by one sub-tree, and the points outside of this distance are covered by
the other.  This results in a hard-to-update data structure that can
be constructed in $\Theta(n \log n)$ time using repeated application of
the \texttt{quickselect} algorithm~\cite{quickselect} to partition the
points for each node. This structure can answer $k$-NN queries in
$\Theta(k \log n)$ time.

Our dynamized query procedure is implemented based on
Algorithm~\cite{alg:idsp-knn}, though using delete tagging instead of
tombstones. VPTree doesn't support efficient point lookups, and so to
work around this we add a hash map to each shard, mapping each record to
its location in storage, to ensure that deletes can be done efficiently
in this way. This allows us to avoid canceling deleted records in
the \texttt{combine} operation, as they can be skipped over during
\texttt{local\_query} directly. Because $k$-NN doesn't have any of the
distributional requirements of IRS, these local queries can return $k$
records even in the case of deletes, by simply returning the next-closest
record instead, so long as there are at least $k$ undeleted records in
the shard. Thus, \texttt{repeat} isn't necessary. This algorithm and
data structure result in a dynamization with the following performance
characteristics,

\begin{align*}
    \text{Insert:} \quad &\Theta\left(\log_s n\right) \\
    \text{Query:}  \quad &\Theta\left(N_B + \log n \log_s n\right ) \\
    \text{Delete:} \quad &\Theta\left(\log_s n \right)
\end{align*}

For testing, we considered a dynamized VPTree using $N_B = 1400$, $s =
8$, the tiering layout policy, and tagged deletes. Because $k$-NN is a
standard DDSP, we compare with the Bentley-Saxe Method (\textbf{BSM})\footnote{
    There is one deviation from pure BSM in our implementation. We use
    the same delete tagging scheme as the rest of our framework, meaning
    that the hash tables for record lookup are embedded alongside each
    block, rather than having a single global table. This means that
    the lookup of the shard containing the record to be deleted runs
    in $\Theta(\log_2 n)$ time, rather than $\Theta(1)$ time. However,
    once the block has been identified, our approach allows the record to
    be deleted in $\Theta(1)$ time, rather than requiring an inefficient
    point-lookup directly on the VPTree.
} and a dynamic data structure for the same search problem called an
M-Tree~\cite{mtree,mtree-impl} (\textbf{MTree}), which is an example of a so-called
"ball tree" structure that partitions high dimensional space using nodes
representing spheres, which are merged and split to maintain balance in
a manner not unlike a B+Tree.  We also consider a static instance of a
VPTree built over the same set of records (\textbf{VPTree}).  We used
L2 distance as our metric, which is defined for vectors of $d$
dimensions as
\begin{equation*}
dist(r, s) = \sqrt{\sum_{i=0}^{d-1} \left(r_i - s_i\right)^2} 
\end{equation*}
and ran the queries with $k=1000$ relative to a randomly selected point
in the dataset.

\begin{figure*}
		\subfloat[Update Throughput]{\includegraphics[width=.32\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-knn-insert} \label{fig:knn-insert}}
		\subfloat[Query Latency]{\includegraphics[width=.32\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-knn-query} \label{fig:knn-query}} 
		\subfloat[Index Overhead]{\includegraphics[width=.32\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-knn-space} \label{fig:knn-space}} 
		%\vspace{-3mm}
        \caption{k-NN Index Evaluation}
        %\vspace{-3mm}
        \label{fig:knn-eval}
\end{figure*}

The results of this benchmarking are reported in
Figure~\ref{fig:knn-eval}.  The VPTree is shown here to \emph{vastly}
out-perform the dynamic data structure in query performance in
Figure~\ref{fig:knn-query}. Note that the y-axis of this figure
is log-scaled. Interestingly, the query performance is not severely
degraded relative to the static baseline regardless of the dynamization
scheme used, with \textbf{BSM-VPTree} performing slightly \emph{better}
than our framework for query performance. The reason for this is
shown in Figure~\ref{fig:knn-insert}, where our framework outperforms
the Bentley-Saxe method in insertion performance. These results are
attributable to our selection of framework configuration parameters,
which are biased towards better insertion performance. Both dynamized
structures also outperform the dynamic baseline. Finally, as is becoming
a trend, Figure~\ref{fig:knn-space} shows that the storage requirements
of the static data structures, dynamized or not, are significantly less
than M-Tree. M-Tree, like a B+Tree, requires leaving empty slots in its
nodes to support insertion, and this results in a large amount of wasted
space.

As a final note, metric indexing is an area where dynamized static
structures have been shown to work well already, and our results here
are in line with the results of Naidan and Hetland, who applied BSM
directly to metric data structures, including VPTree, in their own work
and showed similar performance advantages~\cite{naidan14}.


\subsection{Range Scan}

Next, we will consider applying our dynamization framework to learned
indices for single-dimensional range scans. A learned index is a sorted
data structure which attempts to index data by directly modeling a
function mapping a key to its offset within a storage array. The result
of a lookup against the index is a estimated location, along with a
strict error bound, within which the record is guaranteed to be located.
We apply our framework to create dynamized versions of two static learned
indices: Triespline~\cite{plex} (\textbf{DE-TS}) and PGM~\cite{pgm}
(\textbf{DE-PGM}), and compare with a standard Bentley-Saxe dynamized of
Triespline (\textbf{BSM-TS}). Our dynamic baselines are ALEX~\cite{alex},
which is dynamic learned index based on a B+Tree like structure, and
PGM (\textbf{PGM}), which provides support for a dynamic version based
on Bentley-Saxe dynamization (which is why we have not included a BSM
version of PGM in our testing).

For our dynamized versions of Triespline and PGM, we configure the
framework with $N_B = 12000$, $s=8$ and the tiering layout policy. We
consider range count queries, which traverse the range and return the
number of records on it, rather than returning the set of records,
to overcome differences in the query interfaces in our baselines, some
of which make extra copies of the records. We consider traversing the
range and counting to be a more fair comparison. Range counts are true
invertible search problems, and so we use tombstone-deletes. The query
process itself performs no preprocessing. Local queries use the index to
identify the first record in the query range and then traverses the range,
counting the number of records and tombstones encountered. These counts
are then combined by adding up the total record count from all shards,
subtracting the total tombstone count, and returning the final count. No
repeats are necessary. The buffer query simply scans the unsorted array
and performs the same counting. We examine range count queries with
a fixed selectivity of $\sigma = 0.1\%$.

\begin{figure*}
		\centering
		\subfloat[Update Throughput]{\includegraphics[width=.32\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-rq-insert} \label{fig:rq-insert}}
		\subfloat[Query Latency]{\includegraphics[width=.32\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-rq-query} \label{fig:rq-query}} 
		\subfloat[Index Overhead]{\includegraphics[width=.32\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-rq-space} \label{fig:rq-space}} 
        %\vspace{-3mm}
		\caption{Learned Index Evaluation}
		%\vspace{-3mm}
		\label{fig:eval-learned-index}
\end{figure*}

The results of our evaluation are shown in
Figure~\ref{fig:eval-learned-index}. Figure~\ref{fig:rq-insert} shows
the insertion performance. DE-TS is the best in all cases, and the pure
BSM version of Triespline is the worst by a substantial margin. Of
particular interest in this chart is the inconsistent performance of
ALEX, which does quite well on the \texttt{books} dataset, and poorly
on the others. It is worth noting that getting ALEX to run \emph{at
all} in some cases required a lot of trial and error and tuning, as its
performance is highly distribution dependent.  Our dynamized version of
PGM consistently out-performed the built-in dynamic support of the same
structure.  One shouldn't read \emph{too} much into this result, as PGM
itself supports some performance tuning and can be adjusted to balance
between insertion and query performance. We ran it with the author's
suggested default values, but in principle it could be possible to tune
it to match our framework's performance here. The important take-away
from this test is that our generalized framework can easily trade-blows
with a custom, integrated solution.

The query performance results in Figure~\ref{fig:rq-query} are a bit
less interesting. All solutions perform similarly, with ALEX again
showing itself be to fairly distribution dependent in its performance,
performing the best out of all of the structures on the \texttt{books}
dataset by a reasonable margin, but falling in line with the others on the
remaining datasets. The standout result here is the dynamic PGM, which
performs horrendously compared to all of the other structures. The same
caveat from the previous paragraph applies here--PGM can be configured
for better performance. But it's notable that our framework-dynamized PGM
is able to beat PGM slightly in insertion performance without seeing the
same massive degradation in query performance that PGM's native update
support does in its own update-optimized configuration.\footnote{
    It's also worth noting that PGM implements tombstone deletes by
    inserting a record with a matching key to the record to be deleted,
    and a particular "tombstone" value, rather than using a header. This
    means that it can not support duplicate keys when deletes are used,
    unlike our approach. It also means that the records are smaller,
    which should improve query performance, but we're able to beat it even
    including the header. PGM is the reason we excluded the \texttt{wiki}
    dataset from SOSD, as it has duplicate key values.
} Finally, Figure~\ref{fig:rq-space} shows the storage requirements for
these data structures. All of the dynamic options require significantly
more space than the static Triespline, but ALEX requires the most by a
very large margin. This is in keeping with the previous experiments, which
all included similarly B+Tree-like structures that required significant
additional storage space compared to static structures as part of their
update support.

\subsection{String Search}

As a final example of a search problem, we consider exact string matching
using the fast succinct trie~\cite{zhang18}. While dynamic
tries aren't terribly unusual~\cite{m-bonsai,dynamic-trie}, succinct data
structures, which attempt to approach an information-theoretic lower-bound
on their binary representation of the data, are usually static because
implementing updates while maintaining these compact representations
is difficult~\cite{dynamic-trie}. There are specialized approaches for
dynamizing such structures~\cite{dynamize-succinct}, but in this section
we consider the effectiveness of our generalized framework for them.

\begin{figure*}
    \centering
	\subfloat[Update Throughput]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-insert} \label{fig:fst-insert}} 
	\subfloat[Query Latency]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-query} \label{fig:fst-query}} 
	\subfloat[Index Overhead]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-space} \label{fig:fst-space}} 
    %\vspace{-3mm}
    \caption{FST Evaluation}
    \label{fig:fst-eval}
    %\vspace{-5mm}
\end{figure*}

Our shard type is a direct wrapper around an implementation of fast
succinct trie~\cite{fst-impl}. We store the strings in off-record
storage, and the record type itself contains a pointer to the string in
storage. Queries use no pre-processing and the local queries directly
search for a matching string. We use the framework's early abort feature
to stop as soon as the first result is found, and combine simply checks
whether this record is a tombstone or not. If it's a tombstone, then
the lookup is considered to have not found the search string. Otherwise,
the record is returned. This results in a dynamized structure with the
following asymptotic costs,


\begin{align*}
    \text{Insert:} \quad &\Theta\left(\log_s n\right) \\
    \text{Query:}  \quad &\Theta\left(N_B + \log n \log_s n\right ) \\
    \text{Delete:} \quad &\Theta\left(\log_s n \right)
\end{align*}

We compare our dynamized succinct trie (\textbf{DE-FST}), configured with
$N_B = 1200$, $s = 8$, the tiering layout policy, and tombstone deletes,
with a standard Bentley-Saxe dynamization (\textbf{BSM-FST}), as well
as a single static instance of the structure (\textbf{FST}).

The results are show in Figure~\ref{fig:fst-eval}. As with range scans,
the Bentley-Saxe method shows horrible insertion performance relative to
our framework in Figure~\ref{fig:fst-insert}. Note that the significant
observed difference in update throughput for the two data sets is
largely attributable to the relative sizes. The \texttt{US} set is
far larger than \texttt{english}. Figure~\ref{fig:fst-query} shows that
our write-optimized framework configuration is slightly out-performed in
query latency by the standard Bentley-Saxe dynamization, and that both
dynamized structures are quite a bit slower than the static structure for
queries. Finally, the storage costs for the data structures are shown
in Figure~\ref{fig:fst-space}. For the \texttt{english} data set, the
extra storage cost from decomposing the structure is quite significant,
but the for \texttt{ursarc} set the sizes are quite comparable. It is
not unexpected that dynamization would add storage cost for succinct
(or any compressed) data structures, because the splitting of the records
across multiple data structures reduces the ability of the structure to
compress redundant data.

\subsection{Concurrency}

We also tested the preliminary concurrency support described in
Section~\ref{ssec:dyn-concurrency}, using IRS as our test case, with our
dynamization configured with $N_B = 1200$, $s=8$, and the tiering layout
policy. Note that IRS only supports tagging, as it isn't invertible even
under the IDSP model, and our current concurrency implementation only
supports deletes with tombstones, so we eschewed deletes entirely for
this test.

In this benchmark, we used a single thread to insert records
into the structure at a constant rate, while we deployed a variable
number of additional threads that continuously issued sampling queries
against the structure. We used an AGG B+Tree as our baseline. Note
that, to accurately maintain the aggregate weight counts as records
are inserted, it is necessary that each operation obtain a lock on
the root node of the tree~\cite{zhao22}.  This makes this situation
a good use-case for the automatic concurrency support provided by our
framework. Figure~\ref{fig:irs-concurrency} shows the results of this
benchmark for various numbers of concurrency query threads. As can be seen,
our framework supports a stable update throughput up to 32 query threads,
whereas the AGG B+Tree suffers from contention for the mutex and sees
its performance degrade as the number of threads increases.

\begin{figure}
    \centering
	%\vspace{-2mm}
	\includegraphics[width=.5\textwidth]{img/fig-bs-irs-concurrency} 
	%\vspace{-2mm}
	\caption{IRS Thread Scaling}
	\label{fig:irs-concurrency}
	%\vspace{-2mm}
\end{figure}

\section{Conclusion}

In this chapter, we sought to develop a set of tools for generalizing
some of the results from our study of sampling data structures in
Chapter~\ref{chap:sampling} to apply to a broader set of data structures.
This results in our development of two new classes of search problem:
extended decomposable search problems, and iterative deletion decomposable
search problems.  The former class allows for a pre-processing step
to be used to generate individualize local queries for each block in a
decomposed structure, and the latter allows for the query process to be
repeated as necessary, with possible modifications to the local queries
each time, to build up the result set iteratively. We then implemented a
C++ framework for automatically dynamizing static data structures for
search problems falling into either of these classes, which included an
LSM tree inspired design space and support for concurrency.

We used this framework to produce dynamized structures for a wide
variety of search problems, and compared the results to existing
dynamic baselines, as well as the original Bentley-Saxe method, where
applicable. The results show that our framework is capable of creating
dynamic structures that are competitive with, or superior to, custom-built
dynamic structures, and also has clear performance advantages over the
classical Bentley-Saxe method.