5 files changed, 719 insertions, 836 deletions
diff --git a/chapters/beyond-dsp.tex b/chapters/beyond-dsp.tex
index 50d6369..66b9d97 100644
--- a/chapters/beyond-dsp.tex
+++ b/chapters/beyond-dsp.tex
@@ -11,863 +11,729 @@
 
 \label{chap:framework}
 
-The previous chapter demonstrated
-the possible utility of
-designing indexes based upon the dynamic extension of static data
-structures. However, the presented strategy falls short of a general
-framework, as it is specific to sampling problems. In this chapter,
-the techniques of that work will be discussed in more general terms,
-to arrive at a more broadly applicable solution. A general
-framework is proposed, which places only two requirements on supported data
-structures, 
-
-\begin{itemize}
-    \item Extended Decomposability
-    \item Record Identity
-\end{itemize}
+\section{Introduction}
+
+In the previous chapter, we discussed how several of the limitations of
+dynamization could be overcome by proposing a systematic dynamization
+approach for sampling data structures. In doing so, we introduced
+a multi-stage query mechanism to overcome the non-decomposability of
+these queries, provided two mechanisms for supporting deletes along with
+specialized processing to integrate these with the query mechanism, and
+introduced some performance tuning capability inspired by the design space
+of modern LSM Trees. While promising, these results are highly specialized
+and remain useful only within the context of sampling queries. In this
+chapter, we develop new generalized query abstractions based on these
+specific results, and discuss a fully implemented framework based upon
+these abstractions.
+
+More specifically, in this chapter we propose \emph{extended
+decomposability} and \emph{iterative deletion decomposability} as two
+new, broader classes of search problem which are strict supersets of
+decomposability and deletion decomposability respectively, providing a
+more powerful interface to allow the efficient implementation of a larger
+set of search problems over a dynamized structure. We then implement
+a C++ library based upon these abstractions which is capable of adding
+support for inserts, deletes, and concurrency to static data structures
+automatically, and use it to provide dynamizations for independent range
+sampling, range queries with learned indices, string search with succinct
+tries, and high dimensional vector search with metric indices. In each
+case we compare our dynamized implementation with existing dynamic
+structures, and standard Bentley-Saxe dynamizations, where possible.
+
+\section{Beyond Decomposability}
+
+We begin our discussion of this generalized framework by proposing
+new classes of search problems based upon our results from examining
+sampling problems in the previous chapter. Our new classes will enable
+the support of new types of search problem, enable more efficient support
+for certain already supported problems, and allow for broader support
+of deletes. Based on this, we will develop a taxonomy of search problems
+that can be supported by our dynamization technique.
+
+
+\subsection{Extended Decomposability}
+
+As discussed in Chapter~\cite{chap:background}, the standard query model
+used by dynamization techniques requires that a given query be broadcast,
+unaltered, to each block within the dynamized structure, and then that
+the results from these identical local queries be efficiently mergable
+to obtain the final answer to the query. This model limits dynamization
+to decomposable search problems (Definition~\ref{def:dsp}).
+
+In the previous chapter, we considered various sampling problems as
+examples of non-decomposable search problems, and devised a technique for
+correctly answering queries of that type over a dynamized structure. In
+this section, we'll retread our steps with an eye towards a general
+solution, that could be applicable in other contexts. For convenience,
+we'll focus exlusively on independent range sampling. As a reminder, this
+search problem is defined as,
+
+\begin{definitionIRS}[Independent Range Sampling~\cite{tao22}]
+    Let $D$ be a set of $n$ points in $\mathbb{R}$. Given a query
+    interval $q = [x, y]$ and an integer $k$, an independent range sampling
+    query returns $k$ independent samples from $D \cap q$ with each 
+    point having equal probability of being sampled.
+\end{definitionIRS}
+
+We formalize this as a search problem $F_\text{IRS}:(\mathcal{D},
+\mathcal{Q}) \to \mathcal{R}$ where the record domain is $\mathcal{D}
+= \mathbb{R}$, the query parameters domain consists of order triples
+containing the lower and upper boudns of the query interval, and the
+number of samples to draw, $\mathcal{Q} = \mathbb{R} \times \mathbb{R}
+\times \mathbb{Z}^+$, and the result domain containts subsets of the
+real numbers, $\mathcal{R} = \mathcal{PS}(\mathbb{R})$.
+
+$F_\text{IRS}$ can be solved using a variety of data structures, such as
+the static ISAM solution discussed in Section~\ref{ssec:irs-struct}. For
+our example here, we will use a simple sorted array. Let $\mathcal{I}$
+be the sorted array data structure, with a specific instance $\mathscr{I}
+\in \mathcal{I}$ built over a set $D \subset \mathbb{R}$ having $|D| =
+n$ records. The problem $F_\text{IRS}(\mathscr{I}, (l, u, k))$ can be
+solved by binary searching $\mathscr{I}$ twice to obtain the index of
+the first element greater than or equal to $l$ ($i_l$) and the last
+element less than or equal to $u$ ($i_u$). With these two indices,
+$k$ random numbers can generated on the interval $[i_l, i_u]$ and the
+records at these indices returned. This sampling procedure is described
+in Algorithm~\ref{alg:array-irs} and runs in $\mathscr{Q}_\text{irs}
+\in \Theta(\log n + k)$ time.
+
+\SetKwFunction{IRS}{IRS}
+\begin{algorithm}
+\caption{Solution to IRS on a sorted array}
+\label{alg:array-irs}
+\KwIn{$k$: sample size, $[l,u]$: lower and upper bound of records to sample}
+\KwOut{$S$: a sample set of size $k$}
+\Def{\IRS{$(\mathscr{I}, (l, u, k))$}}{
+    \Comment{Find the lower and upper bounds of the interval}
+    $i_l \gets \text{binary\_search\_lb}(\mathscr{I}, l)$ \;
+    $i_u \gets \text{binary\_search\_ub}(\mathscr{I}, u)$ \;
+    \BlankLine
+    \Comment{Initialize empty sample set}
+    $S \gets \{\}$ \;
+    \BlankLine
+    \For {$i=1\ldots k$} {
+        \Comment{Select a random record within the inteval}
+        $i_r \gets \text{randint}(i_l, i_u)$ \;
+
+        \Comment{Add it to the sample set}
+        $S \gets S \cup \{\text{get}(\mathscr{I}, i_r)\}$ \;
+    }
+    \BlankLine
+    \Comment{Return the sample set}
+    \Return $S$ \;
+}
+\end{algorithm}
 
-In this chapter, first these two properties are defined. Then, 
-a general dynamic extension framework is described which can
-be applied to any data structure supporting these properties. Finally,
-an experimental evaluation is presented that demonstrates the viability
-of this framework.
-
-\section{Extended Decomposability}
-
-Chapter~\ref{chap:sampling} demonstrated how non-DSPs can be efficiently
-addressed using Bentley-Saxe, so long as the query interface is
-modified to accommodate their needs. For Independent sampling
-problems, this involved a two-pass approach, where some pre-processing
-work was performed against each shard and used to construct a shard
-alias structure. This structure was then used to determine how many
-samples to draw from each shard.
-
-To generalize this approach, a new class of decomposability is proposed,
-called \emph{extended decomposability}. At present, its 
-definition is tied tightly to the query interface, rather
-than a formal mathematical definition.  In extended decomposability,
-rather than treating a search problem as a monolith, the algorithm 
-is decomposed into multiple components.
-This allows
-for communication between shards as part of the query process.
-Additionally, rather than using a binary merge operator, extended
-decomposability uses a variadic function that merges all of the
-result sets in one pass, reducing the cost due to merging by a
-logarithmic factor without introducing any new restrictions.
-
-The basic interface that must be supported by a extended-decomposable
-search problem (eDSP) is,
-\begin{itemize}
+It becomes more difficult to answer $F_\text{IRS}$ over a data structure
+that has been decomposed into blocks, because the number of samples
+taken from each block must be appropriately weighted to correspond to the
+number of records within each block falling into the query range. In the
+classical model, there isn't a way to do this, and so the only solution
+is to answer $F_\text{IRS}$ against each block, asking for the full $k$
+samples each time, and then downsampling the results corresponding to
+the relative weight of each block, to obtain a final sample set.
+
+Using this idea, we can formulate $F_\text{IRS}$ as a $C(n)$-decomposable
+problem by changing the result set type to $\mathcal{R} =
+\mathcal{PS}(\mathbb{R}) \times \mathbb{R}$ where the first element
+in the tuple is the sample set and the second argument is the number
+of elements falling between $l$ and $u$ in the block being sampled
+from. With this information, it is possible to implement $\mergeop$
+using Bernoulli sampling over the two sample sets to be merged. This
+requires $\Theta(k)$ time, and thus $F_\text{IRS}$ can be said to be
+a $k$-decomposable search problem, which runs in $\Theta(\log^2 n + k
+\log n)$ time. This procedure is shown in Algorithm~\ref{alg:decomp-irs}.
+
+\SetKwFunction{IRSDecomp}{IRSDecomp}
+\SetKwFunction{IRSCombine}{IRSCombine}
+\begin{algorithm}[!h]
+    \caption{$k$-Decomposable Independent Range Sampling}
+    \label{alg:decomp-irs}
+    \KwIn{$k$: sample size, $[l,u]$: lower and upper bound of records to sample}
+    \KwOut{$(S, c)$: a sample set of size $k$ and a count of the number
+           of records on on the interval $[l,u]$}
+    \Def{\IRSDecomp{$\mathscr{I}_i, (l, u, k)$}}{
+        \Comment{Find the lower and upper bounds of the interval}
+        $i_l \gets \text{binary\_search\_lb}(\mathscr{I}_i, l)$ \;
+        $i_u \gets \text{binary\_search\_ub}(\mathscr{I}_i, u)$ \;
+        \BlankLine
+        \Comment{Initialize empty sample set}
+        $S \gets \{\}$ \;
+        \BlankLine
+        \For {$i=1\ldots k$} {
+            \Comment{Select a random record within the inteval}
+            $i_r \gets \text{randint}(i_l, i_u)$ \;
+
+            \Comment{Add it to the sample set}
+            $S \gets S \cup \{\text{get}(\mathscr{I}_i, i_r)\}$ \;
+        }
+        \BlankLine
+        \Comment{Return the sample set and record count}
+        \Return ($S$, $i_u - i_l$) \;
+    }
+    \BlankLine
+
+    \Def{\IRSCombine{$(S_1, c_1)$, $(S_2, c_2)$}}{
+        \Comment{The output set should be the same size as the input ones}
+        $k \gets |S_1|$ \;
+        \BlankLine
+        \Comment{Calculate the weighting that should be applied to each set when sampling}
+        $w_1 \gets \frac{c_1}{c_1 + c_2}$ \;
+        $w_2 \gets \frac{c_2}{c_1 + c_2}$ \;
+        \BlankLine
+        \Comment{Initialize output set and count}
+        $S \gets \{\}$\;
+        $c \gets c_1 + c_2$ \;
+        \BlankLine
+        \Comment{Down-sample the input result sets}
+        $S \gets S \cup \text{bernoulli}(S_1, w_1, k\times w_1)$ \;
+        $S \gets S \cup \text{bernoulli}(S_2, w_2, k\times w_2)$ \;
+        \BlankLine
+        \Return $(S, w)$
+    }
+\end{algorithm}
 
-    \item $\mathbftt{local\_preproc}(\mathcal{I}_i, \mathcal{Q}) \to
-    \mathscr{S}_i$ \\
-        Pre-processes each partition $\mathcal{D}_i$ using index
-        $\mathcal{I}_i$ to produce preliminary information about the 
-        query result on this partition, encoded as an object 
-        $\mathscr{S}_i$.
-
-    \item $\mathbftt{distribute\_query}(\mathscr{S}_1, \ldots,
-    \mathscr{S}_m, \mathcal{Q}) \to \mathcal{Q}_1, \ldots, 
-    \mathcal{Q}_m$\\
-            Processes the list of preliminary information objects
-            $\mathscr{S}_i$ and emits a list of local queries
-            $\mathcal{Q}_i$ to run independently on each partition.
-
-    \item $\mathbftt{local\_query}(\mathcal{I}_i, \mathcal{Q}_i)
-    \to \mathcal{R}_i$ \\
-            Executes the local query $\mathcal{Q}_i$ over partition
-            $\mathcal{D}_i$ using index $\mathcal{I}_i$ and returns a
-            partial result $\mathcal{R}_i$.
-
-    \item $\mathbftt{merge}(\mathcal{R}_1, \ldots \mathcal{R}_m) \to
-    \mathcal{R}$ \\ 
-           Merges the partial results to produce the final answer.
+While this approach does allow sampling over a dynamized structure, it is
+asymptotically inferior to Olken's method, which allows for sampling in
+only $\Theta(k \log n)$ time~\cite{olken89}. However, we've already seen
+in the previous chapter how it is possible to modify the query procedure
+into a multi-stage process to enable more efficient solutions to the IRS
+problem. The core idea underlying our solution in that chapter was to
+introduce individualized local queries for each block, which were created
+after a pre-processing step to allow information about each block to be
+determined first. In that particular example, we established the weight
+each block should have during sampling, and then creating custom sampling
+queries with variable $k$ values, following the weight distributions. We
+have determined a general interface that allows for this procedure to be
+expressed, and we define the term \emph{extended decomposability} to refer
+to search problems that can be answered in this way.
+
+More formally, consider search problem $F(D, q)$ capable of being
+answered using a data structure instance $\mathscr{I} \in \mathcal{I}$
+built over a set of records $D \in \mathcal{D}$ that has been decomposed
+into $m$ blocks, $\mathscr{I}_1, \mathscr{I}_2, \ldots, \mathscr{I}_m$
+each corresponding to a partition of $D$, $D_1, D_2, \ldots, D_m$. $F$
+is an extended-decomposable search problem (eDSP) if it can be expressed
+using the following interface,
 
+\begin{itemize}
+\item $\mathbftt{local\_preproc}(\mathscr{I}_i, q) \to \mathscr{M}_i$ \\
+    Pre-process each partition, $D_i$, using its associated data
+    structure, $\mathscr{I}$ and generate a meta-information object
+    $\mathscr{M}_i$ for use in local query generation.
+
+\item $\mathbftt{distribute\_query}(\mathscr{M}_1, \ldots, \mathscr{M}_m,
+       q) \to q_1, \ldots, q_m$\\
+        Process the set of meta-information about each block and produce
+        individual local queries, $q_1, \ldots, q_m$, for each block.
+
+\item $\mathbftt{local\_query}(\mathscr{I}_i, q_i)  \to r_i$ \\
+        Evaluate the local query with parameters $q_i$ over the data
+        in $D_i$ using the data structure $\mathscr{I}_i$ and produce
+        a partial query result, $r_i$.
+
+\item $\mathbftt{combine}(r_1, \ldots, r_m) \to R$ \\ 
+        Combine the list of local query results, $r_1, \ldots, r_m$ into
+        a final query result, $R$.
 \end{itemize}
 
-The pseudocode for the query algorithm using this interface is,
-\begin{algorithm}
-    \DontPrintSemicolon
-    \SetKwProg{Proc}{procedure}{ BEGIN}{END}
-    \SetKwProg{For}{for}{ DO}{DONE}
+Let $P(n)$ be the cost of $\mathbftt{local\_preproc}$, $D(n)$ be
+the cost of $\mathbftt{distribute\_query}$,  $\mathscr{Q}_\ell(n)$
+be the cost of $\mathbftt{local\_query}$, and $C_e(n)$ be the cost
+$\mathbftt{combine}$. To solve a search problem with this interface
+requires calling $\mathbftt{local\_preproc}$ and $\mathbftt{local\_query}$
+once per block, and $\mathbftt{distribute\_query}$ and
+$\mathbftt{combine}$ once. For a Bentley-Saxe dynamization then, with
+$O(\log_2 n)$ blocks, the worst-case cost of answering an eDSP is,
+\begin{equation}
+\label{eqn:edsp-cost}
+O \left( \log_2 n \cdot P(n) + D(n) + \log_2 n \cdot \mathscr{Q}_\ell(n) + C_e(n) \right)
+\end{equation}
 
-    \Proc{\mathbftt{QUERY}($D[]$, $\mathscr{Q}$)} {
-        \For{$i \in [0, |D|)$} {
-            $S[i] := \mathbftt{local\_preproc}(D[i], \mathscr{Q})$
-        } \;
+As an example, we'll express IRS using the above interface and
+analyze its complexity to show that the resulting solution as the
+same $\Theta(log^2 n + k)$ cost as the specialized solution from
+Chapter~\ref{chap:sampling}.  We use $\mathbftt{local\_preproc}$
+to determine the number of records on each block falling on the
+interval $[l, u]$ and return this, as well as $i_l$ and $i_u$ as the
+meta-information. Then, $\mathbftt{distribute\_query}$ will perform
+weighted set sampling using a temporary alias structure over the
+weights of all of the blocks to calculate the appropriate value
+of $k$ for each local query, which will consist of $(k_i, i_{l,i},
+i_{u,i})$. With the appropriate value of $k$, as well as the indices of
+the upper and lower bounds, pre-calculated, $\mathbftt{local\_query}$
+can simply generate $k_i$ random integers and return the corresponding
+records. $\mathbftt{combine}$ simply combines all of the local results
+and returns the final result set. Algorithm~\ref{alg:edsp-irs} shows
+each of these operations in psuedo-code.
+
+
+\SetKwFunction{preproc}{local\_preproc}
+\SetKwFunction{distribute}{distribute\_query}
+\SetKwFunction{query}{local\_query}
+\SetKwFunction{combine}{combine}
+\begin{algorithm}[t]
+    \caption{IRS with Extended Decomposability}
+    \label{alg:edsp-irs}
+    \KwIn{$k$: sample size, $[l,u]$: lower and upper bound of records to sample}
+    \KwOut{$R$: a sample set of size $k$}
+
+    \Def{\preproc{$\mathscr{I}_i$, $q=(l,u,k)$}}{
+        \Comment{Find the indices for the upper and lower bounds of the query range}
+        $i_l \gets \text{binary\_search\_lb}(\mathscr{I}_i, l)$ \;
+        $i_u \gets \text{binary\_search\_ub}(\mathscr{I}_i, u)$ \;
+        \BlankLine
+        \Return $(i_l, i_u)$ \;
+    }
+
+    \BlankLine
+    \Def{\distribute{$\mathscr{M}_1$, $\ldots$, $\mathscr{M}_m$, $q=(l,u,k)$}}{
+        \Comment{Determine number of records to sample from each block}
+        $k_1, \ldots k_m \gets \mathtt{wss}(k, \mathscr{M}_1, \ldots \mathscr{M}_m)$ \;
+        \BlankLine
+        \Comment{Build local query objects}
+        \For {$i=1..m$} {
+            $q_i \gets (\mathscr{M}.i_l, \mathscr{M}.i_u, k_i)$ \;
+        }
+
+        \BlankLine
+        \Return $q_1 \ldots q_m$ \;
+    }
 
-        $ Q := \mathbftt{distribute\_query}(S, \mathscr{Q}) $ \; \;
+    \BlankLine
+    \Def{\query{$\mathscr{I}_i$, $q_i = (i_{l,i},i_{u,i},k_i)$}}{
+ 
+        \For {$i=1\ldots k_i$} {
+            \Comment{Select a random record within the inteval}
+            $i_r \gets \text{randint}(i_{l,i}, i_{u,i})$ \;
 
-        \For{$i \in [0, |D|)$} {
-            $R[i] := \mathbftt{local\_query}(D[i], Q[i])$
-        } \;
+            \Comment{Add it to the sample set}
+            $S \gets S \cup \{\text{get}(\mathscr{I}_i, i_r)\}$ \;
+        }
 
-        $OUT := \mathbftt{merge}(R)$ \;
+        \Return $S$ \;
+    }
 
-        \Return {$OUT$} \;
+    \BlankLine
+    \Def{\combine{$r_1, \ldots, r_m$, $q=(l, u, k)$}}{
+        \Comment{Union results together}
+        \Return $\bigcup_{i=1}^{m} r_i$ 
     }
 \end{algorithm}
 
-In this system, each query can report a partial result with
-\mathbftt{local\_preproc}, which can be used by
-\mathbftt{distribute\_query} to adjust the per-partition query
-parameters, allowing for direct communication of state between
-partitions. Queries which do not need this functionality can simply
-return empty $\mathscr{S}_i$ objects from \mathbftt{local\_preproc}.
+These operations result in $P(n) \in \Theta(\log n)$, $D(n) \in
+\Theta(\log n)$, $\mathscr{Q}(n,k) \in \Theta(k)$, and $C_e(n) \in
+\Theta(1)$. At first glance, it would appear that we arrived at a
+solution with a query cost of $O\left(\log_2^2 n + k\log_2 n\right)$,
+and thus fallen short of our goal. However, Equation~\ref{eqn:edsp-cost}
+is only an upper bound on the cost. In the case of IRS, we can leverage an
+important problem-specific detail to obtain a better result: the total
+cost of the local queries is actually \emph{independent} of the number
+of shards.
+
+For IRS, the cost of $\mathbftt{local\_query}$ is linear to the number
+of samples requested. Our initial asymptotic cost assumes that, in the
+worst case, each of the $\log_2 n$ blocks is sampled $k$ times. But
+this is not true of our algorithm. Rather, only $k$ samples are taken
+\emph{in total}, distributed across all of the blocks. Thus, regardless
+of how many blocks there are, there will only be $k$ samples drawn,
+requiring $k$ random number generations, etc. As a result, the total
+cost of the local query term in the cost function is actually $\Theta(k)$.
+Applying this result gives us a tighter bound of,
+\begin{equation*}
+\mathscr{Q}_\text{IRS} \in \Theta\left(\log_2^2 n + k\right)
+\end{equation*}
+which matches the result of Chapter~\ref{chap:sampling} for IRS sampling
+in the absence of deletes. The other sampling problems considered in
+Chapter~\ref{chap:sampling} can be similarly implemented using this
+interface, with the same performance as their specialized implementations.
+
+
+\subsection{Iterative Deletion Decomposability}
+
+We next turn out attention to support for deletes. Efficient delete
+support in Bentley-Saxe dynamization is provably impossible~\cite{saxe79},
+but, as discussed in Section~\ref{ssec:dyn-deletes} it is possible
+to support them in restricted situations, where either the search
+problem is invertible (Definition~\ref{}) or the data structure and
+search problem combined are deletion decomposable (Definition~\ref{}).
+In Chapter~\ref{chap:sampling}, we considered a set of search problems
+which did \emph{not} satisfy any of these properties, and instead built a
+customized solution for deletes that required tight integration with the
+query process in order to function. While such a solution was acceptable
+for the goals of that chapter, it is not sufficient for our goal in this
+chapter of producing a generalized system.
+
+Additionally, of the two types of problem that can support deletes, the
+invertible case is preferable. This is because the amount of work necessary
+to support deletes for invertible search problems is very small. The data
+structure requires no modification (such as to implement weak deletes),
+and the query requires no modification (to ignore the weak deletes) aside
+from the addition of the $\Delta$ operator. This is appealing from a
+framework design standpoint. Thus, it would also be worth it to consider
+approaches for expanding the range of search problems that can be answered
+using the ghost structure mechanism supported by invertible problems.
+
+A significant limitation of invertible problems is that the result set
+size is not able to be controlled. We do not know how many records in our
+local results have been deleted until we reach the combine operation and
+they begin to cancel out, at which point we lack a mechanism to go back
+and retrieve more. This presents difficulties for addressing important
+search problems such as top-$k$, $k$-NN, and sampling. In principle, these
+queries could be supported by repeating the query with larger-and-larger
+$k$ values until the desired number of records is returned, but in the
+eDSP model this requires throwing away a lot of useful work, as the state
+of the query must be rebuilt each time. 
+
+We can resolve this problem by moving the decision to repeat the query
+into the query interface itself, allowing retries \emph{before} the
+result set is returned to the user and the local meta-information objects
+discarded. This allows us to preserve this pre-processing work, and repeat
+the local query process as many times as is necessary to achieve our
+desired number of records. From this obervation, we propose another new
+class of search problem: \emph{iterative deletion decomposable} (IDSP). The
+IDSP definition expands eDSP with a fifth operation,
 
-\subsection{Query Complexity}
+\begin{itemize}
+	\item $\mathbftt{repeat}(\mathcal{Q}, \mathcal{R}, \mathcal{Q}_1, \ldots,
+        \mathcal{Q}_m) \to (\mathbb{B}, \mathcal{Q}_1, \ldots,
+        \mathcal{Q}_m)$ \\
+	Evaluate the combined query result in light of the query. If
+	a repetition is necessary to satisfy constraints in the query
+	(e.g., result set size), optionally update the local queries as
+	needed and return true. Otherwise, return false.
+\end{itemize}
 
-Before describing how to use this new interface and definition to
-support more efficient queries than standard decomposability, more
-more general expression for the cost of querying such a structure should
-be derived.
-Recall that Bentley-Saxe, when applied to a $C(n)$-decomposable
-problem, has the following query cost,
+If this routine returns true, then the query process is repeated
+from $\mathbftt{distribute\_query}$, and if it returns false then the
+result is returned to the user. If the number of repetitions of the
+query is bounded by $R(n)$, then the following provides an upper bound
+on the worst-case query complexity of an IDSP,
+
+\begin{equation*}
+    O\left(\log_2 n \cdot P(n) + R(n) \left(D(n) + \log_2 n \cdot Q_s(n) +
+	       C_e(n)\right)\right)
+\end{equation*}
+
+It is important that a bound on the number of repetitions exists,
+as without this the worst-case query complexity is unbounded. The
+details of providing and enforcing this bound are very search problem
+specific. For problems like $k$-NN or top-$k$, the number of repetitions
+is a function of the number of deleted records within the structure,
+and so $R(n)$ can be bounded by placing a limit on the number of deleted
+records. This can be done, for example, using the full-reconstruction
+techniques in the literature~\cite{saxe79, merge-dsp, overmars83}
+or through proactively performing reconstructions, such as with the
+mechanism discussed in Section~\ref{sssec:sampling-rejection-bound},
+depending on the particulars of how deletes are implemented.
+
+As an example of how IDSP can facilitate delete support for search
+problems, let's consider $k$-NN. This problem can be $C(n)$-deletion
+decomposable, depending upon the data structure used to answer it, but
+it is not invertible because it suffers from the problem of potentially
+returning fewer than $k$ records in the final result set after the results
+of the query against the primary and ghost structures have been combined.
+Worse, even if the query does return $k$ records as requested, it is
+possible that the result set could be incorrect, depending upon which
+records were deleted, what block those records are in, and the order in
+which the merge and inverse merge are applied.
+
+\begin{example}
+Consider the $k$-NN search problem, $F$, over some metric index
+$\mathcal{I}$. $\mathcal{I}$ has been dynamized, with a ghost
+structure for deletes, and consists of two blocks, $\mathscr{I}_1$ and
+$\mathscr{I}_2$ in the primary structure, and one block, $\mathscr{I}_G$
+in the ghost structure. The structures contain the following records,
+\begin{align*}
+\mathscr{I}_1 &= \{ x_1, x_2, x_3, x_4, x_5\} \\
+\mathscr{I}_2 &= \{ x_6, x_7, x_8 \} \\
+\mathscr{I}_G &= \{x_1, x_2, x_3 \}
+\end{align*}
+where the subscript indicates the proximity to some point, $p$. Thus,
+the correct answer to the query $F(\mathscr{I}, (3, p))$ would be the
+set of points $\{x_4, x_5, x_6\}$.
+
+Querying each of the three blocks independently, however, will produce
+an incorrect answer. The partial results will be,
+\begin{align*}
+r_1 = \{x_1, x_2, x_3\} \\
+r_2 = \{x_6, x_7, x_8\} \\
+r_g = \{x_1, x_2, x_3\}
+\end{align*}
+and, assuming that $\mergeop$ returns the $k$ elements closest to $p$
+from the inputs, and $\Delta$ removes matching elements, performing
+$r_1~\mergeop~r_2~\Delta~r_g$ will give an answer of $\{\}$, which
+has insufficient records, and performing $r_1~\Delta~r_g~\mergeop~r_2$
+will provide a result of $\{x_6, x_7, x_8\}$, which is wrong.
+\end{example}
+
+From this example, we can draw two conclusions about performing $k$-NN
+using a ghost structure for deletes. First, we must ensure that all of
+the local queries against the primary structure are merged, prior to
+removing any deleted records, to ensure correctness. Second, once the
+ghost structure records have been removed, we may need to go back to
+the dynamized structure for more records to ensure that we have enough.
+Both of these requirements can be accomodated by the IDSP model, and the
+resulting query algorithm is shown in Algorithm~\ref{alg:idsp-knn}. This
+algorithm assumes that the data structure in question can save the
+current traversal state in the meta-information object, and resume a
+$k$-NN query on the structure from that state at no cost. 
+
+\SetKwFunction{repeat}{repeat}
+
+\begin{algorithm}[th]
+    \caption{$k$-NN with Iterative Decomposability}
+    \label{alg:idsp-knn}
+    \KwIn{$k$: result size, $p$: query point}
+    \Def{\preproc{$q=(k, p)$, $\mathscr{I}_i$}}{
+        \Return $\mathscr{I}_i.\text{initialize\_state}(k, p)$ \;
+    }
 
-\begin{equation}
-    \label{eq3:Bentley-Saxe}
-    O\left(\log n \cdot \left( Q_s(n) + C(n)\right)\right)
-\end{equation}
-where $Q_s(n)$ is the cost of the query against one partition, and
-$C(n)$ is the cost of the merge operator.
-
-Let $Q_s(n)$ represent the cost of \mathbftt{local\_query} and
-$C(n)$ the cost of \mathbftt{merge} in the extended decomposability
-case. Additionally, let $P(n)$ be the cost of $\mathbftt{local\_preproc}$
-and $\mathcal{D}(n)$ be the cost of \mathbftt{distribute\_query}.
-Additionally, recall that $|D| = \log n$ for the Bentley-Saxe method.
-In this case, the cost of a query is
-\begin{equation}
-    O \left( \log n \cdot P(n) + \mathcal{D}(n) + 
-             \log n \cdot Q_s(n) + C(n) \right)
-\end{equation}
+    \BlankLine
+    \Def{\distribute{$\mathscr{M}_1$, ..., $\mathscr{M}_m$, $q=(k,p)$}}{
+        \For {$i\gets1 \ldots m$} {
+            $q_i \gets (k, p, \mathscr{M}_i)$ \;
+        }
 
-Superficially, this looks to be strictly worse than the Bentley-Saxe
-case in Equation~\ref{eq3:Bentley-Saxe}. However, the important
-thing to understand is that for $C(n)$-decomposable queries, $P(n)
-\in O(1)$ and $\mathcal{D}(n) \in O(1)$, as these steps are unneeded.
-Thus, for normal decomposable queries, the cost actually reduces
-to,
-\begin{equation}
-    O \left( \log n \cdot Q_s(n) + C(n) \right)
-\end{equation}
-which is actually \emph{better} than Bentley-Saxe. Meanwhile, the
-ability perform state-sharing between queries can facilitate better
-solutions than would otherwise be possible.
-
-In light of this new approach, consider the two examples of
-non-decomposable search problems from Section~\ref{ssec:decomp-limits}.
-
-\subsection{k-Nearest Neighbor}
-\label{ssec:knn}
-The KNN problem is $C(n)$-decomposable, and Section~\ref{sssec-decomp-limits-knn}
-arrived at a Bentley-Saxe based solution to this problem based on
-VPTree, with a query cost of
-\begin{equation}
-    O \left( k \log^2 n + k \log n \log k \right)
-\end{equation}
-by running KNN on each partition, and then merging the result sets
-with a heap.
-
-Applying the interface of extended-decomposability to this problem
-allows for some optimizations. Pre-processing is not necessary here,
-but the variadic merge function can be leveraged to get an asymptotically
-better solution. Simply dropping the existing algorithm into this
-interface will result in a merge algorithm with cost,
-\begin{equation}
-    C(n) \in O \left( k \log n \left( \log k + \log\log n\right)\right)
-\end{equation}
-which results in a total query cost that is slightly \emph{worse}
-than the original,
+        \Return $q_1 \ldots q_m$ \;
+    }
 
-\begin{equation}
-    O \left( k \log^2 n + k \log n \left(\log k + \log\log n\right) \right)
-\end{equation}
+    \BlankLine
+    \Def{\query{$\mathscr{I}_i$, $q_i=(k,p,\mathscr{M}_i)$}}{
+        $(r_i, \mathscr{M}_i) \gets \mathscr{I}_i.\text{knn\_from}(k, p, \mathscr{M}_i)$ \;
+        \BlankLine
+        \Comment{The local result includes the records stored in a priority queue and query state}
+        \Return $(r_i, \mathscr{M}_i)$  \;
+    }
 
-The problem is that the number of records considered in a given
-merge has grown from $O(k)$ in the binary merge case to $O(\log n
-\cdot k)$ in the variadic merge. However, because the merge function
-now has access to all of the data at once, the algorithm can be modified
-slightly for better efficiency by only pushing $\log n$ elements
-into the heap at a time. This trick only works if 
-the $R_i$s are in sorted order relative to $f(x, q)$,
-however this condition is satisfied by the result sets returned by
-KNN against a VPTree. Thus, for each $R_i$, the first element in sorted
-order can be inserted into the heap,
-element in sorted order into the heap, tagged with a reference to
-which $R_i$ it was taken from. Then, when the heap is popped, the
-next element from the associated $R_i$ can be inserted.
-This allows the heap's size to be maintained at no larger 
-than $O(\log n)$, and limits the algorithm to no more than
-$k$ pop operations and $\log n + k - 1$ pushes.
-
-This algorithm reduces the cost of KNN on this structure to,
-\begin{equation}
-    O(k \log^2 n + \log n)
-\end{equation}
-which is strictly better than the original.
+    \BlankLine
+    \Def{\combine{$r_1, \ldots, r_m, \ldots, r_n$, $q=(k,p)$}}{
+        $R \gets \{\}$ \;
+        $pq \gets \text{PriorityQueue}()$ ;
+        $gpq \gets \text{PriorityQueue}()$ \; 
+        \BlankLine
+        \Comment{Results $1$ through $m$ are from the primary structure,
+        and $m+1$ through $n$ are from the ghost structure.}
+        \For {$i\gets 1 \ldots m$} {
+            $pq.\text{enqueue}(i, r_i.\text{front}())$ \;
+        }
+
+        \For {$i \gets m+1 \ldots n$} {
+            $gpq.\text{enqueue}(i, r_i.\text{front}())$
+        }
+
+    \BlankLine
+    \Comment{Process the primary local results}
+        \While{$|R| < k \land \neg pq.\text{empty}()$} {
+            $(i, d) \gets pq.\text{dequeue}()$ \;
+
+            \BlankLine
+            $R \gets R \cup r_i.\text{dequeue}()$ \;
+            \If {$\neg r_i.\text{empty}()$} {
+                $pq.\text{enqueue}(i, r_i.\text{front}())$ \;
+            }
+        }
+
+    \BlankLine
+    \Comment{Process the ghost local results}
+    \While{$\neg gpq.\text{empty}()$} {
+        $(i, d) \gets gpq.\text{dequeue}()$ \;
+
+            \BlankLine
+        \If {$r_i.\text{front}() \in R$} {
+            $R \gets R / \{r_i.\text{front}()\}$ \;
+
+            \If {$\neg r_i.\text{empty}()$} {
+                $gpq.\text{enqueue}(i, r_i.\text{front}())$ \;
+            }
+        }
+    }
 
-\subsection{Independent Range Sampling}
+    \BlankLine
+        \Return $R$ \;
+    }
+    \BlankLine
+    \Def{\repeat{$q=(k,p), R, q_1,\ldots q_m$}} {
+        $missing \gets k - R.\text{size}()$ \;
+        \If {$missing > 0$} {
+            \For {$i \gets 1\ldots m$} {
+                $q_i \gets (missing, p, q_i.\mathscr{M}_i)$ \;
+            }
+
+            \Return $(True, q_1 \ldots q_m)$ \;
+        }
+
+        \Return $(False, q_1 \ldots q_m)$ \;
+    }
+\end{algorithm}
 
-The eDSP abstraction also provides sufficient features to implement
-IRS, using the same basic approach as was used in the previous
-chapter. Unlike KNN, IRS will take advantage of the extended query
-interface. Recall from the Chapter~\ref{chap:sampling} that the approach used
-for answering sampling queries (ignoring the buffer, for now) was,
-
-\begin{enumerate}
-    \item Query each shard to establish the weight that should be assigned to the
-        shard in sample size assignments.
-    \item Build an alias structure over those weights.
-    \item For each sample, reference the alias structure to determine which shard
-        to sample from, and then draw the sample.
-\end{enumerate}
-
-This approach can be mapped easily onto the eDSP interface as follows,
-\begin{itemize}
-    \item[\texttt{local\_preproc}] Determine and return the total weight of candidate records for
-        sampling in the shard.
-    \item[\texttt{distribute\_query}] Using the shard weights, construct an alias structure associating
-        each shard with its total weight. Then, query this alias structure $k$ times. For shard $i$, the
-        local query $\mathscr{Q}_i$ will have its sample size assigned based on how many times $i$ is returned
-        during the alias querying.
-    \item[\texttt{local\_query}] Process the local query using the underlying data structure's normal sampling
-        procedure.
-    \item[\texttt{merge}] Union all of the partial results together.
-\end{itemize}
 
-This division of the query maps closely onto the cost function,
-\begin{equation}
-    O\left(P(n) + kS(n)\right)
-\end{equation}
-used in Chapter~\ref{chap:sampling}, where the $W(n) + P(n)$ pre-processing
-cost is associated with the cost of \texttt{local\_preproc} and the
-$kS(n)$ sampling cost is associated with $\texttt{local\_query}$.
-The \texttt{distribute\_query} operation will require $O(\log n)$
-time to construct the shard alias structure, and $O(k)$ time to
-query it. Accounting then for the fact that \texttt{local\_preproc}
-will be called once per shard ($\log n$ times), and a total of $k$
-records will be sampled as the cost of $S(n)$ each, this results
-in a total query cost of,
-\begin{equation}
-    O\left(\left[W(n) + P(n)\right]\log n + k S(n)\right)
-\end{equation}
-which matches the cost in Equation~\ref{eq:sample-cost}.
-
-\section{Record Identity}
-
-Another important consideration for the framework is support for
-deletes, which are important in the contexts of database systems.
-The sampling extension framework supported two techniques
-for the deletion of records: tombstone-based deletes and tagging-based
-deletes. In both cases, the solution required that the shard support
-point lookups, either for checking tombstones or for finding the
-record to mark it as deleted. Implicit in this is an important
-property of the underlying data structure which was taken for granted
-in that work, but which will be made explicit here: record identity.
-
-Delete support requires that each record within the index be uniquely
-identifiable, and linkable directly to a location in storage. This 
-property is called \emph{record identity}.
- In the context of database
-indexes, it isn't a particularly contentious requirement. Indexes
-already are designed to provide a mapping directly to a record in
-storage, which (at least in the context of RDBMS) must have a unique
-identifier attached. However, in more general contexts, this
-requirement will place some restrictions on the applicability of
-the framework.
-
-For example, approximate data structures or summaries, such as Bloom
-filters~\cite{bloom70} or count-min sketches~\cite{countmin-sketch}
-are data structures which don't necessarily store the underlying
-record. In principle, some summaries \emph{could} be supported by
-normal Bentley-Saxe as there exist mergeable
-summaries~\cite{mergeable-summaries}. But because these data structures
-violate the record identity property, they would not support deletes
-(either in the framework, or Bentley-Saxe). The framework considers
-deletes to be a first-class citizen, and this is formalized by
-requiring record identity as a property that supported data structures
-must have.
-
-\section{The General Framework}
-
-Based on these properties, and the work described in
-Chapter~\ref{chap:sampling}, dynamic extension framework has been devised with
-broad support for data structures. It is implemented in C++20, using templates
-and concepts to define the necessary interfaces. A user of this framework needs
-to provide a definition for their data structure with a prescribed interface
-(called a \texttt{shard}), and a definition for their query following an
-interface based on the above definition of an eDSP. These two classes can then
-be used as template parameters to automatically create a dynamic index, which
-exposes methods for inserting and deleting records, as well as executing
-queries.
-
-\subsection{Framework Design}
-
-\Paragraph{Structure.} The overall design of the general framework
-itself is not substantially different from the sampling framework
-discussed in the Chapter~\ref{chap:sampling}. It consists of a mutable buffer
-and a set of levels containing data structures with geometrically
-increasing capacities.  The \emph{mutable buffer} is a small unsorted
-record array of fixed capacity that buffers incoming inserts. As
-the mutable buffer is kept sufficiently small (e.g. fits in L2 CPU
-cache), the cost of querying it without any auxiliary structures
-can be minimized, while still allowing better insertion performance
-than Bentley-Saxe, which requires rebuilding an index structure for
-each insertion.  The use of an unsorted buffer is necessary to
-ensure that the framework doesn't require an existing dynamic version
-of the index structure being extended, which would defeat the purpose
-of the entire exercise.
-
-The majority of the data within the structure is stored in a sequence
-of \emph{levels} with geometrically increasing record capacity,
-such that the capacity of level $i$ is $s^{i+1}$, where $s$ is a
-configurable parameter called the \emph{scale factor}.  Unlike
-Bentley-Saxe, these levels are permitted to be partially full, which
-allows significantly more flexibility in terms of how reconstruction
-is performed. This also opens up the possibility of allowing each
-level to allocate its record capacity across multiple data structures
-(named \emph{shards}) rather than just one. This decision is called
-the  \emph{layout policy}, with the use of a single structure being
-called \emph{leveling}, and multiple structures being called
-\emph{tiering}.
-
-\begin{figure}
-\centering
-\subfloat[Leveling]{\includegraphics[width=.5\textwidth]{img/leveling} \label{fig:leveling}}
-\subfloat[Tiering]{\includegraphics[width=.5\textwidth]{img/tiering} \label{fig:tiering}}
-    \caption{\textbf{An overview of the general structure of the
-    dynamic extension framework} using leveling (Figure~\ref{fig:leveling}) and
-tiering (Figure~\ref{fig:tiering}) layout policies. The pictured extension has
-a scale factor of 3, with $L_0$ being at capacity, and $L_1$ being at
-one third capacity. Each shard is shown as a dotted box, wrapping its associated
-dataset ($D_i$), data structure ($I_i$), and auxiliary structures $(A_i)$. }
-\label{fig:framework}
-\end{figure}
-
-\Paragraph{Shards.} The basic building block of the dynamic extension
-is called a shard, defined as $\mathcal{S}_i = (\mathcal{D}_i,
-\mathcal{I}_i, A_i)$, which consists of a partition of the data
-$\mathcal{D}_i$, an instance of the static index structure being
-extended $\mathcal{I}_i$, and an optional auxiliary structure $A_i$.
-To ensure the viability of level reconstruction, the extended data
-structure should at least support a construction method
-$\mathtt{build}(\mathcal{D})$ that can build a new static index
-from a set of records $\mathcal{D}$ from scratch. This set of records
-may come from the mutable buffer, or from a union of underlying
-data of multiple other shards. It is also beneficial for $\mathcal{I}_i$
-to support efficient point-lookups, which can search for a record's
-storage location by its identifier (given by the record identify
-requirements of the framework). The shard can also be customized
-to provide any necessary features for supporting the index being
-extended.  For example, auxiliary data structures like Bloom filters
-or hash tables can be added to improve point-lookup performance,
-or additional, specialized query functions can be provided for use
-by the query functions.
-
-From an implementation standpoint, the shard object provides a shim
-between the data structure and the framework itself. At minimum,
-it must support the following interface,
-\begin{itemize}
-    \item $\mathbftt{construct}(B) \to S$ \\
-    Construct a new shard from the contents of the mutable buffer, $B$.
 
-    \item $\mathbftt{construct}(S_0, \ldots, S_n) \to S$ 
-    Construct a new shard from the records contained within a list of already
-    existing shards.
 
-    \item $\mathbftt{point\_lookup}(r) \to *r$ \\
-    Search for a record, $r$, by identity and return a reference to its
-    location in storage.
-\end{itemize}
+\subsection{Search Problem Taxonomy}
 
-\Paragraph{Insertion \& deletion.} The framework supports inserting
-new records and deleting records already in the index. These two
-operations also allow for updates to existing records, by first
-deleting the old version and then inserting a new one. These
-operations are added by the framework automatically, and require
-only a small shim or minor adjustments to the code of the data
-structure being extended within the implementation of the shard
-object.
-
-Insertions are performed by first wrapping the record to be inserted
-with a framework header, and then appending it to the end of the
-mutable buffer. If the mutable buffer is full, it is flushed to
-create a new shard, which is combined into the first level of the
-structure. The level reconstruction process is layout policy
-dependent. In the case of leveling, the underlying data of the
-source shard and the target shard are combined, resulting a new
-shard replacing the target shard in the target level. When using
-tiering, the newly created shard is simply placed into the target
-level. If the target level is full, the framework first triggers a merge on the
-target level, which will create another shard at one higher level,
-and then inserts the former shard at the now empty target level.
-Note that each time a new shard is created, the framework must invoke
-$\mathtt{build}$ to construct a new index from scratch for this
-shard.
-
-The framework supports deletes using two approaches: either by
-inserting a special tombstone record or by performing a lookup for
-the record to be deleted and setting a bit in the header. This
-decision is called the \emph{delete policy}, with the former being
-called \emph{tombstone delete} and the latter \emph{tagged delete}.
-The framework will automatically filter deleted records from query
-results before returning them to the user, either by checking for
-the delete tag, or by performing a lookup of each record for an
-associated tombstone. The number of deleted records within the
-framework can be bounded by canceling tombstones and associated
-records when they meet during reconstruction, or by dropping all
-tagged records when a shard is reconstructed. The framework also
-supports aggressive reconstruction (called \emph{compaction}) to
-precisely bound the number of deleted records within the index,
-which can be helpful to improve the performance of certain types
-of query. This is useful for certain search problems, as was seen with
-sampling queries in Chapter~\ref{chap:sampling}, but is not
-generally necessary to bound query cost in most cases.
-
-\Paragraph{Design space.} The framework described in this section
-has a large design space. In fact, much of the design space has
-similar knobs to the well-known LSM Tree~\cite{dayan17}, albeit in
-a different environment: the framework targets in-memory static
-index structures for general extended decomposable queries without
-efficient index merging support, whereas the LSM-tree targets
-external range indexes that can be efficiently merged.  
-
-The framework's design trades off among auxiliary memory usage, read performance,
-and write performance. The two most significant decisions are the
-choice of layout and delete policy. A tiering layout policy reduces
-write amplification compared to leveling, requiring each record to
-only be written once per level, but increases the number of shards
-within the structure, which can hurt query performance. As for
-delete policy, the use of tombstones turns deletes into insertions,
-which are typically faster. However, depending upon the nature of
-the query being executed, the delocalization of the presence
-information for a record may result in one extra point lookup for
-each record in the result set of a query, vastly reducing read
-performance. In these cases, tagging may make more sense. This
-results in each delete turning into a slower point-lookup, but
-always allows for constant-time visibility checks of records. The
-other two major parameters, scale factor and buffer size, can be
-used to tune the performance once the policies have been selected.
-Generally speaking, larger scale factors result in fewer shards,
-but can increase write amplification under leveling.  Large buffer
-sizes can adversely affect query performance when an unsorted buffer
-is used, while allowing higher update throughput. Because the overall
-design of the framework remains largely unchanged, the design space
-exploration of Section~\ref{ssec:ds-exp} remains relevant here.
-
-\subsection{The Shard Interface}
-
-The shard object serves as a ``shim'' between a data structure and
-the extension framework, providing a set of mandatory functions
-which are used by the framework code to facilitate reconstruction
-and deleting records. The data structure being extended can be
-provided by a different library and included as an attribute via 
-composition/aggregation, or can be directly implemented within the 
-shard class. Additionally, shards can contain any necessary auxiliary
-structures, such as bloom filters or hash tables, as necessary to
-support the required interface.
-
-The require interface for a shard object is as follows,
-\begin{verbatim}
-    new(MutableBuffer) -> Shard
-    new(Shard[]) -> Shard
-    point_lookup(Record, Boolean) -> Record
-    get_data() -> Record
-    get_record_count() -> Int
-    get_tombstone_count() -> Int
-    get_memory_usage() -> Int
-    get_aux_memory_usage() -> Int
-\end{verbatim}
-
-The first two functions are constructors, necessary to build a new Shard
-from either an array of other shards (for a reconstruction), or from
-a mutable buffer (for a buffer flush).\footnote{
-    This is the interface as it currently stands in the existing implementation, but
-    is subject to change. In particular, we are considering changing the shard reconstruction
-    procedure to allow for only one necessary constructor, with a more general interface. As
-    we look to concurrency, being able to construct shards from arbitrary combinations of shards
-    and buffers will become convenient, for example.
- } 
-The \texttt{point\_lookup} operation is necessary for delete support, and is
-used either to locate a record for delete when tagging is used, or to search
-for a tombstone associated with a record when tombstones are used. The boolean
-is intended to be used to communicate to the shard whether the lookup is
-intended to locate a tombstone or a record, and is meant to be used to allow
-the shard to control whether a point lookup checks a filter before searching,
-but could also be used for other purposes. The \texttt{get\_data}
-function exposes a pointer to the beginning of the array of records contained
-within the shard--it imposes no restriction on the order of these records, but
-does require that all records can be accessed sequentially from this pointer,
-and that the order of records does not change. The rest of the functions are
-accessors for various shard metadata. The record and tombstone count numbers
-are used by the framework for reconstruction purposes.\footnote{The record
-count includes tombstones as well, so the true record count on a level is
-$\text{reccnt} - \text{tscnt}$.} The memory usage statistics are, at present,
-only exposed directly to the user and have no effect on the framework's
-behavior. In the future, these may be used for concurrency control and task
-scheduling purposes.
-
-Beyond these, a shard can expose any additional functions that are necessary
-for its associated query classes. For example, a shard intended to be used for
-range queries might expose upper and lower bound functions, or a shard used for
-nearest neighbor search might expose a nearest-neighbor function.
-
-\subsection{The Query Interface}
-\label{ssec:fw-query-int}
-
-The required interface for a query in the framework is a bit more
-complicated than the interface defined for an eDSP, because the
-framework needs to query the mutable buffer as well as the shards.
-As a result, there is some slight duplication of functions, with
-specialized query and pre-processing routines for both shards and
-buffers. Specifically, a query must define the following functions,
-\begin{verbatim}
-    get_query_state(QueryParameters, Shard) -> ShardState;
-    get_buffer_query_state(QueryParameters, Buffer) -> BufferState;
-
-    process_query_states(QueryParameters, ShardStateList, BufferStateList) -> LocalQueryList;
-
-    query(LocalQuery, Shard) -> ResultList
-    buffer_query(LocalQuery, Buffer) -> ResultList
-
-    merge(ResultList) -> FinalResult
-
-    delete_query_state(ShardState)
-    delete_buffer_query_state(BufferState)
-
-    bool EARLY_ABORT;
-    bool SKIP_DELETE_FILTER;
-\end{verbatim}
-
-The \texttt{get\_query\_state} and \texttt{get\_buffer\_query\_state} functions
-map to the \texttt{local\_preproc} operation of the eDSP definition for shards
-and buffers respectively. \texttt{process\_query\_states} serves the function
-of \texttt{distribute\_query}. Note that this function takes a list of buffer
-states; although the proposed framework above contains only a single buffer,
-future support for concurrency will require multiple buffers, and so the
-interface is set up with support for this. The \texttt{query} and
-\texttt{buffer\_query} functions execute the local query against the shard or
-buffer and return the intermediate results, which are merged using
-\texttt{merge} into a final result set. The \texttt{EARLY\_ABORT} parameter can
-be set to \texttt{true} to force the framework to immediately return as soon as
-the first result is found, rather than querying the entire structure, and the
-\texttt{SKIP\_DELETE\_FILTER} disables the framework's automatic delete
-filtering, allowing deletes to be manually handled within the \texttt{merge}
-function by the developer. These flags exist to allow for optimizations for
-certain types of query. For example, point-lookups can take advantage of
-\texttt{EARLY\_ABORT} to stop as soon as a match is found, and
-\texttt{SKIP\_DELETE\_FILTER} can be used for more efficient tombstone delete
-handling in range queries, where tombstones for results will always be in the
-\texttt{ResultList}s going into \texttt{merge}.
-
-The framework itself answers queries by simply calling these routines in 
-a prescribed order,
-\begin{verbatim}
-query(QueryArguments qa) BEGIN
-    FOR i < BufferCount DO
-        BufferStates[i] = get_buffer_query_state(qa, Buffers[i])
-    DONE
-
-    FOR i < ShardCount DO
-        ShardStates[i] = get_query_state(qa, Shards[i])
-    DONE
-
-    process_query_states(qa, ShardStates, BufferStates)
-
-    FOR i < BufferCount DO 
-        temp = buffer_query(BufferStates[i], Buffers[i])
-        IF NOT SKIP_DELETE_FILTER THEN
-            temp = filter_deletes(temp)
-        END
-        Results[i] = temp;
-
-        IF EARLY_ABORT AND Results[i].size() > 0 THEN
-            delete_states(ShardStates, BufferStates)
-            return merge(Results)
-        END
-    DONE
-
-    FOR i < ShardCount DO
-        temp = query(ShardStates[i], Shards[i])
-        IF NOT SKIP_DELETE_FILTER THEN
-            temp = filter_deletes(temp)
-        END
-        Results[i + BufferCount] = temp
-        IF EARLY_ABORT AD Results[i + BufferCount].size() > 0 THEN
-            delete_states(ShardStates, BufferStates)
-            return merge(Results)
-        END
-    DONE
-
-    delete_states(ShardStates, BufferStates)
-    return merge(Results)
-END
-\end{verbatim}
-
-\subsubsection{Standardized Queries}
-
-Provided with the framework are several "standardized" query classes, including
-point lookup, range query, and IRS. These queries can be freely applied to any
-shard class that implements the necessary optional interfaces. For example, the
-provided IRS and range query both require the shard to implement a
-\texttt{lower\_bound} and \texttt{upper\_bound} function that returns an index.
-They then use this index to access the record array exposed via
-\texttt{get\_data}. This is convenient, because it helps to separate the search
-problem from the data structure, and moves towards presenting these two objects
-as orthogonal.
-
-In the next section the framework is evaluated by producing a number of indexes
-for three different search problems. Specifically, the framework is applied to
-a pair of learned indexes, as well as an ISAM-tree. All three of these shards
-provide the bound interface described above, meaning that the same range query
-class can be used for all of them. It also means that the learned indexes
-automatically have support for IRS. And, of course, they also all can be used
-with the provided point-lookup query, which simply uses the required
-\texttt{point\_lookup} function of the shard.
-
-At present, the framework only supports associating a single query class with
-an index. However, this is simply a limitation of implementation. In the future,
-approaches will be considered for associating arbitrary query classes to allow
-truly multi-purpose indexes to be constructed. This is not to say that every
-data structure will necessarily be efficient at answering every type of query 
-that could be answered using their interface--but in a database system, being
-able to repurpose an existing index to accelerate a wide range of query types
-would certainly seem worth considering.
-
-\section{Framework Evaluation}
-
-The framework was evaluated using three different types of search problem:
-range-count, high-dimensional k-nearest neighbor, and independent range
-sampling. In all three cases, an extended static data structure was compared
-with dynamic alternatives for the same search problem to demonstrate the
-framework's competitiveness.
-
-\subsection{Methodology} 
-
-All tests were performed using Ubuntu 22.04
-LTS on a dual-socket Intel Xeon Gold 6242R server with 384 GiB of
-installed memory and 40 physical cores. Benchmark code was compiled
-using \texttt{gcc} version 11.3.0 at the \texttt{-O3} optimization level.
-
-
-\subsection{Range Queries}
-
-A first test evaluates the performance of the framework in the context of
-range queries against learned indexes. In Chapter~\ref{chap:intro}, the
-lengthy development cycle of this sort of data structure was discussed,
-and so learned indexes were selected as an evaluation candidate to demonstrate
-how this framework could allow such lengthy development lifecycles to be largely
-bypassed.
-
-Specifically, the framework is used to produce dynamic learned indexes based on
-TrieSpline~\cite{plex} (DE-TS) and the static version of PGM~\cite{pgm} (DE-PGM). These
-are both single-pass construction static learned indexes, and thus well suited for use
-within this framework compared to more complex structures like RMI~\cite{RMI}, which have
-more expensive construction algorithms. The two framework-extended data structures are
-compared with dynamic learned indexes, namely ALEX~\cite{ALEX} and the dynamic version of
-PGM~\cite{pgm}. PGM provides an interesting comparison, as its native
-dynamic version was implemented using a slightly modified version Bentley-Saxe method.
-
-When performing range queries over large data sets, the
-copying of query results can introduce significant overhead. Because the four
-tested structures have different data copy behaviors, a range count query was
-used for testing, rather than a pure range query. This search problem exposes
-the searching performance of the data structures, while controlling for different
-data copy behaviors, and so should provide more directly comparable results.
-
-Range count
-queries were executed with a selectivity of $0.01\%$ against three datasets
-from the SOSD benchmark~\cite{sosd-datasets}: \texttt{book}, \texttt{fb}, and
-\texttt{osm}, which all have 200 million 64-bit keys following a variety of
-distributions, which were paired with uniquely generated 64-bit values. There
-is a fourth dataset in SOSD, \texttt{wiki}, which was excluded from testing
-because it contained duplicate keys, which are not supported by dynamic
-PGM.\footnote{The dynamic version of PGM supports deletes using tombstones,
-but doesn't wrap records with a header to accomplish this. Instead it reserves
-one possible value to represent a tombstone. Records are deleted by inserting a
-record having the same key, but this different value. This means that duplicate
-keys, even if they have different values, are unsupported as two records with
-the same key will be treated as a delete by the index.~\cite{pgm} }
-
-The shard implementations for DE-PGM and DE-TS required about 300 lines of
-C++ code each, and no modification to the data structures themselves. For both
-data structures, the framework was configured with a buffer of 12,000 records, a scale
-factor of 8, the tombstone delete policy, and tiering. Each shard stored $D_i$
-as a sorted array of records, used an instance of the learned index for
-$\mathcal{I}_i$, and has no auxiliary structures. The local query routine used
-the learned index to locate the first key in the query range and then iterated
-over the sorted array until the end of the range is reached, counting the
-number of records and tombstones required. The mutable buffer query performed
-the counting over a full scan.  No local preprocessing was needed, and the merge
-operation simply summed the record and tombstone counts, and returned their
-difference.
-
-\begin{figure*}[t]
-    \centering
-    \subfloat[Update Throughput]{\includegraphics[width=.5\textwidth]{img/fig-bs-rq-insert} \label{fig:rq-insert}}
-    \subfloat[Query Latency]{\includegraphics[width=.5\textwidth]{img/fig-bs-rq-query} \label{fig:rq-query}} \\
-    \subfloat[Index Sizes]{\includegraphics[width=.5\textwidth, trim=5mm 5mm 0 0 ]{img/fig-bs-rq-space} \label{fig:idx-space}}
-    \caption{Range Count Evaluation}
-    \label{fig:results1}
-\end{figure*}
-
-Figure~\ref{fig:rq-insert} shows the update throughput of all competitors. ALEX
-performs the worst in all cases, and PGM performs the best, with the extended
-indexes falling in the middle. It is not unexpected that PGM performs better
-than the framework, because the Bentley-Saxe extension in PGM is custom-built,
-and thus has a tighter integration than a general framework would allow.
-However, even with this advantage, DE-PGM still reaches up to 85\% of PGM's
-insertion throughput. Additionally, Figure~\ref{fig:rq-query} shows that PGM
-pays a large cost in query latency for its advantage in insertion, with the
-framework extended indexes significantly outperforming it. Further, DE-TS even
-outperforms ALEX for query latency in some cases. Finally,
-Figure~\ref{fig:idx-space} shows the storage cost of the indexes, without
-counting the space necessary to store the records themselves. The storage cost
-of a learned index is fairly variable, as it is largely a function of the
-distribution of the data, but in all cases, the extended learned
-indexes, which build compact data arrays without gaps, occupy three orders of
-magnitude smaller storage space compared to ALEX, which requires leaving gaps
-in the data arrays.
-
-\subsection{High-Dimensional k-Nearest Neighbor} 
-The next test evaluates the framework for the extension of high-dimensional 
-metric indexes for the k-nearest neighbor search problem. An M-tree~\cite{mtree}
-was used as the dynamic baseline,\footnote{
-    Specifically, the M-tree implementation tested can be found at \url{https://github.com/dbrumbaugh/M-Tree}
-    and is a fork of a structure written originally by Eduardo D'Avila, modified to compile under C++20. The
-    tree uses a random selection algorithm for ball splitting.
-} and a VPTree~\cite{vptree} as the static structure. The framework was used to
-extend VPTree to produce the dynamic version, DE-VPTree.
-An M-Tree is a tree that partitions records based on
-high-dimensional spheres and supports updates by splitting and merging these
-partitions. 
-A VPTree is a binary tree that is produced by recursively selecting
-a point, called the vantage point, and partitioning records based on their
-distance from that point. This results in a difficult to modify structure that
-can be constructed in $O(n \log n)$ time and can answer KNN queries in $O(k
-\log n)$ time.
-
-DE-VPTree, used a buffer of 12,000 records, a scale factor of 6, tiering, and
-delete tagging. The query was implemented without a pre-processing step, using
-the standard VPTree algorithm for  KNN queries against each shard.  All $k$
-records were determined for each shard, and then the merge operation used a
-heap to merge the results sets together and return the $k$ nearest neighbors
-from the $k\log(n)$ intermediate results. This is a type of query that pays a
-non-constant merge cost, even with the framework's expanded query interface, of
-$O(k \log k)$. In effect, the kNN query must be answered twice: once for each
-shard to get the intermediate result sets, and then a second time within the
-merge operation to select the kNN from the result sets.
-
-\begin{figure}
-    \centering
-    \includegraphics[width=.75\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-knn}
-    \caption{KNN Index Evaluation}
-    \label{fig:knn}
-\end{figure}
-Euclidean distance was used as the metric for both structures, and $k=1000$ was
-used for all queries. The reference point for each query was selected randomly
-from points within the dataset. Tests were run using the Spanish Billion Words
-dataset~\cite{sbw}, of 300-dimensional vectors. The results are shown in
-Figure~\ref{fig:knn}. In this case, the static nature of the VPTree allows it
-to dominate the M-Tree in query latency, and the simpler reconstruction
-procedure shows a significant insertion performance improvement as well.
-
-\subsection{Independent Range Sampling} 
-Finally, the
-framework was tested using one-dimensional IRS queries. As before,
-a static ISAM-tree was used as the data structure to be extended,
-however the sampling query was implemented using the query interface from
-Section~\ref{ssec:fw-query-int}. The pre-processing step identifies the first
-and last query falling into the range to be sampled from, and determines the
-total weight based on this range, for each shard. Then, in the local query
-generation step, these weights are used to construct and alias structure, which
-is used to assign sample sizes to each shard based on weight to avoid
-introducing skew into the results. After this, the query routine generates
-random numbers between the established bounds to sample records, and the merge
-operation appends the individual result sets together. This static procedure
-only requires a pair of tree traversals per shard, regardless of how many
-samples are taken.
-
-\begin{figure}
-    \centering
-    \subfloat[Query Latency]{\includegraphics[width=.5\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-irs-query} \label{fig:irs-query}}
-    \subfloat[Update Throughput]{\includegraphics[width=.5\textwidth, trim=5mm 5mm 0 0]{img/fig-bs-irs-insert} \label{fig:irs-insert}}
-    \caption{IRS Index Evaluation}
-    \label{fig:results2}
-\end{figure}
-
-The extended ISAM structure (DE-IRS) was compared to a B$^+$-Tree
-with aggregate weight tags on internal nodes (AGG B+Tree) for sampling
-and insertion performance, and to a single instance of the static ISAM-tree (ISAM), 
-which does not support updates. DE-IRS was configured with a buffer size
-of 12,000 records, a scale factor of 6, tiering, and delete tagging. The IRS
-queries had a selectivity of $0.1\%$ with sample size of $k=1000$. Testing
-was performed using the same datasets as were used for range queries.
-
-Figure~\ref{fig:irs-query}
-shows the significant latency advantage that the dynamically extended ISAM tree
-enjoys compared to a B+Tree. DE-IRS is up to 23 times faster than the B$^+$-Tree at
-answering sampling queries, and only about 3 times slower than the fully static
-solution.  In this case, the extra query cost caused by needing to query
-multiple structures is more than balanced by the query efficiency of each of
-those structures, relative to tree sampling.  Interestingly, the framework also
-results in better update performance compared to the B$^+$-Tree, as shown in
-Figure~\ref{fig:irs-insert}. This is likely because the ISAM shards can be
-efficiently constructed using a combination of sorted-merge operations and
-bulk-loading, and avoid expensive structural modification operations that are
-necessary for maintaining a B$^+$-Tree.
-
-\subsection{Discussion} 
-
-
-The results demonstrate not only that the framework's update support is
-competitive with custom-built dynamic data structures, but that the framework
-is even able to, in many cases, retain some of the query performance advantage 
-of its extended static data structure. This is particularly evident in the k-nearest
-neighbor and independent range sampling tests, where the static version of the
-structure was directly tested as well. These tests demonstrate one of the advantages
-of static data structures: they are able to maintain much tighter inter-record relationships
-than dynamic ones, because update support typically requires relaxing these relationships
-to make it easier to update them. While the framework introduces the overhead of querying
-multiple structures and merging them together, it is clear from the results that this overhead
-is generally less than the overhead incurred by the update support techniques used
-in the dynamic structures. The only case where the framework was defeated in query performance
-was in competition with ALEX, where the resulting query latencies were comparable.
-
-It is also evident that the update support provided by the framework is on par with, if not
-superior, to that provided by the dynamic baselines, at least in terms of throughput. The 
-framework will certainly suffer from larger tail latency spikes, which weren't measured in
-this round of testing, due to the larger scale of the reconstructions, but the amortization
-of these costs over a large number of inserts allows for the maintenance of a respectable
-level of throughput. In fact, the only case where the framework loses in insertion throughput
-is against the dynamic PGM. However, an examination of the query latency reveals that this 
-is likely due to the fact that the standard configuration of the Bently-Saxe variant used
-by PGM is highly tuned for insertion performance, as the query latencies against this data
-structure are far worse than any other learned index tested, so even this result shouldn't
-be taken as a ``clear'' defeat of the framework's implementation.
-
-Overall, it is clear from this evaluation that the dynamic extension framework is a
-promising alternative to manual index redesign for accommodating updates. In almost 
-all cases, the framework-extended static data structures provided superior insertion
-throughput in all cases, and query latencies that either matched or exceeded that of
-the dynamic baselines. Additionally, though it is hard to quantity, the code complexity
-of the framework-extended data structures was much less, with the shard implementations
-requiring only a small amount of relatively straightforward code to interface with pre-existing
-static data structures, or with the necessary data structure implementations themselves being
-simpler.
+Having defined two new classes of search problem, it seems sensible
+at this point to collect our definitions together with pre-existing
+ones from the classical literature, and present a cohesive taxonomy
+of the search problems for which our techniques can be used to
+support dynamization. This taxonomy is shown in the Venn diagrams of
+Figure~\ref{fig:taxonomy}. Note that, for convenience, the search problem
+classications relevant for supporting deletes have been seperated out
+into a seperate diagram. In principle, this deletion taxonomy can be
+thought of as being nested inside of each of the general search problem
+classifications, as the two sets of classification are orthogonal. That
+a search problem falls into a particular classification in the general
+taxonomy doesn't imply any particular information about where in the
+deletion taxonomy that same problem might also fall.
 
-\section{Conclusion}
+\begin{figure}[t]
+	\subfloat[General Taxonomy]{\includegraphics[width=.49\linewidth]{diag/taxonomy}
+    \label{fig:taxonomy-main}} 
+	\subfloat[Deletion Taxonomy]{\includegraphics[width=.49\linewidth]{diag/deletes} \label{fig:taxonomy-deletes}}
+	\caption{An overview of the Taxonomy of Search Problems, as relevant to
+		our discussion of data structure dynamization. Our proposed extensions
+        are marked with an asterisk (*) and colored yellow. 
+    }
+	\label{fig:taxonomy}
+\end{figure} 
+
+Figure~\ref{fig:taxonomy-main} illustrates the classifications of search
+problem that are not deletion-related, including standard decomposability
+(DSP), extended decomposability (eDSP), $C(n)$-decomposability
+($C(n)$-DSP), and merge decomposability (MDSP). We consider ISAM,
+TrieSpline~\cite{plex}, and succinct trie~\cite{zhang18} to be examples
+of MDSPs because the data structures can be constructed more efficiently
+from sorted data, and so when building from existing blocks, the data
+is already sorted in each block and can be merged while maintaining
+a sorted order more efficiently. VP-trees~\cite{vptree} and alias
+structures~\cite{walker74}, in contrast, don't have a convenient
+way of merging, and so must be reconstructed in full each time. We
+have classified sampling queries in this taxonomy as eDSPs because
+this implementation is more efficient than the $C(n)$-decomposable
+variant we have also discussed. $k$-NN, for reasons discussed in
+Chapter~\ref{chap:background}, are classified as $C(n)$-decomposable.
+
+The classification of range scans is a bit trickier. It is not uncommon
+in the theoretical literature for range scans to be considered DSPs, with
+$\mergeop$ taken to be the set union operator. From an implementation
+standpoint, it is sometimes possible to perform a union in $\Theta(1)$
+time. For example, in Chapter~\ref{chap:sampling} we accomplished this by
+placing sampled records directly into a shared buffer, and not having an
+explicit combine step at all. However, in the general case where we do
+need an explicit combine step, the union operation does require linear
+time in the size of the result sets to copy the records from the local
+result into the final result. The sizes of these results are functions
+of the selectivity of the range scan, but theoretically could be large
+relative to the data size, and so we've decided to err on the side of
+caution and classify range scans as $C(n)$-decomposable here. If the
+results of the range scan are expected to be returned in sorted order,
+then the problem is \emph{certainly} $C(n)$-decomposable.
+Range
+counts, on the other hand, are truly DSPs.\footnote{
+    Because of the explicit combine interface we use for eDSPs, the
+    optimization of writing samples directly into the buffer that we used
+    in the previous chapter to get a $\Theta(1)$ set union cannot be used
+    for the eDSP implementation of IRS in this chapter. However, our eDSP
+    sampling in Algorithm~\ref{alg:edsp-irs} samples \emph{exactly} $k$
+    records, and so the combination step still only requires $\Theta(k)$
+    work, and the complexity remains the same.
+} Point lookups are an example of a DSP as well, assuming that the lookup
+key is unique, or at least minimally duplicated. In the case where
+the number of results for the lookup become a substantial proportion
+of the total data size, then this search problem could be considered
+$C(n)$-decomposable for the same reason as range scans.
+
+Figure~\ref{fig:taxonomy-deletes} shows the various classes of search
+problem relevant to delete support. We have made the decision to
+classify invertible problems (INV) as a subset of deletion decomposable
+problems (DDSP), because one could always embed the ghost structure
+directly into the block implementation, use the DDSP delete operation
+to insert into that block, and handle the $\Delta$ operator as part of
+$\mathbftt{local\_query}$. We consider range count to be invertible,
+with $\Delta$ taken to be subtraction. Range scans are also invertible,
+technically, but the cost of filtering out the deleted records during
+result set merging is relatively expensive, as it requires either
+performing a sorted merge of all of the records (rather than a simple
+union) to cancel out records with their ghosts, or doing a linear
+search for each ghost record to remove its corresponding data from the
+result set.  As a result, we have classified them as DDSPs instead,
+as weak deletes are easily supported during range scans with no extra
+cost. Any records marked as deleted can simply be skipped over when
+copying into the local or final result sets. Similarly, $k$-NN queries
+admit a DDSP solution for certain data structures, but we've elected to
+classify them as IDSPs using Algorithm~\ref{alg:idsp-knn} as this is
+possible without making any modifications to the data structure to support
+weak deletes, and not all metric indexing structures support efficient
+point lookups that would be necessary to support weak deletes. We've also
+classified IRS as an IDSP, which is the only place in the taxonomy that
+it can fit. Note that IRS (and other sampling problems) are unique in this
+model in that they require the IDSP classification, but must actually
+support deletes using weak deletes. There's no way to support ghost structure
+based deletes in our general framework for sampling queries.\footnote{
+    This is in contrast to the specialized framework for sampling in
+    Chapter~\ref{chap:sampling}, where we heavily modified the query
+    process to make tombstone (which is analogous to ghost structure)
+    based deletes possible.
+}
+
+\section{Dynamization Framework}
+
+With the previously discussed new classes of search problems devised, we
+can now present our generalized framework based upon those models. This
+framework takes the form of a header-only C++20 library which can
+automatically extend data structures with support for concurrent inserts
+and deletes, depending upon the classification of the problem in the
+taxonomy of Figure~\ref{fig:taxonomy}. The user provides the data
+structure and query implementations as template parameters, and the
+framework then provides an interface that allows for queries, inserts,
+and deletes against the new dynamic structure.
+
+\subsection{Interfaces}
+
+In order to enforce interface requirements, our implementation takes
+advantage of C++20 concepts. There are three major sets of interfaces
+that the user of the framework must implement: records, shards, and
+queries. We'll discuss each of these in this section.
+
+\subsubsection{Record Interface}
+
+
+The record interface is the simplest of the three. Records are C++
+structs, and they must implement an equality comparision operator. Beyond
+this, the framework places no additional constraints and makes
+no assumptions about record contents, their ordering properties,
+etc. Though the records must be fixed length (as they are structs),
+variable length data can be supported using off-record storage and
+pointers if necessary. Each record is automatically wrapped by the
+framework with a header that is used to facilitate deletion support.
+The record concept is shown in Listing~\ref{lst:record}.
+
+\begin{lstfloat}
+\begin{lstlisting}[language=C++]
+template <typename R>
+concept RecordInterface = requires(R r, R s) {
+  { r == s } -> std::convertible_to<bool>;
+};
+\end{lstlisting}
+\caption{The required interface for record types in our dynamization framework.}
+\label{lst:record}
+\end{lstfloat}
+
+
+\subsubsection{Shard Interface}
+\subsubsection{Query Interface}
+
+\subsection{Configurability}
+
+\subsection{Concurrency}
+
+\section{Evaluation}
+\subsection{Experimental Setup}
+\subsection{Design Space Evaluation}
+\subsection{Independent Range Sampling}
+\subsection{k-NN Search}
+\subsection{Range Scan}
+\subsection{String Search}
+\subsection{Concurrency}
 
-In this chapter, a generalize version of the framework originally proposed in
-Chapter~\ref{chap:sampling} was proposed. This framework is based on two
-key properties: extended decomposability and record identity. It is capable
-of extending any data structure and search problem supporting these two properties
-with support for inserts and deletes. An evaluation of this framework was performed
-by extending several static data structures, and comparing the resulting structures'
-performance against dynamic baselines capable of answering the same type of search
-problem. The extended structures generally performed as well as, if not better, than
-their dynamic baselines in query performance, insert performance, or both. This demonstrates
-the capability of this framework to produce viable indexes in a variety of contexts. However,
-the framework is not yet complete. In the next chapter, the work required to bring this
-framework to completion will be described.
+\section{Conclusion}
diff --git a/chapters/dynamization.tex b/chapters/dynamization.tex
index edd3014..c21bfbc 100644
--- a/chapters/dynamization.tex
+++ b/chapters/dynamization.tex
@@ -462,6 +462,7 @@ is $\Theta\left(\mathscr{Q}(n)\right)$.~\cite{saxe79}
 \subsection{Merge Decomposable Search Problems}
 
 \subsection{Delete Support}
+\label{ssec:dyn-deletes}
 
 Classical dynamization techniques have also been developed with
 support for deleting records. In general, the same technique of global
diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex
index 2f2515b..ad250cc 100644
--- a/chapters/sigmod23/framework.tex
+++ b/chapters/sigmod23/framework.tex
@@ -396,7 +396,7 @@ Section~\ref{sec:sampling-implementation}.
 
 
 \subsubsection{Bounding Rejection Probability}
-
+\label{sssec:sampling-rejection-bound}
 When a sampled record has been rejected, it must be re-sampled. This
 introduces performance overhead resulting from extra memory access and
 random number generations, and hurts our ability to provide performance
diff --git a/cls/userlib.tex b/cls/userlib.tex
index c64d25a..277d997 100644
--- a/cls/userlib.tex
+++ b/cls/userlib.tex
@@ -8,6 +8,8 @@
 \newtheorem{example}{Example}
 \newtheorem{claim}{Claim}
 
+\newtheorem*{definitionIRS}{Definition 16}
+
 \DeclareMathOperator{\op}{op}
 \DeclareMathOperator{\args}{arg}
 \DeclareMathOperator{\cost}{cost}
@@ -20,3 +22,18 @@
 \newcommand\mathbftt[1]{\textnormal{\ttfamily\bfseries #1}}
 
 \newcommand\note[1]{\marginpar{\color{red}\tiny #1}}
+
+\def\mergeop{\square}
+\def\bigmergeop{\mathop{\scalebox{1}[1]{\scalerel*{\Box}{\strut}}}}
+
+
+\SetAlgoSkip{}
+\SetKwProg{Def}{def}{}{}
+\SetKw{Break}{break}
+\SetKw{Lambda}{lambda\,}
+\SetKwComment{Comment}{//}{}
+\SetKwRepeat{Do}{do}{while}
+
+
+\newfloat{lstfloat}{htbp}{lop}
+\floatname{lstfloat}{Listing}
diff --git a/paper.tex b/paper.tex
index 57d1585..ea0cd55 100644
--- a/paper.tex
+++ b/paper.tex
@@ -116,6 +116,7 @@
 \usepackage{xcolor}
 \usepackage{mathrsfs}
 \usepackage{scalerel}
+\usepackage{float}
 
 \setstretch{1.24}
 
@@ -233,8 +234,6 @@ of Engineering Science and Mechanics
 \titleformat{\section}[block]{\Large\bfseries\sffamily}{\thesection}{12pt}{}{}
 \titleformat{\subsection}[block]{\large\bfseries\sffamily}{\thesubsection}{12pt}{}{}
 
-\def\mergeop{\square}
-\def\bigmergeop{\mathop{\scalebox{1}[1]{\scalerel*{\Box}{\strut}}}}
 
 % Makes use of LaTeX's include facility. Add as many chapters
 % and appendices as you like.