\chapter{Exploring the Design Space}
\label{chap:design-space}

\section{Introduction}

In the previous two chapters, we introduced an LSM tree inspired design
space into the Bentley-Saxe method to allow for more flexibility in
performance tuning. However, aside from some general comments about how
these parameters affect insertion and query performance, and some limited
experimental evaluation, we have not performed a systematic analysis of
this space, its capabilities, and its limitations. We will rectify this
situation in this chapter, performing both a detailed mathematical
analysis of the design parameter space, as well can experimental
evaluation, to explore the space and its trade-offs, and demonstrate
their practical effectiveness.

Before diving into the design space we have introduced in detail, it's
worth taking some time to motivate this entire endeavor. There is a large
body of theoretical work in the area of data structure dynamization,
and, to the best of our knowledge, none of these papers have introduced
a design space of the sort that we have introduced here. Despite this,
some papers which \emph{use} these techniques have introduced similar
design elements into their own implementations~\cite{pgm}, with some even
going so far as to describe these elements as part of the Bentley-Saxe
method~\cite{almodaresi23}.

This situation is best understood in terms of the ultimate goals
of the respective lines of work. In the classical literature
on dynamization, the focus is directed at proving theoretical
asymptotic bounds. In this context, the LSM tree design space is
of limited utility, because its tuning parameters adjust constant
factors, and thus don't play a major role in asymptotics. Where
the theoretical literature does introduce configurability, such as
with the equal blocks method~\cite{overmars-art-of-dyn} or more
complex schemes that nest the equal block method \emph{inside}
of a binary decomposition~\cite{overmars81}, the intention is
to produce asymptotically relevant trade-offs between insert,
query, and delete performance for deletion decomposable search
problems~\cite[pg. 117]{overmars83}. This explains why the equal block
method is described in terms of a function, rather than a constant value,
to enable it to appear in the asymptotics.

On the other hand, in practical scenarios, constant tuning of performance
can be very relevant. We've already shown in Sections~\ref{ssec:ds-exp}
and \ref{ssec:dyn-ds-exp} how tuning parameters, particularly adjusting
the number of shards per level, can have measurable real-world effects on
the performance characteristics of dynamized structures. In fact sometimes
this tuning is \emph{necessary} to enable reasonable performance. It's
quite telling that the two most direct implementations of the Bentley-Saxe
method that we have identified in the literature are both in the context
of metric indices~\cite{naidan14,bkdtree}, a class of data structure
and search problem for which we saw very good performance from standard
Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. Our experiments in
Chapter~\ref{chap:framework} show that, for other types of problem,
the technique does not fare quite so well in its unmodified form.

\section{Asymptotic Analysis}
\label{sec:design-asymp}

Before beginning with derivations for the cost functions of dynamized
structures within the context of our proposed design space, we should
make a few comments about the assumptions and techniques that we will use
in our analysis. We will generally neglect buffering in our analysis,
both in terms of the additional query cost of querying the buffer, and
in terms of the buffer's effect on the reconstruction process. Buffering
isn't fundamental to the techniques we are considering, and including it
would needlessly complicate the analysis. However, we will include the
scale factor, $s$, which directly governs the number of blocks within
the dynamized structures.  Additionally, we will perform the query cost
analysis assuming a decomposable search problem.  Deletes will be entirely
neglected, and we won't make any assumptions about mergeability.

\subsection{Generalized Bentley Saxe Method}
As a first step, we will derive a modified version of the Bentley-Saxe
method that has been adjusted to support arbitrary scale factors.  There's
nothing fundamental to the technique that prevents such modifications,
and its likely that they have not been analyzed like this before simply
out of a lack of interest in constant factors in theoretical asymptotic
analysis.

When generalizing the Bentley-Saxe method for arbitrary scale factors,
we decided to maintain the core concept of binary decomposition. One
interesting mathematical property of a Bentley-Saxe dynamization is that
the internal layout of levels exactly matches the binary representation of
the record count contained within the index. For example, a dynamization
containing $n=20$ records will have 4 records in the third level, and
16 in the fifth, with all other levels being empty. If we represent a
full level with a 1 and an empty level with a 0, then we'd have $10100$,
which is $20$ in base 2.

\begin{algorithm}
\caption{The Generalized BSM Layout Policy}
\label{alg:design-bsm}

\KwIn{$r$: set of records to be inserted, $\mathscr{I}$: a dynamized structure, $n$: number of records in $\mathscr{I}$}

\BlankLine
\Comment{Find the first non-full level}
$target \gets -1$ \;
\For{$i=0\ldots \log_s n$} {
	\If {$|\mathscr{I}_i| < N_B (s - 1)\cdot s^i$} {
		$target \gets i$ \;
		break \;
	}
}

\BlankLine
\Comment{If the structure is full, we need to grow it}
\If {$target = -1$} {
	$target \gets 1 + (\log_s n)$ \;
}

\BlankLine
\Comment{Build the new structure}
$\mathscr{I}_{target} \gets \text{build}(\text{unbuild}(\mathscr{I}_0) \cup  \ldots \text{unbuild}(\mathscr{I}_{target}) \cup r)$ \;
\BlankLine
\Comment{Empty the levels used to build the new shard}
\For{$i=0\ldots target-1$} {
	$\mathscr{I}_i \gets \emptyset$ \;
}
\end{algorithm}

Our generalization, then, is to represent the data as an $s$-ary
decomposition, where the scale factor represents the base of the
representation. To accomplish this, we set of capacity of level $i$ to
be $N_B (s - 1) \cdot s^i$, where $N_B$ is the size of the buffer. The
resulting structure will have at most $\log_s n$ shards. The resulting
policy is described in Algorithm~\ref{alg:design-bsm}.

Unfortunately, the approach used by Bentley and Saxe to calculate the
amortized insertion cost of the BSM does not generalize to larger bases,
and so we will need to derive this result using a different approach.

\begin{theorem}
The amortized insertion cost for generalized BSM with a growth factor of
$s$ is $\Theta\left(\frac{B(n)}{n} \cdot s\log_s n)\right)$.
\end{theorem}
\begin{proof}

In order to calculate the amortized insertion cost, we will first
determine the average number of times that a record is involved in a
reconstruction, and then amortize those reconstructions over the records
in the structure.

If we consider only the first level of the structure, it's clear that
the reconstruction count associated with each record in that structure
will follow the pattern, $1, 2, 3, 4, ..., s-1$ when the level is full.
Thus, the total number of reconstructions associated with records on level
$i=0$ is the sum of that sequence, or
\begin{equation*}
W(0) = \sum_{j=1}^{s-1} j = \frac{1}{2}\left(s^2 - s\right)
\end{equation*}

Considering the next level, $i=1$, each reconstruction involving this
level will copy down the entirety of the structure above it, adding
one more write per record, as well as one extra write for the new record.
More specifically, in the above example, the first "batch" of records in
level $i=1$ will have the following write counts: $1, 2, 3, 4, 5, ..., s$,
the second "batch" of records will increment all of the existing write
counts by one, and then introduce another copy of $1, 2, 3, 4, 5, ..., s$
writes, and so on.

Thus, each new "batch" written to level $i$ will introduce $W(i-1) + 1$
writes from the previous level into level $i$, as well as rewriting all
of the records currently on level $i$.

The net result of this is that the number of writes on level $i$ is given
by the following recurrence relation (combined with the $W(0)$ base case),

\begin{equation*}
W(i) = sW(i-1) + \frac{1}{2}\left(s-1\right)^2 \cdot s^i
\end{equation*}

which can be solved to give the following closed-form expression,
\begin{equation*}
W(i) = s^i \cdot \left(\frac{1}{2} (s-1) \cdot (s(i+1) - i)\right)
\end{equation*}
which provides the total number of reconstructions that records in
level $i$ of the structure have participated in. As each record
is involved in a different number of reconstructions, we'll consider the
average number by dividing $W(i)$ by the number of records in level $i$.

From here, the proof proceeds in the standard way for this sort of
analysis. The worst-case cost of a reconstruction is $B(n)$, and there
are $\log_s(n)$ total levels, so the total reconstruction costs associated
with a record can be upper-bounded by, $B(n) \cdot
\frac{W(\log_s(n))}{n}$, and then this cost amortized over the $n$
insertions necessary to get the record into the last level. We'll also
condense the multiplicative constants and drop the additive ones to more
clearly represent the relationship we're looking to show. This results
in an amortized insertion cost of,
\begin{equation*}
\frac{B(n)}{n} \cdot s \log_s n
\end{equation*}
\end{proof}

\begin{theorem}
The worst-case insertion cost for generalized BSM with a scale factor
of $s$ is $\Theta(B(n))$.
\end{theorem}
\begin{proof}
The Bentley-Saxe method finds the smallest non-full block and performs
a reconstruction including all of the records from that block, as well
as all blocks smaller than it, and the new records to be added. The
worst case, then, will occur when all of the existing blocks in the
structure are full, and a new, larger, block must be added.

In this case, the reconstruction will involve every record currently
in the dynamized structure, and will thus have a cost of $I(n) \in
\Theta(B(n))$.
\end{proof}

\begin{theorem}
The worst-case query cost for generalized BSM for a decomposable
search problem with cost $\mathscr{Q}_S(n)$ is $O(\log_s(n) \cdot
\mathscr{Q}_s(n))$.
\end{theorem}
\begin{proof}
The worst-case scenario for queries in BSM occurs when every existing
level is full. In this case, there will be $\log_s n$ levels that must
be queried, with the $i$th level containing $(s - 1) \cdot s^i$ records.
Thus, the total cost of querying the structure will be,
\begin{equation}
\mathscr{Q}(n) = \sum_{i=0}^{\log_s n} \mathscr{Q}_S\left((s - 1) \cdot s^i\right)
\end{equation}
The number of records per shard will be upper bounded by $O(n)$, so
\begin{equation}
\mathscr{Q}(n) \in O\left(\sum_{i=0}^{\log_s n} \mathscr{Q}_S(n)\right)
	\in O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)
\end{equation}
\end{proof}

\begin{theorem}
The best-case query cost for generalized BSM for a decomposable
search problem with a cost of $\mathscr{Q}_S(n)$ is $\mathscr{Q}(n)
\in \Theta(\mathscr{Q}_S(n))$.
\end{theorem}
\begin{proof}
The best case scenario for queries in BSM occurs when a new level is
added, which results in every record in the structure being compacted
into a single structure. In this case, there is only a single data
structure in the dynamization, and so the query cost over the dynamized
structure is identical to the query cost of a single static instance of
the structure. Thus, the best case query cost in BSM is,
\begin{equation*}
\mathscr{Q}_B(n) \in \Theta \left( 1 \cdot \mathscr{Q}_S(n) \right) \in \Theta\left(\mathscr{Q}_S(n)\right)
\end{equation*}

\end{proof}


\subsection{Leveling}


\begin{algorithm}
\caption{The Leveling Policy}
\label{alg:design-leveling}

\KwIn{$r$: set of records to be inserted, $\mathscr{I}$: a dynamized structure, $n$: number of records in $\mathscr{I}$}

\BlankLine
\Comment{Find the first non-full level}
$target \gets -1$ \;
\For{$i=0\ldots \log_s n$} {
	\If {$|\mathscr{I}_i| < N_B \cdot s^{i+1}$} {
		$target \gets i$ \;
		break \;
	}
}

\BlankLine
\Comment{If the target is $0$, then just merge the buffer into it}
\If{$target = 0$} {
	$\mathscr{I}_0 \gets \text{build}(\text{unbuild}(\mathscr{I}_0) \cup r)$ \;
	\Return
}

\BlankLine
\Comment{If the structure is full, we need to grow it}
\If {$target = -1$} {
	$target \gets 1 + (\log_s n)$ \;
}

\BlankLine
\Comment{Perform the reconstruction}
$\mathscr{I}_{target} \gets \text{build}(\text{unbuild}(\mathscr{I}_{target}) \cup \text{unbuild}(\mathscr{I}_{target - 1}))$ \;

\BlankLine
\Comment{Shift the remaining levels down to free up $\mathscr{I}_0$}
\For{$i=target-1 \ldots 1$} {
	$\mathscr{I}_i \gets \mathscr{I}_{i-1}$ \;
}

\BlankLine
\Comment{Flush the buffer in $\mathscr{I}_0$}
$\mathscr{I}_0 \gets \text{build}(r)$ \;

\Return \;
\end{algorithm}

Our leveling layout policy is described in
Algorithm~\ref{alg:design-leveling}. Each level contains a single
structure with a capacity of $N_B\cdot s^{i+1}$ records. When a
reconstruction occurs, the smallest level, $i$, with space to contain the
records from level $i-1$, in addition to the records currently within
it, is located. Then, a new structure is built at level $i$ containing
all of the records in levels $i$ and $i-1$, and the structure at level
$i-1$ is deleted. Finally, all levels $j < (i - 1)$ are shifted to level
$j+1$. This process clears space in level $0$ to contain the buffer flush.

\begin{theorem}
The amortized insertion cost of leveling with a scale factor of $s$ is
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot s \log_s n\right)
\end{equation*}
\end{theorem}
\begin{proof}
Similarly to generalized BSM, the records in each level will be rewritten
up to $s$ times before they move down to the next level. Thus, the
amortized insertion cost for leveling can be found by determining how
many times a record is expected to be rewritten on a single level, and
how many levels there are in the structure.

On any given level, the total number of writes require to fill the level
is given by the expression,
\begin{equation*}
B(s + (s - 1) + (s - 2) + \ldots + 1)
\end{equation*}
where $B$ is the number of records added to the level during each
reconstruction (i.e., $N_B$ for level $0$ and $N_B\cdots^{i-1}$ for any
other level).

This is because the first batch of records entering the level will be
rewritten each of the $s$ times that the level is rebuilt before the
records are merged into the level below. The next batch will be rewritten
one fewer times, and so on. Thus, the total number of writes is,
\begin{equation*}
B\sum_{i=0}^{s-1} (s - i) = B\left(s^2 + \sum_{i=0}^{i-1} i\right) = B\left(s^2 + \frac{(s-1)s}{2}\right)
\end{equation*}
which can be simplified to get,
\begin{equation*}
\frac{1}{2}s(s+1)\cdot B
\end{equation*}
writes occurring on each level.\footnote{
	This write count is not cumulative over the entire structure. It only
	accounts for the number of writes occurring on this specific level.
}

To obtain the total number of times records are rewritten, we need to
calculate the average number of times a record is rewritten per level,
and sum this over all of the levels.
\begin{equation*}
\sum_{i=0}^{\log_s n} \frac{\frac{1}{2}B_i s (s+1)}{s B_i} = \frac{1}{2} \sum_{i=0}^{\log_s n} (s + 1) = \frac{1}{2} (s+1) \log_s n 
\end{equation*}
To calculate the amortized insertion cost, we multiply this write amplification
number of the cost of rebuilding the structures, and divide by the total number
of records. We'll condense the constant into a single $s$, as this best
expresses the nature of the relationship we're looking for,
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n}\cdot s \log_s n\right)
\end{equation*}
\end{proof}

\begin{theorem}
The worst-case insertion cost for leveling with a scale factor of $s$ is
\begin{equation*}
\Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right)
\end{equation*}
\end{theorem}
\begin{proof}
Unlike in BSM, where the worst case reconstruction involves all of the
records within the structure, in leveling it only includes the records
in the last two levels. In particular, the worst case behavior occurs
when the last level is one reconstruction away from its capacity, and the
level above it is full. In this case, the reconstruction will involve the
full capacity of the last level, or $N_B \cdot s^{\log_s n +1}$ records.

We can relate this to $n$ by finding the ratio of elements contained in
the last level of the structure to the entire structure. This is given
by,
\begin{equation*}
\frac{N_B \cdot s^{\log_s n + 1}}{\sum_{i=0}^{\log_s n} N_B \cdot s^{i + 1}} = \frac{(s - 1)n}{sn - 1}
\end{equation*}
This fraction can be simplified by noting that the $1$ subtracted in
the denominator is negligible and dropping it, allowing the $n$ to be
canceled and giving a ratio of $\frac{s-1}{s}$. Thus the worst case reconstruction
will involve $\frac{s - 1}{s} \cdot n$ records, with all the other levels
simply shifting down at no cost, resulting in a worst-case insertion cost
of,
\begin{equation*}
I(n) \in \Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right)
\end{equation*}
\end{proof}

\begin{theorem}
The worst-case query cost for leveling for a decomposable search
problem with cost $\mathscr{Q}_S(n)$ is
\begin{equation*}
O\left(\mathscr{Q}_S(n) \cdot \log_s n \right)
\end{equation*}
\end{theorem}
\begin{proof}
The worst-case scenario for leveling is right before the structure gains
a new level, at which point there will be $\log_s n$ data structures
each with $O(n)$ records. Thus the worst-case cost will be the cost
of querying each of these structures,
\begin{equation*}
O\left(\mathscr{Q}_S(n) \cdot \log_s n \right)
\end{equation*}
\end{proof}

\begin{theorem}
The best-case query cost for leveling for a decomposable search
problem with cost $\mathscr{Q}_S(n)$ is
\begin{equation*}
\mathscr{Q}_B(n) \in O(\mathscr{Q}_S(n) \cdot \log_s n)
\end{equation*}

\end{theorem}
\begin{proof}
Unlike BSM, leveling will never have empty levels. The policy ensures
that there is always a data structure on every level. As a result, the
best-case query still must query $\log_s n$ structures, and so has a
best-case cost of,
\begin{equation*}
\mathscr{Q}_B(n) \in O\left(\mathscr{Q}_S(n) \cdot \log_s n\right)
\end{equation*}
\end{proof}

\subsection{Tiering}


\begin{algorithm}
\caption{The Tiering Policy}
\label{alg:design-tiering}

\KwIn{$r$: set of records to be inserted, $\mathscr{L}_0 \ldots \mathscr{L}_{\log_s n}$: the levels of $\mathscr{I}$, $n$: the number of records in $\mathscr{I}$}
\BlankLine
\Comment{Find the first non-full level}
$target \gets -1$ \;
\For{$i=0\ldots \log_s n$} {
	\If {$|\mathscr{L}_i| < s$} {
		$target \gets i$ \;
		break \;
	}
}

\BlankLine
\Comment{If the structure is full, we need to grow it}
\If {$target = -1$} {
	$target \gets 1 + (\log_s n)$ \;
}

\BlankLine
\Comment{Walk the structure backwards, applying reconstructions}
\For {$i \gets target \ldots 1$} {
	$\mathscr{L}_i \gets \mathscr{L_i} \cup \text{build}(\text{unbuild}(\mathscr{L}_{i-1, 0}) \ldots \text{unbuild}(\mathscr{L}_{i-1, s-1}))$ \;
}
\BlankLine
\Comment{Add the buffered records to $\mathscr{L}_0$}
$\mathscr{L}_0 \gets \mathscr{L}_0 \cup \text{build}(r)$ \;

\Return  \;
\end{algorithm}

Our tiering layout policy is described in Algorithm~\ref{alg:design-tiering}. In
this policy, each level contains $s$ shards, each with a capacity
$N_B\cdot s^i$ records. When a reconstruction occurs, the first level
with fewer than $s$ shards is selected as the target, $t$. Then, for
every level with $i < t$, all of the shards in $i$ are merged into a
single shard using a reconstruction and placed in level $i+1$. These
reconstructions are performed backwards, starting at $t-1$ and moving
back up towards $0$. Then, the the shard created by the buffer flush is
placed in level $0$.

\begin{theorem}
The amortized insertion cost of tiering with a scale factor of $s$ is,
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \log_s n \right)
\end{equation*}
\end{theorem}
\begin{proof}
For tiering, each record is written \emph{exactly} one time per
level. As a result, each record will be involved in exactly $\log_s n$
reconstructions over the lifetime of the structure. Each reconstruction
will have cost $B(n)$, and thus the amortized insertion cost must be,
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \log_s n\right)
\end{equation*}
\end{proof}

\begin{theorem}
The worst-case insertion cost of tiering with a scale factor of $s$ is,
\begin{equation*}
I(n) \in \Theta\left(B(n)\right)
\end{equation*}
\end{theorem}
\begin{proof}
The worst-case reconstruction in tiering involves performing a
reconstruction on each level.  More formally, the total cost of this
reconstruction will be,
\begin{equation*}
I(n) \in \Theta\left(\sum_{i=0}^{\log_s n} B(s^i)\right)
\end{equation*}
\end{proof}

\begin{theorem}
The worst-case query cost for tiering for a decomposable search
problem with cost $\mathscr{Q}_S(n)$ is
\begin{equation*}
\mathscr{Q}(n) \in O( \mathscr{Q}_S(n) \cdot s \log_s n)
\end{equation*}
\end{theorem}
\begin{proof}
As with the previous two policies, the worst-case query occurs when the
structure is completely full. In case of tiering, that means that there
will be $\log_s n$ levels, each containing $s$ shards with a size bounded
by $O(n)$. Thus, there will be $s \log_s n$ structures to query, and the
query cost must be,
\begin{equation*}
\mathscr{Q}(n) \in O \left(\mathscr{Q}_S(n) \cdot s \log_s n \right)
\end{equation*}
\end{proof}

\begin{theorem}
The best-case query cost for tiering for a decomposable search problem
with cost $\mathscr{Q}_S(n)$ is $O(\log_s n)$.
\end{theorem}
\begin{proof}
The tiering policy ensures that there are no internal empty levels, and
as a result the best case scenario for tiering occurs when each level is
populated by exactly $1$ shard. In this case, there will only be $\log_s n$
shards to query, resulting in,
\begin{equation*}
\mathscr{Q}_B(n) \in O\left(\mathscr{Q}_S(n) \cdot \log_S n \right)
\end{equation*}
best-case query cost.
\end{proof}
\section{General Observations}

The asymptotic results from the previous section are summarized in
Table~\ref{tab:policy-comp}. When the scale factor is accounted for
in the analysis, we can see that possible trade-offs begin to manifest
within the space. We've seen some of these in action directly in
the experimental sections of previous chapters.

Most notably, we can directly see in these cost functions the reason why
tiering and leveling experience opposite effects as the scale factor
changes. In both policies, increasing the scale factor increases the
base of the logarithm governing the height, and so in the absence of
the additional constants in the analysis, it would superficially appear
as though both policies should see the same effects. But, with other
constants retained, we can see that this is in fact not the case. For
tiering, increasing the scale factor does reduce the number of levels,
however it also increases the number of shards. Because the level
reduction is in the base of the logarithm, but the shard count increase
is directly linear, the shard count effect dominates and we see the query
performance reduce as the scale factor increases. Leveling, however,
does not include this linear term and sees only a reduction in height.

When considering insertion, we see a similar situation in reverse. For
leveling and tiering, increasing the scale factor reduces the size of
the log term, and there are no other terms at play in tiering, so we
see an improvement in insertion performance. However, leveling also
has a linear dependency on the scale factor, as increasing the scale
factor also increases the write amplification. This is why leveling sees
its insertion performance reduce with scale factor. The generalized
Bentley-Saxe method follows the same general trends as leveling for
worst-case query cost and for amortized insertion cost.

Of note as well is the fact that leveling has slightly better worst-case
insertion performance. This is because leveling only ever reconstructs
one level at a time, with the other levels simply shifting around in
constant time. Bentley-Saxe and tiering have strictly worse worst-case
insertion  cost as their worst-case reconstructions involve all of the
levels. In the Bentley-Saxe method, this worst-case cost is manifest
in a single, large reconstruction. In tiering, it involves $\log_s n$
reconstructions, one per level.


\begin{table*}
\centering
\small
\renewcommand{\arraystretch}{1.6}
\begin{tabular}{|l l l l|}
\hline
& \textbf{Gen. BSM} & \textbf{Leveling} & \textbf{Tiering} \\ \hline
$I(n)$ & $\Theta(B(n))$ & $\Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right)$ & $ \Theta\left(\sum_{i=0}^{\log_s n} B(s^i)\right)$ \\ \hline
$I_A(n)$ & $\Theta\left(\frac{B(n)}{n} s\log_s n)\right)$ & $\Theta\left(\frac{B(n)}{n} s\log_s n\right)$& $\Theta\left(\frac{B(n)}{n} \log_s n\right)$ \\ \hline
$\mathscr{Q}(n)$ &$O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)$ & $O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)$ & $O\left(s \log_s n  \cdot \mathscr{Q}_S(n)\right)$\\ \hline
$\mathscr{Q}_B(n)$ & $\Theta(\mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ \\ \hline
\end{tabular}

\caption{Comparison of cost functions for various layout policies for DSPs}
\label{tab:policy-comp}
\end{table*}

\section{Experimental Evaluation}

In the previous sections, we mathematically proved various claims about
the performance characteristics of our three layout policies to assess
the trade-offs that exist within the design space. While this analysis is
useful, the effects we are examining are at the level of constant factors,
and so it would be useful to perform experimental testing to validate
that these claimed performance characteristics manifest in practice. In
this section, we will do just that, running various benchmarks to explore
the real-world performance implications of the configuration parameter
space of our framework.


\subsection{Asymptotic Insertion Performance}

We'll begin by validating our results for the insertion performance
characteristics of the three layout policies. For this test, we
consider two data structures: the ISAM tree and the VP tree. The ISAM
tree structure is merge-decomposable using a sorted-array merge, with a
build cost of $B_M(n, k) \in \Theta(n \log k)$, where $k$ is the number of
structures being merged. The VPTree, by contrast, is \emph{not} merge
decomposable, and is built in $B(n) \in \Theta(n \log n)$ time. We use
the $200$ million record SOSD \texttt{OSM} dataset~\cite{sosd-datasets}
for ISAM testing, and the one million record, $300$-dimensional Spanish
Billion Words (\texttt{SBW}) dataset~\cite{sbw} for VPTree testing.

For our first experiment, we will examine the latency distribution
for inserts into our structures. We tested the three layout policies,
using a common scale factor of $s=2$. This scale factor was selected
to minimize its influence on the results (we've seen before in
Sections~\ref{ssec:ds-exp} and \ref{ssec:dyn-ds-exp} that scale factor
affects leveling and tiering in opposite ways) and isolate the influence
of the layout policy alone to as great a degree as possible. We used a
buffer size of $N_B=12000$ for the ISAM tree structure, and $N_B=1000$
for the VPTree.

We generated this distribution by inserting $30\%$ of the records from
the set to ``warm up'' the dynamized structure, and then measuring
the insertion latency for each individual insert for the remaining
$70\%$ of the data.  Note that, due to timer resolution issues at
nanosecond scales, the specific latency values associated with the
faster end of the insertion distribution are not precise. However,
it is our intention to examine the latency distribution, not the
values themselves, and so this is not a significant limitation
for our analysis.  The resulting distributions are shown in
Figure~\ref{fig:design-policy-ins-latency}. These distributions
are representing using a ``reversed'' CDF with log scaling on both
axes. 

\begin{figure}
\centering
\subfloat[ISAM Tree Insertion Latencies]{\includegraphics[width=.5\textwidth]{img/design-space/isam-insert-dist.pdf} \label{fig:design-isam-ins-dist}} 
\subfloat[VPTree Insertion Latencies]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-insert-dist.pdf} \label{fig:design-vptree-ins-dist}} \\
\caption{Insertion Latency Distributions for Layout Policies}
\label{fig:design-policy-ins-latency}
\end{figure}


The first notable point is that, for both the ISAM
tree in Figure~\ref{fig:design-isam-ins-dist} and VPTree in
Figure~\ref{fig:design-vptree-ins-dist}, the leveling policy results in a
measurable lower worst-case insertion latency. This result is in line with
our theoretical analysis in Section~\ref{sec:design-asymp}. However, there
is a major deviation from theoretical in the worst-case performance of
tiering and BSM. Both of these should have similar worst-case latencies,
as the worst-case reconstruction in both cases involves every record
in the structure. Yet, we see tiering consistently performing better,
particularly for the ISAM tree.

The reason for this has to do with the way that the records are
partitioned in these worst-case reconstructions. In tiering, with a scale
factor of $s$, the worst-case reconstruction consists of $\Theta(\log_2
n)$ distinct reconstructions, each involving exactly $2$ structures. BSM,
on the other hand, will use exactly $1$ reconstruction involving
$\Theta(\log_2 n)$ structures. This explains why ISAM performs much better
in tiering than BSM, as the actual reconstruction cost function there is
$\Theta(n \log_2 k)$. For tiering, this results in $\Theta(n)$ cost in
the worst case. BSM, on the other hand, has $\Theta(n \log_2 \log_2 n)$,
as many more distinct structures must be merged in the reconstruction,
and is thus asymptotically worse-off. VPTree, on the other hand, sees
less of a difference because it is \emph{not} merge decomposable, and so
the number of structures playing a role in the reconstructions plays less
of a role. Having the records more partitioned still hurts performance,
due to cache effects most likely, but less so than in the MDSP case.

\begin{figure}
\centering
\subfloat[ISAM Tree]{\includegraphics[width=.5\textwidth]{img/design-space/isam-tput.pdf} \label{fig:design-isam-tput}} 
\subfloat[VPTree]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-tput.pdf} \label{fig:design-vptree-tput}} \\
\caption{Insertion Throughput for Layout Policies}
\label{fig:design-ins-tput}
\end{figure}

Next, in Figure~\ref{fig:design-ins-tput}, we show the overall insertion
throughput for the three policies for both ISAM tree and VPTree. This
result should correlate with the amortized insertion costs for each
policy derived in Section~\ref{sec:design-asymp}. At a scale factor of
$s=2$, all three policies have similar insertion performance. This makes
sense, as both leveling and Bentley-Saxe experience write-amplification
proportional to the scale factor, and at $s=2$ this isn't significantly
larger than tiering's write amplification, particularly compared
to the other factors influencing insertion performance, such as
reconstruction time. However, for larger scale factors, tiering shows
\emph{significantly} higher insertion throughput, and leveling and
Bentley-Saxe show greatly degraded performance due to the large amount
of additional write amplification. These results are perfectly in line
with the mathematical analysis of the previous section.

\subsection{General Insert vs. Query Trends}

For our next experiment, we will consider the trade-offs between insertion
and query performance that exist within this design space. We benchmarked
each layout policy for a range of scale factors, measuring both their
respective insertion throughputs and query latencies for both ISAM tree
and VPTree.

\begin{figure}
\centering
\subfloat[ISAM Tree Range Count]{\includegraphics[width=.5\textwidth]{img/design-space/isam-parm-sweep.pdf} \label{fig:design-isam-tradeoff}} 
\subfloat[VPTree $k$-NN]{\includegraphics[width=.5\textwidth]{img/design-space/knn-parm-sweep.pdf} \label{fig:design-knn-tradeoff}} \\
\caption{Insertion Throughput vs. Query Latency for varying scale factors}
\label{fig:design-tradeoff}
\end{figure}

Figure~\ref{fig:design-isam-tradeoff} shows the trade-off curve between
insertion throughput and query latency for range count queries executed
against a dynamized ISAM tree. This test was run with a dataset
of 500 million uniform integer keys, and a selectivity of $\sigma =
0.0000001$, the scale factor associated with each point is annotated on
the plot. These results show that there is a very direct relationship
between scale factor, layout policy, and insertion throughput. Leveling
almost universally has lower insertion throughput but also lower
query latency than tiering does, though at scale factor $s=2$ they are
fairly similar. Tiering gains insertion throughput at the cost of query
performance as the scale factor increases, although the rate at which
the insertion performance improves decreases for larger scale factors,
and the rate at which query performance declines increases dramatically.

One interesting note is that leveling sees very little improvement in
query latency as the scale factor is increased. This is due to the fact
that, asymptotically, the scale factor only affects leveling's query
performance by increasing the base of a logarithm. Thus, small increases
in scale factor have very little effect. However, level's insertion
performance degrades linearly with scale factor, and this is well
demonstrated in the plot.

The story is a bit clearer in Figure~\ref{fig:design-knn-tradeoff}. The
VPTree has a much greater construction time, both asymptotically and
in absolute terms, and the average query latency is also significantly
greater. These result in the configuration changes showing much more
significant changes in performance, and present us with a far clearer
trade-off space. The same general trends hold as in ISAM, just amplified.
Leveling has better query performance than tiering and sees increased
query performance and decreased insert performance as the scale factor
increases. Tiering has better insertion performance and worse query
performance than leveling, and sees improved insert and worsening
query performance as the scale factor is increased. The Bentley-Saxe
method shows similar trends to leveling.

In general, the Bentley-Saxe method appears to follow a very similar
trend to that of leveling, albeit with even more dramatic performance
degradation as the scale factor is increased and slightly better query
performance across the board. Generally it seems to be a strictly worse
alternative to leveling in all but its best-case query cost, and we will
omit it from our tests moving forward.

\subsection{Buffer Size}

\begin{figure}
\centering
\subfloat[ISAM Tree Range Count]{\includegraphics[width=.5\textwidth]{img/design-space/isam-bs-sweep.pdf} \label{fig:buffer-isam-tradeoff}} 
\subfloat[VPTree $k$-NN]{\includegraphics[width=.5\textwidth]{img/design-space/knn-bs-sweep.pdf} \label{fig:buffer-knn-tradeoff}} \\
\caption{Insertion Throughput vs. Query Latency for varying buffer sizes}
\label{fig:buffer-size}
\end{figure}

In the previous section, we considered the effect of various scale
factors on the trade-off between insertion and query performance. Our
framework also supports varying buffer sizes, and so we will examine this
next. Figure~\ref{fig:buffer-size} shows the same insertion throughput
vs. query latency curves for fixed layout policy and scale factor
configurations at varying buffer sizes, under the same experimental
conditions as the previous test.

Unlike with the scale factor, there is a significant difference in the
behavior of the two tested structures under buffer size variation. For
the ISAM tree, shown in Figure~\ref{fig:buffer-isam-tradeoff}, we see that
all layout policies follow a similar pattern. Increasing the buffer size
increases insertion throughput for little to no additional query cost up
to a certain point, after which query performance degrades substantially.
This isn't terribly surprising: growing the buffer size will increase
the number of records on each level, and therefore decrease the number
of shards, while at the same time reducing the number of reconstructions
that must be performed. However, the query must be answered against the
buffer too, and once the buffer gets sufficiently large, this increased
cost will exceed any query latency benefit from decreased shard count.
We see this pattern fairly clearly on all tested configurations, however
BSM sees the least benefit from an increased buffer size in terms of
insertion performance.

VPTree is another story, shown in Figure~\ref{fig:buffer-knn-tradeoff}.
This plot is far more chaotic; in fact there aren't any particularly
strong patterns to draw from it. This is likely due to the fact that the
time scales associated with the VPTree in terms of both reconstruction
and query latency are significantly larger, and so the relatively small
constant associated with adjusting the buffer size doesn't have as strong
an influence on performance as it does for the ISAM tree.

\subsection{Query Size Effects}

One potentially interesting aspect of decomposition-based dynamization
techniques is that, asymptotically, the additional cost added by
decomposing the data structure vanished for sufficiently expensive
queries. Bentley and Saxe proved that for query costs of the form
$\mathscr{Q}_B(n) \in \Omega(n^\epsilon)$ for $\epsilon > 0$, the
overall query cost is unaffected (asymptotically) by the decomposition.
This would seem to suggest that, as the cost of the query over a single
shard increases, the effectiveness of our design space for tuning query
performance should reduce. This is because our tuning space consists
of adjusting the number of shards within the structure, and so as the
effects of decomposition on the query cost reduce, we should see all
configurations approaching a similar query performance.

In order to evaluate this effect, we tested the query latency of range
queries of varying selectivity against various configurations of our
framework to see at what points the query latencies begin to converge. We
also tested $k$-NN queries with varying values of $k$. For these tests,
we used a synthetic dataset of 500 million 64-bit key-value pairs for
the ISAM testing, and the SBW dataset for $k$-NN. Query latencies were
measured by executing the queries after all records were inserted into
the structure.

\begin{figure}
\centering
\subfloat[ISAM Tree Range Count]{\includegraphics[width=.5\textwidth]{img/design-space/selectivity-sweep.pdf} \label{fig:design-isam-sel}} 
\subfloat[VPTree $k$-NN]{\includegraphics[width=.5\textwidth]{img/design-space/selectivity-sweep-knn.pdf} \label{fig:design-knn-sel}} \\
\caption{Query Result Size Effect Analysis}
\label{fig:design-query-sze}
\end{figure}

Interestingly, for the range of selectivities tested for range counts, the
overall query latency failed to converge, and there remains a consistent,
albeit slight, stratification amongst the tested policies, as shown in
Figure~\ref{fig:design-isam-sel}. As the selectivity continues to rise
above those shown in the chart, the relative ordering of the policies
remains the same, but the relative differences between them begin to
shrink. This result makes sense given the asymptotics--there is still
\emph{some} overhead associated with the decomposition, but as the cost
of the query approaches linear, it makes up an increasingly irrelevant
portion of the run time.

The $k$-NN results in Figure~\ref{fig:design-knn-sel} show a slightly
different story. This is also not surprising, because $k$-NN is a
$C(n)$-decomposable problem, and the cost of result combination grows
with $k$. Thus, larger $k$ values will \emph{increase} the effect that
the decomposition has on the query run time, unlike was the case in the
range count queries, where the total cost of the combination is constant.

% \section{Asymptotically Relevant Trade-offs}

% Thus far, we have considered a configuration system that trades in
% constant factors only. In general asymptotic analysis, all possible
% configurations of our framework in this scheme collapse to the same basic
% cost functions when the constants are removed. While we have demonstrated
% that, in practice, the effects of this configuration are measurable, there
% do exist techniques in the classical literature that provide asymptotically
% relevant trade-offs, such as the equal block method~\cite{maurer80} and
% the mixed method~\cite[pp. 117-118]{overmars83}.  These techniques have
% cost functions that are derived from arbitrary, positive, monotonically
% increasing functions of $n$ that govern various ways in which the data
% structure is partitioned, and changing the selection of function allows
% for "tuning" the performance. However, to the best of our knowledge,
% these techniques have never been implemented, and no useful guidance in
% the literature exists for selecting these functions. 

% However, it is useful to consider the general approach of these
% techniques.  They accomplish asymptotically relevant trade-offs by tying
% the decomposition of the data structure directly to a function of $n$,
% the number of records, in a user-configurable way. We can import a similar
% concept into our already existing configuration framework for dynamization
% to enable similar trade-offs, by replacing the constant scale factor,
% $s$, with some function $s(n)$. However, we must take extreme care when
% doing this to select a function that doesn't catastrophically impair
% query performance.

% Recall that, generally speaking, our dynamization technique requires
% multiplying the cost function for the data structure being dynamized by
% the number of shards that the data structure has been decomposed into. For
% search problems that are solvable in sub-polynomial time, this results in
% a worst-case query cost of,
% \begin{equation}
% \mathscr{Q}(n) \in O(S(n) \cdot \mathscr{Q}_S(n))
% \end{equation}
% where $S(n)$ is the number of shards and, for our framework, is $S(n) \in
% O(s \log_s n)$. The user can adjust $s$, but this tuning does not have
% asymptotically relevant consequences. Unfortunately, there is not much
% room, practically, for adjustment. If, for example, we were to allow the
% user to specify $S(n) \in \Theta(n)$, rather than $\Theta(\log n)$, then
% query performance would be greatly impaired. We need a function that is
% sub-linear to ensure useful performance.

% To accomplish this, we proposed adding a second scaling factor, $k$, such
% that the number of records on level $i$ is given by,
% \begin{equation}
% \label{eqn:design-k-expr}
% N_B \cdot \left(s \log_2^k(n)\right)^{i}
% \end{equation}
% with $k=0$ being equivalent to the configuration space we have discussed
% thus far. The addition of $k$ allows for the dependency of the number of
% shards on $n$ to be slightly biased upwards or downwards, in a way that
% \emph{does} show up in the asymptotic analysis for inserts and queries,
% but also ensures sub-polynomial additional query cost.

% In particular, we prove the following asymptotic properties of this
% configuration.
% \begin{theorem}
% The worst-case query latency of a dynamization scheme where the
% capacity of each level is provided by Equation~\ref{eqn:design-k-expr} is
% \begin{equation}
% \mathscr{Q}(n) \in O\left(\left(\frac{\log n}{\log (k \log n))}\right) \cdot \mathscr{Q}_S(n)\right)
% \end{equation}
% \end{theorem}
% \begin{proof}
% The number of levels within the structure is given by $\log_s (n)$,
% where $s$ is the scale factor. The addition of $k$ to the parametrization
% replaces this scale factor with $s \log^k n$, and so we have
% \begin{equation*}
% \log_{s \log^k n}n = \frac{\log n}{\log\left(s \log^k n\right)} = \frac{\log n}{\log s + \log\left(k \log n\right)} \in O\left(\frac{\log n}{\log (k \log n)}\right)
% \end{equation*}
% by the application of various logarithm rules and change-of-base formula.

% The cost of a query against a decomposed structure is $O(S(n) \cdot \mathscr{Q}_S(n))$, and
% there are $\Theta(1)$ shards per level. Thus, the worst case query cost is
% \begin{equation*}
% \mathscr{Q}(n) \in O\left(\left(\frac{\log n}{\log (k \log n))}\right) \cdot \mathscr{Q}_S(n)\right)
% \end{equation*}
% \end{proof}

% \begin{theorem}
% The amortized insertion cost of a dynamization scheme where the capacity of
% each level is provided by Equation~\ref{eqn:design-k-expr} is,
% \begin{equation*}
% I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \frac{\log n}{\log ( k \log n)}\right)
% \end{equation*}
% \end{theorem}
% \begin{proof}
% \end{proof}

% \subsection{Evaluation}

% In this section, we'll access the effect that modifying $k$ in our
% new parameter space has on the insertion and query performance of our
% dynamization framework.


\section{Conclusion}

In this chapter, we considered the proposed design space for our
dynamization framework both mathematically and experimentally, and derived
some general principles for configuration within the space. We generalized
the Bentley-Saxe method to support scale factors and buffering, but
found that the result was generally worse than leveling in all but its
best case query performance. We also showed that there does exist a
trade-off, mediated by scale factor, between insertion performance and
query performance, though it doesn't manifest for every layout policy
and data structure combination. For example, when testing the ISAM tree
structure with the leveling or BSM policies, there is not a particularly
useful trade-off resulting from scale factor adjustments, because the
amount of extra query performance resulting from increasing the scale
factor is dwarfed by the reduction in insertion performance. This is
because the cost in insertion performance grows far faster than any
query performance benefit, due to the way to two effects scale in the
cost functions for the method. 

Broadly speaking, we can draw a few general conclusions. First, the
leveling and BSM policies are fairly similar, with the BSM having slightly
better query performance in general owing to its better best-case query
cost. Both of these policies are better than tiering in terms of query
performance, but generally worse for insertion performance. The one
slight exception to this trend is in worst-case insertion performance,
where leveling has a slight advantage over the other policies because
of the way it performs reconstructions ensuring that the worst-case
reconstruction cost is smaller. Adjusting the scale factor can trade
between insert and query performance, though leveling and BSM have an
opposite effect from tiering. For these policies, increasing the scale
factor reduces insert performance and improves query performance. Tiering
does the opposite. The mutable buffer can be increased in size to improve
insert performance as well (in all cases), but the query cost increases
as a result. Once the buffer gets sufficiently large, the trade-off in
query performance becomes severe.

While this trade-off space does provide us with the desired
configurability, the experimental results show that the trade-off curves
are not particularly smooth, and the effectiveness can vary quite a bit
depending on the properties of the data structure and search problem being
dynamized. Additionally, there isn't a particular good way to control
insertion tail latencies in this model, as leveling is only slightly
better in this metric. In the next chapter, we'll consider methods for
controlling tail latency, which will, as a side benefit, also provide
a more desirable configuration space than the one considered here.