\chapter{Exploring the Design Space} \label{chap:design-space} \section{Introduction} In the previous two chapters, we introduced an LSM tree inspired design space into the Bentley-Saxe method to allow for more flexibility in performance tuning. However, aside from some general comments about how these parameters affect insertion and query performance, and some limited experimental evaluation, we have not performed a systematic analysis of this space, its capabilities, and its limitations. We will rectify this situation in this chapter, performing both a detailed mathematical analysis of the design parameter space, as well can experimental evaluation, to explore the space and its trade-offs, and demonstrate their practical effectiveness. Before diving into the design space we have introduced in detail, it's worth taking some time to motivate this entire endeavor. There is a large body of theoretical work in the area of data structure dynamization, and, to the best of our knowledge, none of these papers have introduced a design space of the sort that we have introduced here. Despite this, some papers which \emph{use} these techniques have introduced similar design elements into their own implementations~\cite{pgm}, with some even going so far as to describe these elements as part of the Bentley-Saxe method~\cite{almodaresi23}. This situation is best understood in terms of the ultimate goals of the respective lines of work. In the classical literature on dynamization, the focus is directed at proving theoretical asymptotic bounds. In this context, the LSM tree design space is of limited utility, because its tuning parameters adjust constant factors, and thus don't play a major role in asymptotics. Where the theoretical literature does introduce configurability, such as with the equal blocks method~\cite{overmars-art-of-dyn} or more complex schemes that nest the equal block method \emph{inside} of a binary decomposition~\cite{overmars81}, the intention is to produce asymptotically relevant trade-offs between insert, query, and delete performance for deletion decomposable search problems~\cite[pg. 117]{overmars83}. This explains why the equal block method is described in terms of a function, rather than a constant value, to enable it to appear in the asymptotics. On the other hand, in practical scenarios, constant tuning of performance can be very relevant. We've already shown in Sections~\ref{ssec:ds-exp} and \ref{ssec:dyn-ds-exp} how tuning parameters, particularly adjusting the number of shards per level, can have measurable real-world effects on the performance characteristics of dynamized structures. In fact sometimes this tuning is \emph{necessary} to enable reasonable performance. It's quite telling that the two most direct implementations of the Bentley-Saxe method that we have identified in the literature are both in the context of metric indices~\cite{naidan14,bkdtree}, a class of data structure and search problem for which we saw very good performance from standard Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. Our experiments in Chapter~\ref{chap:framework} show that, for other types of problem, the technique does not fare quite so well in its unmodified form. \section{Asymptotic Analysis} \label{sec:design-asymp} Before beginning with derivations for the cost functions of dynamized structures within the context of our proposed design space, we should make a few comments about the assumptions and techniques that we will use in our analysis. We will generally neglect buffering in our analysis, both in terms of the additional query cost of querying the buffer, and in terms of the buffer's effect on the reconstruction process. Buffering isn't fundamental to the techniques we are considering, and including it would needlessly complicate the analysis. However, we will include the scale factor, $s$, which directly governs the number of blocks within the dynamized structures. Additionally, we will perform the query cost analysis assuming a decomposable search problem. Deletes will be entirely neglected, and we won't make any assumptions about mergeability. \subsection{Generalized Bentley Saxe Method} As a first step, we will derive a modified version of the Bentley-Saxe method that has been adjusted to support arbitrary scale factors. There's nothing fundamental to the technique that prevents such modifications, and its likely that they have not been analyzed like this before simply out of a lack of interest in constant factors in theoretical asymptotic analysis. When generalizing the Bentley-Saxe method for arbitrary scale factors, we decided to maintain the core concept of binary decomposition. One interesting mathematical property of a Bentley-Saxe dynamization is that the internal layout of levels exactly matches the binary representation of the record count contained within the index. For example, a dynamization containing $n=20$ records will have 4 records in the third level, and 16 in the fifth, with all other levels being empty. If we represent a full level with a 1 and an empty level with a 0, then we'd have $10100$, which is $20$ in base 2. \begin{algorithm} \caption{The Generalized BSM Layout Policy} \label{alg:design-bsm} \KwIn{$r$: set of records to be inserted, $\mathscr{I}$: a dynamized structure, $n$: number of records in $\mathscr{I}$} \BlankLine \Comment{Find the first non-full level} $target \gets -1$ \; \For{$i=0\ldots \log_s n$} { \If {$|\mathscr{I}_i| < N_B (s - 1)\cdot s^i$} { $target \gets i$ \; break \; } } \BlankLine \Comment{If the structure is full, we need to grow it} \If {$target = -1$} { $target \gets 1 + (\log_s n)$ \; } \BlankLine \Comment{Build the new structure} $\mathscr{I}_{target} \gets \text{build}(\text{unbuild}(\mathscr{I}_0) \cup \ldots \text{unbuild}(\mathscr{I}_{target}) \cup r)$ \; \BlankLine \Comment{Empty the levels used to build the new shard} \For{$i=0\ldots target-1$} { $\mathscr{I}_i \gets \emptyset$ \; } \end{algorithm} Our generalization, then, is to represent the data as an $s$-ary decomposition, where the scale factor represents the base of the representation. To accomplish this, we set of capacity of level $i$ to be $N_B (s - 1) \cdot s^i$, where $N_B$ is the size of the buffer. The resulting structure will have at most $\log_s n$ shards. The resulting policy is described in Algorithm~\ref{alg:design-bsm}. Unfortunately, the approach used by Bentley and Saxe to calculate the amortized insertion cost of the BSM does not generalize to larger bases, and so we will need to derive this result using a different approach. \begin{theorem} The amortized insertion cost for generalized BSM with a growth factor of $s$ is $\Theta\left(\frac{B(n)}{n} \cdot s\log_s n)\right)$. \end{theorem} \begin{proof} In order to calculate the amortized insertion cost, we will first determine the average number of times that a record is involved in a reconstruction, and then amortize those reconstructions over the records in the structure. If we consider only the first level of the structure, it's clear that the reconstruction count associated with each record in that structure will follow the pattern, $1, 2, 3, 4, ..., s-1$ when the level is full. Thus, the total number of reconstructions associated with records on level $i=0$ is the sum of that sequence, or \begin{equation*} W(0) = \sum_{j=1}^{s-1} j = \frac{1}{2}\left(s^2 - s\right) \end{equation*} Considering the next level, $i=1$, each reconstruction involving this level will copy down the entirety of the structure above it, adding one more write per record, as well as one extra write for the new record. More specifically, in the above example, the first "batch" of records in level $i=1$ will have the following write counts: $1, 2, 3, 4, 5, ..., s$, the second "batch" of records will increment all of the existing write counts by one, and then introduce another copy of $1, 2, 3, 4, 5, ..., s$ writes, and so on. Thus, each new "batch" written to level $i$ will introduce $W(i-1) + 1$ writes from the previous level into level $i$, as well as rewriting all of the records currently on level $i$. The net result of this is that the number of writes on level $i$ is given by the following recurrence relation (combined with the $W(0)$ base case), \begin{equation*} W(i) = sW(i-1) + \frac{1}{2}\left(s-1\right)^2 \cdot s^i \end{equation*} which can be solved to give the following closed-form expression, \begin{equation*} W(i) = s^i \cdot \left(\frac{1}{2} (s-1) \cdot (s(i+1) - i)\right) \end{equation*} which provides the total number of reconstructions that records in level $i$ of the structure have participated in. As each record is involved in a different number of reconstructions, we'll consider the average number by dividing $W(i)$ by the number of records in level $i$. From here, the proof proceeds in the standard way for this sort of analysis. The worst-case cost of a reconstruction is $B(n)$, and there are $\log_s(n)$ total levels, so the total reconstruction costs associated with a record can be upper-bounded by, $B(n) \cdot \frac{W(\log_s(n))}{n}$, and then this cost amortized over the $n$ insertions necessary to get the record into the last level. We'll also condense the multiplicative constants and drop the additive ones to more clearly represent the relationship we're looking to show. This results in an amortized insertion cost of, \begin{equation*} \frac{B(n)}{n} \cdot s \log_s n \end{equation*} \end{proof} \begin{theorem} The worst-case insertion cost for generalized BSM with a scale factor of $s$ is $\Theta(B(n))$. \end{theorem} \begin{proof} The Bentley-Saxe method finds the smallest non-full block and performs a reconstruction including all of the records from that block, as well as all blocks smaller than it, and the new records to be added. The worst case, then, will occur when all of the existing blocks in the structure are full, and a new, larger, block must be added. In this case, the reconstruction will involve every record currently in the dynamized structure, and will thus have a cost of $I(n) \in \Theta(B(n))$. \end{proof} \begin{theorem} The worst-case query cost for generalized BSM for a decomposable search problem with cost $\mathscr{Q}_S(n)$ is $O(\log_s(n) \cdot \mathscr{Q}_s(n))$. \end{theorem} \begin{proof} The worst-case scenario for queries in BSM occurs when every existing level is full. In this case, there will be $\log_s n$ levels that must be queried, with the $i$th level containing $(s - 1) \cdot s^i$ records. Thus, the total cost of querying the structure will be, \begin{equation} \mathscr{Q}(n) = \sum_{i=0}^{\log_s n} \mathscr{Q}_S\left((s - 1) \cdot s^i\right) \end{equation} The number of records per shard will be upper bounded by $O(n)$, so \begin{equation} \mathscr{Q}(n) \in O\left(\sum_{i=0}^{\log_s n} \mathscr{Q}_S(n)\right) \in O\left(\log_s n \cdot \mathscr{Q}_S(n)\right) \end{equation} \end{proof} \begin{theorem} The best-case query cost for generalized BSM for a decomposable search problem with a cost of $\mathscr{Q}_S(n)$ is $\mathscr{Q}(n) \in \Theta(\mathscr{Q}_S(n))$. \end{theorem} \begin{proof} The best case scenario for queries in BSM occurs when a new level is added, which results in every record in the structure being compacted into a single structure. In this case, there is only a single data structure in the dynamization, and so the query cost over the dynamized structure is identical to the query cost of a single static instance of the structure. Thus, the best case query cost in BSM is, \begin{equation*} \mathscr{Q}_B(n) \in \Theta \left( 1 \cdot \mathscr{Q}_S(n) \right) \in \Theta\left(\mathscr{Q}_S(n)\right) \end{equation*} \end{proof} \subsection{Leveling} \begin{algorithm} \caption{The Leveling Policy} \label{alg:design-leveling} \KwIn{$r$: set of records to be inserted, $\mathscr{I}$: a dynamized structure, $n$: number of records in $\mathscr{I}$} \BlankLine \Comment{Find the first non-full level} $target \gets -1$ \; \For{$i=0\ldots \log_s n$} { \If {$|\mathscr{I}_i| < N_B \cdot s^{i+1}$} { $target \gets i$ \; break \; } } \BlankLine \Comment{If the target is $0$, then just merge the buffer into it} \If{$target = 0$} { $\mathscr{I}_0 \gets \text{build}(\text{unbuild}(\mathscr{I}_0) \cup r)$ \; \Return } \BlankLine \Comment{If the structure is full, we need to grow it} \If {$target = -1$} { $target \gets 1 + (\log_s n)$ \; } \BlankLine \Comment{Perform the reconstruction} $\mathscr{I}_{target} \gets \text{build}(\text{unbuild}(\mathscr{I}_{target}) \cup \text{unbuild}(\mathscr{I}_{target - 1}))$ \; \BlankLine \Comment{Shift the remaining levels down to free up $\mathscr{I}_0$} \For{$i=target-1 \ldots 1$} { $\mathscr{I}_i \gets \mathscr{I}_{i-1}$ \; } \BlankLine \Comment{Flush the buffer in $\mathscr{I}_0$} $\mathscr{I}_0 \gets \text{build}(r)$ \; \Return \; \end{algorithm} Our leveling layout policy is described in Algorithm~\ref{alg:design-leveling}. Each level contains a single structure with a capacity of $N_B\cdot s^{i+1}$ records. When a reconstruction occurs, the smallest level, $i$, with space to contain the records from level $i-1$, in addition to the records currently within it, is located. Then, a new structure is built at level $i$ containing all of the records in levels $i$ and $i-1$, and the structure at level $i-1$ is deleted. Finally, all levels $j < (i - 1)$ are shifted to level $j+1$. This process clears space in level $0$ to contain the buffer flush. \begin{theorem} The amortized insertion cost of leveling with a scale factor of $s$ is \begin{equation*} I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot s \log_s n\right) \end{equation*} \end{theorem} \begin{proof} Similarly to generalized BSM, the records in each level will be rewritten up to $s$ times before they move down to the next level. Thus, the amortized insertion cost for leveling can be found by determining how many times a record is expected to be rewritten on a single level, and how many levels there are in the structure. On any given level, the total number of writes require to fill the level is given by the expression, \begin{equation*} B(s + (s - 1) + (s - 2) + \ldots + 1) \end{equation*} where $B$ is the number of records added to the level during each reconstruction (i.e., $N_B$ for level $0$ and $N_B\cdots^{i-1}$ for any other level). This is because the first batch of records entering the level will be rewritten each of the $s$ times that the level is rebuilt before the records are merged into the level below. The next batch will be rewritten one fewer times, and so on. Thus, the total number of writes is, \begin{equation*} B\sum_{i=0}^{s-1} (s - i) = B\left(s^2 + \sum_{i=0}^{i-1} i\right) = B\left(s^2 + \frac{(s-1)s}{2}\right) \end{equation*} which can be simplified to get, \begin{equation*} \frac{1}{2}s(s+1)\cdot B \end{equation*} writes occurring on each level.\footnote{ This write count is not cumulative over the entire structure. It only accounts for the number of writes occurring on this specific level. } To obtain the total number of times records are rewritten, we need to calculate the average number of times a record is rewritten per level, and sum this over all of the levels. \begin{equation*} \sum_{i=0}^{\log_s n} \frac{\frac{1}{2}B_i s (s+1)}{s B_i} = \frac{1}{2} \sum_{i=0}^{\log_s n} (s + 1) = \frac{1}{2} (s+1) \log_s n \end{equation*} To calculate the amortized insertion cost, we multiply this write amplification number of the cost of rebuilding the structures, and divide by the total number of records. We'll condense the constant into a single $s$, as this best expresses the nature of the relationship we're looking for, \begin{equation*} I_A(n) \in \Theta\left(\frac{B(n)}{n}\cdot s \log_s n\right) \end{equation*} \end{proof} \begin{theorem} The worst-case insertion cost for leveling with a scale factor of $s$ is \begin{equation*} \Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right) \end{equation*} \end{theorem} \begin{proof} Unlike in BSM, where the worst case reconstruction involves all of the records within the structure, in leveling it only includes the records in the last two levels. In particular, the worst case behavior occurs when the last level is one reconstruction away from its capacity, and the level above it is full. In this case, the reconstruction will involve the full capacity of the last level, or $N_B \cdot s^{\log_s n +1}$ records. We can relate this to $n$ by finding the ratio of elements contained in the last level of the structure to the entire structure. This is given by, \begin{equation*} \frac{N_B \cdot s^{\log_s n + 1}}{\sum_{i=0}^{\log_s n} N_B \cdot s^{i + 1}} = \frac{(s - 1)n}{sn - 1} \end{equation*} This fraction can be simplified by noting that the $1$ subtracted in the denominator is negligible and dropping it, allowing the $n$ to be canceled and giving a ratio of $\frac{s-1}{s}$. Thus the worst case reconstruction will involve $\frac{s - 1}{s} \cdot n$ records, with all the other levels simply shifting down at no cost, resulting in a worst-case insertion cost of, \begin{equation*} I(n) \in \Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right) \end{equation*} \end{proof} \begin{theorem} The worst-case query cost for leveling for a decomposable search problem with cost $\mathscr{Q}_S(n)$ is \begin{equation*} O\left(\mathscr{Q}_S(n) \cdot \log_s n \right) \end{equation*} \end{theorem} \begin{proof} The worst-case scenario for leveling is right before the structure gains a new level, at which point there will be $\log_s n$ data structures each with $O(n)$ records. Thus the worst-case cost will be the cost of querying each of these structures, \begin{equation*} O\left(\mathscr{Q}_S(n) \cdot \log_s n \right) \end{equation*} \end{proof} \begin{theorem} The best-case query cost for leveling for a decomposable search problem with cost $\mathscr{Q}_S(n)$ is \begin{equation*} \mathscr{Q}_B(n) \in O(\mathscr{Q}_S(n) \cdot \log_s n) \end{equation*} \end{theorem} \begin{proof} Unlike BSM, leveling will never have empty levels. The policy ensures that there is always a data structure on every level. As a result, the best-case query still must query $\log_s n$ structures, and so has a best-case cost of, \begin{equation*} \mathscr{Q}_B(n) \in O\left(\mathscr{Q}_S(n) \cdot \log_s n\right) \end{equation*} \end{proof} \subsection{Tiering} \begin{algorithm} \caption{The Tiering Policy} \label{alg:design-tiering} \KwIn{$r$: set of records to be inserted, $\mathscr{L}_0 \ldots \mathscr{L}_{\log_s n}$: the levels of $\mathscr{I}$, $n$: the number of records in $\mathscr{I}$} \BlankLine \Comment{Find the first non-full level} $target \gets -1$ \; \For{$i=0\ldots \log_s n$} { \If {$|\mathscr{L}_i| < s$} { $target \gets i$ \; break \; } } \BlankLine \Comment{If the structure is full, we need to grow it} \If {$target = -1$} { $target \gets 1 + (\log_s n)$ \; } \BlankLine \Comment{Walk the structure backwards, applying reconstructions} \For {$i \gets target \ldots 1$} { $\mathscr{L}_i \gets \mathscr{L_i} \cup \text{build}(\text{unbuild}(\mathscr{L}_{i-1, 0}) \ldots \text{unbuild}(\mathscr{L}_{i-1, s-1}))$ \; } \BlankLine \Comment{Add the buffered records to $\mathscr{L}_0$} $\mathscr{L}_0 \gets \mathscr{L}_0 \cup \text{build}(r)$ \; \Return \; \end{algorithm} Our tiering layout policy is described in Algorithm~\ref{alg:design-tiering}. In this policy, each level contains $s$ shards, each with a capacity $N_B\cdot s^i$ records. When a reconstruction occurs, the first level with fewer than $s$ shards is selected as the target, $t$. Then, for every level with $i < t$, all of the shards in $i$ are merged into a single shard using a reconstruction and placed in level $i+1$. These reconstructions are performed backwards, starting at $t-1$ and moving back up towards $0$. Then, the the shard created by the buffer flush is placed in level $0$. \begin{theorem} The amortized insertion cost of tiering with a scale factor of $s$ is, \begin{equation*} I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \log_s n \right) \end{equation*} \end{theorem} \begin{proof} For tiering, each record is written \emph{exactly} one time per level. As a result, each record will be involved in exactly $\log_s n$ reconstructions over the lifetime of the structure. Each reconstruction will have cost $B(n)$, and thus the amortized insertion cost must be, \begin{equation*} I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \log_s n\right) \end{equation*} \end{proof} \begin{theorem} The worst-case insertion cost of tiering with a scale factor of $s$ is, \begin{equation*} I(n) \in \Theta\left(B(n)\right) \end{equation*} \end{theorem} \begin{proof} The worst-case reconstruction in tiering involves performing a reconstruction on each level. More formally, the total cost of this reconstruction will be, \begin{equation*} I(n) \in \Theta\left(\sum_{i=0}^{\log_s n} B(s^i)\right) \end{equation*} \end{proof} \begin{theorem} The worst-case query cost for tiering for a decomposable search problem with cost $\mathscr{Q}_S(n)$ is \begin{equation*} \mathscr{Q}(n) \in O( \mathscr{Q}_S(n) \cdot s \log_s n) \end{equation*} \end{theorem} \begin{proof} As with the previous two policies, the worst-case query occurs when the structure is completely full. In case of tiering, that means that there will be $\log_s n$ levels, each containing $s$ shards with a size bounded by $O(n)$. Thus, there will be $s \log_s n$ structures to query, and the query cost must be, \begin{equation*} \mathscr{Q}(n) \in O \left(\mathscr{Q}_S(n) \cdot s \log_s n \right) \end{equation*} \end{proof} \begin{theorem} The best-case query cost for tiering for a decomposable search problem with cost $\mathscr{Q}_S(n)$ is $O(\log_s n)$. \end{theorem} \begin{proof} The tiering policy ensures that there are no internal empty levels, and as a result the best case scenario for tiering occurs when each level is populated by exactly $1$ shard. In this case, there will only be $\log_s n$ shards to query, resulting in, \begin{equation*} \mathscr{Q}_B(n) \in O\left(\mathscr{Q}_S(n) \cdot \log_S n \right) \end{equation*} best-case query cost. \end{proof} \section{General Observations} The asymptotic results from the previous section are summarized in Table~\ref{tab:policy-comp}. When the scale factor is accounted for in the analysis, we can see that possible trade-offs begin to manifest within the space. We've seen some of these in action directly in the experimental sections of previous chapters. Most notably, we can directly see in these cost functions the reason why tiering and leveling experience opposite effects as the scale factor changes. In both policies, increasing the scale factor increases the base of the logarithm governing the height, and so in the absence of the additional constants in the analysis, it would superficially appear as though both policies should see the same effects. But, with other constants retained, we can see that this is in fact not the case. For tiering, increasing the scale factor does reduce the number of levels, however it also increases the number of shards. Because the level reduction is in the base of the logarithm, but the shard count increase is directly linear, the shard count effect dominates and we see the query performance reduce as the scale factor increases. Leveling, however, does not include this linear term and sees only a reduction in height. When considering insertion, we see a similar situation in reverse. For leveling and tiering, increasing the scale factor reduces the size of the log term, and there are no other terms at play in tiering, so we see an improvement in insertion performance. However, leveling also has a linear dependency on the scale factor, as increasing the scale factor also increases the write amplification. This is why leveling sees its insertion performance reduce with scale factor. The generalized Bentley-Saxe method follows the same general trends as leveling for worst-case query cost and for amortized insertion cost. Of note as well is the fact that leveling has slightly better worst-case insertion performance. This is because leveling only ever reconstructs one level at a time, with the other levels simply shifting around in constant time. Bentley-Saxe and tiering have strictly worse worst-case insertion cost as their worst-case reconstructions involve all of the levels. In the Bentley-Saxe method, this worst-case cost is manifest in a single, large reconstruction. In tiering, it involves $\log_s n$ reconstructions, one per level. \begin{table*} \centering \small \renewcommand{\arraystretch}{1.6} \begin{tabular}{|l l l l|} \hline & \textbf{Gen. BSM} & \textbf{Leveling} & \textbf{Tiering} \\ \hline $I(n)$ & $\Theta(B(n))$ & $\Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right)$ & $ \Theta\left(\sum_{i=0}^{\log_s n} B(s^i)\right)$ \\ \hline $I_A(n)$ & $\Theta\left(\frac{B(n)}{n} s\log_s n)\right)$ & $\Theta\left(\frac{B(n)}{n} s\log_s n\right)$& $\Theta\left(\frac{B(n)}{n} \log_s n\right)$ \\ \hline $\mathscr{Q}(n)$ &$O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)$ & $O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)$ & $O\left(s \log_s n \cdot \mathscr{Q}_S(n)\right)$\\ \hline $\mathscr{Q}_B(n)$ & $\Theta(\mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ \\ \hline \end{tabular} \caption{Comparison of cost functions for various layout policies for DSPs} \label{tab:policy-comp} \end{table*} \section{Experimental Evaluation} In the previous sections, we mathematically proved various claims about the performance characteristics of our three layout policies to assess the trade-offs that exist within the design space. While this analysis is useful, the effects we are examining are at the level of constant factors, and so it would be useful to perform experimental testing to validate that these claimed performance characteristics manifest in practice. In this section, we will do just that, running various benchmarks to explore the real-world performance implications of the configuration parameter space of our framework. \subsection{Asymptotic Insertion Performance} We'll begin by validating our results for the insertion performance characteristics of the three layout policies. For this test, we consider two data structures: the ISAM tree and the VP tree. The ISAM tree structure is merge-decomposable using a sorted-array merge, with a build cost of $B_M(n, k) \in \Theta(n \log k)$, where $k$ is the number of structures being merged. The VPTree, by contrast, is \emph{not} merge decomposable, and is built in $B(n) \in \Theta(n \log n)$ time. We use the $200$ million record SOSD \texttt{OSM} dataset~\cite{sosd-datasets} for ISAM testing, and the one million record, $300$-dimensional Spanish Billion Words (\texttt{SBW}) dataset~\cite{sbw} for VPTree testing. For our first experiment, we will examine the latency distribution for inserts into our structures. We tested the three layout policies, using a common scale factor of $s=2$. This scale factor was selected to minimize its influence on the results (we've seen before in Sections~\ref{ssec:ds-exp} and \ref{ssec:dyn-ds-exp} that scale factor affects leveling and tiering in opposite ways) and isolate the influence of the layout policy alone to as great a degree as possible. We used a buffer size of $N_B=12000$ for the ISAM tree structure, and $N_B=1000$ for the VPTree. We generated this distribution by inserting $30\%$ of the records from the set to ``warm up'' the dynamized structure, and then measuring the insertion latency for each individual insert for the remaining $70\%$ of the data. Note that, due to timer resolution issues at nanosecond scales, the specific latency values associated with the faster end of the insertion distribution are not precise. However, it is our intention to examine the latency distribution, not the values themselves, and so this is not a significant limitation for our analysis. The resulting distributions are shown in Figure~\ref{fig:design-policy-ins-latency}. These distributions are representing using a ``reversed'' CDF with log scaling on both axes. \begin{figure} \centering \subfloat[ISAM Tree Insertion Latencies]{\includegraphics[width=.5\textwidth]{img/design-space/isam-insert-dist.pdf} \label{fig:design-isam-ins-dist}} \subfloat[VPTree Insertion Latencies]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-insert-dist.pdf} \label{fig:design-vptree-ins-dist}} \\ \caption{Insertion Latency Distributions for Layout Policies} \label{fig:design-policy-ins-latency} \end{figure} The first notable point is that, for both the ISAM tree in Figure~\ref{fig:design-isam-ins-dist} and VPTree in Figure~\ref{fig:design-vptree-ins-dist}, the leveling policy results in a measurable lower worst-case insertion latency. This result is in line with our theoretical analysis in Section~\ref{sec:design-asymp}. However, there is a major deviation from theoretical in the worst-case performance of tiering and BSM. Both of these should have similar worst-case latencies, as the worst-case reconstruction in both cases involves every record in the structure. Yet, we see tiering consistently performing better, particularly for the ISAM tree. The reason for this has to do with the way that the records are partitioned in these worst-case reconstructions. In tiering, with a scale factor of $s$, the worst-case reconstruction consists of $\Theta(\log_2 n)$ distinct reconstructions, each involving exactly $2$ structures. BSM, on the other hand, will use exactly $1$ reconstruction involving $\Theta(\log_2 n)$ structures. This explains why ISAM performs much better in tiering than BSM, as the actual reconstruction cost function there is $\Theta(n \log_2 k)$. For tiering, this results in $\Theta(n)$ cost in the worst case. BSM, on the other hand, has $\Theta(n \log_2 \log_2 n)$, as many more distinct structures must be merged in the reconstruction, and is thus asymptotically worse-off. VPTree, on the other hand, sees less of a difference because it is \emph{not} merge decomposable, and so the number of structures playing a role in the reconstructions plays less of a role. Having the records more partitioned still hurts performance, due to cache effects most likely, but less so than in the MDSP case. \begin{figure} \centering \subfloat[ISAM Tree]{\includegraphics[width=.5\textwidth]{img/design-space/isam-tput.pdf} \label{fig:design-isam-tput}} \subfloat[VPTree]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-tput.pdf} \label{fig:design-vptree-tput}} \\ \caption{Insertion Throughput for Layout Policies} \label{fig:design-ins-tput} \end{figure} Next, in Figure~\ref{fig:design-ins-tput}, we show the overall insertion throughput for the three policies for both ISAM tree and VPTree. This result should correlate with the amortized insertion costs for each policy derived in Section~\ref{sec:design-asymp}. At a scale factor of $s=2$, all three policies have similar insertion performance. This makes sense, as both leveling and Bentley-Saxe experience write-amplification proportional to the scale factor, and at $s=2$ this isn't significantly larger than tiering's write amplification, particularly compared to the other factors influencing insertion performance, such as reconstruction time. However, for larger scale factors, tiering shows \emph{significantly} higher insertion throughput, and leveling and Bentley-Saxe show greatly degraded performance due to the large amount of additional write amplification. These results are perfectly in line with the mathematical analysis of the previous section. \subsection{General Insert vs. Query Trends} For our next experiment, we will consider the trade-offs between insertion and query performance that exist within this design space. We benchmarked each layout policy for a range of scale factors, measuring both their respective insertion throughputs and query latencies for both ISAM tree and VPTree. \begin{figure} \centering \subfloat[ISAM Tree Range Count]{\includegraphics[width=.5\textwidth]{img/design-space/isam-parm-sweep.pdf} \label{fig:design-isam-tradeoff}} \subfloat[VPTree $k$-NN]{\includegraphics[width=.5\textwidth]{img/design-space/knn-parm-sweep.pdf} \label{fig:design-knn-tradeoff}} \\ \caption{Insertion Throughput vs. Query Latency for varying scale factors} \label{fig:design-tradeoff} \end{figure} Figure~\ref{fig:design-isam-tradeoff} shows the trade-off curve between insertion throughput and query latency for range count queries executed against a dynamized ISAM tree. This test was run with a dataset of 500 million uniform integer keys, and a selectivity of $\sigma = 0.0000001$, the scale factor associated with each point is annotated on the plot. These results show that there is a very direct relationship between scale factor, layout policy, and insertion throughput. Leveling almost universally has lower insertion throughput but also lower query latency than tiering does, though at scale factor $s=2$ they are fairly similar. Tiering gains insertion throughput at the cost of query performance as the scale factor increases, although the rate at which the insertion performance improves decreases for larger scale factors, and the rate at which query performance declines increases dramatically. One interesting note is that leveling sees very little improvement in query latency as the scale factor is increased. This is due to the fact that, asymptotically, the scale factor only affects leveling's query performance by increasing the base of a logarithm. Thus, small increases in scale factor have very little effect. However, level's insertion performance degrades linearly with scale factor, and this is well demonstrated in the plot. The story is a bit clearer in Figure~\ref{fig:design-knn-tradeoff}. The VPTree has a much greater construction time, both asymptotically and in absolute terms, and the average query latency is also significantly greater. These result in the configuration changes showing much more significant changes in performance, and present us with a far clearer trade-off space. The same general trends hold as in ISAM, just amplified. Leveling has better query performance than tiering and sees increased query performance and decreased insert performance as the scale factor increases. Tiering has better insertion performance and worse query performance than leveling, and sees improved insert and worsening query performance as the scale factor is increased. The Bentley-Saxe method shows similar trends to leveling. In general, the Bentley-Saxe method appears to follow a very similar trend to that of leveling, albeit with even more dramatic performance degradation as the scale factor is increased and slightly better query performance across the board. Generally it seems to be a strictly worse alternative to leveling in all but its best-case query cost, and we will omit it from our tests moving forward. \subsection{Buffer Size} \begin{figure} \centering \subfloat[ISAM Tree Range Count]{\includegraphics[width=.5\textwidth]{img/design-space/isam-bs-sweep.pdf} \label{fig:buffer-isam-tradeoff}} \subfloat[VPTree $k$-NN]{\includegraphics[width=.5\textwidth]{img/design-space/knn-bs-sweep.pdf} \label{fig:buffer-knn-tradeoff}} \\ \caption{Insertion Throughput vs. Query Latency for varying buffer sizes} \label{fig:buffer-size} \end{figure} In the previous section, we considered the effect of various scale factors on the trade-off between insertion and query performance. Our framework also supports varying buffer sizes, and so we will examine this next. Figure~\ref{fig:buffer-size} shows the same insertion throughput vs. query latency curves for fixed layout policy and scale factor configurations at varying buffer sizes, under the same experimental conditions as the previous test. Unlike with the scale factor, there is a significant difference in the behavior of the two tested structures under buffer size variation. For the ISAM tree, shown in Figure~\ref{fig:buffer-isam-tradeoff}, we see that all layout policies follow a similar pattern. Increasing the buffer size increases insertion throughput for little to no additional query cost up to a certain point, after which query performance degrades substantially. This isn't terribly surprising: growing the buffer size will increase the number of records on each level, and therefore decrease the number of shards, while at the same time reducing the number of reconstructions that must be performed. However, the query must be answered against the buffer too, and once the buffer gets sufficiently large, this increased cost will exceed any query latency benefit from decreased shard count. We see this pattern fairly clearly on all tested configurations, however BSM sees the least benefit from an increased buffer size in terms of insertion performance. VPTree is another story, shown in Figure~\ref{fig:buffer-knn-tradeoff}. This plot is far more chaotic; in fact there aren't any particularly strong patterns to draw from it. This is likely due to the fact that the time scales associated with the VPTree in terms of both reconstruction and query latency are significantly larger, and so the relatively small constant associated with adjusting the buffer size doesn't have as strong an influence on performance as it does for the ISAM tree. \subsection{Query Size Effects} One potentially interesting aspect of decomposition-based dynamization techniques is that, asymptotically, the additional cost added by decomposing the data structure vanished for sufficiently expensive queries. Bentley and Saxe proved that for query costs of the form $\mathscr{Q}_B(n) \in \Omega(n^\epsilon)$ for $\epsilon > 0$, the overall query cost is unaffected (asymptotically) by the decomposition. This would seem to suggest that, as the cost of the query over a single shard increases, the effectiveness of our design space for tuning query performance should reduce. This is because our tuning space consists of adjusting the number of shards within the structure, and so as the effects of decomposition on the query cost reduce, we should see all configurations approaching a similar query performance. In order to evaluate this effect, we tested the query latency of range queries of varying selectivity against various configurations of our framework to see at what points the query latencies begin to converge. We also tested $k$-NN queries with varying values of $k$. For these tests, we used a synthetic dataset of 500 million 64-bit key-value pairs for the ISAM testing, and the SBW dataset for $k$-NN. Query latencies were measured by executing the queries after all records were inserted into the structure. \begin{figure} \centering \subfloat[ISAM Tree Range Count]{\includegraphics[width=.5\textwidth]{img/design-space/selectivity-sweep.pdf} \label{fig:design-isam-sel}} \subfloat[VPTree $k$-NN]{\includegraphics[width=.5\textwidth]{img/design-space/selectivity-sweep-knn.pdf} \label{fig:design-knn-sel}} \\ \caption{Query Result Size Effect Analysis} \label{fig:design-query-sze} \end{figure} Interestingly, for the range of selectivities tested for range counts, the overall query latency failed to converge, and there remains a consistent, albeit slight, stratification amongst the tested policies, as shown in Figure~\ref{fig:design-isam-sel}. As the selectivity continues to rise above those shown in the chart, the relative ordering of the policies remains the same, but the relative differences between them begin to shrink. This result makes sense given the asymptotics--there is still \emph{some} overhead associated with the decomposition, but as the cost of the query approaches linear, it makes up an increasingly irrelevant portion of the run time. The $k$-NN results in Figure~\ref{fig:design-knn-sel} show a slightly different story. This is also not surprising, because $k$-NN is a $C(n)$-decomposable problem, and the cost of result combination grows with $k$. Thus, larger $k$ values will \emph{increase} the effect that the decomposition has on the query run time, unlike was the case in the range count queries, where the total cost of the combination is constant. % \section{Asymptotically Relevant Trade-offs} % Thus far, we have considered a configuration system that trades in % constant factors only. In general asymptotic analysis, all possible % configurations of our framework in this scheme collapse to the same basic % cost functions when the constants are removed. While we have demonstrated % that, in practice, the effects of this configuration are measurable, there % do exist techniques in the classical literature that provide asymptotically % relevant trade-offs, such as the equal block method~\cite{maurer80} and % the mixed method~\cite[pp. 117-118]{overmars83}. These techniques have % cost functions that are derived from arbitrary, positive, monotonically % increasing functions of $n$ that govern various ways in which the data % structure is partitioned, and changing the selection of function allows % for "tuning" the performance. However, to the best of our knowledge, % these techniques have never been implemented, and no useful guidance in % the literature exists for selecting these functions. % However, it is useful to consider the general approach of these % techniques. They accomplish asymptotically relevant trade-offs by tying % the decomposition of the data structure directly to a function of $n$, % the number of records, in a user-configurable way. We can import a similar % concept into our already existing configuration framework for dynamization % to enable similar trade-offs, by replacing the constant scale factor, % $s$, with some function $s(n)$. However, we must take extreme care when % doing this to select a function that doesn't catastrophically impair % query performance. % Recall that, generally speaking, our dynamization technique requires % multiplying the cost function for the data structure being dynamized by % the number of shards that the data structure has been decomposed into. For % search problems that are solvable in sub-polynomial time, this results in % a worst-case query cost of, % \begin{equation} % \mathscr{Q}(n) \in O(S(n) \cdot \mathscr{Q}_S(n)) % \end{equation} % where $S(n)$ is the number of shards and, for our framework, is $S(n) \in % O(s \log_s n)$. The user can adjust $s$, but this tuning does not have % asymptotically relevant consequences. Unfortunately, there is not much % room, practically, for adjustment. If, for example, we were to allow the % user to specify $S(n) \in \Theta(n)$, rather than $\Theta(\log n)$, then % query performance would be greatly impaired. We need a function that is % sub-linear to ensure useful performance. % To accomplish this, we proposed adding a second scaling factor, $k$, such % that the number of records on level $i$ is given by, % \begin{equation} % \label{eqn:design-k-expr} % N_B \cdot \left(s \log_2^k(n)\right)^{i} % \end{equation} % with $k=0$ being equivalent to the configuration space we have discussed % thus far. The addition of $k$ allows for the dependency of the number of % shards on $n$ to be slightly biased upwards or downwards, in a way that % \emph{does} show up in the asymptotic analysis for inserts and queries, % but also ensures sub-polynomial additional query cost. % In particular, we prove the following asymptotic properties of this % configuration. % \begin{theorem} % The worst-case query latency of a dynamization scheme where the % capacity of each level is provided by Equation~\ref{eqn:design-k-expr} is % \begin{equation} % \mathscr{Q}(n) \in O\left(\left(\frac{\log n}{\log (k \log n))}\right) \cdot \mathscr{Q}_S(n)\right) % \end{equation} % \end{theorem} % \begin{proof} % The number of levels within the structure is given by $\log_s (n)$, % where $s$ is the scale factor. The addition of $k$ to the parametrization % replaces this scale factor with $s \log^k n$, and so we have % \begin{equation*} % \log_{s \log^k n}n = \frac{\log n}{\log\left(s \log^k n\right)} = \frac{\log n}{\log s + \log\left(k \log n\right)} \in O\left(\frac{\log n}{\log (k \log n)}\right) % \end{equation*} % by the application of various logarithm rules and change-of-base formula. % The cost of a query against a decomposed structure is $O(S(n) \cdot \mathscr{Q}_S(n))$, and % there are $\Theta(1)$ shards per level. Thus, the worst case query cost is % \begin{equation*} % \mathscr{Q}(n) \in O\left(\left(\frac{\log n}{\log (k \log n))}\right) \cdot \mathscr{Q}_S(n)\right) % \end{equation*} % \end{proof} % \begin{theorem} % The amortized insertion cost of a dynamization scheme where the capacity of % each level is provided by Equation~\ref{eqn:design-k-expr} is, % \begin{equation*} % I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \frac{\log n}{\log ( k \log n)}\right) % \end{equation*} % \end{theorem} % \begin{proof} % \end{proof} % \subsection{Evaluation} % In this section, we'll access the effect that modifying $k$ in our % new parameter space has on the insertion and query performance of our % dynamization framework. \section{Conclusion} In this chapter, we considered the proposed design space for our dynamization framework both mathematically and experimentally, and derived some general principles for configuration within the space. We generalized the Bentley-Saxe method to support scale factors and buffering, but found that the result was generally worse than leveling in all but its best case query performance. We also showed that there does exist a trade-off, mediated by scale factor, between insertion performance and query performance, though it doesn't manifest for every layout policy and data structure combination. For example, when testing the ISAM tree structure with the leveling or BSM policies, there is not a particularly useful trade-off resulting from scale factor adjustments, because the amount of extra query performance resulting from increasing the scale factor is dwarfed by the reduction in insertion performance. This is because the cost in insertion performance grows far faster than any query performance benefit, due to the way to two effects scale in the cost functions for the method. Broadly speaking, we can draw a few general conclusions. First, the leveling and BSM policies are fairly similar, with the BSM having slightly better query performance in general owing to its better best-case query cost. Both of these policies are better than tiering in terms of query performance, but generally worse for insertion performance. The one slight exception to this trend is in worst-case insertion performance, where leveling has a slight advantage over the other policies because of the way it performs reconstructions ensuring that the worst-case reconstruction cost is smaller. Adjusting the scale factor can trade between insert and query performance, though leveling and BSM have an opposite effect from tiering. For these policies, increasing the scale factor reduces insert performance and improves query performance. Tiering does the opposite. The mutable buffer can be increased in size to improve insert performance as well (in all cases), but the query cost increases as a result. Once the buffer gets sufficiently large, the trade-off in query performance becomes severe. While this trade-off space does provide us with the desired configurability, the experimental results show that the trade-off curves are not particularly smooth, and the effectiveness can vary quite a bit depending on the properties of the data structure and search problem being dynamized. Additionally, there isn't a particular good way to control insertion tail latencies in this model, as leveling is only slightly better in this metric. In the next chapter, we'll consider methods for controlling tail latency, which will, as a side benefit, also provide a more desirable configuration space than the one considered here.