updates

author: Douglas Rumbaugh <dbr4@psu.edu> 2025-06-20 17:24:18 -0400
committer: Douglas Rumbaugh <dbr4@psu.edu> 2025-06-20 17:24:18 -0400
commit: 7700f2818cca731cadac034322a28f19e9ac3a17 (patch)
tree: 86e29639d5067bc047ee2f36471eda0ce8c7a291 /chapters/design-space.tex
parent: 903055812fa35e0533b940ddb2d8db8c2a20af2b (diff)
download: dissertation-7700f2818cca731cadac034322a28f19e9ac3a17.tar.gz
1 files changed, 129 insertions, 108 deletions
diff --git a/chapters/design-space.tex b/chapters/design-space.tex
index 321c638..c8876de 100644
--- a/chapters/design-space.tex
+++ b/chapters/design-space.tex
@@ -5,15 +5,14 @@
 
 In the previous two chapters, we introduced an LSM tree inspired design
 space into the Bentley-Saxe method to allow for more flexibility in
-tuning the performance. However, aside from some general comments
-about how these parameters operator in relation to insertion and
-query performance, and some limited experimental evaluation, we haven't
-performed a systematic analysis of this space, its capabilities, and its
-limitations. We will rectify this situation in this chapter, performing
-both a detailed mathematical analysis of the design parameter space,
-as well as experiments to demonstrate these trade-offs exist in practice.
-
-\subsection{Why bother?}
+performance tuning. However, aside from some general comments about how
+these parameters affect insertion and query performance, and some limited
+experimental evaluation, we have not performed a systematic analysis of
+this space, its capabilities, and its limitations. We will rectify this
+situation in this chapter, performing both a detailed mathematical
+analysis of the design parameter space, as well can experimental
+evaluation, to explore the space and its trade-offs, and demonstrate
+their practical effectiveness.
 
 Before diving into the design space we have introduced in detail, it's
 worth taking some time to motivate this entire endeavor. There is a large
@@ -21,78 +20,73 @@ body of theoretical work in the area of data structure dynamization,
 and, to the best of our knowledge, none of these papers have introduced
 a design space of the sort that we have introduced here. Despite this,
 some papers which \emph{use} these techniques have introduced similar
-design elements into their own implementations~\cite{pgm}, with some
-even going so far as to (inaccurately) describe these elements as part
-of the Bentley-Saxe method~\cite{almodaresi23}.
-
-This situation is best understood, we think, in terms of the ultimate
-goals of the respective lines of work. In the classical literature on
-dynamization, the focus is mostly on proving theoretical asymptotic
-bounds about the techniques. In this context, the LSM tree design space
-is of limited utility, because its tuning parameters adjust constant
-factors only, and thus don't play a major role in asymptotics. Where
+design elements into their own implementations~\cite{pgm}, with some even
+going so far as to describe these elements as part of the Bentley-Saxe
+method~\cite{almodaresi23}.
+
+This situation is best understood in terms of the ultimate goals
+of the respective lines of work. In the classical literature
+on dynamization, the focus is directed at proving theoretical
+asymptotic bounds. In this context, the LSM tree design space is
+of limited utility, because its tuning parameters adjust constant
+factors, and thus don't play a major role in asymptotics. Where
 the theoretical literature does introduce configurability, such as
 with the equal blocks method~\cite{overmars-art-of-dyn} or more
 complex schemes that nest the equal block method \emph{inside}
 of a binary decomposition~\cite{overmars81}, the intention is
 to produce asymptotically relevant trade-offs between insert,
 query, and delete performance for deletion decomposable search
-problems~\cite[pg. 117]{overmars83}. This is why the equal block method
-is described in terms of a function, rather than a constant value,
+problems~\cite[pg. 117]{overmars83}. This explains why the equal block
+method is described in terms of a function, rather than a constant value,
 to enable it to appear in the asymptotics.
 
 On the other hand, in practical scenarios, constant tuning of performance
 can be very relevant. We've already shown in Sections~\ref{ssec:ds-exp}
-and \ref{ssec:dyn-ds-exp} how tuning parameters, particularly the
-number of shards per level, can have measurable real-world effects on the
-performance characteristics of dynamized structures, and in fact sometimes
+and \ref{ssec:dyn-ds-exp} how tuning parameters, particularly adjusting
+the number of shards per level, can have measurable real-world effects on
+the performance characteristics of dynamized structures. In fact sometimes
 this tuning is \emph{necessary} to enable reasonable performance. It's
 quite telling that the two most direct implementations of the Bentley-Saxe
 method that we have identified in the literature are both in the context
 of metric indices~\cite{naidan14,bkdtree}, a class of data structure
 and search problem for which we saw very good performance from standard
-Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. The other experiments
-in Chapter~\ref{chap:framework} show that, for other types of problem,
-the technique does not fare quite so well.
+Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. Our experiments in
+Chapter~\ref{chap:framework} show that, for other types of problem,
+the technique does not fare quite so well in its unmodified form.
 
 \section{Asymptotic Analysis}
 \label{sec:design-asymp}
 
-Before beginning with derivations for
-the cost functions of dynamized structures within the context of our
-proposed design space, we should make a few comments about the assumptions
-and techniques that we will use in our analysis. As this design space
-involves adjusting constants, we will leave the design-space related
-constants within our asymptotic expressions. Additionally, we will
-perform the analysis for a simple decomposable search problem. Deletes
-will be entirely neglected, and we won't make any assumptions about
-mergeability. We will also neglect the buffer size, $N_B$, during this
-analysis. Buffering isn't fundamental to the techniques we are examining
-in this chapter, and including it would increase the complexity of the
-analysis without contributing any useful insights.\footnote{
-	The contribution of the buffer size is simply to replace each of the
-	individual records considered in the analysis with batches of $N_B$
-	records. The same patterns hold.
-} 
+Before beginning with derivations for the cost functions of dynamized
+structures within the context of our proposed design space, we should
+make a few comments about the assumptions and techniques that we will use
+in our analysis. We will generally neglect buffering in our analysis,
+both in terms of the additional query cost of querying the buffer, and
+in terms of the buffer's effect on the reconstruction process. Buffering
+isn't fundamental to the techniques we are considering, and including it
+would needlessly complicate the analysis. However, we will include the
+scale factor, $s$, which directly governs the number of blocks within
+the dynamized structures.  Additionally, we will perform the query cost
+analysis assuming a decomposable search problem.  Deletes will be entirely
+neglected, and we won't make any assumptions about mergeability.
 
 \subsection{Generalized Bentley Saxe Method}
 As a first step, we will derive a modified version of the Bentley-Saxe
-method that has been adjusted to support arbitrary scale factors, and
-buffering. There's nothing fundamental to the technique that prevents
-such modifications, and its likely that they have not been analyzed
-like this before simply out of a lack of interest in constant factors in
-theoretical asymptotic analysis. During our analysis, we'll intentionally
-leave these constant factors in place.
-
-When generalizing the Bentley-Saxe method for arbitrary scale factors, we
-decided to maintain the core concept of binary decomposition. One interesting
-mathematical property of a Bentley-Saxe dynamization is that the internal
-layout of levels exactly matches the binary representation of the record
-count contained within the index. For example, a dynamization containing
-$n=20$ records will have 4 records in the third level, and 16 in the fifth,
-with all other levels being empty. If we represent a full level with a 1
-and an empty level with a 0, then we'd have $10100$, which is $20$ in
-base 2.
+method that has been adjusted to support arbitrary scale factors.  There's
+nothing fundamental to the technique that prevents such modifications,
+and its likely that they have not been analyzed like this before simply
+out of a lack of interest in constant factors in theoretical asymptotic
+analysis.
+
+When generalizing the Bentley-Saxe method for arbitrary scale factors,
+we decided to maintain the core concept of binary decomposition. One
+interesting mathematical property of a Bentley-Saxe dynamization is that
+the internal layout of levels exactly matches the binary representation of
+the record count contained within the index. For example, a dynamization
+containing $n=20$ records will have 4 records in the third level, and
+16 in the fifth, with all other levels being empty. If we represent a
+full level with a 1 and an empty level with a 0, then we'd have $10100$,
+which is $20$ in base 2.
 
 \begin{algorithm}
 \caption{The Generalized BSM Layout Policy}
@@ -137,7 +131,6 @@ Unfortunately, the approach used by Bentley and Saxe to calculate the
 amortized insertion cost of the BSM does not generalize to larger bases,
 and so we will need to derive this result using a different approach.
 
-
 \begin{theorem}
 The amortized insertion cost for generalized BSM with a growth factor of
 $s$ is $\Theta\left(\frac{B(n)}{n} \cdot s\log_s n)\right)$.
@@ -318,7 +311,7 @@ $j+1$. This process clears space in level $0$ to contain the buffer flush.
 \begin{theorem}
 The amortized insertion cost of leveling with a scale factor of $s$ is
 \begin{equation*}
-I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \frac{1}{2}(s+1)\log_s n\right)
+I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot s \log_s n\right)
 \end{equation*}
 \end{theorem}
 \begin{proof}
@@ -592,10 +585,10 @@ reconstructions, one per level.
 \begin{tabular}{|l l l l|}
 \hline
 & \textbf{Gen. BSM} & \textbf{Leveling} & \textbf{Tiering} \\ \hline
+$I(n)$ & $\Theta(B(n))$ & $\Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right)$ & $ \Theta\left(\sum_{i=0}^{\log_s n} B(s^i)\right)$ \\ \hline
 $I_A(n)$ & $\Theta\left(\frac{B(n)}{n} s\log_s n)\right)$ & $\Theta\left(\frac{B(n)}{n} s\log_s n\right)$& $\Theta\left(\frac{B(n)}{n} \log_s n\right)$ \\ \hline
 $\mathscr{Q}(n)$ &$O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)$ & $O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)$ & $O\left(s \log_s n  \cdot \mathscr{Q}_S(n)\right)$\\ \hline
-%$\mathscr{Q}_B(n)$ & $\Theta(\mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ \\ \hline
-$I(n)$ & $\Theta(B(n))$ & $\Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right)$ & $ \Theta\left(\sum_{i=0}^{\log_s n} B(s^i)\right)$ \\ \hline
+$\mathscr{Q}_B(n)$ & $\Theta(\mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ \\ \hline
 \end{tabular}
 
 \caption{Comparison of cost functions for various layout policies for DSPs}
@@ -620,12 +613,12 @@ space of our framework.
 We'll begin by validating our results for the insertion performance
 characteristics of the three layout policies. For this test, we
 consider two data structures: the ISAM tree and the VP tree. The ISAM
-tree structure is merge-decomposable using a sorted-array merge, with
-a build cost of $B_M(n) \in \Theta(n \log k)$, where $k$ is the number
-of structures being merged. The VPTree, by contrast, is \emph{not}
-merge decomposable, and is built in $B(n) \in \Theta(n \log n)$ time. We
-use the $200,000,000$ record SOSD \texttt{OSM} dataset~\cite{sosd-datasets} for
-ISAM testing, and the $1,000,000$ record, $300$-dimensional Spanish
+tree structure is merge-decomposable using a sorted-array merge, with a
+build cost of $B_M(n, k) \in \Theta(n \log k)$, where $k$ is the number of
+structures being merged. The VPTree, by contrast, is \emph{not} merge
+decomposable, and is built in $B(n) \in \Theta(n \log n)$ time. We use
+the $200$ million record SOSD \texttt{OSM} dataset~\cite{sosd-datasets}
+for ISAM testing, and the one million record, $300$-dimensional Spanish
 Billion Words (\texttt{SBW}) dataset~\cite{sbw} for VPTree testing.
 
 For our first experiment, we will examine the latency distribution
@@ -639,13 +632,20 @@ buffer size of $N_B=12000$ for the ISAM tree structure, and $N_B=1000$
 for the VPTree.
 
 We generated this distribution by inserting $30\%$ of the records from
-the set to ``warm up'' the dynamized structure, and then measuring the
-insertion latency for each individual insert for the remaining $70\%$
-of the data.  Note that, due to timer resolution issues at nanosecond
-scales, the specific latency values associated with the faster end of
-the insertion distribution are not precise. However, it is our intention
-to examine the latency distribution, not the values themselves, and so
-this is not a significant limitation for our analysis.
+the set to ``warm up'' the dynamized structure, and then measuring
+the insertion latency for each individual insert for the remaining
+$70\%$ of the data.  Note that, due to timer resolution issues at
+nanosecond scales, the specific latency values associated with the
+faster end of the insertion distribution are not precise. However,
+it is our intention to examine the latency distribution, not the
+values themselves, and so this is not a significant limitation
+for our analysis.  The resulting distributions are shown in
+Figure~\ref{fig:design-policy-ins-latency}. These distributions
+are representing using a ``reversed'' CDF with log scaling on both
+axes. This representation has proven very useful for interpreting the
+latency distributions that we see in evaluating dynamization, but is
+slightly unusual, and so we've included a guide to interpreting these
+charts in Appendix~\ref{append:rcdf}.
 
 \begin{figure}
 \centering
@@ -655,32 +655,25 @@ this is not a significant limitation for our analysis.
 \label{fig:design-policy-ins-latency}
 \end{figure}
 
-The resulting distributions are shown in
-Figure~\ref{fig:design-policy-ins-latency}. These distributions are
-representing using a "reversed" CDF with log scaling on both axes. This
-representation has proven very useful for interpreting the latency
-distributions that we see in evaluating dynamization, but are slightly
-unusual, and so we've included a guide to interpreting these charts
-in Appendix~\ref{append:rcdf}.
 
 The first notable point is that, for both the ISAM
 tree in Figure~\ref{fig:design-isam-ins-dist} and VPTree in
-Figure~\ref{fig:design-vptree-ins-dist}, the Leveling policy results in a
+Figure~\ref{fig:design-vptree-ins-dist}, the leveling policy results in a
 measurable lower worst-case insertion latency. This result is in line with
 our theoretical analysis in Section~\ref{sec:design-asymp}. However, there
 is a major deviation from theoretical in the worst-case performance of
-Tiering and BSM. Both of these should have similar worst-case latencies,
+tiering and BSM. Both of these should have similar worst-case latencies,
 as the worst-case reconstruction in both cases involves every record
 in the structure. Yet, we see tiering consistently performing better,
 particularly for the ISAM tree.
 
 The reason for this has to do with the way that the records are
-partitioned in these worst-case reconstructions. In Tiering, with a scale
+partitioned in these worst-case reconstructions. In tiering, with a scale
 factor of $s$, the worst-case reconstruction consists of $\Theta(\log_2
 n)$ distinct reconstructions, each involving exactly $2$ structures. BSM,
 on the other hand, will use exactly $1$ reconstruction involving
 $\Theta(\log_2 n)$ structures. This explains why ISAM performs much better
-in Tiering than BSM, as the actual reconstruction cost function there is
+in tiering than BSM, as the actual reconstruction cost function there is
 $\Theta(n \log_2 k)$. For tiering, this results in $\Theta(n)$ cost in
 the worst case. BSM, on the other hand, has $\Theta(n \log_2 \log_2 n)$,
 as many more distinct structures must be merged in the reconstruction,
@@ -699,7 +692,7 @@ due to cache effects most likely, but less so than in the MDSP case.
 \end{figure}
 
 Next, in Figure~\ref{fig:design-ins-tput}, we show the overall insertion
-throughput for the three policies for both ISAM Tree and VPTree. This
+throughput for the three policies for both ISAM tree and VPTree. This
 result should correlate with the amortized insertion costs for each
 policy derived in Section~\ref{sec:design-asymp}. At a scale factor of
 $s=2$, all three policies have similar insertion performance. This makes
@@ -708,7 +701,7 @@ proportional to the scale factor, and at $s=2$ this isn't significantly
 larger than tiering's write amplification, particularly compared
 to the other factors influencing insertion performance, such as
 reconstruction time. However, for larger scale factors, tiering shows
-\emph{significantly} higher insertion throughput, and Leveling and
+\emph{significantly} higher insertion throughput, and leveling and
 Bentley-Saxe show greatly degraded performance due to the large amount
 of additional write amplification. These results are perfectly in line
 with the mathematical analysis of the previous section.
@@ -718,7 +711,7 @@ with the mathematical analysis of the previous section.
 For our next experiment, we will consider the trade-offs between insertion
 and query performance that exist within this design space. We benchmarked
 each layout policy for a range of scale factors, measuring both their
-respective insertion throughputs and query latencies for both ISAM Tree
+respective insertion throughputs and query latencies for both ISAM tree
 and VPTree.
 
 \begin{figure}
@@ -786,7 +779,8 @@ factors on the trade-off between insertion and query performance. Our
 framework also supports varying buffer sizes, and so we will examine this
 next. Figure~\ref{fig:buffer-size} shows the same insertion throughput
 vs. query latency curves for fixed layout policy and scale factor
-configurations at varying buffer sizes.
+configurations at varying buffer sizes, under the same experimental
+conditions as the previous test.
 
 Unlike with the scale factor, there is a significant difference in the
 behavior of the two tested structures under buffer size variation. For
@@ -830,7 +824,11 @@ configurations approaching a similar query performance.
 In order to evaluate this effect, we tested the query latency of range
 queries of varying selectivity against various configurations of our
 framework to see at what points the query latencies begin to converge. We
-also tested $k$-NN queries with varying values of $k$.
+also tested $k$-NN queries with varying values of $k$. For these tests,
+we used a synthetic dataset of 500 million 64-bit key-value pairs for
+the ISAM testing, and the SBW dataset for $k$-NN. Query latencies were
+measured by executing the queries after all records were inserted into
+the structure.
 
 \begin{figure}
 \centering
@@ -961,24 +959,47 @@ In this chapter, we considered the proposed design space for our
 dynamization framework both mathematically and experimentally, and derived
 some general principles for configuration within the space. We generalized
 the Bentley-Saxe method to support scale factors and buffering, but
-found that the result was strictly worse than leveling in all but its
+found that the result was generally worse than leveling in all but its
 best case query performance. We also showed that there does exist a
 trade-off, mediated by scale factor, between insertion performance and
-query performance for the tiering layout policy. Unfortunately, the
-leveling layout policy does not have a particularly useful trade-off
-in this area because the cost in insertion performance grows far faster
-than any query performance benefit, due to the way to two effects scale
-in the cost functions for the method. 
+query performance, though it doesn't manifest for every layout policy
+and data structure combination. For example, when testing the ISAM tree
+structure with the leveling or BSM policies, there is not a particularly
+useful trade-off resulting from scale factor adjustments, because the
+amount of extra query performance resulting from increasing the scale
+factor is dwarfed by the reduction in insertion performance. This is
+because the cost in insertion performance grows far faster than any
+query performance benefit, due to the way to two effects scale in the
+cost functions for the method. 
 
 Broadly speaking, we can draw a few general conclusions. First, the
-leveling layout policy is better than tiering for query latency in
-all configurations, but worse in insertion performance. Leveling also
-has the best insertion tail latency performance by a small margin,
-owing to the way it performs reconstructions. Tiering, however,
-has significantly better insertion performance and can be configured
-with query performance that is similar to leveling. These results are
-aligned with the smaller-scale parameter testing done in the previous
-chapters, which landed on tiering as a good general solution for most
-cases. Tiering also has the advantage of meaningful tuning through scale
-factor adjustment.
+leveling and BSM policies are fairly similar, with the BSM having slightly
+better query performance in general owing to its better best-case query
+cost. Both of these policies are better than tiering in terms of query
+performance, but generally worse for insertion performance. The one
+slight exception to this trend is in worst-case insertion performance,
+where leveling has a slight advantage over the other policies because
+of the way it performs reconstructions ensuring that the worst-case
+reconstruction cost is smaller. Adjusting the scale factor can trade
+between insert and query performance, though leveling and BSM have an
+opposite effect from tiering. For these policies, increasing the scale
+factor reduces insert performance and improves query performance. Tiering
+does the opposite. The mutable buffer can be increased in size to improve
+insert performance as well (in all cases), but the query cost increases
+as a result. Once the buffer gets sufficiently large, the trade-off in
+query performance becomes severe.
+
+While this trade-off space does provide us with the desired
+configurability, the experimental results show that the trade-off curves
+are not particularly smooth, and the effectiveness can vary quite a bit
+depending on the properties of the data structure and search problem being
+dynamized. Additionally, there isn't a particular good way to control
+insertion tail latencies in this model, as leveling is only slightly
+better in this metric. In the next chapter, we'll consider methods for
+controlling tail latency, which will, as a side benefit, also provide
+a more desirable configuration space than the one considered here.
+
+
+
+
author	Douglas Rumbaugh <dbr4@psu.edu>	2025-06-20 17:24:18 -0400
committer	Douglas Rumbaugh <dbr4@psu.edu>	2025-06-20 17:24:18 -0400
commit	7700f2818cca731cadac034322a28f19e9ac3a17 (patch)
tree	86e29639d5067bc047ee2f36471eda0ce8c7a291 /chapters/design-space.tex
parent	903055812fa35e0533b940ddb2d8db8c2a20af2b (diff)
download	dissertation-7700f2818cca731cadac034322a28f19e9ac3a17.tar.gz