From 6354e60f106a89f5bf807082561ed5efd9be0f4f Mon Sep 17 00:00:00 2001 From: "Douglas B. Rumbaugh" Date: Sun, 1 Jun 2025 11:59:11 -0400 Subject: updates --- chapters/dynamization.tex | 179 +++++++++++++++++++++++++++++++++++++++------- 1 file changed, 154 insertions(+), 25 deletions(-) (limited to 'chapters') diff --git a/chapters/dynamization.tex b/chapters/dynamization.tex index fce0d9f..053fb46 100644 --- a/chapters/dynamization.tex +++ b/chapters/dynamization.tex @@ -315,7 +315,7 @@ bit simpler, and so we will begin our discussion of decomposition-based technique for dynamization of decomposable search problems with it. There have been several proposed variations of this concept~\cite{maurer79, maurer80}, but we will focus on the most developed form as described by -Overmars and von Leeuwan~\cite{overmars-art-of-dyn, overmars83}. The core +Overmars and von Leeuwen~\cite{overmars-art-of-dyn, overmars83}. The core concept of the equal block method is to decompose the data structure into several smaller data structures, called blocks, over partitions of the data. This decomposition is performed such that each block is of @@ -478,7 +478,7 @@ called a \emph{merge decomposable search problem} (MDSP)~\cite{merge-dsp}. Note that in~\cite{merge-dsp}, Overmars considers a \emph{very} specific definition where the data structure is built in two stages. An initial sorting phase, requiring $O(n \log n)$ time, and then a construction -phase requiring $O(n)$ time. Overmar's proposed mechanism for leveraging +phase requiring $O(n)$ time. Overmars's proposed mechanism for leveraging this property is to include with each block a linked list storing the records in sorted order (presumably to account for structures where the records must be sorted, but aren't necessarily kept that way). During @@ -489,7 +489,7 @@ the amortized insertion cost is less than would have been the case paying the $O( n \log n)$ cost for each reconstruction.~\cite{merge-dsp} While Overmars's definition for MDSP does capture a large number of -mergable data structures (including all of the mergable structures +mergeable data structures (including all of the mergeable structures considered in this work), we modify his definition to consider a broader class of problems. We will be using the term to refer to any search problem with a data structure that can be merged more efficiently than @@ -515,7 +515,7 @@ the above definition.\footnote{ In the equal block method, all reconstructions are due to either inserting a record, in which case the reconstruction consists of adding a single record to a structure, not merging two structures, - or due to repartitioning, occurs when $f(n)$ increases sufficiently + or due to re-partitioning, occurs when $f(n)$ increases sufficiently that the existing structures must be made \emph{smaller}, and so, again, no merging is done. } @@ -846,8 +846,54 @@ into the $s$ blocks.~\cite{overmars-art-of-dyn} \subsection{Worst-Case Optimal Techniques} \label{ssec:bsm-worst-optimal} - - +Dynamization based upon amortized global reconstruction has a +significant gap between its \emph{amortized} insertion performance, +and its \emph{worst-case} insertion performance. When using the +Bentley-Saxe method, the logarithmic decomposition ensures that the +majority of inserts involve rebuilding only small data structures, +and thus are relatively fast. However, the worst-case insertion cost is +still $\Theta(B(n))$, no better than unamortized global reconstruction, +because the worst-case insert requires a reconstruction using all of +the records in the structure. + +Overmars and van Leeuwen~\cite{overmars81, overmars83} proposed an +alteration to the Bentley-Saxe method that is capable of bringing the +worst-case insertion cost in in line with amortized, $I(n) \in \Theta +\left(\frac{B(n)}{n} \log n\right)$. To accomplish this, they introduce +a structure that is capable of spreading the work of reconstructions +out across multiple inserts. Their structure consists of $\log_2 n$ +levels, like the Bentley-Saxe method, but each level contains four data +structures, rather than one,called $Oldest_i$, $Older_i$, $Old_i$, $New_i$ +respectively.\footnote{ + We are here adopting nomenclature used by Erickson in his lecture + notes on the topic~\cite{erickson-bsm-notes}, which is a bit clearer + than the more mathematical notation in the original source material. +} The $Old$, $Older$, $Oldest$ structures represent completely built +versions of the data structure on each level, and will be either full +($2^i$ records) or empty. If $Oldest$ is empty, then so is $Older$, +and if $Older$ is empty, then so is $Old$. The fourth structure, +$New$, represents a partially built structure on the level. A record +in the structure will be present in exactly one old structure, and may +additionally appear in a new structure as well. + +When inserting into this structure, the algorithm first examines every +level, $i$. If both $Older_{i-1}$ and $Oldest_{i-1}$ are full, then the +algorithm will execute $\frac{B(2^i)}{2^i}$ steps of the algorithm +to construction $New_i$ from $\text{unbuild}(Older_{i-1}) \cup +\text{unbuild}(Oldest_{i-1})$. Once enough inserts have been performed +to completely build some block, $New_i$, the source blocks for the +reconstruction, $Oldest_{i-1}$ and $Older_{i-1}$ are deleted, $Old_{i-1}$ +becomes $Oldest_{i-1}$, and $New_i$ is assigned to the oldest empty block +on level $i$. + +This approach means that, in the worst case, partial reconstructions will +be executed on every level in the structure, resulting in +\begin{equation*} + I(n) \in \Theta\left(\sum_{i=0}^{\log_2 n-1} \frac{B(2^i)}{2^i}\right) \in \Theta\left(\log_2 n \frac{B(n)}{n}\right) +\end{equation*} +time. Additionally, if $B(n) \in \Omega(n^{1 + \epsilon})$ for $\epsilon +> 0$, then the bottom level dominates the reconstruction cost, and the +worst-case bound drops to $I(n) \in \Theta(\frac{B(n)}{n})$. \section{Limitations of Classical Dynamization Techniques} \label{sec:bsm-limits} @@ -1062,30 +1108,113 @@ the partitions. \end{example} -The problem is that the number of samples drawn from each partition needs to be -weighted based on the number of elements satisfying the query predicate in that -partition. In the above example, by drawing $4$ samples from $D_1$, more weight -is given to $3$ than exists within the base dataset. This can be worked around -by sampling a full $k$ records from each partition, returning both the sample -and the number of records satisfying the predicate as that partition's query -result, and then performing another pass of IRS as the merge operator, but this -is the same approach as was used for k-NN above. This leaves IRS firmly in the -$C(n)$-decomposable camp. If it were possible to pre-calculate the number of -samples to draw from each partition, then a constant-time merge operation could -be used. +The problem is that the number of samples drawn from each partition +needs to be weighted based on the number of elements satisfying the +query predicate in that partition. In the above example, by drawing $4$ +samples from $D_1$, more weight is given to $3$ than exists within +the base dataset. This can be worked around by sampling a full $k$ +records from each partition, returning both the sample and the number +of records satisfying the predicate as that partition's query result, +and then performing another pass of IRS as the merge operator, but this +is the same approach as was used for k-NN above. This leaves IRS firmly +in the $C(n)$-decomposable camp. If it were possible to pre-calculate +the number of samples to draw from each partition, then a constant-time +merge operation could be used. + +We examine this problem in detail in Chapters~\ref{chap:sampling} and +\ref{chap:framework} and propose techniques for efficiently expanding +support of dynamization systems to non-decomposable search problems, as +well as addressing some additional difficulties introduced by supporting +deletes, which can complicate query processing. \subsection{Configurability} +Amortized global reconstruction is built upon a fundamental trade-off +between insertion and query performance, that is governed by the number +of blocks a structure is decomposed into. The equal block method attempts +to address this by directly exposing $f(n)$, the number of blocks, as +a configuration parameter. However, this technique suffers from poor +insertion performance~\cite{overmars83} compared to the Bentley-Saxe +method, owing to the larger average reconstruction size required by +the fact that the blocks are of equal size. In fact, we'll show in +Chapter~\ref{chap:tail-latency} that the equal block method is strictly +worse than Bentley-Saxe in experimental conditions for a given query +latency in the trade-off space. There is a theoretical technique that +attempts to address this limitation by nesting the Bentley-Saxe method +inside of the equal block method, called the \emph{mixed method}, that +has appeared in the theoretical literature~\cite{overmars83}. But this +technique is clunky, and doesn't provide the user with a meaningful design +space for configuring the system beyond specifying arbitrary functions. + +The reason for this lack of simple configurability in existing +dynamization literature seems to stem from the theoretical nature of +the work. Many ``obvious'' options for tweaking the method, such as +changing the rate at which levels grow, adding buffering, etc., result in +constant-factor trade-offs, and thus are not relevant to the asymptotic +bounds that these works are concerned with. It's worth noting that some +works based on \emph{applying} the Bentley-Saxe method introduce some +form of configurability~\cite{pgm,almodaresi23}, usually inspired by +the design space of LSM trees~\cite{oneil96}, but the full consequences +of this parametrization in the context of dynamization have, to the +best of our knowledge, not been explored. We will discuss this topic +in Chapter~\ref{chap:design-space} + \subsection{Insertion Tail Latency} \label{ssec:bsm-tail-latency-problem} +One of the largest problems associated with classical dynamization +techniques is the poor worst-case insertion performance. This +results in massive insertion tail latencies. Unfortunately, solving +this problem within the Bentley-Saxe method itself is not a trivial +undertaking. Maintaining the strict binary decomposition of the structure, +as Bentley-Saxe does, ensures that any given reconstruction cannot be +performed in advance, as it requires access to all the records in the +structure in the worst case. This limits the ability to use parallelism +to hide the latencies. + +The worst-case optimized approach proposed by Overmars and von Leeuwen +abandons the binary decomposition of the Bentley-Saxe method, and is thus +able to provide an approach for limiting this worst-case insertion bound, +but it has a number of serious problems, + +\begin{enumerate} + \item It assumes that the reconstruction process for a data structure + can be divided \textit{a priori} into a small number of independent + operations that can be executed in batches during each insert. It + is not always possible to do this efficiently, particularly for + structures whose construction involve multiple stages (e.g., + a sorting phase followed by a recursive node construction phase, + like in a B+Tree) with non-trivially predictable operation counts. + + \item Even if the reconstruction process can be efficiently + sub-divided, implementing the technique requires \emph{significant} + and highly specialized modification of the construction procedures + for a data structure, and tight integration of these procedures into + the insertion process as a whole. This makes it poorly suited for use + in a generalized framework of the sort we are attempting to create. + +\end{enumerate} + +We tackle the problem of insertion tail latency in +Chapter~\ref{chap:tail-latency} and propose a new system which +resolves these difficulties and allows for significant improvements in +insertion tail latency without seriously degrading the other performance +characteristics of the dynamized structure. \section{Conclusion} -This chapter discussed the necessary background information pertaining to -queries and search problems, indexes, and techniques for dynamic extension. It -described the potential for using custom indexes for accelerating particular -kinds of queries, as well as the challenges associated with constructing these -indexes. The remainder of this document will seek to address these challenges -through modification and extension of the Bentley-Saxe method, describing work -that has already been completed, as well as the additional work that must be -done to realize this vision. + +In this chapter, we introduced the concept of a search problem, and +showed how amortized global reconstruction can be used to dynamize data +structures associated with search problems having certain properties. We +examined several theoretical approaches for dynamization, including the +equal block method, the Bentley-Saxe method, and a worst-case insertion +optimized approach. Additionally, we considered several more classes of +search problem, and saw how additional properties could be used to enable +more efficient reconstruction, and support for efficiently deleting +records from the structure. Ultimately, however, these techniques +have several deficiencies that must be overcome before a practical, +general, system can be built upon them. Namely, they lack support for +several important types of search problem, particularly if deletes are +required, they are not easily configurable by the user, and they suffer +from poor insertion tail latency. The rest of this work will be dedicated +to approaches to resolve these deficiencies. -- cgit v1.2.3