Updates

author: Douglas Rumbaugh <dbr4@psu.edu> 2025-05-13 17:29:40 -0400
committer: Douglas Rumbaugh <dbr4@psu.edu> 2025-05-13 17:29:40 -0400
commit: 40bff24fc2e2da57f382e4f49a5ffb7c826bbcfb (patch)
tree: c00441b058255de08a32d227ce7af46bf11d8eb8 /chapters/background.tex
parent: 5ffc53e69e956054fdefd1fe193e00eee705dcab (diff)
download: dissertation-40bff24fc2e2da57f382e4f49a5ffb7c826bbcfb.tar.gz
1 files changed, 374 insertions, 43 deletions
diff --git a/chapters/background.tex b/chapters/background.tex
index 69436c8..8ad92a8 100644
--- a/chapters/background.tex
+++ b/chapters/background.tex
@@ -95,7 +95,7 @@ and later work by Overmars lifted this constraint and considered a more
 general class of search problems called \emph{$C(n)$-decomposable search
 problems},
 
-\begin{definition}[$C(n)$-decomposable Search Problem~\cite{overmars83}]
+\begin{definition}[$C(n)$-decomposable Search Problem~\cite{overmars-cn-decomp}]
     A search problem $F: (\mathcal{D}, \mathcal{Q}) \to \mathcal{R}$ is $C(n)$-decomposable
     if and only if there exists an $O(C(n))$-time computable, associative,
     and commutative binary operator $\mergeop$ such that,
@@ -113,12 +113,14 @@ decomposable even in cases with more than two partial results.
 
 As an example, consider  range scans,
 \begin{definition}[Range Count]
+    \label{def:range-count}
     Let $d$ be a set of $n$ points in $\mathbb{R}$. Given an interval,
     $ q = [x, y],\quad x,y \in \mathbb{R}$, a range count returns
     the cardinality, $|d \cap q|$.
 \end{definition}
 
 \begin{theorem}
+\label{ther:decomp-range-count}
 Range Count is a decomposable search problem.
 \end{theorem}
 
@@ -130,7 +132,7 @@ Definition~\ref{def:dsp}, gives
 \end{align*}
 which is true by the distributive property of union and
 intersection. Addition is an associative and commutative
-operator that can be calculated in $O(1)$ time. Therefore, range counts
+operator that can be calculated in $\Theta(1)$ time. Therefore, range counts
 are DSPs.
 \end{proof}
 
@@ -376,15 +378,18 @@ database indices. We refer to a data structure with update support as
     contain header information (like visibility) that is updated in place.
 }
 
-This section discusses \emph{dynamization}, the construction of a dynamic
-data structure based on an existing static one. When certain conditions
-are satisfied by the data structure and its associated search problem,
-this process can be done automatically, and with provable asymptotic
-bounds on amortized insertion performance, as well as worst case query
-performance. We will first discuss the necessary data structure
-requirements, and then examine several classical dynamization techniques.
-The section will conclude with a discussion of delete support within the
-context of these techniques.
+This section discusses \emph{dynamization}, the construction of a
+dynamic data structure based on an existing static one. When certain
+conditions are satisfied by the data structure and its associated
+search problem, this process can be done automatically, and with
+provable asymptotic bounds on amortized insertion performance, as well
+as worst case query performance. This is in contrast to the manual
+design of dynamic data structures, which involve techniques based on
+partially rebuilding small portions of a single data structure (called
+\emph{local reconstruction})~\cite{overmars83}.  This is a very high cost
+intervention that requires significant effort on the part of the data
+structure designer, whereas conventional dynamization can be performed
+with little-to-no modification of the underlying data structure at all.
 
 It is worth noting that there are a variety of techniques
 discussed in the literature for dynamizing structures with specific
@@ -395,6 +400,18 @@ of insert and query operations~\cite{batched-decomposable}. This
 section discusses techniques that are more general, and don't require
 workload-specific assumptions.
 
+We will first discuss the necessary data structure requirements, and
+then examine several classical dynamization techniques.  The section
+will conclude with a discussion of delete support within the context
+of these techniques. For more detail than is included in this chapter,
+Overmars wrote a book providing a comprehensive survey of techniques for
+creating dynamic data structures, including not only the dynamization
+techniques discussed here, but also local reconstruction based
+techniques and more~\cite{overmars83}.\footnote{
+    Sadly, this book isn't readily available in
+    digital format as of the time of writing.
+}
+
 
 \subsection{Global Reconstruction}
 
@@ -412,7 +429,7 @@ possible if $\mathcal{I}$ supports the following two operations,
 \end{align*}
 where $\mathtt{build}$ constructs an instance $\mathscr{I}\in\mathcal{I}$
 over the data structure over a set of records $d \subseteq \mathcal{D}$
-in $C(|d|)$ time, and $\mathtt{unbuild}$ returns the set of records $d
+in $B(|d|)$ time, and $\mathtt{unbuild}$ returns the set of records $d
 \subseteq \mathcal{D}$ used to construct $\mathscr{I} \in \mathcal{I}$ in
 $\Theta(1)$ time,\footnote{
     There isn't any practical reason why $\mathtt{unbuild}$ must run
@@ -428,7 +445,7 @@ data structure $\mathscr{I} \in \mathcal{I}$ can be defined by,
 \end{align*}
 
 It goes without saying that this operation is sub-optimal, as the
-insertion cost is $\Theta(C(n))$, and $C(n) \in \Omega(n)$ at best for
+insertion cost is $\Theta(B(n))$, and $B(n) \in \Omega(n)$ at best for
 most data structures. However, this global reconstruction strategy can
 be used as a primitive for more sophisticated techniques that can provide
 reasonable performance.
@@ -438,7 +455,7 @@ reasonable performance.
 
 The problem with global reconstruction is that each insert must rebuild
 the entire data structure, involving all of its records. This results
-in a worst-case insert cost of $\Theta(C(n))$. However, opportunities
+in a worst-case insert cost of $\Theta(B(n))$. However, opportunities
 for improving this scheme can present themselves when considering the
 \emph{amortized} insertion cost.
 
@@ -446,11 +463,11 @@ Consider the cost accrued by the dynamized structure under global
 reconstruction over the lifetime of the structure. Each insert will result
 in all of the existing records being rewritten, so at worst each record
 will be involved in $\Theta(n)$ reconstructions, each reconstruction
-having $\Theta(C(n))$ cost. We can amortize this cost over the $n$ records
+having $\Theta(B(n))$ cost. We can amortize this cost over the $n$ records
 inserted to get an amortized insertion cost for global reconstruction of,
 
 \begin{equation*}
-I_a(n) = \frac{C(n) \cdot n}{n} = C(n)
+I_a(n) = \frac{B(n) \cdot n}{n} = B(n)
 \end{equation*}
 
 This doesn't improve things as is, however it does present two
@@ -459,9 +476,9 @@ the reconstructions, or the number of times a record is reconstructed,
 then we could reduce the amortized insertion cost.
 
 The key insight, first discussed by Bentley and Saxe, is that
-this goal can be accomplished by \emph{decomposing} the data
-structure into multiple, smaller structures, each built from a
-disjoint partition of the data. As long as the search problem
+both of these goals can be accomplished by \emph{decomposing} the
+data structure into multiple, smaller structures, each built from
+a disjoint partition of the data. As long as the search problem
 being considered is decomposable, queries can be answered from
 this structure with bounded worst-case overhead, and the amortized
 insertion cost can be improved~\cite{saxe79}. Significant theoretical
@@ -470,21 +487,34 @@ data structure~\cite{saxe79, overmars81, overmars83} and for leveraging
 specific efficiencies of the data structures being considered to improve
 these reconstructions~\cite{merge-dsp}.
 
-There are two general decomposition techniques that emerged from this
-work. The earliest of these is the logarithmic method, often called
-the Bentley-Saxe method in modern literature, and is the most commonly
-discussed technique today. A later technique, the equal block method,
-was also examined. It is generally not as effective as the Bentley-Saxe
-method, but it has some useful properties for explanatory purposes and
-so will be discussed here as well.
-
-\subsection{Equal Block Method~\cite[pp.~96-100]{overmars83}}
+There are two general decomposition techniques that emerged from
+this work. The earliest of these is the logarithmic method, often
+called the Bentley-Saxe method in modern literature, and is the most
+commonly discussed technique today. The Bentley-Saxe method has been
+directly applied in a few instances in the literature, such as to
+metric indexing structures~\cite{naidan14} and spatial structures~\cite{bkdtree},
+and has also been used in a modified form for genetic sequence search
+structures~\cite{almodaresi23} and graphs~\cite{lsmgraph}, to cite a few
+examples.
+
+A later technique, the equal block method, was also developed. It is
+generally not as effective as the Bentley-Saxe method, and as a result we
+have not identified any specific applications of this technique outside
+of the theoretical literature, however we will discuss it as well in
+the interest of completeness, and because it does lend itself well to
+demonstrating certain properties of decomposition-based dynamization
+techniques.
+
+\subsection{Equal Block Method}
 \label{ssec:ebm}
 
 Though chronologically later, the equal block method is theoretically a
 bit simpler, and so we will begin our discussion of decomposition-based
-technique for dynamization of decomposable search problems with it. The
-core concept of the equal block method is to decompose the data structure
+technique for dynamization of decomposable search problems with it. There
+have been several proposed variations of this concept~\cite{maurer79,
+maurer80}, but we will focus on the most developed form as described by
+Overmars and von Leeuwan~\cite{overmars-art-of-dyn, overmars83}. The core
+concept of the equal block method is to decompose the data structure
 into several smaller data structures, called blocks, over partitions
 of the data. This decomposition is performed such that each block is of
 roughly equal size.
@@ -499,7 +529,7 @@ to be governed by a smooth, monotonically increasing function $f(n)$ such
 that, at any point, the following two constraints are obeyed.
 \begin{align}
     f\left(\frac{n}{2}\right) \leq s \leq f(2n) \label{ebm-c1}\\
-    \forall_{1 \leq j \leq s} \quad | \mathscr{I}_j |  \leq \frac{2n}{i} \label{ebm-c2}
+    \forall_{1 \leq j \leq s} \quad | \mathscr{I}_j |  \leq \frac{2n}{s} \label{ebm-c2}
 \end{align}
 where $|\mathscr{I}_j|$ is the number of records in the block,
 $|\text{unbuild}(\mathscr{I}_j)|$.
@@ -528,16 +558,16 @@ where $F(\mathscr{I}, q)$ is a slight abuse of notation, referring to
 answering the query over $d$ using the data structure $\mathscr{I}$.
 
 This technique provides better amortized performance bounds than global
-reconstruction, at the possible cost of increased query performance for
+reconstruction, at the possible cost of worse query performance for
 sub-linear queries. We'll omit the details of the proof of performance
 for brevity and streamline some of the original notation (full details
 can be found in~\cite{overmars83}), but this technique ultimately
 results in a data structure with the following performance characteristics,
 \begin{align*}
-\text{Amortized Insertion Cost:}&\quad \Theta\left(\frac{C(n)}{n} + C\left(\frac{n}{f(n)}\right)\right) \\
+\text{Amortized Insertion Cost:}&\quad \Theta\left(\frac{B(n)}{n} + B\left(\frac{n}{f(n)}\right)\right) \\
 \text{Worst-case Query Cost:}& \quad \Theta\left(f(n) \cdot \mathscr{Q}\left(\frac{n}{f(n)}\right)\right) \\
 \end{align*}
-where $C(n)$ is the cost of statically building $\mathcal{I}$, and
+where $B(n)$ is the cost of statically building $\mathcal{I}$, and
 $\mathscr{Q}(n)$ is the cost of answering $F$ using $\mathcal{I}$.
 
 %TODO: example?
@@ -599,18 +629,35 @@ structure in the same way that incrementing the binary number by $1$ does.
 By applying BSM to a data structure, a dynamized structure can be created
 with the following performance characteristics,
 \begin{align*}
-\text{Amortized Insertion Cost:}&\quad \Theta\left(\left(\frac{C(n)}{n}\cdot \log_2 n\right)\right) \\
-\text{Worst Case Insertion Cost:}&\quad \Theta\left(C(n)\right) \\
+\text{Amortized Insertion Cost:}&\quad \Theta\left(\left(\frac{B(n)}{n}\cdot \log_2 n\right)\right) \\
+\text{Worst Case Insertion Cost:}&\quad \Theta\left(B(n)\right) \\
 \text{Worst-case Query Cost:}& \quad \Theta\left(\log_2 n\cdot \mathscr{Q}\left(n\right)\right) \\
 \end{align*}
 This is a particularly attractive result because, for example, a data
-structure having $C(n) \in \Theta(n)$ will have an amortized insertion
-cost of $\log_2 (n)$, which is quite reasonable. The cost is an extra
-logarithmic multiple attached to the query complexity. It is also worth
-noting that the worst-case insertion cost remains the same as global
-reconstruction, but this case arises only very rarely. If you consider the
-binary decomposition representation, the worst-case behavior is triggered
-each time the existing number overflows, and a new digit must be added.
+structure having $B(n) \in \Theta(n)$ will have an amortized insertion
+cost of $\log_2 (n)$, which is quite reasonable. The trade-off for this
+is an extra logarithmic multiple attached to the query complexity. It is
+also worth noting that the worst-case insertion cost remains the same
+as global reconstruction, but this case arises only very rarely. If
+you consider the binary decomposition representation, the worst-case
+behavior is triggered each time the existing number overflows, and a
+new digit must be added.
+
+As a final note about the query performance of this structure, because
+the overhead due to querying the blocks is logarithmic, under certain
+circumstances this cost can be absorbed, resulting in no effect on the
+asymptotic worst-case query performance. As an example, consider a linear
+scan of the data running in $\Theta(n)$ time. In this case, every record
+must be considered, and so there isn't any performance penalty\footnote{
+  From an asymptotic perspective. There will still be measurable performance
+  effects from caching, etc., even in this case.
+} to breaking the records out into multiple chunks and scanning them
+individually. For formally, for any query running in $\mathscr{Q}(n) \in
+\Omega\left(n^\epsilon\right)$ time where $\epsilon > 0$, the worst-case
+cost of answering a decomposable search problem from a BSM dynamization
+is $\Theta\left(\mathscr{Q}(n)\right)$.~\cite{saxe79}
+
+\subsection{Merge Decomposable Search Problems}
 
 \subsection{Delete Support}
 
@@ -651,6 +698,290 @@ This presents several problems,
           require additional work to fix.
 \end{itemize}
 
+To resolve these difficulties, two very different approaches have been
+proposed for supporting deletes, each of which rely on certain properties
+of the search problem and data structure. These are the use of a ghost
+structure and weak deletes.
+
+\subsubsection{Ghost Structure for Invertible Search Problems}
+
+The first proposed mechanism for supporting deletes was discussed
+alongside the Bentley-Saxe method in Bentley and Saxe's original
+paper. This technique applies to a class of search problems called
+\emph{invertible} (also called \emph{decomposable counting problems}
+in later literature~\cite{overmars83}). Invertible search problems
+are decomposable, and also support an ``inverse'' merge operator, $\Delta$,
+that is able to remove records from the result set. More formally,
+\begin{definition}[Invertible Search Problem~\cite{saxe79}]
+\label{def:invert}
+A decomposable search problem, $F$ is invertible if and only if there
+exists a constant time computable operator, $\Delta$, such that
+\begin{equation*}
+F(A / B, q) = F(A, q)~\Delta~F(B, q)
+\end{equation*}
+for all $A, B \in \mathcal{PS}(\mathcal{D})$ where $A \cap B = \emptyset$.
+\end{definition}
+
+Given a search problem with this property, it is possible to perform
+deletes by creating a secondary ``ghost'' structure. When a record
+is to be deleted, it is inserted into this structure. Then, when the
+dynamization is queried, this ghost structure is queried as well as the
+main one. The results from the ghost structure can be removed from the
+result set using the inverse merge operator. This simulates the result
+that would have been obtained had the records been physically removed
+from the main structure.
+
+Two examples of invertible search problems are set membership
+and range count. Range count was formally defined in
+Definition~\ref{def:range-count}.
+
+\begin{theorem}
+Range count is an invertible search problem.
+\end{theorem}
+
+\begin{proof}
+To prove that range count is an invertible search problem, it must be
+decomposable and have a $\Delta$ operator. That it is a DSP has already
+been proven in Theorem~\ref{ther:decomp-range-count}.
+
+Let $\Delta$ be subtraction $(-)$. Applying this to Definition~\ref{def:invert}
+gives,
+\begin{equation*}
+|(A / B) \cap q | = |(A \cap q) / (B \cap q)| = |(A \cap q)| - |(B \cap q)|
+\end{equation*}
+which is true by the distributive property of set difference and
+intersection. Subtraction is computable in constant time, therefore
+range count is an invertible search problem using subtraction as $\Delta$.
+\end{proof}
+
+The set membership search problem is defined as follows,
+\begin{definition}[Set Membership]
+\label{def:set-membership}
+Consider a set of elements $d \subseteq \mathcal{D}$ from some domain,
+and a single element $r \in \mathcal{D}$. A test of set membership is a
+search problem of the form $F: (\mathcal{PS}(\mathcal{D}), \mathcal{D})
+\to \mathbb{B}$ such that $F(d, r) = r \in d$, which maps to $0$ if $r
+\not\in d$ and $1$ if $r \in d$.
+\end{definition}
+
+\begin{theorem}
+Set membership is an invertible search problem.
+\end{theorem}
+
+\begin{proof}
+To prove that set membership is invertible, it is necessary to establish
+that it is a decomposable search problem, and that a $\Delta$ operator
+exists. We'll begin with the former.
+\begin{lemma}
+    \label{lem:set-memb-dsp}
+    Set membership is a decomposable search problem.
+\end{lemma}
+\begin{proof}
+Let $\mergeop$ be the logical disjunction ($\lor$). This yields,
+\begin{align*}
+F(A \cup B, r) &= F(A, r) \lor F(B, r) \\
+r \in (A \cup B) &= (r \in A) \lor (r \in B)
+\end{align*}
+which is true, following directly from the definition of union. The
+logical disjunction is an associative, commutative operator that can
+be calculated in $\Theta(1)$ time. Therefore, set membership is a
+decomposable search problem.
+\end{proof}
+
+For the inverse merge operator, $\Delta$, it is necessary that $F(A,
+r) ~\Delta~F(B, r)$ be true \emph{only} if $r \in A$ and $r \not\in
+B$. Thus, it could be directly implemented as $F(A, r)~\Delta~F(B, r) =
+F(A, r) \land \neg F(B, r)$, which is constant time if
+the operands are already known.
+
+Thus, we have shown that set membership is a decomposable search problem,
+and that a constant time $\Delta$ operator exists. Therefore, it is an
+invertible search problem.
+\end{proof}
+
+For search problems such as these, this technique allows for deletes to be
+supported with the same cost as an insert. Unfortunately, it suffers from
+write amplification because each deleted record is recorded twice--one in
+the main structure, and once in the ghost structure. This means that $n$
+is, in effect, the total number of records and deletes. This can lead
+to some serious problems, for example if every record in a structure
+of $n$ records is deleted, the net result will be an "empty" dynamized
+data structure containing $2n$ physical records within it. To circumvent
+this problem, Bentley and Saxe proposed a mechanism of setting a maximum
+threshold for the size of the ghost structure relative to the main one,
+and performing a complete re-partitioning of the data once this threshold
+is reached, removing all deleted records from the main structure,
+emptying the ghost structure, and rebuilding blocks with the records
+that remain according to the invariants of the technique.
+
+\subsubsection{Weak Deletes for Deletion Decomposable Search Problems}
+
+Another approach for supporting deletes was proposed later, by Overmars
+and van Leeuwen, for a class of search problem called \emph{deletion
+decomposable}. These are decomposable search problems for which the
+underlying data structure supports a delete operation. More formally,
+
+\begin{definition}[Deletion Decomposable Search Problem~\cite{merge-dsp}]
+    A decomposable search problem, $F$, and its data structure,
+    $\mathcal{I}$, is deletion decomposable if and only if, for some
+    instance $\mathscr{I} \in \mathcal{I}$, containing $n$ records,
+    there exists a deletion routine $\mathtt{delete}(\mathscr{I},
+    r)$ that removes some $r \in \mathcal{D}$ in time $D(n)$ without
+    increasing the query time, deletion time, or storage requirement,
+    for $\mathscr{I}$.
+\end{definition}
+
+Superficially, this doesn't appear very useful. If the underlying data
+structure already supports deletes, there isn't much reason to use a
+dynamization technique to add deletes to it. However, one point worth
+mentioning is that it is possible, in many cases, to easily \emph{add}
+delete support to a static structure. If it is possible to locate a
+record and somehow mark it as deleted, without removing it from the
+structure, and then efficiently ignore these records while querying,
+then the given structure and its search problem can be said to be
+deletion decomposable. This technique for deleting records is called
+\emph{weak deletes}.
+
+\begin{definition}[Weak Deletes~\cite{overmars81}]
+\label{def:weak-delete}
+A data structure is said to support weak deletes if it provides a
+routine, \texttt{delete}, that guarantees that after $\alpha \cdot n$
+deletions, where $\alpha < 1$, the query cost is bounded by $k_\alpha
+\mathscr{Q}(n)$ for some constant $k_\alpha$ dependent only upon $\alpha$,
+where $\mathscr{Q}(n)$ is the cost of answering the query against a
+structure upon which no weak deletes were performed.\footnote{
+    This paper also provides a similar definition for weak updates,
+    but these aren't of interest to us in this work, and so the above
+    definition was adapted from the original with the weak update
+    constraints removed.
+} The results of the query of a block containing weakly deleted records
+should be the same as the results would be against a block with those
+records removed.
+\end{definition}
+
+As an example of a deletion decomposable search problem, consider the set
+membership problem considered above (Definition~\ref{def:set-membership})
+where $\mathcal{I}$, the data structure used to answer queries of the
+search problem, is a hash map.\footnote{
+  While most hash maps are already dynamic, and so wouldn't need
+  dynamization to be applied, there do exist static ones too. For example,
+  the hash map being considered could be implemented using perfect
+  hashing~\cite{perfect-hashing}, which has many static implementations.
+}
+
+\begin{theorem}
+ The set membership problem, answered using a static hash map, is
+ deletion decomposable.
+\end{theorem}
+
+\begin{proof}
+We've already shown in Lemma~\ref{lem:set-memb-dsp} that set membership
+is a decomposable search problem. For it to be deletion decomposable,
+we must demonstrate that the hash map, $\mathcal{I}$, supports deleting
+records without hurting its query performance, delete performance, or
+storage requirements. Assume that an instance $\mathscr{I} \in
+\mathcal{I}$ having $|\mathscr{I}| = n$ can answer queries in
+$\mathscr{Q}(n) \in \Theta(1)$ time and requires $\Omega(n)$ storage.
+
+Such a structure can support weak deletes. Each record within the
+structure has a single bit attached to it, indicating whether it has
+been deleted or not. These bits will require $\Theta(n)$ storage and
+be initialized to 0 when the structure is constructed. A delete can
+be performed by querying the structure for the record to be deleted in
+$\Theta(1)$ time, and setting the bit to 1 if the record is found. This
+operation has $D(n) \in \Theta(1)$ cost.
+
+\begin{lemma}
+\label{lem:weak-deletes}
+The delete procedure as described above satisfies the requirements of
+Definition~\ref{def:weak-delete} for weak deletes.
+\end{lemma}
+\begin{proof}
+Per Definition~\ref{def:weak-delete}, there must exist some constant
+dependent only on $\alpha$, $k_\alpha$, such that after $\alpha \cdot
+n$ deletes against $\mathscr{I}$ with $\alpha < 1$, the query cost is
+bounded by $\Theta(\alpha \mathscr{Q}(n))$.
+
+In this case, $\mathscr{Q}(n) \in \Theta(1)$, and therefore our final
+query cost must be bounded by $\Theta(k_\alpha)$. When a query is
+executed against $\mathscr{I}$, there are three possible cases,
+\begin{enumerate}
+\item The record being searched for does not exist in $\mathscr{I}$. In
+this case, the query result is 0.
+\item The record being searched for does exist in $\mathscr{I}$  and has
+a delete bit value of 0. In this case, the query result is 1.
+\item The record being searched for does exist in $\mathscr{I}$ and has
+a delete bit value of 1 (i.e., it has been deleted). In this case, the
+query result is 0.
+\end{enumerate}
+In all three cases, the addition of deletes requires only $\Theta(1)$
+extra work at most. Therefore, set membership over a static hash map
+using our proposed deletion mechanism satisfies the requirements for
+weak deletes, with $k_\alpha = 1$.
+\end{proof}
+
+Finally, we note that the cost of one of these weak deletes is $D(n)
+= \mathscr{Q}(n)$. By Lemma~\ref{lem:weak-deletes}, the delete cost is
+not asymptotically harmed by deleting records.
+
+Thus, we've shown that set membership using a static hash map is a
+decomposable search problem, the storage cost remains $\Omega(n)$ and the
+query and delete costs are unaffected by the presence of deletes using the
+proposed mechanism. All of the requirements of deletion decomposability
+are satisfied, therefore set membership using a static hash map is a
+deletion decomposable search problem.
+\end{proof}
+
+For such problems, deletes can be supported by first identifying the
+block in the dynamization containing the record to be deleted, and
+then calling $\mathtt{delete}$ on it. In order to allow this block to
+be easily located, it is possible to maintain a hash table over all
+of the records, alongside the dynamization, which maps each record
+onto the block containing it. This table must be kept up to date as
+reconstructions occur, but this can be done at no extra asymptotic costs
+for any data structures having $B(n) \in \Omega(n)$, as it requires only
+linear time. This allows for deletes to be performed in $\mathscr{D}(n)
+\in \Theta(D(n))$ time.
+
+The presence of deleted records within the structure does introduce a
+new problem, however. Over time, the number of records in each block will
+drift away from the requirements imposed by the dynamization technique. It
+will eventually become necessary to re-partition the records to restore
+these invariants, which are necessary for bounding the number of blocks,
+and thereby the query performance. The particular invariant maintenance
+rules depend upon the decomposition scheme used.
+
+\Paragraph{Bentley-Saxe Method.} When creating a BSM dynamization for
+a deletion decomposable search problem, the $i$th block where $i \geq 2$\footnote{
+ Block $i=0$ will only ever have one record, so no special maintenance must be
+ done for it. A delete will simply empty it completely.
+},
+in the absence of deletes, will contain $2^{i-1} + 1$ records. When a
+delete occurs in block $i$, no special action is taken until the number
+of records in that block falls below $2^{i-2}$. Once this threshold is
+reached, a reconstruction can be performed to restore the appropriate
+record counts in each block.~\cite{merge-dsp}
+
+\Paragraph{Equal Block Method.} For the equal block method, there are
+two cases in which a delete may cause a block to fail to obey the method's
+size invariants,
+\begin{enumerate}
+    \item If enough records are deleted, it is possible for the number
+    of blocks to exceed $f(2n)$, violating Invariant~\ref{ebm-c1}.
+    \item The deletion of records may cause the maximum size of each
+    block to shrink, causing some blocks to exceed the maximum capacity
+    of $\nicefrac{2n}{s}$. This is a violation of Invariant~\ref{ebm-c2}.
+\end{enumerate}
+In both cases, it should be noted that $n$ is decreased as records are
+deleted. Should either of these cases emerge as a result of a delete,
+the entire structure must be reconfigured to ensure that its invariants
+are maintained. This reconfiguration follows the same procedure as when
+an insert results in a violation: $s$ is updated to be exactly $f(n)$, all
+existing blocks are unbuilt, and then the records are evenly redistributed
+into the $s$ blocks.~\cite{overmars-art-of-dyn}
+
+
+\subsection{Worst-Case Optimal Techniques}
 
 
 \section{Limitations of Classical Dynamization Techniques}
author	Douglas Rumbaugh <dbr4@psu.edu>	2025-05-13 17:29:40 -0400
committer	Douglas Rumbaugh <dbr4@psu.edu>	2025-05-13 17:29:40 -0400
commit	40bff24fc2e2da57f382e4f49a5ffb7c826bbcfb (patch)
tree	c00441b058255de08a32d227ce7af46bf11d8eb8 /chapters/background.tex
parent	5ffc53e69e956054fdefd1fe193e00eee705dcab (diff)
download	dissertation-40bff24fc2e2da57f382e4f49a5ffb7c826bbcfb.tar.gz