updates

author: Douglas Rumbaugh <dbr4@psu.edu> 2025-06-27 18:10:23 -0400
committer: Douglas Rumbaugh <dbr4@psu.edu> 2025-06-27 18:10:23 -0400
commit: 692e6185988fde5e20b883ac3d9d8f0847d96958 (patch)
tree: dcaf5013ba4eff22877f8cf06de4387882c3e627 /chapters/dynamization.tex
parent: fcdbcbcd45dc567792429bb314df53b42ed9f22e (diff)
download: dissertation-692e6185988fde5e20b883ac3d9d8f0847d96958.tar.gz
1 files changed, 155 insertions, 142 deletions
diff --git a/chapters/dynamization.tex b/chapters/dynamization.tex
index 085ce65..738a436 100644
--- a/chapters/dynamization.tex
+++ b/chapters/dynamization.tex
@@ -115,16 +115,21 @@ data structures must support the following three operations,
 \item $\mathbftt{query}: \left(\mathcal{I}, \mathcal{Q}\right) \to \mathcal{R}$ \\
     $\mathbftt{query}(\mathscr{I}, q)$ answers the query
     $F(\mathscr{I}, q)$ and returns the result.  This operation runs
-    in $\mathscr{Q}_S(n)$ time in the worst-case and \emph{cannot alter
-    the state of $\mathscr{I}$}.
+    in $\mathscr{Q}_S(n)$ time in the worst-case and cannot alter
+    the state of $\mathscr{I}$.
     
-\item $\mathbftt{build}:\left(\mathcal{PS}(\mathcal{D})\right) \to \mathcal{I}$ \\
+\item $\mathbftt{build}:\mathcal{PS}(\mathcal{D}) \to \mathcal{I}$ \\
     $\mathbftt{build}(d)$ constructs a new instance of $\mathcal{I}$
     using the records in set $d$. This operation runs in $B(n)$ time in
-    the worst case.
-
-\item $\mathbftt{unbuild}\left(\mathcal{I}\right) \to \mathcal{PS}(\mathcal{D})$ \\
-    $\mathbftt{unbuild}(\mathscr{I})$ recovers the set of records, $d$
+    the worst case.\footnote{
+        We use the notation $\mathcal{PS}(\mathcal{D})$ to indicate the
+        power set of $\mathcal{D}$, i.e. the set containing all possible
+        subsets of $\mathcal{D}$. Thus, $d \in \mathcal{PS}(\mathcal{D})
+        \iff d \subseteq \mathcal{D}$.
+    }
+
+\item $\mathbftt{unbuild}: \mathcal{I} \to \mathcal{PS}(\mathcal{D})$ \\
+    $\mathbftt{unbuild}(\mathscr{I})$ recovers the set of records, $d$,
     used to construct $\mathscr{I}$. The literature on dynamization
     generally assumes that this operation runs in $\Theta(1)$
     time~\cite{saxe79}, and we will adopt the same assumption in our
@@ -133,38 +138,41 @@ data structures must support the following three operations,
 
 \end{definition}
 
-Note that the term static is distinct from immutable. Static refers
-to the layout of records within the data structure, whereas immutable
-refers to the data stored within those records. This distinction will
-become relevant when we discuss different techniques for adding delete
-support to data structures.  The data structures used are always static,
-but not necessarily immutable, because the records may contain header
-information (like visibility) that is updated in place.
+Note that the property of being static is distinct from that of being
+immutable. Static refers to the layout of records within the data
+structure, whereas immutable refers to the data stored within those
+records. This distinction will become relevant when we discuss different
+techniques for adding delete support to data structures.  The data
+structures used are always static, but not necessarily immutable,
+because the records may contain header information (like visibility)
+that is updated in place.
 
 \begin{definition}[Half-dynamic Data Structure~\cite{overmars-art-of-dyn}]
 \label{def:half-dynamic-ds}
 A half-dynamic data structure requires the three operations of a static
-data structure, as well as the ability to efficiently insert new data into
-a structure built over an existing data set, $d$.
+data structure, as well as the ability to efficiently insert new data
+into a structure built over an existing data set.
 
 \begin{itemize}
 \item $\mathbftt{insert}: \left(\mathcal{I}, \mathcal{D}\right) \to \mathcal{I}$ \\
     $\mathbftt{insert}(\mathscr{I}, r)$ returns a data structure,
     $\mathscr{I}^\prime$, such that $\mathbftt{query}(\mathscr{I}^\prime,
-    q) = F(d \cup r, q)$, for some $r \in \mathcal{D}$. This operation
-    runs in $I(n)$ time in the worst-case.
+    q) = F(\mathbftt{unbuild}(\mathscr{I}) \cup \{r\}, q)$, for some
+    $r \in \mathcal{D}$.  This operation runs in $I(n)$ time in the
+    worst-case.
 \end{itemize}
 
 \end{definition}
 
 The important aspect of insertion in this model is that the effect of
-the new record on the query result is observed, not necessarily that
-the result is a structure exactly identical to the one that would be
-obtained by building a new structure over $d \cup r$. Also, though the
-formalism used implies a functional operation where the original data
-structure is unmodified, this is not actually a requirement. $\mathscr{I}$
-could be sightly modified in place, and returned as $\mathscr{I}^\prime$,
-as is conventionally done with native dynamic data structures.
+the new record on the query result is observed, not necessarily that the
+result is a structure exactly identical to the one that would be obtained
+by building a new structure over $\mathbftt{unbuild}(\mathscr{I}) \cup
+\{r\}$. Also, though the formalism used implies a functional operation
+where the original data structure is unmodified, this is not actually
+a requirement. $\mathscr{I}$ could be sightly modified in place, and
+returned as $\mathscr{I}^\prime$, as is conventionally done with native
+dynamic data structures.
 
 \begin{definition}[Full-dynamic Data Structure~\cite{overmars-art-of-dyn}]
 \label{def:full-dynamic-ds}
@@ -175,7 +183,7 @@ has support for deleting records from the dataset.
 \item $\mathbftt{delete}: \left(\mathcal{I}, \mathcal{D}\right) \to \mathcal{I}$ \\
     $\mathbftt{delete}(\mathscr{I}, r)$ returns a data structure, $\mathscr{I}^\prime$,
     such that  $\mathbftt{query}(\mathscr{I}^\prime,
-    q) = F(d - r, q)$, for some $r \in \mathcal{D}$. This operation
+    q) = F(\mathbftt{unbuild}(\mathscr{I}) - \{r\}, q)$, for some $r \in \mathcal{D}$. This operation
     runs in $D(n)$ time in the worst-case.
 \end{itemize}
 
@@ -199,7 +207,7 @@ data structures cannot be statically queried--the act of querying them
 mutates their state. This is the case for structures like heaps, stacks,
 and queues, for example.
 
-\section{Decomposition-based Dynamization}
+\section{Dynamization Basics}
 
 \emph{Dynamization} is the process of transforming a static data structure
 into a dynamic one.  When certain conditions are satisfied by the data
@@ -226,10 +234,10 @@ section discusses techniques that are more general, and don't require
 workload-specific assumptions.  For more detail than is included in
 this section, Overmars wrote a book providing a comprehensive survey of
 techniques for creating dynamic data structures, including not only the
-dynamization techniques discussed here, but also local reconstruction
-based techniques and more~\cite{overmars83}.\footnote{
-    Sadly, this book isn't readily available in
-    digital format as of the time of writing.
+techniques discussed here, but also local reconstruction based techniques
+and more~\cite{overmars83}.\footnote{
+    Sadly, this book isn't readily available in digital format as of
+    the time of writing.
 }
 
 
@@ -271,30 +279,34 @@ requiring $B(n)$ time, it will require $B(\sqrt{n})$ time.}
 \end{figure}
 
 The problem with global reconstruction is that each insert or delete
-must rebuild the entire data structure, involving all of its records. The
-key insight, first discussed by Bentley and Saxe~\cite{saxe79}, is that
-the cost associated with global reconstruction can be reduced by be
-accomplished by \emph{decomposing} the data structure into multiple,
-smaller structures, each built from a disjoint partition of the data.
-These smaller structures are called \emph{blocks}. It is possible to
-devise decomposition schemes that result in asymptotic improvements
-of insertion performance when compared to global reconstruction alone.
+must rebuild the entire data structure. The key insight that enables
+dynamization based on global reconstruction, first discussed by
+Bentley and Saxe~\cite{saxe79}, is that the cost associated with global
+reconstruction can be reduced by \emph{decomposing} the data structure
+into multiple, smaller structures, called \emph{blocks}, each built from
+a disjoint partition of the data. The process by which the structure is
+broken into blocks is called a decomposition method, and various methods
+have been proposed that result in asymptotic improvements of insertion
+performance when compared to global reconstruction alone. 
 
 \begin{example}[Data Structure Decomposition]
-Consider a data structure that can be constructed in $B(n) \in \Theta
-(n \log n)$ time with $|\mathscr{I}| = n$. Inserting a new record into
-this structure using global reconstruction will require $I(n) \in \Theta
-(n \log n)$ time. However, if the data structure is decomposed into
-blocks, such that each block contains $\Theta(\sqrt{n)})$ records, as shown
-in Figure~\ref{fig:bg-decomp}, then only a single block must be reconstructed
-to accommodate the insert, requiring $I(n) \in \Theta(\sqrt{n} \log \sqrt{n})$ time.
+Consider a data structure that can be constructed in $B(n) \in \Theta (n
+\log n)$ time with $|\mathscr{I}| = n$. Inserting a new record into this
+structure using global reconstruction will require $I(n) \in \Theta (n
+\log n)$ time. However, if the data structure is decomposed into blocks,
+such that each block contains $\Theta(\sqrt{n)})$ records, as shown in
+Figure~\ref{fig:bg-decomp}, then only a single block must be reconstructed
+to accommodate the insert, requiring $I(n) \in \Theta(\sqrt{n} \log
+\sqrt{n})$ time.  If this structure contains $m = \frac{n}{\sqrt{n}}$
+blocks, we represent it with the notation $\mathscr{I} = \{\mathscr{I}_1,
+\ldots, \mathscr{I}_m\}$, where $\mathscr{I}_i$ is the $i$th block.
 \end{example}
 
 Much of the existing work on dynamization has considered different
-approaches to decomposing data structures, and the effects that these
-approaches have on insertion and query performance. However, before we can
-discuss these approaches, we must first address the problem of answering
-search problems over these decomposed structures.
+decomposition methods for static data structures, and the effects that
+these methods have on insertion and query performance. However, before
+we can discuss these approaches, we must first address the problem of
+answering search problems over these decomposed structures.
 
 \subsection{Decomposable Search Problems}
 
@@ -335,12 +347,12 @@ search problems},
     for all $A, B \in \mathcal{PS}(\mathcal{D})$ where $A \cap B = \emptyset$.
 \end{definition}
 
-\Paragraph{Examples.} To demonstrate that a search problem is
-decomposable, it is necessary to show the existence of the merge operator,
-$\mergeop$, with the necessary properties, and to show that $F(A \cup
-B, q) = F(A, q)~ \mergeop ~F(B, q)$. With these two results, induction
-demonstrates that the problem is decomposable even in cases with more
-than two partial results.
+\subsubsection{Examples}
+To demonstrate that a search problem is decomposable, it is necessary to
+prove the existence of the merge operator, $\mergeop$, with the necessary
+properties, and to show that $F(A \cup B, q) = F(A, q)~ \mergeop ~F(B,
+q)$. With these two results, induction demonstrates that the problem is
+decomposable even in cases with more than two partial results.
 
 As an example, consider the range counting problem, which seeks to
 identify the number of elements in a set of 1-dimensional points that
@@ -395,13 +407,14 @@ taking $\nicefrac{s}{c}$. Therefore, calculating the average of a set
 of numbers is a DSP.
 \end{proof}
 
-\Paragraph{Answering Queries for DSPs.} Queries for a decomposable
-search problem can be answered over a decomposed structure by
-individually querying each block, and then merging the results together
-using $\mergeop$. In many cases, this process will introduce some
-overhead in the query cost. Given a decomposed data structure $\mathscr{I}
-= \{\mathscr{I}_1, \mathscr{I}_2, \ldots, \mathscr{I}_m\}$,
-a query for a $C(n)$-decomposable search problem can be answered using,
+\subsubsection{Answering Queries for DSPs.}
+Queries for a decomposable search problem can be answered over a
+decomposed structure by individually querying each block, and then merging
+the results together using $\mergeop$. In many cases, this process
+will introduce some overhead in the query cost. Given a decomposed
+data structure $\mathscr{I} = \{\mathscr{I}_1, \mathscr{I}_2, \ldots,
+\mathscr{I}_m\}$, a query for a $C(n)$-decomposable search problem can
+be answered using,
 \begin{equation*}
 \mathbftt{query}\left(\mathscr{I}, q\right) \triangleq \bigmergeop_{i=1}^{m} F(\mathscr{I}_i, q)
 \end{equation*}
@@ -420,11 +433,11 @@ better. Under certain circumstances, the costs of querying multiple
 blocks can be absorbed, resulting in no worst-case overhead, at least
 asymptotically.  As an example, consider a linear scan of the data running
 in $\Theta(n)$ time. In this case, every record must be considered,
-and so there isn't any performance penalty\footnote{
+and so there isn't any performance penalty to breaking the records into
+multiple blocks and scanning them individually.\footnote{
   From an asymptotic perspective. There will still be measurable
   performance effects from caching, etc., even in this case.
-} to breaking the records out into multiple chunks and scanning them
-individually. More formally, for any query running in $\mathscr{Q}_S(n) \in
+} More formally, for any query running in $\mathscr{Q}_S(n) \in
 \Omega\left(n^\epsilon\right)$ time where $\epsilon > 0$, the worst-case
 cost of answering a decomposable search problem from a decomposed
 structure is $\Theta\left(\mathscr{Q}_S(n)\right)$.~\cite{saxe79}
@@ -445,37 +458,38 @@ half-dynamic data structures, and the next section will discuss similar
 considerations for full-dynamic structures.
 
 Of the decomposition techniques, we will focus on the three most important
-from a practical standpoint.\footnote{
-    There are, in effect, two main methods for decomposition:
+methods.\footnote{
+    There are two main classes of method for decomposition:
     decomposing based on some counting scheme (logarithmic and
     $k$-binomial)~\cite{saxe79} or decomposing into equally sized blocks
     (equal block method)~\cite{overmars-art-of-dyn}. Other, more complex,
     methods do exist, but they are largely compositions of these two
-    simpler ones. These composed decompositions (heh) are of largely
-    theoretical interest, as they are sufficiently complex to be of
-    questionable practical utility.~\cite{overmars83}
-} The earliest of these is the logarithmic method, often called the
-Bentley-Saxe method in modern literature, and is the most commonly
-discussed technique today. The logarithmic method has been directly
-applied in a few instances in the literature, such as to metric indexing
-structures~\cite{naidan14} and spatial structures~\cite{bkdtree},
+    simpler ones. These decompositions  are of largely theoretical
+    interest, as they are sufficiently complex to be of questionable
+    practical utility.~\cite{overmars83}
+} The earliest of these is the logarithmic method~\cite{saxe79}, often
+called the Bentley-Saxe method in modern literature, and is the most
+commonly discussed technique today. The logarithmic method has been
+directly applied in a few instances in the literature, such as to metric
+indexing structures~\cite{naidan14} and spatial structures~\cite{bkdtree},
 and has also been used in a modified form for genetic sequence search
 structures~\cite{almodaresi23} and graphs~\cite{lsmgraph}, to cite
 a few examples. Bentley and Saxe also proposed a second approach, the
 $k$-binomial method, that slightly alters the exact decomposition approach
-used by the logarithmic method to allow for flexibility in whether the
-performance of inserts or queries should be favored.  A later technique,
-the equal block method, was also developed, which also seeks to introduce
-a mechanism for performance tuning. Of the three, the logarithmic method
-is the most generally effective, and we have not identified any specific
-applications of either $k$-binomial decomposition or the equal block method
-outside of the theoretical literature.
+used by the logarithmic method to allow for flexibility in whether
+the performance of inserts or queries should be favored~\cite{saxe79}.
+A later technique, the equal block method~\cite{overmars-art-of-dyn},
+was also developed, which also seeks to introduce a mechanism for
+performance tuning. Of the three, the logarithmic method is the most
+generally effective, and we have not identified any specific applications
+of either $k$-binomial decomposition or the equal block method outside
+of the theoretical literature.
 
 \subsection{The Logarithmic Method} 
 \label{ssec:bsm}
 
 The original, and most frequently used, decomposition technique is the
-logarithmic method, also called Bentley-Saxe method (BSM) in more recent
+logarithmic method, also called Bentley-Saxe method in more recent
 literature. This technique decomposes the structure into logarithmically
 many blocks of exponentially increasing size. More specifically, the
 data structure is decomposed into $h = \lceil \log_2 n \rceil$ blocks,
@@ -483,16 +497,16 @@ $\mathscr{I}_1, \mathscr{I}_2, \ldots, \mathscr{I}_h$. A given block
 $\mathscr{I}_i$ will be either empty, or contain exactly $2^i$ records
 within it.
 
-The procedure for inserting a record, $r \in \mathcal{D}$, into
-a logarithmic decomposition is as follows. If the block $\mathscr{I}_1$
-is empty, then $\mathscr{I}_1 = \mathbftt{build}{\{r\}}$. If it is not
-empty, then there will exist a maximal sequence of non-empty blocks
-$\mathscr{I}_1, \mathscr{I}_1, \ldots, \mathscr{I}_i$ for some $i \geq
+The procedure for inserting a record, $r \in \mathcal{D}$, into a
+logarithmic decomposition is as follows. If the block $\mathscr{I}_1$
+is empty, then $\mathscr{I}_1 = \mathbftt{build}(\{r\})$. If it is
+not empty, then there will exist a maximal sequence of non-empty
+blocks $\mathscr{I}_1, \ldots, \mathscr{I}_i$ for some $i \geq
 1$, terminated by an empty block $\mathscr{I}_{i+1}$. In this case,
 $\mathscr{I}_{i+1}$ is set to $\mathbftt{build}(\{r\} \cup \bigcup_{l=1}^i
 \mathbftt{unbuild}(\mathscr{I}_l))$ and blocks $\mathscr{I}_1$ through
-$\mathscr{I}_i$ are emptied. New empty blocks can be freely added to the
-end of the structure as needed.
+$\mathscr{I}_i$ are emptied. New empty blocks can be freely added to
+the end of the structure as needed.
 
 %FIXME: switch the x's to r's for consistency
 \begin{figure}
@@ -509,18 +523,20 @@ and $2$ to be merged, along with the $r_{12}$, to create the new block.
 \label{fig:bsm-example}
 \end{figure}
 
+\begin{example}[Insertion into a Logarithmic Decomposition]
 Figure~\ref{fig:bsm-example} demonstrates this insertion procedure. The
-dynamization is built over a set of records $x_1, x_2, \ldots,
-x_{10}$ initially, with eight records in $\mathscr{I}_4$ and two in
-$\mathscr{I}_2$. The first new record, $x_{11}$, is inserted directly
-into $\mathscr{I}_1$. For the next insert following this, $x_{12}$, the
+dynamization is built over a set of records $r_1, r_2, \ldots,
+r_{10}$ initially, with eight records in $\mathscr{I}_4$ and two in
+$\mathscr{I}_2$. The first new record, $r_{11}$, is inserted directly
+into $\mathscr{I}_1$. For the next insert following this, $r_{12}$, the
 first empty block is $\mathscr{I}_3$, and so the insert is performed by
-doing $\mathscr{I}_3 = \text{build}\left(\{x_{12}\} \cup
+doing $\mathscr{I}_3 = \text{build}\left(\{r_{12}\} \cup
 \text{unbuild}(\mathscr{I}_2) \cup \text{unbuild}(\mathscr{I}_3)\right)$
 and then emptying $\mathscr{I}_2$ and $\mathscr{I}_3$.
+\end{example}
 
-This technique is called a \emph{binary decomposition} of the data
-structure.  Considering a  logarithmic decomposition of a structure
+This technique is also called a \emph{binary decomposition} of the
+data structure.  Considering a  logarithmic decomposition of a structure
 containing $n$ records, labeling each block with a $0$ if it is empty and
 a $1$ if it is full will result in the binary representation of $n$. For
 example, the final state of the structure in Figure~\ref{fig:bsm-example}
@@ -529,8 +545,8 @@ in $0\text{b}1100$, which is $12$ in binary. Inserts affect this
 representation of the structure in the same way that incrementing the
 binary number by $1$ does.
 
-By applying this method to a data structure, a dynamized structure can
-be created with the following performance characteristics,
+By applying this method to a static data structure, a half-dynamic
+structure can be created with the following performance characteristics,
 \begin{align*}
 \text{Amortized Insertion Cost:}&\quad I_A(n) \in \Theta\left(\frac{B(n)}{n}\cdot \log_2 n\right) \\
 \text{Worst Case Insertion Cost:}&\quad I(n) \in \Theta\left(B(n)\right) \\
@@ -568,12 +584,11 @@ entire structure is compacted into a single block.
 One of the significant limitations of the logarithmic method is that it
 is incredibly rigid. In our earlier discussion of decomposition we noted
 that there exists a clear trade-off between insert and query performance
-for half-dynamic structures mediate by the number of blocks into which
-the structure is decomposed. However, the logarithmic method does not
-allow any navigation of this trade-off. In their original paper on the
-topic, Bentley and Saxe proposed a different decomposition scheme that
-does expose this trade-off, however, which they called the $k$-binomial
-transform.~\cite{saxe79}
+for half-dynamic structures mediate by the number of blocks into which the
+structure is decomposed. However, the logarithmic method does not allow
+any navigation of this trade-off. In their original paper on the topic,
+Bentley and Saxe proposed a different decomposition scheme that does
+expose this trade-off, called the $k$-binomial transform.~\cite{saxe79}
 
 In this transform, rather than decomposing the data structure based on
 powers of two, the structure is decomposed based on a sum of $k$ binomial
@@ -772,17 +787,17 @@ which such a data structure exists is called a \emph{merge decomposable
 search problem} (MDSP)~\cite{merge-dsp}.
 
 Note that in~\cite{merge-dsp}, Overmars considers a \emph{very} specific
-definition where the data structure is built in two stages. An initial
-sorting phase, requiring $O(n \log n)$ time, and then a construction
-phase requiring $O(n)$ time.  Overmars's proposed mechanism for leveraging
-this property is to include with each block a linked list storing the
-records in sorted order (presumably to account for structures where the
-records must be sorted, but aren't necessarily kept that way). During
-reconstructions, these sorted lists can first be merged, and then the
-data structure built from the resulting merged list. Using this approach,
-even accounting for the merging of the list, he is able to prove that
-the amortized insertion cost is less than would have been the case paying
-the $O( n \log n)$ cost for each reconstruction.~\cite{merge-dsp}
+definition where the data structure is built in two stages: An initial
+sorting phase, requiring $O(n \log n)$ time, and then a construction phase
+requiring $O(n)$ time.  Overmars's proposed mechanism for leveraging this
+property attaches a linked list to each block, which stores the records
+in sorted order (to account for structures where the records must be
+sorted, but aren't necessarily kept that way). During reconstructions,
+these sorted lists can first be merged, and then the data structure built
+from the resulting merged list. Using this approach, even accounting
+for the merging of the list, he is able to prove that the amortized
+insertion cost is less than would have been the case paying the $O(
+n \log n)$ cost for each reconstruction.~\cite{merge-dsp}
 
 While Overmars's definition for MDSP does capture a large number of
 mergeable data structures (including all of the mergeable structures
@@ -793,12 +808,12 @@ built from an unsorted set of records. More formally,
 \begin{definition}[Merge Decomposable Search Problem~\cite{merge-dsp}]
 	\label{def:mdsp}
     A search problem $F: (\mathcal{D}, \mathcal{Q}) \to \mathcal{R}$
-    is decomposable if and only if there exists a solution to the
+    is merge decomposable if and only if there exists a solution to the
     search problem (i.e., a data structure) that is static, and also
     supports the operation,
     \begin{itemize}
     \item $\mathbftt{merge}: \mathcal{I}^k \to \mathcal{I}$ \\
-    $\mathbftt{merge}(\mathscr{I}_1, \ldots \mathscr{I}_k)$ returns a
+    $\mathbftt{merge}(\mathscr{I}_1, \ldots, \mathscr{I}_k)$ returns a
     static data structure, $\mathcal{I}^\prime$, constructed
     from the input data structures, with cost $B_M(n, k) \leq B(n)$,
     such that for any set of search parameters $q$,
@@ -812,8 +827,8 @@ The value of $k$ can be upper-bounded by the decomposition technique
 used. For example, in the logarithmic method there will be $\log n$
 structures to merge in the worst case, and so to gain benefit from the
 merge routine, the merging of $\log n$ structures must be less expensive
-than building the new structure using the standard $\mathtt{unbuild}$
-and $\mathtt{build}$ mechanism. Note that the availability of an efficient merge
+than building the new structure using the standard $\mathbftt{unbuild}$
+and $\mathbftt{build}$ mechanism. The availability of an efficient merge
 operation isn't helpful in the equal block method, which doesn't
 perform data structure merges.\footnote{
     In the equal block method, all reconstructions are due to either
@@ -860,8 +875,8 @@ additionally appear in a new structure as well.
 When inserting into this structure, the algorithm first examines every
 level, $i$. If both $Older_{i-1}$ and $Oldest_{i-1}$ are full, then the
 algorithm will execute $\frac{B(2^i)}{2^i}$ steps of the algorithm
-to construct $New_i$ from $\text{unbuild}(Older_{i-1}) \cup
-\text{unbuild}(Oldest_{i-1})$. Once enough inserts have been performed
+to construct $New_i$ from $\mathbftt{unbuild}(Older_{i-1}) \cup
+\mathbftt{unbuild}(Oldest_{i-1})$. Once enough inserts have been performed
 to completely build some block, $New_i$, the source blocks for the
 reconstruction, $Oldest_{i-1}$ and $Older_{i-1}$ are deleted, $Old_{i-1}$
 becomes $Oldest_{i-1}$, and $New_i$ is assigned to the oldest empty block
@@ -880,18 +895,16 @@ worst-case bound drops to $I(n) \in \Theta\left(\frac{B(n)}{n}\right)$.
 \label{ssec:dyn-deletes}
 
 Full-dynamic structures are those with support for deleting records,
-as well as inserting. As it turns out, supporting deletes efficiently
-is significantly more challenging than inserts, but there are some
-results in the theoretical literature for efficient delete support in
-restricted cases.
-
-While, as discussed earlier, it is in principle possible to support
-deletes using global reconstruction, with the operation defined as
+as well as inserting. As it turns out, supporting deletes efficiently is
+significantly more challenging than inserts, but there are some results
+in the theoretical literature for efficient delete support in restricted
+cases.  In principle it is possible possible to support deletes using
+global reconstruction, with the operation defined as,
 \begin{equation*}
 \mathbftt{delete}(\mathscr{I}, r) \triangleq \mathbftt{build}(\mathbftt{unbuild}(\mathscr{I})  - \{r\})
 \end{equation*}
-the extension of this procedure to a decomposed data structure is less
-than trivial. Unlike inserts, where the record can (in principle) be
+However, the extension of this procedure to a decomposed data structure is
+less than trivial. Unlike inserts, where the record can (in principle) be
 placed into whatever block we like, deletes must be applied specifically
 to the block containing the record. As a result, there must be a means to
 locate the block containing a specified record before it can be deleted.
@@ -940,7 +953,7 @@ exists a constant time computable operator, $\Delta$, such that
 \begin{equation*}
 F(A - B, q) = F(A, q)~\Delta~F(B, q)
 \end{equation*}
-for all $A, B \in \mathcal{PS}(\mathcal{D})$ where $A \cap B = \emptyset$.
+for all $A, B \in \mathcal{PS}(\mathcal{D})$.  
 \end{definition}
 
 Given a search problem with this property, it is possible to emulate
@@ -1305,15 +1318,15 @@ the $k$ nearest elements,
 This can be thought of as solving the nearest-neighbor problem $k$ times,
 each time removing the returned result from $D$ prior to solving the
 problem again.  Unlike the single nearest-neighbor case (which can be
-thought of as k-NN with $k=1$), this problem is \emph{not} decomposable.
+thought of as $k$-NN with $k=1$), this problem is \emph{not} decomposable.
 
 \begin{theorem}
-    k-NN is not a decomposable search problem.
+    $k$-NN is not a decomposable search problem.
 \end{theorem}
 
 \begin{proof}
 To prove this, consider the query $KNN(D, q, k)$ against some partitioned
-dataset $D = D_1 \cup D_2 \ldots \cup D_\ell$. If k-NN is decomposable,
+dataset $D = D_1 \cup D_2 \ldots \cup D_\ell$. If $k$-NN is decomposable,
 then there must exist some constant-time, commutative, and associative
 binary operator $\mergeop$, such that $R = \mergeop_{1 \leq i \leq l}
 R_i$ where $R_i$ is the result of evaluating the query $KNN(D_i, q,
@@ -1321,22 +1334,22 @@ k)$. Consider the evaluation of the merge operator against two arbitrary
 result sets, $R = R_i \mergeop R_j$.  It is clear that $|R| = |R_i| =
 |R_j| = k$, and that the contents of $R$ must be the $k$ records from
 $R_i \cup R_j$ that are nearest to $q$. Thus, $\mergeop$ must solve the
-problem $KNN(R_i \cup R_j, q, k)$. However, k-NN cannot be solved in $O(1)$
-time. Therefore, k-NN is not a decomposable search problem.
+problem $KNN(R_i \cup R_j, q, k)$. However, $k$-NN cannot be solved in $O(1)$
+time. Therefore, $k$-NN is not a decomposable search problem.
 \end{proof}
 
 With that said, it is clear that there isn't any fundamental restriction
 preventing the merging of the result sets; it is only the case that an
 arbitrary performance requirement wouldn't be satisfied. It is possible
 to merge the result sets in non-constant time, and so it is the case
-that k-NN is $C(n)$-decomposable. Unfortunately, this classification
+that $k$-NN is $C(n)$-decomposable. Unfortunately, this classification
 brings with it a reduction in query performance as a result of the way
 result merges are performed.
 
 As a concrete example of these costs, consider using the logarithmic
 method to extend the VPTree~\cite{vptree}. The VPTree is a static,
-metric index capable of answering k-NN queries in $KNN(D, q, k) \in O(k
-\log n)$.  One possible merge algorithm for k-NN would be to push all
+metric index capable of answering $k$-NN queries in $KNN(D, q, k) \in O(k
+\log n)$.  One possible merge algorithm for $k$-NN would be to push all
 of the elements in the two arguments onto a min-heap, and then pop off
 the first $k$. In this case, the cost of the merge operation would be
 $C(k) = k \log k$. Were $k$ assumed to be constant, then the operation
@@ -1346,7 +1359,7 @@ general. Evaluating the total query cost for the extended structure,
 this would yield,
 
 \begin{equation} 
-    k-NN(D, q, k) \in O\left(k\log n \left(\log n + \log k\right) \right)
+    KNN(D, q, k) \in O\left(k\log n \left(\log n + \log k\right) \right)
 \end{equation}
 
 The reason for this large increase in cost is the repeated application
author	Douglas Rumbaugh <dbr4@psu.edu>	2025-06-27 18:10:23 -0400
committer	Douglas Rumbaugh <dbr4@psu.edu>	2025-06-27 18:10:23 -0400
commit	692e6185988fde5e20b883ac3d9d8f0847d96958 (patch)
tree	dcaf5013ba4eff22877f8cf06de4387882c3e627 /chapters/dynamization.tex
parent	fcdbcbcd45dc567792429bb314df53b42ed9f22e (diff)
download	dissertation-692e6185988fde5e20b883ac3d9d8f0847d96958.tar.gz