summaryrefslogtreecommitdiffstats
path: root/chapters/dynamization.tex
diff options
context:
space:
mode:
authorDouglas Rumbaugh <dbr4@psu.edu>2025-06-27 18:10:23 -0400
committerDouglas Rumbaugh <dbr4@psu.edu>2025-06-27 18:10:23 -0400
commit692e6185988fde5e20b883ac3d9d8f0847d96958 (patch)
treedcaf5013ba4eff22877f8cf06de4387882c3e627 /chapters/dynamization.tex
parentfcdbcbcd45dc567792429bb314df53b42ed9f22e (diff)
downloaddissertation-692e6185988fde5e20b883ac3d9d8f0847d96958.tar.gz
updates
Diffstat (limited to 'chapters/dynamization.tex')
-rw-r--r--chapters/dynamization.tex297
1 files changed, 155 insertions, 142 deletions
diff --git a/chapters/dynamization.tex b/chapters/dynamization.tex
index 085ce65..738a436 100644
--- a/chapters/dynamization.tex
+++ b/chapters/dynamization.tex
@@ -115,16 +115,21 @@ data structures must support the following three operations,
\item $\mathbftt{query}: \left(\mathcal{I}, \mathcal{Q}\right) \to \mathcal{R}$ \\
$\mathbftt{query}(\mathscr{I}, q)$ answers the query
$F(\mathscr{I}, q)$ and returns the result. This operation runs
- in $\mathscr{Q}_S(n)$ time in the worst-case and \emph{cannot alter
- the state of $\mathscr{I}$}.
+ in $\mathscr{Q}_S(n)$ time in the worst-case and cannot alter
+ the state of $\mathscr{I}$.
-\item $\mathbftt{build}:\left(\mathcal{PS}(\mathcal{D})\right) \to \mathcal{I}$ \\
+\item $\mathbftt{build}:\mathcal{PS}(\mathcal{D}) \to \mathcal{I}$ \\
$\mathbftt{build}(d)$ constructs a new instance of $\mathcal{I}$
using the records in set $d$. This operation runs in $B(n)$ time in
- the worst case.
-
-\item $\mathbftt{unbuild}\left(\mathcal{I}\right) \to \mathcal{PS}(\mathcal{D})$ \\
- $\mathbftt{unbuild}(\mathscr{I})$ recovers the set of records, $d$
+ the worst case.\footnote{
+ We use the notation $\mathcal{PS}(\mathcal{D})$ to indicate the
+ power set of $\mathcal{D}$, i.e. the set containing all possible
+ subsets of $\mathcal{D}$. Thus, $d \in \mathcal{PS}(\mathcal{D})
+ \iff d \subseteq \mathcal{D}$.
+ }
+
+\item $\mathbftt{unbuild}: \mathcal{I} \to \mathcal{PS}(\mathcal{D})$ \\
+ $\mathbftt{unbuild}(\mathscr{I})$ recovers the set of records, $d$,
used to construct $\mathscr{I}$. The literature on dynamization
generally assumes that this operation runs in $\Theta(1)$
time~\cite{saxe79}, and we will adopt the same assumption in our
@@ -133,38 +138,41 @@ data structures must support the following three operations,
\end{definition}
-Note that the term static is distinct from immutable. Static refers
-to the layout of records within the data structure, whereas immutable
-refers to the data stored within those records. This distinction will
-become relevant when we discuss different techniques for adding delete
-support to data structures. The data structures used are always static,
-but not necessarily immutable, because the records may contain header
-information (like visibility) that is updated in place.
+Note that the property of being static is distinct from that of being
+immutable. Static refers to the layout of records within the data
+structure, whereas immutable refers to the data stored within those
+records. This distinction will become relevant when we discuss different
+techniques for adding delete support to data structures. The data
+structures used are always static, but not necessarily immutable,
+because the records may contain header information (like visibility)
+that is updated in place.
\begin{definition}[Half-dynamic Data Structure~\cite{overmars-art-of-dyn}]
\label{def:half-dynamic-ds}
A half-dynamic data structure requires the three operations of a static
-data structure, as well as the ability to efficiently insert new data into
-a structure built over an existing data set, $d$.
+data structure, as well as the ability to efficiently insert new data
+into a structure built over an existing data set.
\begin{itemize}
\item $\mathbftt{insert}: \left(\mathcal{I}, \mathcal{D}\right) \to \mathcal{I}$ \\
$\mathbftt{insert}(\mathscr{I}, r)$ returns a data structure,
$\mathscr{I}^\prime$, such that $\mathbftt{query}(\mathscr{I}^\prime,
- q) = F(d \cup r, q)$, for some $r \in \mathcal{D}$. This operation
- runs in $I(n)$ time in the worst-case.
+ q) = F(\mathbftt{unbuild}(\mathscr{I}) \cup \{r\}, q)$, for some
+ $r \in \mathcal{D}$. This operation runs in $I(n)$ time in the
+ worst-case.
\end{itemize}
\end{definition}
The important aspect of insertion in this model is that the effect of
-the new record on the query result is observed, not necessarily that
-the result is a structure exactly identical to the one that would be
-obtained by building a new structure over $d \cup r$. Also, though the
-formalism used implies a functional operation where the original data
-structure is unmodified, this is not actually a requirement. $\mathscr{I}$
-could be sightly modified in place, and returned as $\mathscr{I}^\prime$,
-as is conventionally done with native dynamic data structures.
+the new record on the query result is observed, not necessarily that the
+result is a structure exactly identical to the one that would be obtained
+by building a new structure over $\mathbftt{unbuild}(\mathscr{I}) \cup
+\{r\}$. Also, though the formalism used implies a functional operation
+where the original data structure is unmodified, this is not actually
+a requirement. $\mathscr{I}$ could be sightly modified in place, and
+returned as $\mathscr{I}^\prime$, as is conventionally done with native
+dynamic data structures.
\begin{definition}[Full-dynamic Data Structure~\cite{overmars-art-of-dyn}]
\label{def:full-dynamic-ds}
@@ -175,7 +183,7 @@ has support for deleting records from the dataset.
\item $\mathbftt{delete}: \left(\mathcal{I}, \mathcal{D}\right) \to \mathcal{I}$ \\
$\mathbftt{delete}(\mathscr{I}, r)$ returns a data structure, $\mathscr{I}^\prime$,
such that $\mathbftt{query}(\mathscr{I}^\prime,
- q) = F(d - r, q)$, for some $r \in \mathcal{D}$. This operation
+ q) = F(\mathbftt{unbuild}(\mathscr{I}) - \{r\}, q)$, for some $r \in \mathcal{D}$. This operation
runs in $D(n)$ time in the worst-case.
\end{itemize}
@@ -199,7 +207,7 @@ data structures cannot be statically queried--the act of querying them
mutates their state. This is the case for structures like heaps, stacks,
and queues, for example.
-\section{Decomposition-based Dynamization}
+\section{Dynamization Basics}
\emph{Dynamization} is the process of transforming a static data structure
into a dynamic one. When certain conditions are satisfied by the data
@@ -226,10 +234,10 @@ section discusses techniques that are more general, and don't require
workload-specific assumptions. For more detail than is included in
this section, Overmars wrote a book providing a comprehensive survey of
techniques for creating dynamic data structures, including not only the
-dynamization techniques discussed here, but also local reconstruction
-based techniques and more~\cite{overmars83}.\footnote{
- Sadly, this book isn't readily available in
- digital format as of the time of writing.
+techniques discussed here, but also local reconstruction based techniques
+and more~\cite{overmars83}.\footnote{
+ Sadly, this book isn't readily available in digital format as of
+ the time of writing.
}
@@ -271,30 +279,34 @@ requiring $B(n)$ time, it will require $B(\sqrt{n})$ time.}
\end{figure}
The problem with global reconstruction is that each insert or delete
-must rebuild the entire data structure, involving all of its records. The
-key insight, first discussed by Bentley and Saxe~\cite{saxe79}, is that
-the cost associated with global reconstruction can be reduced by be
-accomplished by \emph{decomposing} the data structure into multiple,
-smaller structures, each built from a disjoint partition of the data.
-These smaller structures are called \emph{blocks}. It is possible to
-devise decomposition schemes that result in asymptotic improvements
-of insertion performance when compared to global reconstruction alone.
+must rebuild the entire data structure. The key insight that enables
+dynamization based on global reconstruction, first discussed by
+Bentley and Saxe~\cite{saxe79}, is that the cost associated with global
+reconstruction can be reduced by \emph{decomposing} the data structure
+into multiple, smaller structures, called \emph{blocks}, each built from
+a disjoint partition of the data. The process by which the structure is
+broken into blocks is called a decomposition method, and various methods
+have been proposed that result in asymptotic improvements of insertion
+performance when compared to global reconstruction alone.
\begin{example}[Data Structure Decomposition]
-Consider a data structure that can be constructed in $B(n) \in \Theta
-(n \log n)$ time with $|\mathscr{I}| = n$. Inserting a new record into
-this structure using global reconstruction will require $I(n) \in \Theta
-(n \log n)$ time. However, if the data structure is decomposed into
-blocks, such that each block contains $\Theta(\sqrt{n)})$ records, as shown
-in Figure~\ref{fig:bg-decomp}, then only a single block must be reconstructed
-to accommodate the insert, requiring $I(n) \in \Theta(\sqrt{n} \log \sqrt{n})$ time.
+Consider a data structure that can be constructed in $B(n) \in \Theta (n
+\log n)$ time with $|\mathscr{I}| = n$. Inserting a new record into this
+structure using global reconstruction will require $I(n) \in \Theta (n
+\log n)$ time. However, if the data structure is decomposed into blocks,
+such that each block contains $\Theta(\sqrt{n)})$ records, as shown in
+Figure~\ref{fig:bg-decomp}, then only a single block must be reconstructed
+to accommodate the insert, requiring $I(n) \in \Theta(\sqrt{n} \log
+\sqrt{n})$ time. If this structure contains $m = \frac{n}{\sqrt{n}}$
+blocks, we represent it with the notation $\mathscr{I} = \{\mathscr{I}_1,
+\ldots, \mathscr{I}_m\}$, where $\mathscr{I}_i$ is the $i$th block.
\end{example}
Much of the existing work on dynamization has considered different
-approaches to decomposing data structures, and the effects that these
-approaches have on insertion and query performance. However, before we can
-discuss these approaches, we must first address the problem of answering
-search problems over these decomposed structures.
+decomposition methods for static data structures, and the effects that
+these methods have on insertion and query performance. However, before
+we can discuss these approaches, we must first address the problem of
+answering search problems over these decomposed structures.
\subsection{Decomposable Search Problems}
@@ -335,12 +347,12 @@ search problems},
for all $A, B \in \mathcal{PS}(\mathcal{D})$ where $A \cap B = \emptyset$.
\end{definition}
-\Paragraph{Examples.} To demonstrate that a search problem is
-decomposable, it is necessary to show the existence of the merge operator,
-$\mergeop$, with the necessary properties, and to show that $F(A \cup
-B, q) = F(A, q)~ \mergeop ~F(B, q)$. With these two results, induction
-demonstrates that the problem is decomposable even in cases with more
-than two partial results.
+\subsubsection{Examples}
+To demonstrate that a search problem is decomposable, it is necessary to
+prove the existence of the merge operator, $\mergeop$, with the necessary
+properties, and to show that $F(A \cup B, q) = F(A, q)~ \mergeop ~F(B,
+q)$. With these two results, induction demonstrates that the problem is
+decomposable even in cases with more than two partial results.
As an example, consider the range counting problem, which seeks to
identify the number of elements in a set of 1-dimensional points that
@@ -395,13 +407,14 @@ taking $\nicefrac{s}{c}$. Therefore, calculating the average of a set
of numbers is a DSP.
\end{proof}
-\Paragraph{Answering Queries for DSPs.} Queries for a decomposable
-search problem can be answered over a decomposed structure by
-individually querying each block, and then merging the results together
-using $\mergeop$. In many cases, this process will introduce some
-overhead in the query cost. Given a decomposed data structure $\mathscr{I}
-= \{\mathscr{I}_1, \mathscr{I}_2, \ldots, \mathscr{I}_m\}$,
-a query for a $C(n)$-decomposable search problem can be answered using,
+\subsubsection{Answering Queries for DSPs.}
+Queries for a decomposable search problem can be answered over a
+decomposed structure by individually querying each block, and then merging
+the results together using $\mergeop$. In many cases, this process
+will introduce some overhead in the query cost. Given a decomposed
+data structure $\mathscr{I} = \{\mathscr{I}_1, \mathscr{I}_2, \ldots,
+\mathscr{I}_m\}$, a query for a $C(n)$-decomposable search problem can
+be answered using,
\begin{equation*}
\mathbftt{query}\left(\mathscr{I}, q\right) \triangleq \bigmergeop_{i=1}^{m} F(\mathscr{I}_i, q)
\end{equation*}
@@ -420,11 +433,11 @@ better. Under certain circumstances, the costs of querying multiple
blocks can be absorbed, resulting in no worst-case overhead, at least
asymptotically. As an example, consider a linear scan of the data running
in $\Theta(n)$ time. In this case, every record must be considered,
-and so there isn't any performance penalty\footnote{
+and so there isn't any performance penalty to breaking the records into
+multiple blocks and scanning them individually.\footnote{
From an asymptotic perspective. There will still be measurable
performance effects from caching, etc., even in this case.
-} to breaking the records out into multiple chunks and scanning them
-individually. More formally, for any query running in $\mathscr{Q}_S(n) \in
+} More formally, for any query running in $\mathscr{Q}_S(n) \in
\Omega\left(n^\epsilon\right)$ time where $\epsilon > 0$, the worst-case
cost of answering a decomposable search problem from a decomposed
structure is $\Theta\left(\mathscr{Q}_S(n)\right)$.~\cite{saxe79}
@@ -445,37 +458,38 @@ half-dynamic data structures, and the next section will discuss similar
considerations for full-dynamic structures.
Of the decomposition techniques, we will focus on the three most important
-from a practical standpoint.\footnote{
- There are, in effect, two main methods for decomposition:
+methods.\footnote{
+ There are two main classes of method for decomposition:
decomposing based on some counting scheme (logarithmic and
$k$-binomial)~\cite{saxe79} or decomposing into equally sized blocks
(equal block method)~\cite{overmars-art-of-dyn}. Other, more complex,
methods do exist, but they are largely compositions of these two
- simpler ones. These composed decompositions (heh) are of largely
- theoretical interest, as they are sufficiently complex to be of
- questionable practical utility.~\cite{overmars83}
-} The earliest of these is the logarithmic method, often called the
-Bentley-Saxe method in modern literature, and is the most commonly
-discussed technique today. The logarithmic method has been directly
-applied in a few instances in the literature, such as to metric indexing
-structures~\cite{naidan14} and spatial structures~\cite{bkdtree},
+ simpler ones. These decompositions are of largely theoretical
+ interest, as they are sufficiently complex to be of questionable
+ practical utility.~\cite{overmars83}
+} The earliest of these is the logarithmic method~\cite{saxe79}, often
+called the Bentley-Saxe method in modern literature, and is the most
+commonly discussed technique today. The logarithmic method has been
+directly applied in a few instances in the literature, such as to metric
+indexing structures~\cite{naidan14} and spatial structures~\cite{bkdtree},
and has also been used in a modified form for genetic sequence search
structures~\cite{almodaresi23} and graphs~\cite{lsmgraph}, to cite
a few examples. Bentley and Saxe also proposed a second approach, the
$k$-binomial method, that slightly alters the exact decomposition approach
-used by the logarithmic method to allow for flexibility in whether the
-performance of inserts or queries should be favored. A later technique,
-the equal block method, was also developed, which also seeks to introduce
-a mechanism for performance tuning. Of the three, the logarithmic method
-is the most generally effective, and we have not identified any specific
-applications of either $k$-binomial decomposition or the equal block method
-outside of the theoretical literature.
+used by the logarithmic method to allow for flexibility in whether
+the performance of inserts or queries should be favored~\cite{saxe79}.
+A later technique, the equal block method~\cite{overmars-art-of-dyn},
+was also developed, which also seeks to introduce a mechanism for
+performance tuning. Of the three, the logarithmic method is the most
+generally effective, and we have not identified any specific applications
+of either $k$-binomial decomposition or the equal block method outside
+of the theoretical literature.
\subsection{The Logarithmic Method}
\label{ssec:bsm}
The original, and most frequently used, decomposition technique is the
-logarithmic method, also called Bentley-Saxe method (BSM) in more recent
+logarithmic method, also called Bentley-Saxe method in more recent
literature. This technique decomposes the structure into logarithmically
many blocks of exponentially increasing size. More specifically, the
data structure is decomposed into $h = \lceil \log_2 n \rceil$ blocks,
@@ -483,16 +497,16 @@ $\mathscr{I}_1, \mathscr{I}_2, \ldots, \mathscr{I}_h$. A given block
$\mathscr{I}_i$ will be either empty, or contain exactly $2^i$ records
within it.
-The procedure for inserting a record, $r \in \mathcal{D}$, into
-a logarithmic decomposition is as follows. If the block $\mathscr{I}_1$
-is empty, then $\mathscr{I}_1 = \mathbftt{build}{\{r\}}$. If it is not
-empty, then there will exist a maximal sequence of non-empty blocks
-$\mathscr{I}_1, \mathscr{I}_1, \ldots, \mathscr{I}_i$ for some $i \geq
+The procedure for inserting a record, $r \in \mathcal{D}$, into a
+logarithmic decomposition is as follows. If the block $\mathscr{I}_1$
+is empty, then $\mathscr{I}_1 = \mathbftt{build}(\{r\})$. If it is
+not empty, then there will exist a maximal sequence of non-empty
+blocks $\mathscr{I}_1, \ldots, \mathscr{I}_i$ for some $i \geq
1$, terminated by an empty block $\mathscr{I}_{i+1}$. In this case,
$\mathscr{I}_{i+1}$ is set to $\mathbftt{build}(\{r\} \cup \bigcup_{l=1}^i
\mathbftt{unbuild}(\mathscr{I}_l))$ and blocks $\mathscr{I}_1$ through
-$\mathscr{I}_i$ are emptied. New empty blocks can be freely added to the
-end of the structure as needed.
+$\mathscr{I}_i$ are emptied. New empty blocks can be freely added to
+the end of the structure as needed.
%FIXME: switch the x's to r's for consistency
\begin{figure}
@@ -509,18 +523,20 @@ and $2$ to be merged, along with the $r_{12}$, to create the new block.
\label{fig:bsm-example}
\end{figure}
+\begin{example}[Insertion into a Logarithmic Decomposition]
Figure~\ref{fig:bsm-example} demonstrates this insertion procedure. The
-dynamization is built over a set of records $x_1, x_2, \ldots,
-x_{10}$ initially, with eight records in $\mathscr{I}_4$ and two in
-$\mathscr{I}_2$. The first new record, $x_{11}$, is inserted directly
-into $\mathscr{I}_1$. For the next insert following this, $x_{12}$, the
+dynamization is built over a set of records $r_1, r_2, \ldots,
+r_{10}$ initially, with eight records in $\mathscr{I}_4$ and two in
+$\mathscr{I}_2$. The first new record, $r_{11}$, is inserted directly
+into $\mathscr{I}_1$. For the next insert following this, $r_{12}$, the
first empty block is $\mathscr{I}_3$, and so the insert is performed by
-doing $\mathscr{I}_3 = \text{build}\left(\{x_{12}\} \cup
+doing $\mathscr{I}_3 = \text{build}\left(\{r_{12}\} \cup
\text{unbuild}(\mathscr{I}_2) \cup \text{unbuild}(\mathscr{I}_3)\right)$
and then emptying $\mathscr{I}_2$ and $\mathscr{I}_3$.
+\end{example}
-This technique is called a \emph{binary decomposition} of the data
-structure. Considering a logarithmic decomposition of a structure
+This technique is also called a \emph{binary decomposition} of the
+data structure. Considering a logarithmic decomposition of a structure
containing $n$ records, labeling each block with a $0$ if it is empty and
a $1$ if it is full will result in the binary representation of $n$. For
example, the final state of the structure in Figure~\ref{fig:bsm-example}
@@ -529,8 +545,8 @@ in $0\text{b}1100$, which is $12$ in binary. Inserts affect this
representation of the structure in the same way that incrementing the
binary number by $1$ does.
-By applying this method to a data structure, a dynamized structure can
-be created with the following performance characteristics,
+By applying this method to a static data structure, a half-dynamic
+structure can be created with the following performance characteristics,
\begin{align*}
\text{Amortized Insertion Cost:}&\quad I_A(n) \in \Theta\left(\frac{B(n)}{n}\cdot \log_2 n\right) \\
\text{Worst Case Insertion Cost:}&\quad I(n) \in \Theta\left(B(n)\right) \\
@@ -568,12 +584,11 @@ entire structure is compacted into a single block.
One of the significant limitations of the logarithmic method is that it
is incredibly rigid. In our earlier discussion of decomposition we noted
that there exists a clear trade-off between insert and query performance
-for half-dynamic structures mediate by the number of blocks into which
-the structure is decomposed. However, the logarithmic method does not
-allow any navigation of this trade-off. In their original paper on the
-topic, Bentley and Saxe proposed a different decomposition scheme that
-does expose this trade-off, however, which they called the $k$-binomial
-transform.~\cite{saxe79}
+for half-dynamic structures mediate by the number of blocks into which the
+structure is decomposed. However, the logarithmic method does not allow
+any navigation of this trade-off. In their original paper on the topic,
+Bentley and Saxe proposed a different decomposition scheme that does
+expose this trade-off, called the $k$-binomial transform.~\cite{saxe79}
In this transform, rather than decomposing the data structure based on
powers of two, the structure is decomposed based on a sum of $k$ binomial
@@ -772,17 +787,17 @@ which such a data structure exists is called a \emph{merge decomposable
search problem} (MDSP)~\cite{merge-dsp}.
Note that in~\cite{merge-dsp}, Overmars considers a \emph{very} specific
-definition where the data structure is built in two stages. An initial
-sorting phase, requiring $O(n \log n)$ time, and then a construction
-phase requiring $O(n)$ time. Overmars's proposed mechanism for leveraging
-this property is to include with each block a linked list storing the
-records in sorted order (presumably to account for structures where the
-records must be sorted, but aren't necessarily kept that way). During
-reconstructions, these sorted lists can first be merged, and then the
-data structure built from the resulting merged list. Using this approach,
-even accounting for the merging of the list, he is able to prove that
-the amortized insertion cost is less than would have been the case paying
-the $O( n \log n)$ cost for each reconstruction.~\cite{merge-dsp}
+definition where the data structure is built in two stages: An initial
+sorting phase, requiring $O(n \log n)$ time, and then a construction phase
+requiring $O(n)$ time. Overmars's proposed mechanism for leveraging this
+property attaches a linked list to each block, which stores the records
+in sorted order (to account for structures where the records must be
+sorted, but aren't necessarily kept that way). During reconstructions,
+these sorted lists can first be merged, and then the data structure built
+from the resulting merged list. Using this approach, even accounting
+for the merging of the list, he is able to prove that the amortized
+insertion cost is less than would have been the case paying the $O(
+n \log n)$ cost for each reconstruction.~\cite{merge-dsp}
While Overmars's definition for MDSP does capture a large number of
mergeable data structures (including all of the mergeable structures
@@ -793,12 +808,12 @@ built from an unsorted set of records. More formally,
\begin{definition}[Merge Decomposable Search Problem~\cite{merge-dsp}]
\label{def:mdsp}
A search problem $F: (\mathcal{D}, \mathcal{Q}) \to \mathcal{R}$
- is decomposable if and only if there exists a solution to the
+ is merge decomposable if and only if there exists a solution to the
search problem (i.e., a data structure) that is static, and also
supports the operation,
\begin{itemize}
\item $\mathbftt{merge}: \mathcal{I}^k \to \mathcal{I}$ \\
- $\mathbftt{merge}(\mathscr{I}_1, \ldots \mathscr{I}_k)$ returns a
+ $\mathbftt{merge}(\mathscr{I}_1, \ldots, \mathscr{I}_k)$ returns a
static data structure, $\mathcal{I}^\prime$, constructed
from the input data structures, with cost $B_M(n, k) \leq B(n)$,
such that for any set of search parameters $q$,
@@ -812,8 +827,8 @@ The value of $k$ can be upper-bounded by the decomposition technique
used. For example, in the logarithmic method there will be $\log n$
structures to merge in the worst case, and so to gain benefit from the
merge routine, the merging of $\log n$ structures must be less expensive
-than building the new structure using the standard $\mathtt{unbuild}$
-and $\mathtt{build}$ mechanism. Note that the availability of an efficient merge
+than building the new structure using the standard $\mathbftt{unbuild}$
+and $\mathbftt{build}$ mechanism. The availability of an efficient merge
operation isn't helpful in the equal block method, which doesn't
perform data structure merges.\footnote{
In the equal block method, all reconstructions are due to either
@@ -860,8 +875,8 @@ additionally appear in a new structure as well.
When inserting into this structure, the algorithm first examines every
level, $i$. If both $Older_{i-1}$ and $Oldest_{i-1}$ are full, then the
algorithm will execute $\frac{B(2^i)}{2^i}$ steps of the algorithm
-to construct $New_i$ from $\text{unbuild}(Older_{i-1}) \cup
-\text{unbuild}(Oldest_{i-1})$. Once enough inserts have been performed
+to construct $New_i$ from $\mathbftt{unbuild}(Older_{i-1}) \cup
+\mathbftt{unbuild}(Oldest_{i-1})$. Once enough inserts have been performed
to completely build some block, $New_i$, the source blocks for the
reconstruction, $Oldest_{i-1}$ and $Older_{i-1}$ are deleted, $Old_{i-1}$
becomes $Oldest_{i-1}$, and $New_i$ is assigned to the oldest empty block
@@ -880,18 +895,16 @@ worst-case bound drops to $I(n) \in \Theta\left(\frac{B(n)}{n}\right)$.
\label{ssec:dyn-deletes}
Full-dynamic structures are those with support for deleting records,
-as well as inserting. As it turns out, supporting deletes efficiently
-is significantly more challenging than inserts, but there are some
-results in the theoretical literature for efficient delete support in
-restricted cases.
-
-While, as discussed earlier, it is in principle possible to support
-deletes using global reconstruction, with the operation defined as
+as well as inserting. As it turns out, supporting deletes efficiently is
+significantly more challenging than inserts, but there are some results
+in the theoretical literature for efficient delete support in restricted
+cases. In principle it is possible possible to support deletes using
+global reconstruction, with the operation defined as,
\begin{equation*}
\mathbftt{delete}(\mathscr{I}, r) \triangleq \mathbftt{build}(\mathbftt{unbuild}(\mathscr{I}) - \{r\})
\end{equation*}
-the extension of this procedure to a decomposed data structure is less
-than trivial. Unlike inserts, where the record can (in principle) be
+However, the extension of this procedure to a decomposed data structure is
+less than trivial. Unlike inserts, where the record can (in principle) be
placed into whatever block we like, deletes must be applied specifically
to the block containing the record. As a result, there must be a means to
locate the block containing a specified record before it can be deleted.
@@ -940,7 +953,7 @@ exists a constant time computable operator, $\Delta$, such that
\begin{equation*}
F(A - B, q) = F(A, q)~\Delta~F(B, q)
\end{equation*}
-for all $A, B \in \mathcal{PS}(\mathcal{D})$ where $A \cap B = \emptyset$.
+for all $A, B \in \mathcal{PS}(\mathcal{D})$.
\end{definition}
Given a search problem with this property, it is possible to emulate
@@ -1305,15 +1318,15 @@ the $k$ nearest elements,
This can be thought of as solving the nearest-neighbor problem $k$ times,
each time removing the returned result from $D$ prior to solving the
problem again. Unlike the single nearest-neighbor case (which can be
-thought of as k-NN with $k=1$), this problem is \emph{not} decomposable.
+thought of as $k$-NN with $k=1$), this problem is \emph{not} decomposable.
\begin{theorem}
- k-NN is not a decomposable search problem.
+ $k$-NN is not a decomposable search problem.
\end{theorem}
\begin{proof}
To prove this, consider the query $KNN(D, q, k)$ against some partitioned
-dataset $D = D_1 \cup D_2 \ldots \cup D_\ell$. If k-NN is decomposable,
+dataset $D = D_1 \cup D_2 \ldots \cup D_\ell$. If $k$-NN is decomposable,
then there must exist some constant-time, commutative, and associative
binary operator $\mergeop$, such that $R = \mergeop_{1 \leq i \leq l}
R_i$ where $R_i$ is the result of evaluating the query $KNN(D_i, q,
@@ -1321,22 +1334,22 @@ k)$. Consider the evaluation of the merge operator against two arbitrary
result sets, $R = R_i \mergeop R_j$. It is clear that $|R| = |R_i| =
|R_j| = k$, and that the contents of $R$ must be the $k$ records from
$R_i \cup R_j$ that are nearest to $q$. Thus, $\mergeop$ must solve the
-problem $KNN(R_i \cup R_j, q, k)$. However, k-NN cannot be solved in $O(1)$
-time. Therefore, k-NN is not a decomposable search problem.
+problem $KNN(R_i \cup R_j, q, k)$. However, $k$-NN cannot be solved in $O(1)$
+time. Therefore, $k$-NN is not a decomposable search problem.
\end{proof}
With that said, it is clear that there isn't any fundamental restriction
preventing the merging of the result sets; it is only the case that an
arbitrary performance requirement wouldn't be satisfied. It is possible
to merge the result sets in non-constant time, and so it is the case
-that k-NN is $C(n)$-decomposable. Unfortunately, this classification
+that $k$-NN is $C(n)$-decomposable. Unfortunately, this classification
brings with it a reduction in query performance as a result of the way
result merges are performed.
As a concrete example of these costs, consider using the logarithmic
method to extend the VPTree~\cite{vptree}. The VPTree is a static,
-metric index capable of answering k-NN queries in $KNN(D, q, k) \in O(k
-\log n)$. One possible merge algorithm for k-NN would be to push all
+metric index capable of answering $k$-NN queries in $KNN(D, q, k) \in O(k
+\log n)$. One possible merge algorithm for $k$-NN would be to push all
of the elements in the two arguments onto a min-heap, and then pop off
the first $k$. In this case, the cost of the merge operation would be
$C(k) = k \log k$. Were $k$ assumed to be constant, then the operation
@@ -1346,7 +1359,7 @@ general. Evaluating the total query cost for the extended structure,
this would yield,
\begin{equation}
- k-NN(D, q, k) \in O\left(k\log n \left(\log n + \log k\right) \right)
+ KNN(D, q, k) \in O\left(k\log n \left(\log n + \log k\right) \right)
\end{equation}
The reason for this large increase in cost is the repeated application