summaryrefslogtreecommitdiffstats
path: root/chapters/design-space.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/design-space.tex')
-rw-r--r--chapters/design-space.tex31
1 files changed, 16 insertions, 15 deletions
diff --git a/chapters/design-space.tex b/chapters/design-space.tex
index cfea75e..f788a7a 100644
--- a/chapters/design-space.tex
+++ b/chapters/design-space.tex
@@ -53,7 +53,7 @@ of metric indices~\cite{naidan14,bkdtree}, a class of data structure
and search problem for which we saw very good performance from standard
Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. The other experiments
in Chapter~\ref{chap:framework} show that, for other types of problem,
-the technique does not fair quite so well.
+the technique does not fare quite so well.
\section{Asymptotic Analysis}
\label{sec:design-asymp}
@@ -61,7 +61,7 @@ the technique does not fair quite so well.
Before beginning with derivations for
the cost functions of dynamized structures within the context of our
proposed design space, we should make a few comments about the assumptions
-and techniques that we will us in our analysis. As this design space
+and techniques that we will use in our analysis. As this design space
involves adjusting constants, we will leave the design-space related
constants within our asymptotic expressions. Additionally, we will
perform the analysis for a simple decomposable search problem. Deletes
@@ -89,9 +89,9 @@ decided to maintain the core concept of binary decomposition. One interesting
mathematical property of a Bentley-Saxe dynamization is that the internal
layout of levels exactly matches the binary representation of the record
count contained within the index. For example, a dynamization containing
-$n=20$ records will have 4 records in the third level, and 16 in the fourth,
+$n=20$ records will have 4 records in the third level, and 16 in the fifth,
with all other levels being empty. If we represent a full level with a 1
-and an empty level with a 0, then we'd have $1100$, which is $20$ in
+and an empty level with a 0, then we'd have $10100$, which is $20$ in
base 2.
\begin{algorithm}
@@ -129,7 +129,7 @@ $\mathscr{I}_{target} \gets \text{build}(\text{unbuild}(\mathscr{I}_0) \cup \ld
Our generalization, then, is to represent the data as an $s$-ary
decomposition, where the scale factor represents the base of the
representation. To accomplish this, we set of capacity of level $i$ to
-be $N_B (s - 1) \cdot s^i$, where $N_b$ is the size of the buffer. The
+be $N_B (s - 1) \cdot s^i$, where $N_B$ is the size of the buffer. The
resulting structure will have at most $\log_s n$ shards. The resulting
policy is described in Algorithm~\ref{alg:design-bsm}.
@@ -192,7 +192,7 @@ analysis. The worst-case cost of a reconstruction is $B(n)$, and there
are $\log_s(n)$ total levels, so the total reconstruction costs associated
with a record can be upper-bounded by, $B(n) \cdot
\frac{W(\log_s(n))}{n}$, and then this cost amortized over the $n$
-insertions necessary to get the record into the last level. We'lll also
+insertions necessary to get the record into the last level. We'll also
condense the multiplicative constants and drop the additive ones to more
clearly represent the relationship we're looking to show. This results
in an amortized insertion cost of,
@@ -306,12 +306,13 @@ $\mathscr{I}_0 \gets \text{build}(r)$ \;
\end{algorithm}
Our leveling layout policy is described in
-Algorithm~\ref{alg:design-leveling}. Each level contains a single structure
-with a capacity of $N_B\cdot s^{i+1}$ records. When a reconstruction occurs,
-the first level $i$ that has enough space to have the records in the
-level $i-1$ stored inside of it is selected as the target, and then a new
-structure is built at level $i$ containing the records in it and level
-$i-1$. Then, all levels $j < (i - 1)$ are shifted by one level to level
+Algorithm~\ref{alg:design-leveling}. Each level contains a single
+structure with a capacity of $N_B\cdot s^{i+1}$ records. When a
+reconstruction occurs, the smallest level, $i$, with space to contain the
+records from level $i-1$, in addition to the records currently within
+it, is located. Then, a new structure is built at level $i$ containing
+all of the records in levels $i$ and $i-1$, and the structure at level
+$i-1$ is deleted. Finally, all levels $j < (i - 1)$ are shifted to level
$j+1$. This process clears space in level $0$ to contain the buffer flush.
\begin{theorem}
@@ -634,7 +635,7 @@ to minimize its influence on the results (we've seen before in
Sections~\ref{ssec:ds-exp} and \ref{ssec:dyn-ds-exp} that scale factor
affects leveling and tiering in opposite ways) and isolate the influence
of the layout policy alone to as great a degree as possible. We used a
-buffer size of $N_b=12000$ for the ISAM tree structure, and $N_B=1000$
+buffer size of $N_B=12000$ for the ISAM tree structure, and $N_B=1000$
for the VPTree.
We generated this distribution by inserting $30\%$ of the records from
@@ -750,7 +751,7 @@ in scale factor have very little effect. However, level's insertion
performance degrades linearly with scale factor, and this is well
demonstrated in the plot.
-The store is a bit clearer in Figure~\ref{fig:design-knn-tradeoff}. The
+The story is a bit clearer in Figure~\ref{fig:design-knn-tradeoff}. The
VPTree has a much greater construction time, both asymptotically and
in absolute terms, and the average query latency is also significantly
greater. These result in the configuration changes showing much more
@@ -759,7 +760,7 @@ trade-off space. The same general trends hold as in ISAM, just amplified.
Leveling has better query performance than tiering and sees increased
query performance and decreased insert performance as the scale factor
increases. Tiering has better insertion performance and worse query
-performance than leveling, and sees improved insert and worstening
+performance than leveling, and sees improved insert and worsening
query performance as the scale factor is increased. The Bentley-Saxe
method shows similar trends to leveling.