summaryrefslogtreecommitdiffstats
path: root/chapters/design-space.tex
diff options
context:
space:
mode:
authorDouglas Rumbaugh <dbr4@psu.edu>2025-05-29 19:36:41 -0400
committerDouglas Rumbaugh <dbr4@psu.edu>2025-05-29 19:36:41 -0400
commit228be229a831ad082e8310a6d247f1153fb475b8 (patch)
tree8ff8ab4ce2363cfa5f11c01ca47485217bf23741 /chapters/design-space.tex
parent3474aa14fdaec66152ab999a1d3c4b0ec8315a3c (diff)
downloaddissertation-228be229a831ad082e8310a6d247f1153fb475b8.tar.gz
updates
Diffstat (limited to 'chapters/design-space.tex')
-rw-r--r--chapters/design-space.tex32
1 files changed, 16 insertions, 16 deletions
diff --git a/chapters/design-space.tex b/chapters/design-space.tex
index f639999..98c5bb2 100644
--- a/chapters/design-space.tex
+++ b/chapters/design-space.tex
@@ -4,11 +4,11 @@
\section{Introduction}
In the previous two chapters, we introduced an LSM tree inspired design
-space into the Bentley-Saxe method to allow for more flexilibity in
+space into the Bentley-Saxe method to allow for more flexibility in
tuning the performance. However, aside from some general comments
about how these parameters operator in relation to insertion and
query performance, and some limited experimental evaluation, we haven't
-performed a systematic analsyis of this space, its capabilities, and its
+performed a systematic analysis of this space, its capabilities, and its
limitations. We will rectify this situation in this chapter, performing
both a detailed mathematical analysis of the design parameter space,
as well as experiments to demonstrate these trade-offs exist in practice.
@@ -16,7 +16,7 @@ as well as experiments to demonstrate these trade-offs exist in practice.
\subsection{Why bother?}
Before diving into the design space we have introduced in detail, it's
-worth taking some time to motivate this entire endevour. There is a large
+worth taking some time to motivate this entire endeavor. There is a large
body of theoretical work in the area of data structure dynamization,
and, to the best of our knowledge, none of these papers have introduced
a design space of the sort that we have introduced here. Despite this,
@@ -55,7 +55,7 @@ Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. The other experiments
in Chapter~\ref{chap:framework} show that, for other types of problem,
the technique does not fair quite so well.
-\section{Asymptotic Analsyis}
+\section{Asymptotic Analysis}
\label{sec:design-asymp}
Before beginning with derivations for
@@ -172,7 +172,7 @@ writes from the previous level into level $i$, as well as rewriting all
of the records currently on level $i$.
The net result of this is that the number of writes on level $i$ is given
-by the following recurrance relation (combined with the $W(0)$ base case),
+by the following recurrence relation (combined with the $W(0)$ base case),
\begin{equation*}
W(i) = sW(i-1) + \frac{1}{2}\left(s-1\right)^2 \cdot s^i
@@ -297,13 +297,13 @@ one fewer times, and so on. Thus, the total number of writes is,
\begin{equation*}
B\sum_{i=0}^{s-1} (s - i) = B\left(s^2 + \sum_{i=0}^{i-1} i\right) = B\left(s^2 + \frac{(s-1)s}{2}\right)
\end{equation*}
-which can be simplied to get,
+which can be simplified to get,
\begin{equation*}
\frac{1}{2}s(s+1)\cdot B
\end{equation*}
-writes occuring on each level.\footnote{
- This write count is not cummulative over the entire structure. It only
- accounts for the number of writes occuring on this specific level.
+writes occurring on each level.\footnote{
+ This write count is not cumulative over the entire structure. It only
+ accounts for the number of writes occurring on this specific level.
}
To obtain the total number of times records are rewritten, we need to
@@ -517,7 +517,7 @@ characteristics of the three layout policies. For this test, we
consider two data structures: the ISAM tree and the VP tree. The ISAM
tree structure is merge-decomposable using a sorted-array merge, with
a build cost of $B_M(n) \in \Theta(n \log k)$, where $k$ is the number
-of structures being merged. The VPTree, by constrast, is \emph{not}
+of structures being merged. The VPTree, by contrast, is \emph{not}
merge decomposable, and is built in $B(n) \in \Theta(n \log n)$ time. We
use the $200,000,000$ record SOSD \texttt{OSM} dataset~\cite{sosd} for
ISAM testing, and the $1,000,000$ record, $300$-dimensional Spanish
@@ -551,9 +551,9 @@ this is not a significant limitation for our analysis.
The resulting distributions are shown in
Figure~\ref{design-policy-ins-latency}. These distributions are
representing using a "reversed" CDF with log scaling on both axes. This
-representation has proven very useful for interpretting the latency
+representation has proven very useful for interpreting the latency
distributions that we see in evaluating dynamization, but are slightly
-unusual, and so we've included a guide to interpretting these charts
+unusual, and so we've included a guide to interpreting these charts
in Appendix\ref{append:rcdf}.
The first notable point is that, for both the ISAM tree
@@ -592,7 +592,7 @@ due to cache effects most likely, but less so than in the MDSP case.
Next, in Figure~\ref{fig:design-ins-tput}, we show the overall
insertion throughput for the three policies. This result should
-correlate with the amorized insertion costs for each policy derived in
+correlate with the amortized insertion costs for each policy derived in
Section~\ref{sec:design-asym}. As expected, tiering has the highest
throughput.
@@ -618,7 +618,7 @@ techniques is that, asymptotically, the additional cost added by
decomposing the data structure vanished for sufficiently expensive
queries. Bentley and Saxe proved that for query costs of the form
$\mathscr{Q}_B(n) \in \Omega(n^\epsilon)$ for $\epsilon > 0$, the
-overal query cost is unaffected (asymptotically) by the decomposition.
+overall query cost is unaffected (asymptotically) by the decomposition.
This would seem to suggest that, as the cost of the query over a single
shard increases, the effectiveness of our design space for tuning query
performance should reduce. This is because our tuning space consists
@@ -643,7 +643,7 @@ constant factors only. In general asymptotic analysis, all possible
configurations of our framework in this scheme collapse to the same basic
cost functions when the constants are removed. While we have demonstrated
that, in practice, the effects of this configuration are measurable, there
-do exist techniques in the classical literature that provide asympotically
+do exist techniques in the classical literature that provide asymptotically
relevant trade-offs, such as the equal block method~\cite{maurer80} and
the mixed method~\cite[pp. 117-118]{overmars83}. These techniques have
cost functions that are derived from arbitrary, positive, monotonically
@@ -702,7 +702,7 @@ capacity of each level is provided by Equation~\ref{eqn:design-k-expr} is
\end{theorem}
\begin{proof}
The number of levels within the structure is given by $\log_s (n)$,
-where $s$ is the scale factor. The addition of $k$ to the parameterization
+where $s$ is the scale factor. The addition of $k$ to the parametrization
replaces this scale factor with $s \log^k n$, and so we have
\begin{equation*}
\log_{s \log^k n}n = \frac{\log n}{\log\left(s \log^k n\right)} = \frac{\log n}{\log s + \log\left(k \log n\right)} \in O\left(\frac{\log n}{\log (k \log n)}\right)