updates

author: Douglas Rumbaugh <dbr4@psu.edu> 2025-05-29 19:36:41 -0400
committer: Douglas Rumbaugh <dbr4@psu.edu> 2025-05-29 19:36:41 -0400
commit: 228be229a831ad082e8310a6d247f1153fb475b8 (patch)
tree: 8ff8ab4ce2363cfa5f11c01ca47485217bf23741 /chapters/design-space.tex
parent: 3474aa14fdaec66152ab999a1d3c4b0ec8315a3c (diff)
download: dissertation-228be229a831ad082e8310a6d247f1153fb475b8.tar.gz
1 files changed, 16 insertions, 16 deletions
diff --git a/chapters/design-space.tex b/chapters/design-space.tex
index f639999..98c5bb2 100644
--- a/chapters/design-space.tex
+++ b/chapters/design-space.tex
@@ -4,11 +4,11 @@
 \section{Introduction}
 
 In the previous two chapters, we introduced an LSM tree inspired design
-space into the Bentley-Saxe method to allow for more flexilibity in
+space into the Bentley-Saxe method to allow for more flexibility in
 tuning the performance. However, aside from some general comments
 about how these parameters operator in relation to insertion and
 query performance, and some limited experimental evaluation, we haven't
-performed a systematic analsyis of this space, its capabilities, and its
+performed a systematic analysis of this space, its capabilities, and its
 limitations. We will rectify this situation in this chapter, performing
 both a detailed mathematical analysis of the design parameter space,
 as well as experiments to demonstrate these trade-offs exist in practice.
@@ -16,7 +16,7 @@ as well as experiments to demonstrate these trade-offs exist in practice.
 \subsection{Why bother?}
 
 Before diving into the design space we have introduced in detail, it's
-worth taking some time to motivate this entire endevour. There is a large
+worth taking some time to motivate this entire endeavor. There is a large
 body of theoretical work in the area of data structure dynamization,
 and, to the best of our knowledge, none of these papers have introduced
 a design space of the sort that we have introduced here. Despite this,
@@ -55,7 +55,7 @@ Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. The other experiments
 in Chapter~\ref{chap:framework} show that, for other types of problem,
 the technique does not fair quite so well.
 
-\section{Asymptotic Analsyis}
+\section{Asymptotic Analysis}
 \label{sec:design-asymp}
 
 Before beginning with derivations for
@@ -172,7 +172,7 @@ writes from the previous level into level $i$, as well as rewriting all
 of the records currently on level $i$.
 
 The net result of this is that the number of writes on level $i$ is given
-by the following recurrance relation (combined with the $W(0)$ base case),
+by the following recurrence relation (combined with the $W(0)$ base case),
 
 \begin{equation*}
 W(i) = sW(i-1) + \frac{1}{2}\left(s-1\right)^2 \cdot s^i
@@ -297,13 +297,13 @@ one fewer times, and so on. Thus, the total number of writes is,
 \begin{equation*}
 B\sum_{i=0}^{s-1} (s - i) = B\left(s^2 + \sum_{i=0}^{i-1} i\right) = B\left(s^2 + \frac{(s-1)s}{2}\right)
 \end{equation*}
-which can be simplied to get,
+which can be simplified to get,
 \begin{equation*}
 \frac{1}{2}s(s+1)\cdot B
 \end{equation*}
-writes occuring on each level.\footnote{
-	This write count is not cummulative over the entire structure. It only
-	accounts for the number of writes occuring on this specific level.
+writes occurring on each level.\footnote{
+	This write count is not cumulative over the entire structure. It only
+	accounts for the number of writes occurring on this specific level.
 }
 
 To obtain the total number of times records are rewritten, we need to
@@ -517,7 +517,7 @@ characteristics of the three layout policies. For this test, we
 consider two data structures: the ISAM tree and the VP tree. The ISAM
 tree structure is merge-decomposable using a sorted-array merge, with
 a build cost of $B_M(n) \in \Theta(n \log k)$, where $k$ is the number
-of structures being merged. The VPTree, by constrast, is \emph{not}
+of structures being merged. The VPTree, by contrast, is \emph{not}
 merge decomposable, and is built in $B(n) \in \Theta(n \log n)$ time. We
 use the $200,000,000$ record SOSD \texttt{OSM} dataset~\cite{sosd} for
 ISAM testing, and the $1,000,000$ record, $300$-dimensional Spanish
@@ -551,9 +551,9 @@ this is not a significant limitation for our analysis.
 The resulting distributions are shown in
 Figure~\ref{design-policy-ins-latency}. These distributions are
 representing using a "reversed" CDF with log scaling on both axes. This
-representation has proven very useful for interpretting the latency
+representation has proven very useful for interpreting the latency
 distributions that we see in evaluating dynamization, but are slightly
-unusual, and so we've included a guide to interpretting these charts
+unusual, and so we've included a guide to interpreting these charts
 in Appendix\ref{append:rcdf}.
 
 The first notable point is that, for both the ISAM tree
@@ -592,7 +592,7 @@ due to cache effects most likely, but less so than in the MDSP case.
 
 Next, in Figure~\ref{fig:design-ins-tput}, we show the overall
 insertion throughput for the three policies. This result should
-correlate with the amorized insertion costs for each policy derived in
+correlate with the amortized insertion costs for each policy derived in
 Section~\ref{sec:design-asym}. As expected, tiering has the highest
 throughput.
 
@@ -618,7 +618,7 @@ techniques is that, asymptotically, the additional cost added by
 decomposing the data structure vanished for sufficiently expensive
 queries. Bentley and Saxe proved that for query costs of the form
 $\mathscr{Q}_B(n) \in \Omega(n^\epsilon)$ for $\epsilon > 0$, the
-overal query cost is unaffected (asymptotically) by the decomposition.
+overall query cost is unaffected (asymptotically) by the decomposition.
 This would seem to suggest that, as the cost of the query over a single
 shard increases, the effectiveness of our design space for tuning query
 performance should reduce. This is because our tuning space consists
@@ -643,7 +643,7 @@ constant factors only. In general asymptotic analysis, all possible
 configurations of our framework in this scheme collapse to the same basic
 cost functions when the constants are removed. While we have demonstrated
 that, in practice, the effects of this configuration are measurable, there
-do exist techniques in the classical literature that provide asympotically
+do exist techniques in the classical literature that provide asymptotically
 relevant trade-offs, such as the equal block method~\cite{maurer80} and
 the mixed method~\cite[pp. 117-118]{overmars83}.  These techniques have
 cost functions that are derived from arbitrary, positive, monotonically
@@ -702,7 +702,7 @@ capacity of each level is provided by Equation~\ref{eqn:design-k-expr} is
 \end{theorem}
 \begin{proof}
 The number of levels within the structure is given by $\log_s (n)$,
-where $s$ is the scale factor. The addition of $k$ to the parameterization
+where $s$ is the scale factor. The addition of $k$ to the parametrization
 replaces this scale factor with $s \log^k n$, and so we have
 \begin{equation*}
 \log_{s \log^k n}n = \frac{\log n}{\log\left(s \log^k n\right)} = \frac{\log n}{\log s + \log\left(k \log n\right)} \in O\left(\frac{\log n}{\log (k \log n)}\right)
author	Douglas Rumbaugh <dbr4@psu.edu>	2025-05-29 19:36:41 -0400
committer	Douglas Rumbaugh <dbr4@psu.edu>	2025-05-29 19:36:41 -0400
commit	228be229a831ad082e8310a6d247f1153fb475b8 (patch)
tree	8ff8ab4ce2363cfa5f11c01ca47485217bf23741 /chapters/design-space.tex
parent	3474aa14fdaec66152ab999a1d3c4b0ec8315a3c (diff)
download	dissertation-228be229a831ad082e8310a6d247f1153fb475b8.tar.gz