diff options
Diffstat (limited to 'chapters/design-space.tex')
| -rw-r--r-- | chapters/design-space.tex | 34 |
1 files changed, 16 insertions, 18 deletions
diff --git a/chapters/design-space.tex b/chapters/design-space.tex index 22773e5..2aecede 100644 --- a/chapters/design-space.tex +++ b/chapters/design-space.tex @@ -54,7 +54,7 @@ Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. Our experiments in Chapter~\ref{chap:framework} show that, for other types of problem, the technique does not fare quite so well in its unmodified form. -\section{Asymptotic Analysis} +\section{Theoretical Performance Analysis} \label{sec:design-asymp} Before beginning with derivations for the cost functions of dynamized @@ -75,8 +75,7 @@ As a first step, we will derive a modified version of the Bentley-Saxe method that has been adjusted to support arbitrary scale factors. There's nothing fundamental to the technique that prevents such modifications, and its likely that they have not been analyzed like this before simply -out of a lack of interest in constant factors in theoretical asymptotic -analysis. +out of a lack of interest in constant factors in asymptotic analysis. When generalizing the Bentley-Saxe method for arbitrary scale factors, we decided to maintain the core concept of binary decomposition. One @@ -555,7 +554,7 @@ best-case query cost. \end{proof} \section{General Observations} -The asymptotic results from the previous section are summarized in +The theoretical results from the previous section are summarized in Table~\ref{tab:policy-comp}. When the scale factor is accounted for in the analysis, we can see that possible trade-offs begin to manifest within the space. We've seen some of these in action directly in @@ -625,7 +624,7 @@ the real-world performance implications of the configuration parameter space of our framework. -\subsection{Asymptotic Insertion Performance} +\subsection{Insertion Performance} We'll begin by validating our results for the insertion performance characteristics of the three layout policies. For this test, we @@ -638,6 +637,7 @@ the $200$ million record SOSD \texttt{OSM} dataset~\cite{sosd-datasets} for ISAM testing, and the one million record, $300$-dimensional Spanish Billion Words (\texttt{SBW}) dataset~\cite{sbw} for VPTree testing. +\Paragraph{Worst-case Insertion Performance.} For our first experiment, we will examine the latency distribution for inserts into our structures. We tested the three layout policies, using a common scale factor of $s=2$. This scale factor was selected @@ -705,6 +705,7 @@ due to cache effects most likely, but less so than in the MDSP case. \label{fig:design-ins-tput} \end{figure} +\Paragraph{Insertion Throughput.} Next, in Figure~\ref{fig:design-ins-tput}, we show the overall insertion throughput for the three policies for both ISAM tree and VPTree. This result should correlate with the amortized insertion costs for each @@ -778,8 +779,6 @@ performance across the board. Generally it seems to be a strictly worse alternative to leveling in all but its best-case query cost, and we will omit it from our tests moving forward. -\subsection{Buffer Size} - \begin{figure} \centering \subfloat[ISAM Tree Range Count]{\includegraphics[width=.5\textwidth]{img/design-space/isam-bs-sweep.pdf} \label{fig:buffer-isam-tradeoff}} @@ -788,13 +787,12 @@ omit it from our tests moving forward. \label{fig:buffer-size} \end{figure} -In the previous section, we considered the effect of various scale -factors on the trade-off between insertion and query performance. Our -framework also supports varying buffer sizes, and so we will examine this -next. Figure~\ref{fig:buffer-size} shows the same insertion throughput -vs. query latency curves for fixed layout policy and scale factor -configurations at varying buffer sizes, under the same experimental -conditions as the previous test. +We will next turn our attention to the effect that buffer size +has on the trade-off between insertion and query performance. +Figure~\ref{fig:buffer-size} shows the same insertion throughput vs. query +latency curves for fixed layout policy and scale factor configurations +at varying buffer sizes, under the same experimental conditions as the +previous test. Unlike with the scale factor, there is a significant difference in the behavior of the two tested structures under buffer size variation. For @@ -858,10 +856,10 @@ albeit slight, stratification amongst the tested policies, as shown in Figure~\ref{fig:design-isam-sel}. As the selectivity continues to rise above those shown in the chart, the relative ordering of the policies remains the same, but the relative differences between them begin to -shrink. This result makes sense given the asymptotics--there is still -\emph{some} overhead associated with the decomposition, but as the cost -of the query approaches linear, it makes up an increasingly irrelevant -portion of the run time. +shrink. This result makes sense given the theoretical analysis--there +is still \emph{some} overhead associated with the decomposition, but +as the cost of the query approaches linear, it makes up an increasingly +irrelevant portion of the run time. The $k$-NN results in Figure~\ref{fig:design-knn-sel} show a slightly different story. This is also not surprising, because $k$-NN is a |