diff options
Diffstat (limited to 'chapters/design-space.tex')
| -rw-r--r-- | chapters/design-space.tex | 31 |
1 files changed, 16 insertions, 15 deletions
diff --git a/chapters/design-space.tex b/chapters/design-space.tex index cfea75e..f788a7a 100644 --- a/chapters/design-space.tex +++ b/chapters/design-space.tex @@ -53,7 +53,7 @@ of metric indices~\cite{naidan14,bkdtree}, a class of data structure and search problem for which we saw very good performance from standard Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. The other experiments in Chapter~\ref{chap:framework} show that, for other types of problem, -the technique does not fair quite so well. +the technique does not fare quite so well. \section{Asymptotic Analysis} \label{sec:design-asymp} @@ -61,7 +61,7 @@ the technique does not fair quite so well. Before beginning with derivations for the cost functions of dynamized structures within the context of our proposed design space, we should make a few comments about the assumptions -and techniques that we will us in our analysis. As this design space +and techniques that we will use in our analysis. As this design space involves adjusting constants, we will leave the design-space related constants within our asymptotic expressions. Additionally, we will perform the analysis for a simple decomposable search problem. Deletes @@ -89,9 +89,9 @@ decided to maintain the core concept of binary decomposition. One interesting mathematical property of a Bentley-Saxe dynamization is that the internal layout of levels exactly matches the binary representation of the record count contained within the index. For example, a dynamization containing -$n=20$ records will have 4 records in the third level, and 16 in the fourth, +$n=20$ records will have 4 records in the third level, and 16 in the fifth, with all other levels being empty. If we represent a full level with a 1 -and an empty level with a 0, then we'd have $1100$, which is $20$ in +and an empty level with a 0, then we'd have $10100$, which is $20$ in base 2. \begin{algorithm} @@ -129,7 +129,7 @@ $\mathscr{I}_{target} \gets \text{build}(\text{unbuild}(\mathscr{I}_0) \cup \ld Our generalization, then, is to represent the data as an $s$-ary decomposition, where the scale factor represents the base of the representation. To accomplish this, we set of capacity of level $i$ to -be $N_B (s - 1) \cdot s^i$, where $N_b$ is the size of the buffer. The +be $N_B (s - 1) \cdot s^i$, where $N_B$ is the size of the buffer. The resulting structure will have at most $\log_s n$ shards. The resulting policy is described in Algorithm~\ref{alg:design-bsm}. @@ -192,7 +192,7 @@ analysis. The worst-case cost of a reconstruction is $B(n)$, and there are $\log_s(n)$ total levels, so the total reconstruction costs associated with a record can be upper-bounded by, $B(n) \cdot \frac{W(\log_s(n))}{n}$, and then this cost amortized over the $n$ -insertions necessary to get the record into the last level. We'lll also +insertions necessary to get the record into the last level. We'll also condense the multiplicative constants and drop the additive ones to more clearly represent the relationship we're looking to show. This results in an amortized insertion cost of, @@ -306,12 +306,13 @@ $\mathscr{I}_0 \gets \text{build}(r)$ \; \end{algorithm} Our leveling layout policy is described in -Algorithm~\ref{alg:design-leveling}. Each level contains a single structure -with a capacity of $N_B\cdot s^{i+1}$ records. When a reconstruction occurs, -the first level $i$ that has enough space to have the records in the -level $i-1$ stored inside of it is selected as the target, and then a new -structure is built at level $i$ containing the records in it and level -$i-1$. Then, all levels $j < (i - 1)$ are shifted by one level to level +Algorithm~\ref{alg:design-leveling}. Each level contains a single +structure with a capacity of $N_B\cdot s^{i+1}$ records. When a +reconstruction occurs, the smallest level, $i$, with space to contain the +records from level $i-1$, in addition to the records currently within +it, is located. Then, a new structure is built at level $i$ containing +all of the records in levels $i$ and $i-1$, and the structure at level +$i-1$ is deleted. Finally, all levels $j < (i - 1)$ are shifted to level $j+1$. This process clears space in level $0$ to contain the buffer flush. \begin{theorem} @@ -634,7 +635,7 @@ to minimize its influence on the results (we've seen before in Sections~\ref{ssec:ds-exp} and \ref{ssec:dyn-ds-exp} that scale factor affects leveling and tiering in opposite ways) and isolate the influence of the layout policy alone to as great a degree as possible. We used a -buffer size of $N_b=12000$ for the ISAM tree structure, and $N_B=1000$ +buffer size of $N_B=12000$ for the ISAM tree structure, and $N_B=1000$ for the VPTree. We generated this distribution by inserting $30\%$ of the records from @@ -750,7 +751,7 @@ in scale factor have very little effect. However, level's insertion performance degrades linearly with scale factor, and this is well demonstrated in the plot. -The store is a bit clearer in Figure~\ref{fig:design-knn-tradeoff}. The +The story is a bit clearer in Figure~\ref{fig:design-knn-tradeoff}. The VPTree has a much greater construction time, both asymptotically and in absolute terms, and the average query latency is also significantly greater. These result in the configuration changes showing much more @@ -759,7 +760,7 @@ trade-off space. The same general trends hold as in ISAM, just amplified. Leveling has better query performance than tiering and sees increased query performance and decreased insert performance as the scale factor increases. Tiering has better insertion performance and worse query -performance than leveling, and sees improved insert and worstening +performance than leveling, and sees improved insert and worsening query performance as the scale factor is increased. The Bentley-Saxe method shows similar trends to leveling. |