1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
|
\chapter{Exploring the Design Space}
\label{chap:design-space}
\section{Introduction}
In the previous two chapters, we introduced an LSM tree inspired design
space into the Bentley-Saxe method to allow for more flexibility in
tuning the performance. However, aside from some general comments
about how these parameters operator in relation to insertion and
query performance, and some limited experimental evaluation, we haven't
performed a systematic analysis of this space, its capabilities, and its
limitations. We will rectify this situation in this chapter, performing
both a detailed mathematical analysis of the design parameter space,
as well as experiments to demonstrate these trade-offs exist in practice.
\subsection{Why bother?}
Before diving into the design space we have introduced in detail, it's
worth taking some time to motivate this entire endeavor. There is a large
body of theoretical work in the area of data structure dynamization,
and, to the best of our knowledge, none of these papers have introduced
a design space of the sort that we have introduced here. Despite this,
some papers which \emph{use} these techniques have introduced similar
design elements into their own implementations~\cite{pgm}, with some
even going so far as to (inaccurately) describe these elements as part
of the Bentley-Saxe method~\cite{almodaresi23}.
This situation is best understood, we think, in terms of the ultimate
goals of the respective lines of work. In the classical literature on
dynamization, the focus is mostly on proving theoretical asymptotic
bounds about the techniques. In this context, the LSM tree design space
is of limited utility, because its tuning parameters adjust constant
factors only, and thus don't play a major role in asymptotics. Where
the theoretical literature does introduce configurability, such as
with the equal blocks method~\cite{overmars-art-of-dyn} or more
complex schemes that nest the equal block method \emph{inside}
of a binary decomposition~\cite{overmars81}, the intention is
to produce asymptotically relevant trade-offs between insert,
query, and delete performance for deletion decomposable search
problems~\cite[pg. 117]{overmars83}. This is why the equal block method
is described in terms of a function, rather than a constant value,
to enable it to appear in the asymptotics.
On the other hand, in practical scenarios, constant tuning of performance
can be very relevant. We've already shown in Sections~\ref{ssec:ds-exp}
and \ref{ssec:dyn-ds-exp} how tuning parameters, particularly the
number of shards per level, can have measurable real-world effects on the
performance characteristics of dynamized structures, and in fact sometimes
this tuning is \emph{necessary} to enable reasonable performance. It's
quite telling that the two most direct implementations of the Bentley-Saxe
method that we have identified in the literature are both in the context
of metric indices~\cite{naidan14,bkdtree}, a class of data structure
and search problem for which we saw very good performance from standard
Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. The other experiments
in Chapter~\ref{chap:framework} show that, for other types of problem,
the technique does not fair quite so well.
\section{Asymptotic Analysis}
\label{sec:design-asymp}
Before beginning with derivations for
the cost functions of dynamized structures within the context of our
proposed design space, we should make a few comments about the assumptions
and techniques that we will us in our analysis. As this design space
involves adjusting constants, we will leave the design-space related
constants within our asymptotic expressions. Additionally, we will
perform the analysis for a simple decomposable search problem. Deletes
will be entirely neglected, and we won't make any assumptions about
mergability. We will also neglect the buffer size, $N_B$, during this
analysis. Buffering isn't fundamental to the techniques we are examining
in this chapter, and including it would increase the complexity of the
analysis without contributing any useful insights.\footnote{
The contribution of the buffer size is simply to replace each of the
individual records considered in the analysis with batches of $N_B$
records. The same patterns hold.
}
\subsection{Generalized Bentley Saxe Method}
As a first step, we will derive a modified version of the Bentley-Saxe
method that has been adjusted to support arbitrary scale factors, and
buffering. There's nothing fundamental to the technique that prevents
such modifications, and its likely that they have not been analyzed
like this before simply out of a lack of interest in constant factors in
theoretical asymptotic analysis. During our analysis, we'll intentionally
leave these constant factors in place.
When generalizing the Bentley-Saxe method for arbitrary scale factors, we
decided to maintain the core concept of binary decomposition. One interesting
mathematical property of a Bentley-Saxe dynamization is that the internal
layout of levels exactly matches the binary representation of the record
count contained within the index. For example, a dynamization containing
$n=20$ records will have 4 records in the third level, and 16 in the fourth,
with all other levels being empty. If we represent a full level with a 1
and an empty level with a 0, then we'd have $1100$, which is $20$ in
base 2.
\begin{algorithm}
\caption{The Generalized BSM Layout Policy}
\label{alg:design-bsm}
\KwIn{$r$: set of records to be inserted, $\mathscr{I}$: a dynamized structure, $n$: number of records in $\mathscr{I}$}
\BlankLine
\Comment{Find the first non-full level}
$target \gets -1$ \;
\For{$i=0\ldots \log_s n$} {
\If {$|\mathscr{I}_i| < N_B (s - 1)\cdot s^i$} {
$target \gets i$ \;
break \;
}
}
\BlankLine
\Comment{If the structure is full, we need to grow it}
\If {$target = -1$} {
$target \gets 1 + (\log_s n)$ \;
}
\BlankLine
\Comment{Build the new structure}
$\mathscr{I}_{target} \gets \text{build}(\text{unbuild}(\mathscr{I}_0) \cup \ldots \text{unbuild}(\mathscr{I}_{target}) \cup r)$ \;
\BlankLine
\Comment{Empty the levels used to build the new shard}
\For{$i=0\ldots target-1$} {
$\mathscr{I}_i \gets \emptyset$ \;
}
\end{algorithm}
Our generalization, then, is to represent the data as an $s$-ary
decomposition, where the scale factor represents the base of the
representation. To accomplish this, we set of capacity of level $i$ to
be $N_B (s - 1) \cdot s^i$, where $N_b$ is the size of the buffer. The
resulting structure will have at most $\log_s n$ shards. The resulting
policy is described in Algorithm~\ref{alg:design-bsm}.
Unfortunately, the approach used by Bentley and Saxe to calculate the
amortized insertion cost of the BSM does not generalize to larger bases,
and so we will need to derive this result using a different approach.
\begin{theorem}
The amortized insertion cost for generalized BSM with a growth factor of
$s$ is $\Theta\left(\frac{B(n)}{n} \cdot \frac{1}{2}(s-1) \cdot ( (s-1)\log_s n + s)\right)$.
\end{theorem}
\begin{proof}
In order to calculate the amortized insertion cost, we will first
determine the average number of times that a record is involved in a
reconstruction, and then amortize those reconstructions over the records
in the structure.
If we consider only the first level of the structure, it's clear that
the reconstruction count associated with each record in that structure
will follow the pattern, $1, 2, 3, 4, ..., s-1$ when the level is full.
Thus, the total number of reconstructions associated with records on level
$i=0$ is the sum of that sequence, or
\begin{equation*}
W(0) = \sum_{j=1}^{s-1} j = \frac{1}{2}\left(s^2 - s\right)
\end{equation*}
Considering the next level, $i=1$, each reconstruction involving this
level will copy down the entirety of the structure above it, adding
one more write per record, as well as one extra write for the new record.
More specifically, in the above example, the first "batch" of records in
level $i=1$ will have the following write counts: $1, 2, 3, 4, 5, ..., s$,
the second "batch" of records will increment all of the existing write
counts by one, and then introduce another copy of $1, 2, 3, 4, 5, ..., s$
writes, and so on.
Thus, each new "batch" written to level $i$ will introduce $W(i-1) + 1$
writes from the previous level into level $i$, as well as rewriting all
of the records currently on level $i$.
The net result of this is that the number of writes on level $i$ is given
by the following recurrence relation (combined with the $W(0)$ base case),
\begin{equation*}
W(i) = sW(i-1) + \frac{1}{2}\left(s-1\right)^2 \cdot s^i
\end{equation*}
which can be solved to give the following closed-form expression,
\begin{equation*}
W(i) = s^i \cdot \left(\frac{1}{2} (s-1) \cdot (s(i+1) - i)\right)
\end{equation*}
which provides the total number of reconstructions that records in
level $i$ of the structure have participated in. As each record
is involved in a different number of reconstructions, we'll consider the
average number by dividing $W(i)$ by the number of records in level $i$.
From here, the proof proceeds in the standard way for this sort of
analysis. The worst-case cost of a reconstruction is $B(n)$, and there
are $\log_s(n)$ total levels, so the total reconstruction costs associated
with a record can be upper-bounded by, $B(n) \cdot
\frac{W(\log_s(n))}{n}$, and then this cost amortized over the $n$
insertions necessary to get the record into the last level, resulting
in an amortized insertion cost of,
\begin{equation*}
\frac{B(n)}{n} \cdot \frac{1}{2}(s-1) \cdot ( (s-1)\log_s n + s)
\end{equation*}
Note that, in the case of $s=2$, this expression reduces to the same amortized
insertion cost as was derived using Binomial Theorem in the original BSM
paper~\cite{saxe79}.
\end{proof}
\begin{theorem}
The worst-case insertion cost for generalized BSM with a scale factor
of $s$ is $\Theta(B(n))$.
\end{theorem}
\begin{proof}
The Bentley-Saxe method finds the smallest non-full block and performs
a reconstruction including all of the records from that block, as well
as all blocks smaller than it, and the new records to be added. The
worst case, then, will occur when all of the existing blocks in the
structure are full, and a new, larger, block must be added.
In this case, the reconstruction will involve every record currently
in the dynamized structure, and will thus have a cost of $I(n) \in
\Theta(B(n))$.
\end{proof}
\begin{theorem}
The worst-case query cost for generalized BSM for a decomposable
search problem with cost $\mathscr{Q}_S(n)$ is $O(\log_s(n) \cdot
\mathscr{Q}_s(n))$.
\end{theorem}
\begin{proof}
The worst-case scenario for queries in BSM occurs when every existing
level is full. In this case, there will be $\log_s n$ levels that must
be queried, with the $i$th level containing $(s - 1) \cdot s^i$ records.
Thus, the total cost of querying the structure will be,
\begin{equation}
\mathscr{Q}(n) = \sum_{i=0}^{\log_s n} \mathscr{Q}_S\left((s - 1) \cdot s^i\right)
\end{equation}
The number of records per shard will be upper bounded by $O(n)$, so
\begin{equation}
\mathscr{Q}(n) \in O\left(\sum_{i=0}^{\log_s n} \mathscr{Q}_S(n)\right)
\in O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)
\end{equation}
\end{proof}
\begin{theorem}
The best-case query cost for generalized BSM for a decomposable
search problem with a cost of $\mathscr{Q}_S(n)$ is $\mathscr{Q}(n)
\in \Theta(\mathscr{Q}_S(n))$.
\end{theorem}
\begin{proof}
The best case scenario for queries in BSM occurs when a new level is
added, which results in every record in the structure being compacted
into a single structure. In this case, there is only a single data
structure in the dynamization, and so the query cost over the dynamized
structure is identical to the query cost of a single static instance of
the structure. Thus, the best case query cost in BSM is,
\begin{equation*}
\mathscr{Q}_B(n) \in \Theta \left( 1 \cdot \mathscr{Q}_S(n) \right) \in \Theta\left(\mathscr{Q}_S(n)\right)
\end{equation*}
\end{proof}
\subsection{Leveling}
Our leveling layout policy is described in
Algorithm~\ref{alg:design-level}. Each level contains a single structure
with a capacity of $N_B\cdot s^i$ records. When a reconstruction occurs,
the first level $i$ that has enough space to have the records in the
level $i-1$ stored inside of it is selected as the target, and then a new
structure is built at level $i$ containing the records in it and level
$i-1$. Then, all levels $j < (i - 1)$ are shifted by one level to level
$j+1$. This process clears space in level $0$ to contain the buffer flush.
\begin{theorem}
The amortized insertion cost of leveling with a scale factor of $s$ is
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \frac{1}{2}(s+1)\log_s n\right)
\end{equation*}
\end{theorem}
\begin{proof}
Similarly to generalized BSM, the records in each level will be rewritten
up to $s$ times before they move down to the next level. Thus, the
amortized insertion cost for leveling can be found by determining how
many times a record is expected to be rewritten on a single level, and
how many levels there are in the structure.
On any given level, the total number of writes require to fill the level
is given by the expression,
\begin{equation*}
B(s + (s - 1) + (s - 2) + \ldots + 1)
\end{equation*}
where $B$ is the number of records added to the level during each
reconstruction (i.e., $N_B$ for level $0$ and $N_B\cdots^{i-1}$ for any
other level).
This is because the first batch of records entering the level will be
rewritten each of the $s$ times that the level is rebuilt before the
records are merged into the level below. The next batch will be rewritten
one fewer times, and so on. Thus, the total number of writes is,
\begin{equation*}
B\sum_{i=0}^{s-1} (s - i) = B\left(s^2 + \sum_{i=0}^{i-1} i\right) = B\left(s^2 + \frac{(s-1)s}{2}\right)
\end{equation*}
which can be simplified to get,
\begin{equation*}
\frac{1}{2}s(s+1)\cdot B
\end{equation*}
writes occurring on each level.\footnote{
This write count is not cumulative over the entire structure. It only
accounts for the number of writes occurring on this specific level.
}
To obtain the total number of times records are rewritten, we need to
calculate the average number of times a record is rewritten per level,
and sum this over all of the levels.
\begin{equation*}
\sum_{i=0}^{\log_s n} \frac{\frac{1}{2}B_i s (s+1)}{s B_i} = \frac{1}{2} \sum_{i=0}^{\log_s n} (s + 1) = \frac{1}{2} (s+1) \log_s n
\end{equation*}
To calculate the amortized insertion cost, we multiply this write amplification
number of the cost of rebuilding the structures, and divide by the total number
of records,
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n}\cdot \frac{1}{2} (s+1) \log_s n\right)
\end{equation*}
\end{proof}
\begin{theorem}
The worst-case insertion cost for leveling with a scale factor of $s$ is
\begin{equation*}
\Theta\left(\frac{s-1}{s} \cdot B(n)\right)
\end{equation*}
\end{theorem}
\begin{proof}
Unlike in BSM, where the worst case reconstruction involves all of the
records within the structure, in leveling it only includes the records
in the last two levels. In particular, the worst case behavior occurs
when the last level is one reconstruction away from its capacity, and the
level above it is full. In this case, the reconstruction will involve,
\begin{equation*}
\left(s^{\log_s n} - s^{\log_s n - 1}\right) + s^{\log_s n - 1}
\end{equation*}
records, where the first parenthesized term represents the records in
the last level, and the second the records in the level above it.
\end{proof}
\begin{theorem}
The worst-case query cost for leveling for a decomposable search
problem with cost $\mathscr{Q}_S(n)$ is
\begin{equation*}
O\left(\mathscr{Q}_S(n) \cdot \log_s n \right)
\end{equation*}
\end{theorem}
\begin{proof}
The worst-case scenario for leveling is right before the structure gains
a new level, at which point there will be $\log_s n$ data structures
each with $O(n)$ records. Thus the worst-case cost will be the cost
of querying each of these structures,
\begin{equation*}
O\left(\mathscr{Q}_S(n) \cdot \log_s n \right)
\end{equation*}
\end{proof}
\begin{theorem}
The best-case query cost for leveling for a decomposable search
problem with cost $\mathscr{Q}_S(n)$ is
\begin{equation*}
\mathscr{Q}_B(n) \in O(\mathscr{Q}_S(n) \cdot \log_s n)
\end{equation*}
\end{theorem}
\begin{proof}
Unlike BSM, leveling will never have empty levels. The policy ensures
that there is always a data structure on every level. As a result, the
best-case query still must query $\log_s n$ structures, and so has a
best-case cost of,
\begin{equation*}
\mathscr{Q}_B(n) \in O\left(\mathscr{Q}_S(n) \cdot \log_s n\right)
\end{equation*}
\end{proof}
\subsection{Tiering}
\begin{theorem}
The amortized insertion cost of tiering with a scale factor of $s$ is,
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \log_s n \right)
\end{equation*}
\end{theorem}
\begin{proof}
For tiering, each record is written \emph{exactly} one time per
level. As a result, each record will be involved in exactly $\log_s n$
reconstructions over the lifetime of the structure. Each reconstruction
will have cost $B(n)$, and thus the amortized insertion cost must be,
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \log_s n\right)
\end{equation*}
\end{proof}
\begin{theorem}
The worst-case insertion cost of tiering with a scale factor of $s$ is,
\begin{equation*}
I(n) \in \Theta\left(B(n)\right)
\end{equation*}
\end{theorem}
\begin{proof}
The worst-case reconstruction in tiering involves performing a
reconstruction on each level. Of these, the largest level will
contain $\Theta(n)$ records, and thus dominates the cost of the
reconstruction. More formally, the total cost of this reconstruction
will be,
\begin{equation*}
I(n) = \sum_{i=0}{\log_s n} B(s^i) = B(1) + B(s) + B(s^2) + \ldots B(s^{\log_s n})
\end{equation*}
Of these, the final term $B(s^{\log_s n}) = B(n)$ dominates the others,
resulting in an asymptotic worst-case cost of,
\begin{equation*}
I(n) \in \Theta\left(B(n)\right)
\end{equation*}
\end{proof}
\begin{theorem}
The worst-case query cost for tiering for a decomposable search
problem with cost $\mathscr{Q}_S(n)$ is
\begin{equation*}
\mathscr{Q}(n) \in O( \mathscr{Q}_S(n) \cdot s \log_s n)
\end{equation*}
\end{theorem}
\begin{proof}
As with the previous two policies, the worst-case query occurs when the
structure is completely full. In case of tiering, that means that there
will be $\log_s n$ levels, each containing $s$ shards with a size bounded
by $O(n)$. Thus, there will be $s \log_s n$ structures to query, and the
query cost must be,
\begin{equation*}
\mathscr{Q}(n) \in O \left(\mathscr{Q}_S(n) \cdot s \log_s n \right)
\end{equation*}
\end{proof}
\begin{theorem}
The best-case query cost for tiering for a decomposable search problem
with cost $\mathscr{Q}_S(n)$ is $O(\log_s n)$.
\end{theorem}
\begin{proof}
The tiering policy ensures that there are no internal empty levels, and
as a result the best case scenario for tiering occurs when each level is
populated by exactly $1$ shard. In this case, there will only be $\log_s n$
shards to query, resulting in,
\begin{equation*}
\mathscr{Q}_B(n) \in O\left(\mathscr{Q}_S(n) \cdot \log_S n \right)
\end{equation*}
best-case query cost.
\end{proof}
\section{General Observations}
The asymptotic results from the previous section are summarized in
Table~\ref{tab:policy-comp}. When the scale factor is accounted for
in the analysis, we can see that possible trade-offs begin to manifest
within the space. We've seen some of these in action directly in
the experimental sections of previous chapters.
\begin{table*}
\centering
\small
\renewcommand{\arraystretch}{1.6}
\begin{tabular}{|l l l l|}
\hline
& \textbf{Gen. BSM} & \textbf{Leveling} & \textbf{Tiering} \\ \hline
$\mathscr{Q}(n)$ &$O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)$ & $O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)$ & $O\left(s \log_s n \cdot \mathscr{Q}_S(n)\right)$\\ \hline
$\mathscr{Q}_B(n)$ & $\Theta(\mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ \\ \hline
$I(n)$ & $\Theta(B(n))$ & $\Theta(\frac{s - 1}{s}\cdot B(n))$ & $\Theta(B(n))$\\ \hline
$I_A(n)$ & $\Theta\left(\frac{B(n)}{n} \frac{1}{2}(s-1)\cdot((s-1)\log_s n +s)\right)$ & $\Theta\left(\frac{B(n)}{n} \frac{1}{2}(s-1)\log_s n\right)$& $\Theta\left(\frac{B(n)}{n} \log_s n\right)$ \\ \hline
\end{tabular}
\caption{Comparison of cost functions for various layout policies for DSPs}
\label{tab:policy-comp}
\end{table*}
% \begin{table*}[!t]
% \centering
% \begin{tabular}{|l l l l l|}
% \hline
% \textbf{Policy} & \textbf{Worst-case Query Cost} & \textbf{Worst-case Insert Cost} & \textbf{Best-cast Insert Cost} & \textbf{Amortized Insert Cost} \\ \hline
% Gen. Bentley-Saxe &$\Theta\left(\log_s(n) \cdot Q(n)\right)$ &$\Theta\left(B(n)\right)$ &$\Theta\left(1\right)$ &$\Theta\left(\frac{B(n)}{n} \cdot \frac{1}{2}(s-1) \cdot ( (s-1)\log_s n + s)\right)$ \\
% Leveling &$\Theta\left(\log_s(n) \cdot Q(n)\right)$ &$\Theta\left(B(\frac{n}{s})\right)$ &$\Theta\left(1\right)$ &$\Theta\left(\frac{B(n)}{n} \cdot \frac{1}{2} \log_s(n)(s + 1)\right)$ \\
% Tiering &$\Theta\left(s\log_s(n) \cdot Q(n)\right)$ &$\Theta\left(B(n)\right)$ &$\Theta\left(1\right)$ &$\Theta\left(\frac{B(n)}{n} \cdot \log_s(n)\right)$ \\\hline
% \end{tabular}
% \caption{Comparison of cost functions for various reconstruction policies for DSPs}
% \label{tab:policy-comp-old}
% \end{table*}
% \begin{table*}[!t]
% \centering
% \begin{tabular}{|l l l l|}
% %stuff &\textbf{Gen. BSM} & \textbf{Leveling} & \textbf{Tiering} \\
% % \textbf{Worst-case Query} &$O\left(\log_s(n)\cdot \mathscr{Q}_S(n)\right)$ & $O\left(\log_s(n) \mathscr{Q}_S(n)\right)$ & $O\left(s \log_s(n) \mathscr{Q}_S(n)\right)$\\ \hline
% % \textbf{Best-case Query} & & & \\ \hline
% % \textbf{Worst-case Insert} & & & \\ \hline
% % \textbf{Amortized Insert} & & & \\ \hline
% \caption{Comparison of cost functions for various reconstruction policies for DSPs}
% \label{tab:policy-comp}
% \end{table*}
\section{Experimental Evaluation}
In the previous sections, we mathematically proved various claims about
the performance characteristics of our three layout policies to assess
the trade-offs that exist within the design space. While this analysis is
useful, the effects we are examining are at the level of constant factors,
and so it would be useful to perform experimental testing to validate
that these claimed performance characteristics manifest in practice. In
this section, we will do just that, running various benchmarks to explore
the real-world performance implications of the configuration parameter
space of our framework.
\subsection{Asymptotic Insertion Performance}
We'll begin by validating our results for the insertion performance
characteristics of the three layout policies. For this test, we
consider two data structures: the ISAM tree and the VP tree. The ISAM
tree structure is merge-decomposable using a sorted-array merge, with
a build cost of $B_M(n) \in \Theta(n \log k)$, where $k$ is the number
of structures being merged. The VPTree, by contrast, is \emph{not}
merge decomposable, and is built in $B(n) \in \Theta(n \log n)$ time. We
use the $200,000,000$ record SOSD \texttt{OSM} dataset~\cite{sosd} for
ISAM testing, and the $1,000,000$ record, $300$-dimensional Spanish
Billion Words (\texttt{SBW}) dataset~\cite{sbw} for VPTree testing.
For our first experiment, we will examine the latency distribution for
inserts into our structures. We tested the three layout policies, using a
common scale factor of $s=2$. This scale factor was selected to minimize
its influence on the results (we've seen before in Sections~\ref{}
and \ref{} that scale factor affects leveling and tiering in opposite
ways) and isolate the influence of the layout policy alone to as great
a degree as possible. We used a buffer size of $N_b=12000$ for the ISAM
tree structure, and $N_B=1000$ for the VPTree.
We generated this distribution by inserting $30\%$ of the records from
the set to ``warm up'' the dynamized structure, and then measuring the
insertion latency for each individual insert for the remaining $70\%$
of the data. Note that, due to timer resolution issues at nanosecond
scales, the specific latency values associated with the faster end of
the insertion distribution are not precise. However, it is our intention
to examine the latency distribution, not the values themselves, and so
this is not a significant limitation for our analysis.
\begin{figure}
\subfloat[ISAM Tree Insertion Latencies]{\includegraphics[width=.5\textwidth]{img/design-space/isam-insert-dist.pdf} \label{fig:design-isam-ins-dist}}
\subfloat[VPTree Insertion Latencies]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-insert-dist.pdf} \label{fig:design-vptree-ins-dist}} \\
\caption{Insertion Latency Distributions for Layout Policies}
\label{fig:design-policy-ins-latency}
\end{figure}
The resulting distributions are shown in
Figure~\ref{design-policy-ins-latency}. These distributions are
representing using a "reversed" CDF with log scaling on both axes. This
representation has proven very useful for interpreting the latency
distributions that we see in evaluating dynamization, but are slightly
unusual, and so we've included a guide to interpreting these charts
in Appendix\ref{append:rcdf}.
The first notable point is that, for both the ISAM tree
in Figure~\ref{fig:design-isam-ins-dist} and VPTree in
Figure~\ref{fig:design-vptree-ins-dist}, the Leveling
policy results in a measurable lower worst-case insertion
latency. This result is in line with our theoretical analysis in
Section~\ref{ssec:design-leveling-proofs}. However, there is a major
deviation from theoretical in the worst-case performance of Tiering
and BSM. Both of these should have similar worst-case latencies, as
the worst-case reconstruction in both cases involves every record in
the structure. Yet, we see tiering consistently performing better,
particularly for the ISAM tree.
The reason for this has to do with the way that the records are
partitioned in these worst-case reconstructions. In Tiering, with a scale
factor of $s$, the worst-case reconstruction consists of $\Theta(\log_2
n)$ distinct reconstructions, each involving exactly $2$ structures. BSM,
on the other hand, will use exactly $1$ reconstruction involving
$\Theta(\log_2 n)$ structures. This explains why ISAM performs much better
in Tiering than BSM, as the actual reconstruction cost function there is
$\Theta(n \log_2 k)$. For tiering, this results in $\Theta(n)$ cost in
the worst case. BSM, on the other hand, has $\Theta(n \log_2 \log_2 n)$,
as many more distinct structures must be merged in the reconstruction,
and is thus asymptotically worse-off. VPTree, on the other hand, sees
less of a difference because it is \emph{not} merge decomposable, and so
the number of structures playing a role in the reconstructions plays less
of a role. Having the records more partitioned still hurts performance,
due to cache effects most likely, but less so than in the MDSP case.
\begin{figure}
\caption{Insertion Throughput for Layout Policies}
\label{fig:design-ins-tput}
\end{figure}
Next, in Figure~\ref{fig:design-ins-tput}, we show the overall
insertion throughput for the three policies. This result should
correlate with the amortized insertion costs for each policy derived in
Section~\ref{sec:design-asym}. As expected, tiering has the highest
throughput.
\subsection{General Insert vs. Query Trends}
For our next experiment, we will consider the trade-offs between insertion
and query performance that exist within this design space. We benchmarked
each layout policy for a range of scale factors, measuring both their
respective insertion throughputs and query latencies for both ISAM Tree
and VPTree.
\begin{figure}
\caption{Insertion Throughput vs. Query Latency}
\label{fig:design-tradeoff}
\end{figure}
\subsection{Query Size Effects}
One potentially interesting aspect of decomposition-based dynamization
techniques is that, asymptotically, the additional cost added by
decomposing the data structure vanished for sufficiently expensive
queries. Bentley and Saxe proved that for query costs of the form
$\mathscr{Q}_B(n) \in \Omega(n^\epsilon)$ for $\epsilon > 0$, the
overall query cost is unaffected (asymptotically) by the decomposition.
This would seem to suggest that, as the cost of the query over a single
shard increases, the effectiveness of our design space for tuning query
performance should reduce. This is because our tuning space consists
of adjusting the number of shards within the structure, and so as the
effects of decomposition on the query cost reduce, we should see all
configurations approaching a similar query performance.
In order to evaluate this effect, we tested the query latency of range
queries of varying selectivity against various configurations of our
framework to see at what points the query latencies begin to converge. We
also tested $k$-NN queries with varying values of $k$.
\begin{figure}
\caption{Query "Size" Effect Analysis}
\label{fig:design-query-sze}
\end{figure}
\section{Asymptotically Relevant Trade-offs}
Thus far, we have considered a configuration system that trades in
constant factors only. In general asymptotic analysis, all possible
configurations of our framework in this scheme collapse to the same basic
cost functions when the constants are removed. While we have demonstrated
that, in practice, the effects of this configuration are measurable, there
do exist techniques in the classical literature that provide asymptotically
relevant trade-offs, such as the equal block method~\cite{maurer80} and
the mixed method~\cite[pp. 117-118]{overmars83}. These techniques have
cost functions that are derived from arbitrary, positive, monotonically
increasing functions of $n$ that govern various ways in which the data
structure is partitioned, and changing the selection of function allows
for "tuning" the performance. However, to the best of our knowledge,
these techniques have never been implemented, and no useful guidance in
the literature exists for selecting these functions.
However, it is useful to consider the general approach of these
techniques. They accomplish asymptotically relevant trade-offs by tying
the decomposition of the data structure directly to a function of $n$,
the number of records, in a user-configurable way. We can import a similar
concept into our already existing configuration framework for dynamization
to enable similar trade-offs, by replacing the constant scale factor,
$s$, with some function $s(n)$. However, we must take extreme care when
doing this to select a function that doesn't catastrophically impair
query performance.
Recall that, generally speaking, our dynamization technique requires
multiplying the cost function for the data structure being dynamized by
the number of shards that the data structure has been decomposed into. For
search problems that are solvable in sub-polynomial time, this results in
a worst-case query cost of,
\begin{equation}
\mathscr{Q}(n) \in O(S(n) \cdot \mathscr{Q}_S(n))
\end{equation}
where $S(n)$ is the number of shards and, for our framework, is $S(n) \in
O(s \log_s n)$. The user can adjust $s$, but this tuning does not have
asymptotically relevant consequences. Unfortunately, there is not much
room, practically, for adjustment. If, for example, we were to allow the
user to specify $S(n) \in \Theta(n)$, rather than $\Theta(\log n)$, then
query performance would be greatly impaired. We need a function that is
sub-linear to ensure useful performance.
To accomplish this, we proposed adding a second scaling factor, $k$, such
that the number of records on level $i$ is given by,
\begin{equation}
\label{eqn:design-k-expr}
N_B \cdot \left(s \log_2^k(n)\right)^{i}
\end{equation}
with $k=0$ being equivalent to the configuration space we have discussed
thus far. The addition of $k$ allows for the dependency of the number of
shards on $n$ to be slightly biased upwards or downwards, in a way that
\emph{does} show up in the asymptotic analysis for inserts and queries,
but also ensures sub-polynomial additional query cost.
In particular, we prove the following asymptotic properties of this
configuration.
\begin{theorem}
The worst-case query latency of a dynamization scheme where the
capacity of each level is provided by Equation~\ref{eqn:design-k-expr} is
\begin{equation}
\mathscr{Q}(n) \in O\left(\left(\frac{\log n}{\log (k \log n))}\right) \cdot \mathscr{Q}_S(n)\right)
\end{equation}
\end{theorem}
\begin{proof}
The number of levels within the structure is given by $\log_s (n)$,
where $s$ is the scale factor. The addition of $k$ to the parametrization
replaces this scale factor with $s \log^k n$, and so we have
\begin{equation*}
\log_{s \log^k n}n = \frac{\log n}{\log\left(s \log^k n\right)} = \frac{\log n}{\log s + \log\left(k \log n\right)} \in O\left(\frac{\log n}{\log (k \log n)}\right)
\end{equation*}
by the application of various logarithm rules and change-of-base formula.
The cost of a query against a decomposed structure is $O(S(n) \cdot \mathscr{Q}_S(n))$, and
there are $\Theta(1)$ shards per level. Thus, the worst case query cost is
\begin{equation*}
\mathscr{Q}(n) \in O\left(\left(\frac{\log n}{\log (k \log n))}\right) \cdot \mathscr{Q}_S(n)\right)
\end{equation*}
\end{proof}
\begin{theorem}
The amortized insertion cost of a dynamization scheme where the capacity of
each level is provided by Equation~\ref{eqn:design-k-expr} is,
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \frac{\log n}{\log ( k \log n)}\right)
\end{equation*}
\end{theorem}
\begin{proof}
\end{proof}
\subsection{Evaluation}
In this section, we'll access the effect that modifying $k$ in our
new parameter space has on the insertion and query performance of our
dynamization framework.
\section{Conclusion}
|