chapters/design-space.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019

\chapter{Exploring the Design Space}
\label{chap:design-space}

\section{Introduction}

In the previous two chapters, we introduced an LSM tree inspired design
space into the Bentley-Saxe method to allow for more flexibility in
performance tuning. However, aside from some general comments about how
these parameters affect insertion and query performance, and some limited
experimental evaluation, we have not performed a systematic analysis of
this space, its capabilities, and its limitations. We will rectify this
situation in this chapter, performing both a detailed mathematical
analysis of the design parameter space, as well can experimental
evaluation, to explore the space and its trade-offs, and demonstrate
their practical effectiveness.

Before diving into the design space we have introduced in detail, it's
worth taking some time to motivate this entire endeavor. There is a large
body of theoretical work in the area of data structure dynamization,
and, to the best of our knowledge, none of these papers have introduced
a design space of the sort that we have introduced here. Despite this,
some papers which \emph{use} these techniques have introduced similar
design elements into their own implementations~\cite{pgm}, with some even
going so far as to describe these elements as part of the Bentley-Saxe
method~\cite{almodaresi23}.

This situation is best understood in terms of the ultimate goals
of the respective lines of work. In the classical literature
on dynamization, the focus is directed at proving theoretical
asymptotic bounds. In this context, the LSM tree design space is
of limited utility, because its tuning parameters adjust constant
factors, and thus don't play a major role in asymptotics. Where
the theoretical literature does introduce configurability, such as
with the equal blocks method~\cite{overmars-art-of-dyn} or more
complex schemes that nest the equal block method \emph{inside}
of a binary decomposition~\cite{overmars81}, the intention is
to produce asymptotically relevant trade-offs between insert,
query, and delete performance for deletion decomposable search
problems~\cite[pg. 117]{overmars83}. This explains why the equal block
method is described in terms of a function, rather than a constant value,
to enable it to appear in the asymptotics.

On the other hand, in practical scenarios, constant tuning of performance
can be very relevant. We've already shown in Sections~\ref{ssec:ds-exp}
and \ref{ssec:dyn-ds-exp} how tuning parameters, particularly adjusting
the number of shards per level, can have measurable real-world effects on
the performance characteristics of dynamized structures. In fact sometimes
this tuning is \emph{necessary} to enable reasonable performance. It's
quite telling that the two most direct implementations of the Bentley-Saxe
method that we have identified in the literature are both in the context
of metric indices~\cite{naidan14,bkdtree}, a class of data structure
and search problem for which we saw very good performance from standard
Bentley-Saxe in Section~\ref{ssec:dyn-knn-exp}. Our experiments in
Chapter~\ref{chap:framework} show that, for other types of problem,
the technique does not fare quite so well in its unmodified form.

\section{Asymptotic Analysis}
\label{sec:design-asymp}

Before beginning with derivations for the cost functions of dynamized
structures within the context of our proposed design space, we should
make a few comments about the assumptions and techniques that we will use
in our analysis. We will generally neglect buffering in our analysis,
both in terms of the additional query cost of querying the buffer, and
in terms of the buffer's effect on the reconstruction process. Buffering
isn't fundamental to the techniques we are considering, and including it
would needlessly complicate the analysis. However, we will include the
scale factor, $s$, which directly governs the number of blocks within
the dynamized structures.  Additionally, we will perform the query cost
analysis assuming a decomposable search problem.  Deletes will be entirely
neglected, and we won't make any assumptions about mergeability.

\subsection{Generalized Bentley Saxe Method}
As a first step, we will derive a modified version of the Bentley-Saxe
method that has been adjusted to support arbitrary scale factors.  There's
nothing fundamental to the technique that prevents such modifications,
and its likely that they have not been analyzed like this before simply
out of a lack of interest in constant factors in theoretical asymptotic
analysis.

When generalizing the Bentley-Saxe method for arbitrary scale factors,
we decided to maintain the core concept of binary decomposition. One
interesting mathematical property of a Bentley-Saxe dynamization is that
the internal layout of levels exactly matches the binary representation of
the record count contained within the index. For example, a dynamization
containing $n=20$ records will have 4 records in the third level, and
16 in the fifth, with all other levels being empty. If we represent a
full level with a 1 and an empty level with a 0, then we'd have $10100$,
which is $20$ in base 2.

\begin{algorithm}
\caption{The Generalized BSM Layout Policy}
\label{alg:design-bsm}

\KwIn{$r$: set of records to be inserted, $\mathscr{I}$: a dynamized structure, $n$: number of records in $\mathscr{I}$}

\BlankLine
\Comment{Find the first non-full level}
$target \gets -1$ \;
\For{$i=0\ldots \log_s n$} {
	\If {$|\mathscr{I}_i| < N_B (s - 1)\cdot s^i$} {
		$target \gets i$ \;
		break \;
	}
}

\BlankLine
\Comment{If the structure is full, we need to grow it}
\If {$target = -1$} {
	$target \gets 1 + (\log_s n)$ \;
}

\BlankLine
\Comment{Build the new structure}
$\mathscr{I}_{target} \gets \text{build}(\text{unbuild}(\mathscr{I}_0) \cup  \ldots \text{unbuild}(\mathscr{I}_{target}) \cup r)$ \;
\BlankLine
\Comment{Empty the levels used to build the new shard}
\For{$i=0\ldots target-1$} {
	$\mathscr{I}_i \gets \emptyset$ \;
}
\end{algorithm}

Our generalization, then, is to represent the data as an $s$-ary
decomposition, where the scale factor represents the base of the
representation. To accomplish this, we set of capacity of level $i$ to
be $N_B (s - 1) \cdot s^i$, where $N_B$ is the size of the buffer. The
resulting structure will have at most $\log_s n$ shards. The resulting
policy is described in Algorithm~\ref{alg:design-bsm}.

Analyzing the amortized insertion performance of BSM is slightly
complicated by the fact that each record is \emph{not} written on
every level. For the purposes of our analysis, establishing a reasonable
upper bound on the amortized insertion cost is sufficient, however, so
we will settle for a looser bound to keep things simple.

\begin{theorem}
The amortized insertion cost for generalized BSM with a growth factor of
$s$ is $O\left(\frac{B(n)}{n} \cdot s\log_s n)\right)$.
\end{theorem}
\begin{proof}
In generalized BSM, each record will be written at most $s$ times
per level. We will use this result to provide an upper-bound on the
amortized insertion performance. The worst case cost associated with a
reconstruction in BSM is a full compaction of the structure, which will
require $B(n)$ time to complete. Further, there are $O(\log_s n)$ levels
in the decomposition. As a result, the amortized insertion cost can be
bounded above by,
\begin{equation}
I_A(n) \in O\left(\frac{B(n)}{n} \cdot s \log_s n\right)
\end{equation}
\end{proof}


% \begin{proof}

% In order to calculate the amortized insertion cost, we will first
% determine the average number of times that a record is involved in a
% reconstruction, and then amortize those reconstructions over the records
% in the structure.

% If we consider only the first level of the structure, it's clear that
% the reconstruction count associated with each record in that structure
% will follow the pattern, $1, 2, 3, 4, ..., s-1$ when the level is full.
% Thus, the total number of reconstructions associated with records on level
% $i=0$ is the sum of that sequence, or
% \begin{equation*}
% W(0) = \sum_{j=1}^{s-1} j = \frac{1}{2}\left(s^2 - s\right)
% \end{equation*}

% Considering the next level, $i=1$, each reconstruction involving this
% level will copy down the entirety of the structure above it, adding
% one more write per record, as well as one extra write for the new record.
% More specifically, in the above example, the first ``batch'' of records in
% level $i=1$ will have the following write counts: $1, 2, 3, 4, 5, ..., s$,
% the second ``batch'' of records will increment all of the existing write
% counts by one, and then introduce another copy of $1, 2, 3, 4, 5, ..., s$
% writes, and so on.

% Thus, each new ``batch'' written to level $i$ will introduce $W(i-1) + 1$
% writes from the previous level into level $i$, as well as rewriting all
% of the records currently on level $i$.

% The net result of this is that the number of writes on level $i$ is given
% by the following recurrence relation (combined with the $W(0)$ base case),

% \begin{equation*}
% W(i) = sW(i-1) + \frac{1}{2}\left(s-1\right)^2 \cdot s^i
% \end{equation*}

% which can be solved to give the following closed-form expression,
% \begin{equation*}
% W(i) = s^i \cdot \left(\frac{1}{2} (s-1) \cdot (s(i+1) - i)\right)
% \end{equation*}
% which provides the total number of reconstructions that records in
% level $i$ of the structure have participated in. As each record
% is involved in a different number of reconstructions, we'll consider the
% average number by dividing $W(i)$ by the number of records in level $i$.

% From here, the proof proceeds in the standard way for this sort of
% analysis. The worst-case cost of a reconstruction is $B(n)$, and there
% are $\log_s(n)$ total levels, so the total reconstruction costs associated
% with a record can be upper-bounded by, $B(n) \cdot
% \frac{W(\log_s(n))}{n}$, and then this cost amortized over the $n$
% insertions necessary to get the record into the last level. We'll also
% condense the multiplicative constants and drop the additive ones to more
% clearly represent the relationship we're looking to show. This results
% in an amortized insertion cost of,
% \begin{equation*}
% \frac{B(n)}{n} \cdot s \log_s n
% \end{equation*}
% \end{proof}

\begin{theorem}
The worst-case insertion cost for generalized BSM with a scale factor
of $s$ is $\Theta(B(n))$.
\end{theorem}
\begin{proof}
The Bentley-Saxe method finds the smallest non-full block and performs
a reconstruction including all of the records from that block, as well
as all blocks smaller than it, and the new records to be added. The
worst case, then, will occur when all of the existing blocks in the
structure are full, and a new, larger, block must be added.

In this case, the reconstruction will involve every record currently
in the dynamized structure, and will thus have a cost of $I(n) \in
\Theta(B(n))$.
\end{proof}

\begin{theorem}
The worst-case query cost for generalized BSM for a decomposable
search problem with cost $\mathscr{Q}_S(n)$ is $O(\log_s(n) \cdot
\mathscr{Q}_s(n))$.
\end{theorem}
\begin{proof}
The worst-case scenario for queries in BSM occurs when every existing
level is full. In this case, there will be $\log_s n$ levels that must
be queried, with the $i$th level containing $(s - 1) \cdot s^i$ records.
Thus, the total cost of querying the structure will be,
\begin{equation}
\mathscr{Q}(n) = \sum_{i=0}^{\log_s n} \mathscr{Q}_S\left((s - 1) \cdot s^i\right)
\end{equation}
The number of records per shard will be upper bounded by $O(n)$, so
\begin{equation}
\mathscr{Q}(n) \in O\left(\sum_{i=0}^{\log_s n} \mathscr{Q}_S(n)\right)
	\in O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)
\end{equation}
\end{proof}

\begin{theorem}
The best-case query cost for generalized BSM for a decomposable
search problem with a cost of $\mathscr{Q}_S(n)$ is $\mathscr{Q}(n)
\in \Theta(\mathscr{Q}_S(n))$.
\end{theorem}
\begin{proof}
The best case scenario for queries in BSM occurs when a new level is
added, which results in every record in the structure being compacted
into a single structure. In this case, there is only a single data
structure in the dynamization, and so the query cost over the dynamized
structure is identical to the query cost of a single static instance of
the structure. Thus, the best case query cost in BSM is,
\begin{equation*}
\mathscr{Q}_B(n) \in \Theta \left( 1 \cdot \mathscr{Q}_S(n) \right) \in \Theta\left(\mathscr{Q}_S(n)\right)
\end{equation*}

\end{proof}


\subsection{Leveling}


\begin{algorithm}
\caption{The Leveling Policy}
\label{alg:design-leveling}

\KwIn{$r$: set of records to be inserted, $\mathscr{I}$: a dynamized structure, $n$: number of records in $\mathscr{I}$}

\BlankLine
\Comment{Find the first non-full level}
$target \gets -1$ \;
\For{$i=0\ldots \log_s n$} {
	\If {$|\mathscr{I}_i| < N_B \cdot s^{i+1}$} {
		$target \gets i$ \;
		break \;
	}
}

\BlankLine
\Comment{If the target is $0$, then just merge the buffer into it}
\If{$target = 0$} {
	$\mathscr{I}_0 \gets \text{build}(\text{unbuild}(\mathscr{I}_0) \cup r)$ \;
	\Return
}

\BlankLine
\Comment{If the structure is full, we need to grow it}
\If {$target = -1$} {
	$target \gets 1 + (\log_s n)$ \;
}

\BlankLine
\Comment{Perform the reconstruction}
$\mathscr{I}_{target} \gets \text{build}(\text{unbuild}(\mathscr{I}_{target}) \cup \text{unbuild}(\mathscr{I}_{target - 1}))$ \;

\BlankLine
\Comment{Shift the remaining levels down to free up $\mathscr{I}_0$}
\For{$i=target-1 \ldots 1$} {
	$\mathscr{I}_i \gets \mathscr{I}_{i-1}$ \;
}

\BlankLine
\Comment{Flush the buffer in $\mathscr{I}_0$}
$\mathscr{I}_0 \gets \text{build}(r)$ \;

\Return \;
\end{algorithm}

Our leveling layout policy is described in
Algorithm~\ref{alg:design-leveling}. Each level contains a single
structure with a capacity of $N_B\cdot s^{i+1}$ records. When a
reconstruction occurs, the smallest level, $i$, with space to contain the
records from level $i-1$, in addition to the records currently within
it, is located. Then, a new structure is built at level $i$ containing
all of the records in levels $i$ and $i-1$, and the structure at level
$i-1$ is deleted. Finally, all levels $j < (i - 1)$ are shifted to level
$j+1$. This process clears space in level $0$ to contain the buffer flush.

\begin{theorem}
The amortized insertion cost of leveling with a scale factor of $s$ is
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot s \log_s n\right)
\end{equation*}
\end{theorem}
\begin{proof}
Similarly to generalized BSM, the records in each level will be rewritten
up to $s$ times before they move down to the next level. Thus, the
amortized insertion cost for leveling can be found by determining how
many times a record is expected to be rewritten on a single level, and
how many levels there are in the structure.

On any given level, the total number of writes require to fill the level
is given by the expression,
\begin{equation*}
B(s + (s - 1) + (s - 2) + \ldots + 1)
\end{equation*}
where $B$ is the number of records added to the level during each
reconstruction (i.e., $N_B$ for level $0$ and $N_B\cdots^{i-1}$ for any
other level).

This is because the first batch of records entering the level will be
rewritten each of the $s$ times that the level is rebuilt before the
records are merged into the level below. The next batch will be rewritten
one fewer times, and so on. Thus, the total number of writes is,
\begin{equation*}
B\sum_{i=0}^{s-1} (s - i) = B\left(s^2 + \sum_{i=0}^{i-1} i\right) = B\left(s^2 + \frac{(s-1)s}{2}\right)
\end{equation*}
which can be simplified to get,
\begin{equation*}
\frac{1}{2}s(s+1)\cdot B
\end{equation*}
writes occurring on each level.\footnote{
	This write count is not cumulative over the entire structure. It only
	accounts for the number of writes occurring on this specific level.
}

To obtain the total number of times records are rewritten, we need to
calculate the average number of times a record is rewritten per level,
and sum this over all of the levels.
\begin{equation*}
\sum_{i=0}^{\log_s n} \frac{\frac{1}{2}B_i s (s+1)}{s B_i} = \frac{1}{2} \sum_{i=0}^{\log_s n} (s + 1) = \frac{1}{2} (s+1) \log_s n 
\end{equation*}
To calculate the amortized insertion cost, we multiply this write amplification
number of the cost of rebuilding the structures, and divide by the total number
of records. We'll condense the constant into a single $s$, as this best
expresses the nature of the relationship we're looking for,
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n}\cdot s \log_s n\right)
\end{equation*}
\end{proof}

\begin{theorem}
The worst-case insertion cost for leveling with a scale factor of $s$ is
\begin{equation*}
\Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right)
\end{equation*}
\end{theorem}
\begin{proof}
Unlike in BSM, where the worst case reconstruction involves all of the
records within the structure, in leveling it only includes the records
in the last two levels. In particular, the worst case behavior occurs
when the last level is one reconstruction away from its capacity, and the
level above it is full. In this case, the reconstruction will involve the
full capacity of the last level, or $N_B \cdot s^{\log_s n +1}$ records.

We can relate this to $n$ by finding the ratio of elements contained in
the last level of the structure to the entire structure. This is given
by,
\begin{equation*}
\frac{N_B \cdot s^{\log_s n + 1}}{\sum_{i=0}^{\log_s n} N_B \cdot s^{i + 1}} = \frac{(s - 1)n}{sn - 1}
\end{equation*}
This fraction can be simplified by noting that the $1$ subtracted in
the denominator is negligible and dropping it, allowing the $n$ to be
canceled and giving a ratio of $\frac{s-1}{s}$. Thus the worst case reconstruction
will involve $\frac{s - 1}{s} \cdot n$ records, with all the other levels
simply shifting down at no cost, resulting in a worst-case insertion cost
of,
\begin{equation*}
I(n) \in \Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right)
\end{equation*}
\end{proof}

\begin{theorem}
The worst-case query cost for leveling for a decomposable search
problem with cost $\mathscr{Q}_S(n)$ is
\begin{equation*}
O\left(\mathscr{Q}_S(n) \cdot \log_s n \right)
\end{equation*}
\end{theorem}
\begin{proof}
The worst-case scenario for leveling is right before the structure gains
a new level, at which point there will be $\log_s n$ data structures
each with $O(n)$ records. Thus the worst-case cost will be the cost
of querying each of these structures,
\begin{equation*}
O\left(\mathscr{Q}_S(n) \cdot \log_s n \right)
\end{equation*}
\end{proof}

\begin{theorem}
The best-case query cost for leveling for a decomposable search
problem with cost $\mathscr{Q}_S(n)$ is
\begin{equation*}
\mathscr{Q}_B(n) \in O(\mathscr{Q}_S(n) \cdot \log_s n)
\end{equation*}

\end{theorem}
\begin{proof}
Unlike BSM, leveling will never have empty levels. The policy ensures
that there is always a data structure on every level. As a result, the
best-case query still must query $\log_s n$ structures, and so has a
best-case cost of,
\begin{equation*}
\mathscr{Q}_B(n) \in O\left(\mathscr{Q}_S(n) \cdot \log_s n\right)
\end{equation*}
\end{proof}

\subsection{Tiering}


\begin{algorithm}
\caption{The Tiering Policy}
\label{alg:design-tiering}

\KwIn{$r$: set of records to be inserted, $\mathscr{L}_0 \ldots \mathscr{L}_{\log_s n}$: the levels of $\mathscr{I}$, $n$: the number of records in $\mathscr{I}$}
\BlankLine
\Comment{Find the first non-full level}
$target \gets -1$ \;
\For{$i=0\ldots \log_s n$} {
	\If {$|\mathscr{L}_i| < s$} {
		$target \gets i$ \;
		break \;
	}
}

\BlankLine
\Comment{If the structure is full, we need to grow it}
\If {$target = -1$} {
	$target \gets 1 + (\log_s n)$ \;
}

\BlankLine
\Comment{Walk the structure backwards, applying reconstructions}
\For {$i \gets target \ldots 1$} {
	$\mathscr{L}_i \gets \mathscr{L_i} \cup \text{build}(\text{unbuild}(\mathscr{L}_{i-1, 0}) \ldots \text{unbuild}(\mathscr{L}_{i-1, s-1}))$ \;
}
\BlankLine
\Comment{Add the buffered records to $\mathscr{L}_0$}
$\mathscr{L}_0 \gets \mathscr{L}_0 \cup \text{build}(r)$ \;

\Return  \;
\end{algorithm}

Our tiering layout policy is described in Algorithm~\ref{alg:design-tiering}. In
this policy, each level contains $s$ shards, each with a capacity
$N_B\cdot s^i$ records. When a reconstruction occurs, the first level
with fewer than $s$ shards is selected as the target, $t$. Then, for
every level with $i < t$, all of the shards in $i$ are merged into a
single shard using a reconstruction and placed in level $i+1$. These
reconstructions are performed backwards, starting at $t-1$ and moving
back up towards $0$. Then, the the shard created by the buffer flush is
placed in level $0$.

\begin{theorem}
The amortized insertion cost of tiering with a scale factor of $s$ is,
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \log_s n \right)
\end{equation*}
\end{theorem}
\begin{proof}
For tiering, each record is written \emph{exactly} one time per
level. As a result, each record will be involved in exactly $\log_s n$
reconstructions over the lifetime of the structure. Each reconstruction
will have cost $B(n)$, and thus the amortized insertion cost must be,
\begin{equation*}
I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \log_s n\right)
\end{equation*}
\end{proof}

\begin{theorem}
The worst-case insertion cost of tiering with a scale factor of $s$ is,
\begin{equation*}
I(n) \in \Theta\left(B(n)\right)
\end{equation*}
\end{theorem}
\begin{proof}
The worst-case reconstruction in tiering involves performing a
reconstruction on each level.  More formally, the total cost of this
reconstruction will be,
\begin{equation*}
I(n) \in \Theta\left(\sum_{i=0}^{\log_s n} B(s^i)\right)
\end{equation*}
\end{proof}

\begin{theorem}
The worst-case query cost for tiering for a decomposable search
problem with cost $\mathscr{Q}_S(n)$ is
\begin{equation*}
\mathscr{Q}(n) \in O( \mathscr{Q}_S(n) \cdot s \log_s n)
\end{equation*}
\end{theorem}
\begin{proof}
As with the previous two policies, the worst-case query occurs when the
structure is completely full. In case of tiering, that means that there
will be $\log_s n$ levels, each containing $s$ shards with a size bounded
by $O(n)$. Thus, there will be $s \log_s n$ structures to query, and the
query cost must be,
\begin{equation*}
\mathscr{Q}(n) \in O \left(\mathscr{Q}_S(n) \cdot s \log_s n \right)
\end{equation*}
\end{proof}

\begin{theorem}
The best-case query cost for tiering for a decomposable search problem
with cost $\mathscr{Q}_S(n)$ is $O(\log_s n)$.
\end{theorem}
\begin{proof}
The tiering policy ensures that there are no internal empty levels, and
as a result the best case scenario for tiering occurs when each level is
populated by exactly $1$ shard. In this case, there will only be $\log_s n$
shards to query, resulting in,
\begin{equation*}
\mathscr{Q}_B(n) \in O\left(\mathscr{Q}_S(n) \cdot \log_S n \right)
\end{equation*}
best-case query cost.
\end{proof}
\section{General Observations}

The asymptotic results from the previous section are summarized in
Table~\ref{tab:policy-comp}. When the scale factor is accounted for
in the analysis, we can see that possible trade-offs begin to manifest
within the space. We've seen some of these in action directly in
the experimental sections of previous chapters.

Most notably, we can directly see in these cost functions the reason why
tiering and leveling experience opposite effects as the scale factor
changes. In both policies, increasing the scale factor increases the
base of the logarithm governing the height, and so in the absence of
the additional constants in the analysis, it would superficially appear
as though both policies should see the same effects. But, with other
constants retained, we can see that this is in fact not the case. For
tiering, increasing the scale factor does reduce the number of levels,
however it also increases the number of shards. Because the level
reduction is in the base of the logarithm, but the shard count increase
is directly linear, the shard count effect dominates and we see the query
performance reduce as the scale factor increases. Leveling, however,
does not include this linear term and sees only a reduction in height.

When considering insertion, we see a similar situation in reverse. For
leveling and tiering, increasing the scale factor reduces the size of
the log term, and there are no other terms at play in tiering, so we
see an improvement in insertion performance. However, leveling also
has a linear dependency on the scale factor, as increasing the scale
factor also increases the write amplification. This is why leveling sees
its insertion performance reduce with scale factor. The generalized
Bentley-Saxe method follows the same general trends as leveling for
worst-case query cost and for amortized insertion cost.

Of note as well is the fact that leveling has slightly better worst-case
insertion performance. This is because leveling only ever reconstructs
one level at a time, with the other levels simply shifting around in
constant time. Bentley-Saxe and tiering have strictly worse worst-case
insertion  cost as their worst-case reconstructions involve all of the
levels. In the Bentley-Saxe method, this worst-case cost is manifest
in a single, large reconstruction. In tiering, it involves $\log_s n$
reconstructions, one per level.


\begin{table*}
\centering
\small
\renewcommand{\arraystretch}{1.6}
\begin{tabular}{|l l l l|}
\hline
& \textbf{Gen. BSM} & \textbf{Leveling} & \textbf{Tiering} \\ \hline
$I(n)$ & $\Theta(B(n))$ & $\Theta\left(B\left(\frac{s-1}{s} \cdot n\right)\right)$ & $ \Theta\left(\sum_{i=0}^{\log_s n} B(s^i)\right)$ \\ \hline
$I_A(n)$ & $O\left(\frac{B(n)}{n} s\log_s n)\right)$ & $\Theta\left(\frac{B(n)}{n} s\log_s n\right)$& $\Theta\left(\frac{B(n)}{n} \log_s n\right)$ \\ \hline
$\mathscr{Q}(n)$ &$O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)$ & $O\left(\log_s n \cdot \mathscr{Q}_S(n)\right)$ & $O\left(s \log_s n  \cdot \mathscr{Q}_S(n)\right)$\\ \hline
$\mathscr{Q}_B(n)$ & $\Theta(\mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ & $O(\log_s n \cdot \mathscr{Q}_S(n))$ \\ \hline
\end{tabular}

\caption{Comparison of cost functions for various layout policies for DSPs}
\label{tab:policy-comp}
\end{table*}

\section{Experimental Evaluation}

In the previous sections, we mathematically proved various claims about
the performance characteristics of our three layout policies to assess
the trade-offs that exist within the design space. While this analysis is
useful, the effects we are examining are at the level of constant factors,
and so it would be useful to perform experimental testing to validate
that these claimed performance characteristics manifest in practice. In
this section, we will do just that, running various benchmarks to explore
the real-world performance implications of the configuration parameter
space of our framework.


\subsection{Asymptotic Insertion Performance}

We'll begin by validating our results for the insertion performance
characteristics of the three layout policies. For this test, we
consider two data structures: the ISAM tree and the VP tree. The ISAM
tree structure is merge-decomposable using a sorted-array merge, with a
build cost of $B_M(n, k) \in \Theta(n \log k)$, where $k$ is the number of
structures being merged. The VPTree, by contrast, is \emph{not} merge
decomposable, and is built in $B(n) \in \Theta(n \log n)$ time. We use
the $200$ million record SOSD \texttt{OSM} dataset~\cite{sosd-datasets}
for ISAM testing, and the one million record, $300$-dimensional Spanish
Billion Words (\texttt{SBW}) dataset~\cite{sbw} for VPTree testing.

For our first experiment, we will examine the latency distribution
for inserts into our structures. We tested the three layout policies,
using a common scale factor of $s=2$. This scale factor was selected
to minimize its influence on the results (we've seen before in
Sections~\ref{ssec:ds-exp} and \ref{ssec:dyn-ds-exp} that scale factor
affects leveling and tiering in opposite ways) and isolate the influence
of the layout policy alone to as great a degree as possible. We used a
buffer size of $N_B=12000$ for the ISAM tree structure, and $N_B=1000$
for the VPTree.

We generated this distribution by inserting $30\%$ of the records from
the set to ``warm up'' the dynamized structure, and then measuring
the insertion latency for each individual insert for the remaining
$70\%$ of the data.  Note that, due to timer resolution issues at
nanosecond scales, the specific latency values associated with the
faster end of the insertion distribution are not precise. However,
it is our intention to examine the latency distribution, not the
values themselves, and so this is not a significant limitation
for our analysis.  The resulting distributions are shown in
Figure~\ref{fig:design-policy-ins-latency}. These distributions
are representing using a ``reversed'' CDF with log scaling on both
axes. 

\begin{figure}
\centering
\subfloat[ISAM Tree Insertion Latencies]{\includegraphics[width=.5\textwidth]{img/design-space/isam-insert-dist.pdf} \label{fig:design-isam-ins-dist}} 
\subfloat[VPTree Insertion Latencies]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-insert-dist.pdf} \label{fig:design-vptree-ins-dist}} \\
\caption{Insertion Latency Distributions for Layout Policies}
\label{fig:design-policy-ins-latency}
\end{figure}


The first notable point is that, for both the ISAM
tree in Figure~\ref{fig:design-isam-ins-dist} and VPTree in
Figure~\ref{fig:design-vptree-ins-dist}, the leveling policy results in a
measurable lower worst-case insertion latency. This result is in line with
our theoretical analysis in Section~\ref{sec:design-asymp}. However, there
is a major deviation from theoretical in the worst-case performance of
tiering and BSM. Both of these should have similar worst-case latencies,
as the worst-case reconstruction in both cases involves every record
in the structure. Yet, we see tiering consistently performing better,
particularly for the ISAM tree.

The reason for this has to do with the way that the records are
partitioned in these worst-case reconstructions. In tiering, with a scale
factor of $s$, the worst-case reconstruction consists of $\Theta(\log_2
n)$ distinct reconstructions, each involving exactly $2$ structures. BSM,
on the other hand, will use exactly $1$ reconstruction involving
$\Theta(\log_2 n)$ structures. This explains why ISAM performs much better
in tiering than BSM, as the actual reconstruction cost function there is
$\Theta(n \log_2 k)$. For tiering, this results in $\Theta(n)$ cost in
the worst case. BSM, on the other hand, has $\Theta(n \log_2 \log_2 n)$,
as many more distinct structures must be merged in the reconstruction,
and is thus asymptotically worse-off. VPTree, on the other hand, sees
less of a difference because it is \emph{not} merge decomposable, and so
the number of structures playing a role in the reconstructions plays less
of a role. Having the records more partitioned still hurts performance,
due to cache effects most likely, but less so than in the MDSP case.

\begin{figure}
\centering
\subfloat[ISAM Tree]{\includegraphics[width=.5\textwidth]{img/design-space/isam-tput.pdf} \label{fig:design-isam-tput}} 
\subfloat[VPTree]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-tput.pdf} \label{fig:design-vptree-tput}} \\
\caption{Insertion Throughput for Layout Policies}
\label{fig:design-ins-tput}
\end{figure}

Next, in Figure~\ref{fig:design-ins-tput}, we show the overall insertion
throughput for the three policies for both ISAM tree and VPTree. This
result should correlate with the amortized insertion costs for each
policy derived in Section~\ref{sec:design-asymp}. At a scale factor of
$s=2$, all three policies have similar insertion performance. This makes
sense, as both leveling and Bentley-Saxe experience write-amplification
proportional to the scale factor, and at $s=2$ this isn't significantly
larger than tiering's write amplification, particularly compared
to the other factors influencing insertion performance, such as
reconstruction time. However, for larger scale factors, tiering shows
\emph{significantly} higher insertion throughput, and leveling and
Bentley-Saxe show greatly degraded performance due to the large amount
of additional write amplification. These results are perfectly in line
with the mathematical analysis of the previous section.

\subsection{General Insert vs. Query Trends}

For our next experiment, we will consider the trade-offs between insertion
and query performance that exist within this design space. We benchmarked
each layout policy for a range of scale factors, measuring both their
respective insertion throughputs and query latencies for both ISAM tree
and VPTree.

\begin{figure}
\centering
\subfloat[ISAM Tree Range Count]{\includegraphics[width=.5\textwidth]{img/design-space/isam-parm-sweep.pdf} \label{fig:design-isam-tradeoff}} 
\subfloat[VPTree $k$-NN]{\includegraphics[width=.5\textwidth]{img/design-space/knn-parm-sweep.pdf} \label{fig:design-knn-tradeoff}} \\
\caption{Insertion Throughput vs. Query Latency for varying scale factors}
\label{fig:design-tradeoff}
\end{figure}

Figure~\ref{fig:design-isam-tradeoff} shows the trade-off curve between
insertion throughput and query latency for range count queries executed
against a dynamized ISAM tree. This test was run with a dataset
of 500 million uniform integer keys, and a selectivity of $\sigma =
0.0000001$, the scale factor associated with each point is annotated on
the plot. These results show that there is a very direct relationship
between scale factor, layout policy, and insertion throughput. Leveling
almost universally has lower insertion throughput but also lower
query latency than tiering does, though at scale factor $s=2$ they are
fairly similar. Tiering gains insertion throughput at the cost of query
performance as the scale factor increases, although the rate at which
the insertion performance improves decreases for larger scale factors,
and the rate at which query performance declines increases dramatically.

One interesting note is that leveling sees very little improvement in
query latency as the scale factor is increased. This is due to the fact
that, asymptotically, the scale factor only affects leveling's query
performance by increasing the base of a logarithm. Thus, small increases
in scale factor have very little effect. However, level's insertion
performance degrades linearly with scale factor, and this is well
demonstrated in the plot.

The story is a bit clearer in Figure~\ref{fig:design-knn-tradeoff}. The
VPTree has a much greater construction time, both asymptotically and
in absolute terms, and the average query latency is also significantly
greater. These result in the configuration changes showing much more
significant changes in performance, and present us with a far clearer
trade-off space. The same general trends hold as in ISAM, just amplified.
Leveling has better query performance than tiering and sees increased
query performance and decreased insert performance as the scale factor
increases. Tiering has better insertion performance and worse query
performance than leveling, and sees improved insert and worsening
query performance as the scale factor is increased. The Bentley-Saxe
method shows similar trends to leveling.

In general, the Bentley-Saxe method appears to follow a very similar
trend to that of leveling, albeit with even more dramatic performance
degradation as the scale factor is increased and slightly better query
performance across the board. Generally it seems to be a strictly worse
alternative to leveling in all but its best-case query cost, and we will
omit it from our tests moving forward.

\subsection{Buffer Size}

\begin{figure}
\centering
\subfloat[ISAM Tree Range Count]{\includegraphics[width=.5\textwidth]{img/design-space/isam-bs-sweep.pdf} \label{fig:buffer-isam-tradeoff}} 
\subfloat[VPTree $k$-NN]{\includegraphics[width=.5\textwidth]{img/design-space/knn-bs-sweep.pdf} \label{fig:buffer-knn-tradeoff}} \\
\caption{Insertion Throughput vs. Query Latency for varying buffer sizes}
\label{fig:buffer-size}
\end{figure}

In the previous section, we considered the effect of various scale
factors on the trade-off between insertion and query performance. Our
framework also supports varying buffer sizes, and so we will examine this
next. Figure~\ref{fig:buffer-size} shows the same insertion throughput
vs. query latency curves for fixed layout policy and scale factor
configurations at varying buffer sizes, under the same experimental
conditions as the previous test.

Unlike with the scale factor, there is a significant difference in the
behavior of the two tested structures under buffer size variation. For
the ISAM tree, shown in Figure~\ref{fig:buffer-isam-tradeoff}, we see that
all layout policies follow a similar pattern. Increasing the buffer size
increases insertion throughput for little to no additional query cost up
to a certain point, after which query performance degrades substantially.
This isn't terribly surprising: growing the buffer size will increase
the number of records on each level, and therefore decrease the number
of shards, while at the same time reducing the number of reconstructions
that must be performed. However, the query must be answered against the
buffer too, and once the buffer gets sufficiently large, this increased
cost will exceed any query latency benefit from decreased shard count.
We see this pattern fairly clearly on all tested configurations, however
BSM sees the least benefit from an increased buffer size in terms of
insertion performance.

VPTree is another story, shown in Figure~\ref{fig:buffer-knn-tradeoff}.
This plot is far more chaotic; in fact there aren't any particularly
strong patterns to draw from it. This is likely due to the fact that the
time scales associated with the VPTree in terms of both reconstruction
and query latency are significantly larger, and so the relatively small
constant associated with adjusting the buffer size doesn't have as strong
an influence on performance as it does for the ISAM tree.

\subsection{Query Size Effects}

One potentially interesting aspect of decomposition-based dynamization
techniques is that, asymptotically, the additional cost added by
decomposing the data structure vanished for sufficiently expensive
queries. Bentley and Saxe proved that for query costs of the form
$\mathscr{Q}_B(n) \in \Omega(n^\epsilon)$ for $\epsilon > 0$, the
overall query cost is unaffected (asymptotically) by the decomposition.
This would seem to suggest that, as the cost of the query over a single
shard increases, the effectiveness of our design space for tuning query
performance should reduce. This is because our tuning space consists
of adjusting the number of shards within the structure, and so as the
effects of decomposition on the query cost reduce, we should see all
configurations approaching a similar query performance.

In order to evaluate this effect, we tested the query latency of range
queries of varying selectivity against various configurations of our
framework to see at what points the query latencies begin to converge. We
also tested $k$-NN queries with varying values of $k$. For these tests,
we used a synthetic dataset of 500 million 64-bit key-value pairs for
the ISAM testing, and the SBW dataset for $k$-NN. Query latencies were
measured by executing the queries after all records were inserted into
the structure.

\begin{figure}
\centering
\subfloat[ISAM Tree Range Count]{\includegraphics[width=.5\textwidth]{img/design-space/selectivity-sweep.pdf} \label{fig:design-isam-sel}} 
\subfloat[VPTree $k$-NN]{\includegraphics[width=.5\textwidth]{img/design-space/selectivity-sweep-knn.pdf} \label{fig:design-knn-sel}} \\
\caption{Query Result Size Effect Analysis}
\label{fig:design-query-sze}
\end{figure}

Interestingly, for the range of selectivities tested for range counts, the
overall query latency failed to converge, and there remains a consistent,
albeit slight, stratification amongst the tested policies, as shown in
Figure~\ref{fig:design-isam-sel}. As the selectivity continues to rise
above those shown in the chart, the relative ordering of the policies
remains the same, but the relative differences between them begin to
shrink. This result makes sense given the asymptotics--there is still
\emph{some} overhead associated with the decomposition, but as the cost
of the query approaches linear, it makes up an increasingly irrelevant
portion of the run time.

The $k$-NN results in Figure~\ref{fig:design-knn-sel} show a slightly
different story. This is also not surprising, because $k$-NN is a
$C(n)$-decomposable problem, and the cost of result combination grows
with $k$. Thus, larger $k$ values will \emph{increase} the effect that
the decomposition has on the query run time, unlike was the case in the
range count queries, where the total cost of the combination is constant.

% \section{Asymptotically Relevant Trade-offs}

% Thus far, we have considered a configuration system that trades in
% constant factors only. In general asymptotic analysis, all possible
% configurations of our framework in this scheme collapse to the same basic
% cost functions when the constants are removed. While we have demonstrated
% that, in practice, the effects of this configuration are measurable, there
% do exist techniques in the classical literature that provide asymptotically
% relevant trade-offs, such as the equal block method~\cite{maurer80} and
% the mixed method~\cite[pp. 117-118]{overmars83}.  These techniques have
% cost functions that are derived from arbitrary, positive, monotonically
% increasing functions of $n$ that govern various ways in which the data
% structure is partitioned, and changing the selection of function allows
% for "tuning" the performance. However, to the best of our knowledge,
% these techniques have never been implemented, and no useful guidance in
% the literature exists for selecting these functions. 

% However, it is useful to consider the general approach of these
% techniques.  They accomplish asymptotically relevant trade-offs by tying
% the decomposition of the data structure directly to a function of $n$,
% the number of records, in a user-configurable way. We can import a similar
% concept into our already existing configuration framework for dynamization
% to enable similar trade-offs, by replacing the constant scale factor,
% $s$, with some function $s(n)$. However, we must take extreme care when
% doing this to select a function that doesn't catastrophically impair
% query performance.

% Recall that, generally speaking, our dynamization technique requires
% multiplying the cost function for the data structure being dynamized by
% the number of shards that the data structure has been decomposed into. For
% search problems that are solvable in sub-polynomial time, this results in
% a worst-case query cost of,
% \begin{equation}
% \mathscr{Q}(n) \in O(S(n) \cdot \mathscr{Q}_S(n))
% \end{equation}
% where $S(n)$ is the number of shards and, for our framework, is $S(n) \in
% O(s \log_s n)$. The user can adjust $s$, but this tuning does not have
% asymptotically relevant consequences. Unfortunately, there is not much
% room, practically, for adjustment. If, for example, we were to allow the
% user to specify $S(n) \in \Theta(n)$, rather than $\Theta(\log n)$, then
% query performance would be greatly impaired. We need a function that is
% sub-linear to ensure useful performance.

% To accomplish this, we proposed adding a second scaling factor, $k$, such
% that the number of records on level $i$ is given by,
% \begin{equation}
% \label{eqn:design-k-expr}
% N_B \cdot \left(s \log_2^k(n)\right)^{i}
% \end{equation}
% with $k=0$ being equivalent to the configuration space we have discussed
% thus far. The addition of $k$ allows for the dependency of the number of
% shards on $n$ to be slightly biased upwards or downwards, in a way that
% \emph{does} show up in the asymptotic analysis for inserts and queries,
% but also ensures sub-polynomial additional query cost.

% In particular, we prove the following asymptotic properties of this
% configuration.
% \begin{theorem}
% The worst-case query latency of a dynamization scheme where the
% capacity of each level is provided by Equation~\ref{eqn:design-k-expr} is
% \begin{equation}
% \mathscr{Q}(n) \in O\left(\left(\frac{\log n}{\log (k \log n))}\right) \cdot \mathscr{Q}_S(n)\right)
% \end{equation}
% \end{theorem}
% \begin{proof}
% The number of levels within the structure is given by $\log_s (n)$,
% where $s$ is the scale factor. The addition of $k$ to the parametrization
% replaces this scale factor with $s \log^k n$, and so we have
% \begin{equation*}
% \log_{s \log^k n}n = \frac{\log n}{\log\left(s \log^k n\right)} = \frac{\log n}{\log s + \log\left(k \log n\right)} \in O\left(\frac{\log n}{\log (k \log n)}\right)
% \end{equation*}
% by the application of various logarithm rules and change-of-base formula.

% The cost of a query against a decomposed structure is $O(S(n) \cdot \mathscr{Q}_S(n))$, and
% there are $\Theta(1)$ shards per level. Thus, the worst case query cost is
% \begin{equation*}
% \mathscr{Q}(n) \in O\left(\left(\frac{\log n}{\log (k \log n))}\right) \cdot \mathscr{Q}_S(n)\right)
% \end{equation*}
% \end{proof}

% \begin{theorem}
% The amortized insertion cost of a dynamization scheme where the capacity of
% each level is provided by Equation~\ref{eqn:design-k-expr} is,
% \begin{equation*}
% I_A(n) \in \Theta\left(\frac{B(n)}{n} \cdot \frac{\log n}{\log ( k \log n)}\right)
% \end{equation*}
% \end{theorem}
% \begin{proof}
% \end{proof}

% \subsection{Evaluation}

% In this section, we'll access the effect that modifying $k$ in our
% new parameter space has on the insertion and query performance of our
% dynamization framework.


\section{Conclusion}

In this chapter, we considered the proposed design space for our
dynamization framework both mathematically and experimentally, and derived
some general principles for configuration within the space. We generalized
the Bentley-Saxe method to support scale factors and buffering, but
found that the result was generally worse than leveling in all but its
best case query performance. We also showed that there does exist a
trade-off, mediated by scale factor, between insertion performance and
query performance, though it doesn't manifest for every layout policy
and data structure combination. For example, when testing the ISAM tree
structure with the leveling or BSM policies, there is not a particularly
useful trade-off resulting from scale factor adjustments, because the
amount of extra query performance resulting from increasing the scale
factor is dwarfed by the reduction in insertion performance. This is
because the cost in insertion performance grows far faster than any
query performance benefit, due to the way to two effects scale in the
cost functions for the method. 

Broadly speaking, we can draw a few general conclusions. First, the
leveling and BSM policies are fairly similar, with the BSM having slightly
better query performance in general owing to its better best-case query
cost. Both of these policies are better than tiering in terms of query
performance, but generally worse for insertion performance. The one
slight exception to this trend is in worst-case insertion performance,
where leveling has a slight advantage over the other policies because
of the way it performs reconstructions ensuring that the worst-case
reconstruction cost is smaller. Adjusting the scale factor can trade
between insert and query performance, though leveling and BSM have an
opposite effect from tiering. For these policies, increasing the scale
factor reduces insert performance and improves query performance. Tiering
does the opposite. The mutable buffer can be increased in size to improve
insert performance as well (in all cases), but the query cost increases
as a result. Once the buffer gets sufficiently large, the trade-off in
query performance becomes severe.

While this trade-off space does provide us with the desired
configurability, the experimental results show that the trade-off curves
are not particularly smooth, and the effectiveness can vary quite a bit
depending on the properties of the data structure and search problem being
dynamized. Additionally, there isn't a particular good way to control
insertion tail latencies in this model, as leveling is only slightly
better in this metric. In the next chapter, we'll consider methods for
controlling tail latency, which will, as a side benefit, also provide
a more desirable configuration space than the one considered here.