summaryrefslogtreecommitdiffstats
path: root/chapters/related-works.tex
blob: e828593d4bb7bd708feb54b5b977551e8dd3ab34 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
\chapter{Related Work}
\label{chap:related-work}

While we have already discussed, at length, the most
directly relevant background work in the area of dynamization in
Chapter~\ref{chap:background}, there are a number of other lines of work
that are related, either directly or superficially, to our ultimate
goal of automating some, or all, of the process of constructing a
database index. In this chapter, we will discuss some of the most notable
of these works.

\section{Existing Applications of Dynamization}

We will begin with the most directly relevant topics: other papers which
use dynamization to construct specific dynamic data structures. There
are a few examples of works which introduce a new data structure, and
simply apply dynamization to them for the purposes of adding update
support.  For example, both the PGM learned index~\cite{pgm} and the KDS
tree~\cite{xie21} propose static structures, and then apply dynamization
techniques to support updates. However, in this section, we will focus
on works in which dynamization appears as a major focus of the paper,
not simply as an incidental tool.

One of the older applications of the Bentley-Saxe method is in the
creation of a data structure called the Bkd-tree~\cite{bkd-tree}.
This structure is a search tree, based on the kd-tree~\cite{kd-tree},
for multi-dimensional searching, that has been designed for use in
external storage. While it was not the first external kd-tree, existing
implementations struggled with support for updates, which typically
were both inefficient, and resulted in node structures that poorly
utilized the space (i.e., many nodes were mostly empty, resulting in
less efficient searches). To resolve these problems, the authors used
a statically constructed external structure,\footnote{To clarify,
the K-D-B-tree supports updates, but poorly. For Bkd-tree, the authors
exclusively used the K-D-B-tree's static bulk-loading, which is highly
efficient.  } the K-D-B-tree, and combined it with the logarithmic method
to create a full-dynamic structure (K-D-B-tree supports deletes natively,
so the problem is deletion decomposable).  The major contribution of
this paper, per the authors, was not necessarily the structure itself,
but the demonstration of the viability of the logarithmic method, as
they showed extensively that their dynamized static structure was able
to outperform native dynamic implementations on external storage.

A more recent paper discussing the application of the logarithmic method
to a specific example is its application to the Mantis structure for
large-scale DNA sequence search~\cite{mantis-dyn}. Mantis~\cite{mantis}
is one of the fastest and most space efficient structures for sequence
search, but is static. To create a half-dynamic version of Mantis, the
authors first design an algorithm to efficiently merge multiple Mantis
structures together (turning their problem into an MDSP, though they
don't use this terminology). They then apply a modified version of the
logarithmic method, including LSM-tree inspired features like leveling,
tiering, and a scale factor. The resulting structure was shown to perform
quite well compared to other existing solutions.

Another notable work considering dynamization techniques examines applying
the logarithmic method to produce full-dynamic versions of various metric
indexing structures~\cite{naidan14}. In this paper, the logarithmic
method is directly applied to two static metric indexing structures,
VPTree and SSS-tree, to create full-dynamic versions (using weak deletes)
These dynamized structures are compared to two dynamic baselines, DSA-tree
and EGNAT, for both multi-dimensional range scans and $k$-NN searches. The
paper contains extensive benchmarks which demonstrate that the dynamized
versions perform quite well, being capable of even beating the dynamic
ones in query performance under certain circumstances. It's worth noting
that we also tested a dynamized VPTree in Chapter~\ref{chap:framework}
for $k$-NN, and obtained results in line with theirs. 

Finally, LSMGraph is a recently proposed system which applies
dynamization techniques\footnote{
	The authors make a point of saying that they are \emph{not}
	applying dynamization, but instead embedding their structure
	inside of an LSM-tree, noting the challenges associated
	with applying dynamization directly to graphs. While the
	specific techniques they are using are not directly taken
	from any of the dynamization techniques we discussed in
	Chapter~\ref{chap:background}, we nonetheless consider this
	work to be an example of dynamization, at least in principle,
	because they decompose a static structure into smaller blocks
	and handle inserts by rebuilding these blocks systematically.} 
to the compressed sparse row (CSR) matrix representation of graphs to
produce an dynamic, external, graph storage system~\cite{lsmgraph}. This
is a particularly interesting example, because graphs and graph algorithms
are \emph{not} remotely decomposable. Adjacent vertices in the graph may
be spread across many levels, and this means that graph algorithms cannot
be decomposed, as traversals must access adjacent vertices, regardless
of which block they are contained within. To resolve this problem, the
authors discard the general query model and build a tightly integrated
system which uses an index to map each vertex to the block containing
it, and implement a vertex-adjacency aware reconstruction process which
helps ensure that adjacent vertices are compacted into the same blocks
during reconstruction.

\section{LSM Tree}

The Log-structured Merge-tree (LSM tree)~\cite{oneil96} is a data
structure proposed by Oneil \emph{et al.} in the mid-90s that is designed
to optimize for write throughput in external indexing contexts. While
Oneil never cites any of the dynamization work we have considered in this
work, the structure that he proposed is eerily similar to a decomposed
data structure, and future developments have driven the structure in a
direction that looks incredibly similar to Bentley and Saxe's logarithmic
method. In fact, several of the examples of dynamization used in the
previous section (as well as this work) either borrow concepts from
modern LSM trees, or go so far as to use the term ``LSM'' as a synonym
for what we call dynamization in this work. However, the work on LSM
trees is distinct from dynamization, at least general dynamization,
because it leans heavily on very specific aspects of the search problems
(point lookup and single-dimensional range search) and data structure in
ways that don't generalize well. In this section, we'll discuss a few
of the relevant works on LSM trees and attempt to differentiate them
from dynamization.

\subsection{The Structure}

The modern LSM tree is a single-dimensional range data structure that is
commonly used in key-value stores such as RocksDB. It consists of a small,
dynamic in-memory structure called a memtable, and a sequence of static,
external structures on disk of geometrically increasing size. These
structures are organized into levels, which can contain either one
structure or several, with the former strategy being called leveling and
the latter tiering. The individual structures are often simple sorted
arrays (with some attached metadata) called runs, which can be further
decomposed into smaller files called sorted string tables (SSTs). Records
are inserted into the memtable initially. When the memtable is filled, it
is flushed and the records are merged into the top level of the structure,
with reconstructions proceeding according to various merge policies
to make room as necessary. LSM trees typically support point lookup
and range queries, and answer these by searching all of the runs in the
structure and merging the results together. To accelerate point lookups,
Bloom filters~\cite{bloom70} are often built over the records in each run
to allow skipping of some of the runs that don't contain the key being
searched for. Deletes are typically handled using tombstones.

\subsection{Design Space}

The bulk of work on LSM trees that is of interest to us focuses on the
associated design space and performance tuning of the structure. There
have been a very large number of papers discussing different ways
of decomposing the structure, performing reconstructions, allocating
resources to filters and other auxiliary structures, etc., to optimize
for resource usage, enable performance tuning, etc. We'll summarize a few
of these works here.

One major line of work in LSM trees involves optimizing the memory
allocation of Bloom filters to the sorted runs within the structure. Bloom
filters are commonly used in LSM trees to accelerate point lookups,
because these queries must examine each run, from top to bottom, until a
matching key is found. Bloom filters can be used to improve performance by
allowing some runs to be skipped over in this searching process. Because
LSM trees are an external data structure, the savings from doing this can
be quite large. There are a number of works in this area~\cite{dayan18-1,
zhu21}, but we will highlight Monkey~\cite{dayan17} specifically.
Monkey is a system that optimizes the allocation of Bloom filter memory
across the levels of an LSM tree, based on the observation that the
worst-case lookup cost (i.e., the cost of a point-lookup on a key that
doesn't exist within the LSM tree) is directly proportional to the sum
of the false positive rates across all levels in the tree. Thus, memory
can be allocated to filters in a way that minimizes this sum. These works
could be useful in the context of dynamization for problems which allow
similar optimizations, such as a dynamized structure for point lookups
using Bloom filters, or possibly range scans using range filters such
as SuRF~\cite{zhang18} or Rosetta~\cite{siqiang20}, but aren't directly
applicable to the general problem of dynamization.

Other work in LSM trees considers different merge
policies. Dostoevsky~\cite{dayan18} introduces two new merge policies, and
the so-called Wacky Continuum~\cite{dayan19} introduces a general design
space that includes Dostoevsky's policies, the traditional LSM policies,
and a new policy called an LSM bush. As it encompasses all of these,
we'll exclusively summarize the Wacky Continuum here. Wacky defines a
merge policy based on three parameters: the capping ratio ($C$), growth
exponential ($X$), merge greed ($K$), and largest-level merge greed
($Z$). The merge greed parameters are used to define the merge threshold,
which effectively allows merge policies that sit between leveling and
tiering. Each level contains an ``active'' run, into which new records
will be merged. Once this run contains specified fraction of the level's
total capacity (determined by the merge greed parameters), a new run
will be added to the level and made active. Leveling can be simulated in
this model by setting this merge greed parameter to 1, so that 100\% of
the level's capacity is allocated to a single run. Tiering is simulated
by setting this parameter such that each active run can only sustain a
single set of records, and a new run is created each time records are
merged into the level. The Wacky continuum allows configuring the merge
greed of the last level independently from the inner levels, to allow it
to support lazy leveling, a policy where the largest level in the LSM
tree contains a single run, but the inner levels are tiered. The other
design parameters are simpler: the capping ratio allows the size ratio
of the last level to be varied independently of the inner levels, and
the growth exponential allows the size ratio between adjacent levels to
grow as the levels get larger. This work also introduces an optimization
system for determining good values for all of these parameters for a 
given workload.

It's worth taking a moment to address why we did not consider the Wacky
Continuum design space in our attempts to introduce a design space into
dynamization. It appears that these concepts would be useful to us, given
that we imported the basic leveling and tiering concepts from LSM trees.
However, we believe that this particular set of design parameters are
not broadly useful outside of the LSM context. The reason for this is
shown within the experimental section of the Wacky paper itself. For
workloads involving range reads, the standard leveling/tiering designs
show perfectly reasonable (and sometimes even superior) performance
trade-offs.  In large part, the Wacky Continuum work is an extension of
the authors' earlier work on Monkey, as it is most effective at improving
trade-offs for point-lookup performance, which are strongly influenced by
Bloom filters. The range scan results are the ones most closely related to
the general dynamization case we have been considering in this work, where
filters cannot be assumed and filter memory allocation isn't an important
consideration. And, in that case, the new merge policies available within
the Wacky Continuum didn't provide enough advantage to be considered here.

Another aspect of the LSM tree design space is compaction granularity,
which was studied in Spooky~\cite{dayan22}. Because LSM trees are built
over sorted runs of data, it is possible to perform partial merges between
levels--moving only a small portion of the data from one level to another.
This can improve write performance by making reconstructions smaller,
and also help reduce the transient storage requirements of the structure,
but comes at the cost of additional write amplification due to files
being repeatedly re-written on the same level more frequently. The storage
benefit comes from the fact that LSM trees require extra working storage
to perform reconstructions, and making the reconstructions smaller reduces
this space requirement. Spooky is a method for determining effective
approaches to performing partial reconstructions. It works partitioning the
largest level into equally sized files, and then dynamically partitioning
the other levels in the structure based on the key ranges of the last
level files. This approach is shown to provide a good balance between
write-amplification and reconstruction storage requirements. This concept
of compaction granularity is another example of an LSM tree specific
design element that doesn't generalize well. It could be used in the
context of dynamization, but only for single-dimensional sorted data, and
so we have not considered it as part of our general techniques.

\subsection{Tail Latency Control}
LSM trees are susceptible to similar insertion tail latency
problems as dynamization. While tail latency can be controlled by
using partial reconstructions, such as in Spooky~\cite{dayan22}
(though this isn't a focus of the work), there does exist some work
specifically on controlling tail latency. One notable result in this
area is SILK~\cite{balmau19,silk-plus}, which focuses on reducing tail
latency using intelligent scheduling of reconstruction operations.

After performing an experimental evaluation of various LSM tree based key
value systems, the authors determine three main principles for designing
their tail latency control system,
\begin{enumerate}
	\item I/O bandwidth should be opportunistically allocated to 
	      reconstructions.
	\item Reconstructions at smaller levels in the tree should be 
	      prioritized.
	\item Reconstructions at larger levels in the tree should be 
	      preemptable by those on lower levels.
\end{enumerate}
The resulting system prioritizes buffer flushes, which are given dedicated
threads and priority access to I/O bandwidth. The next highest priority
operations are reconstructions involving levels $0$ and $1$, which must be
completed to allow flushes to proceed. These reconstructions are able to
preempt any other running compaction if there is not an available thread
when one is scheduled. All other reconstructions run with lower priority,
and may need to be wholly discarded if a high priority reconstruction
invalidates them. The system also includes sophisticated I/O bandwidth
controls, as this is a constrained resource in external contexts.

Some of the core concepts underlying the SILK system
inspired the tail latency control system we have proposed in
Chapter~\ref{chap:tail-latency}, but our system is quite distinct from
it. SILK leverages some consequences of the LSM tree design space in ways
that our system cannot rely upon.  For example, SILK uses a two-version
buffer (like we do), but is able to allocate enough I/O bandwidth to
ensure that one of the buffer versions can be flushed before the other
one fills up. Given the constraints of our dynamization system, with an
unsorted buffer, this was not possible to do. Additionally, SILK uses
partial compactions to reduce the size of reconstructions. These factors
let SILK maintain the LSM tree structure without having to resort to
insertion throttling, as we do in our system. 

\section{GiST and GIN}

\section{Automated Index Composition}
\subsection{Periodic Table of Data Structures, etc.}
\subsection{Gene}