1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
|
\chapter{Introduction}
It probably goes without saying that database systems are heavily
dependent upon data structures, both for auxiliary use within the system
itself, and for indexing the data in storage to facilitate faster access.
As a result of this, the design of novel data structures constitutes a
significant sub-field within the database community. However, there is a
stark divide between theoretical work and so-called "practical" work in
this area, with many theoretically oriented data structures not seeing
much, if any, use in real systems. I would go so far as to assert that
many of these published data structures have \emph{never} been actually
used.
This situation exists with reason, of course. Fundamentally, the rules
of engagement within the theory community differ from those within the
systems community. Asymptotic analysis, which eschews constant factors,
dominates theoretical analysis of data structures, whereas the systems
community cares a great deal about these constants. We'll see within
this document itself just how significant a divide this is in terms of
real performance numbers. But, perhaps an even more significant barrier
to theoretical data structures is that of support for features.
A data structure, technically speaking, only needs to define algorithms
for constructing and querying it. I'll describe such minimal structures
as \emph{static data structures} within this document. Many theoretical
structures that seem potentially useful fall into this category. Examples
include alias-augmented structures for independent sampling, vantage-point
trees for multi-dimensional similarity search, ISAM trees for traditional
one-dimensional indexing, the vast majority of learned indexes, etc.
These structures allow for highly efficient answering of their associated
types of query, but have either fallen out of use (ISAM Trees) or have
yet to see widespread adoption in database systems. This is because the
minimal interface provided by a static data structure is usually not
sufficient to address the real-world engineering challenges associated
with database systems. Instead, data structures used by such systems must
support variety of additional features: updates to the underlying data,
concurrent access, fault-tolerance, etc. This lack of feature support
is a major barrier to the adoption of such structures.
In the current data structure design paradigm, support for such features
requires extensive redesign of the static data structure, often over a
lengthy development cycle. Learned indexes provide a good case study for
this. The first learned index, RMI, was proposed by Kraska \emph{et al.}
in 2017~\cite{kraska-rmi}. As groundbreaking as this data structure,
and the idea behind it, was, it lacks support for updates and thus was
of very limited practical utility. Work then proceeded over the next
year-and-a-half to develop an updatable data structure based on the
concepts of RMI, culminating in ALEX~\cite{alex}, which first appeared
on archive a year-and-a-half later. The next several years saw the
development of a wide range of learned indexes, promising support for
updates and concurrency. However, a recent survey found that all of them
were still largely inferior to more mature indexing techniques, at least
on certain workloads.
These adventures in learned index design represent much of the modern
index design process in microcosm. It is not unreasonable to expect
that, as the technology matures, learned indexes may one day become
commonplace. But the amount of development and research effort to get
there is, clearly, vast.
On the opposite end of the spectrum, theoretical data structure works
also attempt to extend their structures with update support using a
variety of techniques. However, the differing rules of engagement often
result in solutions to this problem that are horribly impractical in
database systems. As an example, Hu, Qiao, and Tao have proposed a data
structure for efficient range sampling, and included in their design a
discussion of efficient support for updates~\cite{irs}. Without getting
into details, they need to add multiple additional data structures beside
their sampling structure to facilitate this, including a hash table and
multiple linked lists. Asymptotically, this approach doesn't affect space
or time complexity as there is a constant number of extra structures,
and the cost of maintaining and accessing them are on par with the costs
associated with their main structure. But it's clear that the space
and time costs of these extra data structures would have relevance in
a real system. A similar problem arises in a recent attempt to create a
dynamic alias structure, which uses multiple auxiliary data structures,
and further assumes that the key space size is a constant that can be
neglected~\cite{that-paper}.
Further, update support is only one of many features that a data
structure must support for use in database systems. Given these challenges
associated with just update support, one can imagine the amount of work
required to get a data structure fully ``production ready''!
However, all of these tribulations are, I'm going to argue, not
fundamental to data structure design, but rather a consequence of the
modern data structure design paradigm. Rather than this process of manual
integration of features into the data structure itself, we propose a
new paradigm: \emph{Framework-driven Data Structure Design}. Under this
paradigm, the process of designing a data structure is reduced to the
static case: an algorithm for querying the structure and an algorithm
for building it from a set of elements. Once these are defined, a high
level framework can be used to automatically add support for other
desirable features, such as updates, concurrency, and fault-tolerance,
in a manner that is mostly transparent to the static structure itself.
This idea is not without precedent. For example, a similar approach
is used to provide fault-tolerance to indexes within traditional,
disk-based RDBMS. The RDBMS provides a storage engine which has its own
fault tolerance systems. Any data structure built on top of this storage
engine can benefit from its crash recovery, requiring only a small amount
of effort to integrate the system. As a result, crash recovery/fault
tolerance is not handled at the level of the data structure in such
systems. The B+Tree index itself doesn't have the mechanism built into
it, it relies upon the framework provided by the RDBMS.
Similarly, there is an existing technique which uses a similar process
to add support for updates to static structures, commonly called the
Bentley-Saxe method.
\section{Research Objectives}
The proposed project has four major objectives,
\begin{enumerate}
\item Automatic Dynamic Extension
The first phase of this project has seen the development of a
\emph{dynamic extension framework}, which is capable of adding
support for inserts and deletes of data to otherwise static data
structures, so long as a few basic assumptions about the structure
and associated queries are satisfied. This framework is based on
the core principles of the Bentley-Saxe method, and is implemented
using C++ templates to allow for ease of use.
As part of the extension of BSM, a large design space has been added,
giving the framework a trade-off space between memory usage, insert
performance, and query performance. This allows for the performance
characteristics of the framework-extended data structure to be tuned
for particular use cases, and provides a large degree of flexibility
to the technique.
\item Automatic Concurrency Support
Because the Bentley-Saxe method is based on the reconstruction
of otherwise immutable blocks, a basic concurrency implementation
is straightforward. While there are hard blocking points when a
reconstruction requires the results of an as-of-yet incomplete
reconstruction, all other operations can be easily performed
concurrently, so long as the destruction of blocks can be deferred
until all operations actively using it are complete. This lends itself
to a simple epoch-based system, where a particular configuration of
blocks constitutes an epoch, and the reconstruction of one or more
blocks triggers a shift to a new epoch upon its completion. Each
query will see exactly one epoch, and that epoch will remain in
existence until all queries using it have terminated.
With this strategy, the problem of adding support for concurrent
operations is largely converted into one of resource management.
Retaining old epochs, adding more buffers, and running reconstruction
operations all require storage. Further, large reconstructions
consume memory bandwidth and CPU resources, which must be shared
with active queries. And, at least some reconstructions will actively
block others, which will lead to tail latency spikes.
The objective of this phase of the project is the creation of a
scheduling system, built into the framework, that will schedule
queries and merges so as to ensure that the system operates within
specific tail latency and resource utilization constraints. In
particular, it is important to effectively hide the large insertion
tail latencies caused by reconstructions, and to limit the storage
required to retain old versions of the structure. Alongside
scheduling, the use of admission control will be considered for helping
to maintain latency guarantees even in adversarial conditions.
\item Automatic Multi-node Support
It is increasingly the case that the requirements for data management
systems exceed the capacity of a single node, requiring horizontal
scaling. Unfortunately, the design of data structures that work
effectively in a distributed, multi-node environment is non-trivial.
However, the same design elements that make it straightforward to
implement a framework-driven concurrency system should also lend
themselves to adding multi-node support to a data structure. The
framework uses immutable blocks of data, which are periodically
reconstructed by combining them with other blocks. This system is
superficially similar to the RDDs used by Apache Spark, for example.
What is not so straightforward, however, is the implementation
decisions that underlie this framework. It is not obvious that the
geometric block sizing technique used by BSM is well suited to this
task, and so a comprehensive evaluation of block sizing techniques
will be required. Additionally, there are significant challenges
to be overcome regarding block placement on nodes, fault-tolerance
and recovery, how best to handle buffering, and the effect of block
sizing strategies and placement on end-to-end query performance. All
of these problems will be studied during this phase of the project.
\item Automatic Performance Tuning
During all phases of the project, various tunable parameters will
be introduced that allow for various trade-offs between insertion
performance, query performance, and memory usage. These allow for a
user to fine-tune the performance characteristics of the framework
to suit her use-cases. However, this tunability may introduce an
obstacle to adoption for the system, as it is not necessarily trivial
to arrive at an effective configuration of the system, given a set of
performance requirements. Thus, the final phase of the project will
consider systems to automatically tune the framework. As a further
benefit, such a system could allow dynamic adjustment to the tunable
parameters of the framework during execution, to allow for automatic
and transparent evolution in the phase of changing workloads.
\end{enumerate}
\begin{enumerate}
\item Thrust 1. Automatic Concurrency and Scheduling
The design of the framework lends itself to a straightforward, data
structure independent, concurrency implementation, but ensuring good
performance of this implementation will require intelligent scheduling.
In this thrust, we will study the problem of scheduling operations
within the framework to meet certain tail latency guarantees, within a
particular set of resource constraints.
RQ1: How best to parameterize merge and query operations
RQ2: Develop a real-time (or nearly real time) scheduling
system to make decisions about when the merge, while
ensuring certain tail latency requirements within a
set of resource constraints
\item Thrust 2. Temporal and Spatial Data Partitioning
The framework is based upon a temporal partitioning of data, however
there are opportunities to improve the performance of certain
operations by introducing a spatial partitioning scheme as well. In
this thrust, we will expand the framework to support arbitrary
partitioning schemes, and access the efficacy of spatial partitioning
under a variety of contexts.
RQ1: What effect does spatial partitioning within levels have on
the performance of inserts and queries?
RQ2: Does a trade-offs exist between spatial and temporal partitioning?
RQ3: To what degree do results about spatial partitioning generalize
across different types of index (particularly multi-dimensional
ones).
\item Thrust 3. Dynamic Performance Tuning
The framework contains a large number of tunable parameters which allow
for trade-offs between memory usage, read performance, and write
performance. In this thrust, we will comprehensively evaluate this
design space, and develop a system for automatically adjusting these
parameters during system operation. This will allow the system to
dynamically change its own configuration when the workload changes.
RQ1: Quantity and model the effects of framework tuning parameters on
various performance metrics.
RQ2: Evaluate the utility of having a heterogeneous configuration, with
different parameter values on different levels.
RQ3: Develop a system for dynamically adjusting these values based on
current performance data.
\end{enumerate}
|