1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
|
\chapter{Summary and Future Work}
\label{chap:conclusion}
One of the perennial problems in database systems is the design of new
indices to support new data types and search problems. While there exist
numerous data structures that could be used as the basis for such indices,
there is a mismatch between the required feature set of an index and
that of a data structure. This requires a significant amount of effort
to be expended in order to implement the missing features. In order
to circumvent this problem, there have been past efforts at creating
systems for automating some, or all, of the index design process in
certain contexts. These existing efforts fall short of a truly general
solution to the problem of automatic index generation. Automatic index
composition assumes a particular search problem and a set of data
structure primitives, and then composes those primitives into a custom
structure that is optimized for a particular workload. Generalized index
templates assume a solution structure, and attempt to solve a search
problem within that structure. In both cases, the core methodology of
the approach imposes restrictions on the types of problems to which they
can be applied. Thus, neither is a truly viable approach to creating
indices for arbitrary search problems in the general case.
We propose a system based on a third technique: automatic feature
extension. Starting with an existing data structure for the search problem
of interest, various general techniques can be used to automatically
add the features missing by the structure to create an index. A special
case of this approach is well studied in the theoretical literature:
dynamization. Dynamization seeks to automatically add support for
inserts, and sometimes deletes, to a static data structure for a search
problem that satisfies certain constraints. Dynamization has a number
of limitations that prevent it from standing on its own as a solution
to this problem, and so this work has concentrated on overcoming these
shortcomings.
By introducing new classifications of search problem, along with
mechanisms to support solving them over a dynamized structure, we extended
the applicability of dynamization techniques to a broader set of data
structures and search problems, as well as increased the number of search
problems for which deletes can be efficiently supported. We considered
the design space of the similarly structured LSM Tree data structure,
and borrowed certain applicable elements to introduce a configurable
design space to allow for trade-offs between insertion and query
performance. We then devised a system for controlling the worst-case
insertion performance dynamized structures, leveraging concurrency to
match the lowest existing worst-case bound in the theoretical literature,
and then parallelism to beat it.
Through this effort, we have managed to resolve what we saw as the most
significant barriers to the use of dynamization in the context of database
indexing.
\section{Future Work}
While this is a significant step forward, there remains significant
work to be done before the ultimate goal of a general, automatic index
generation framework has been reached. We have resolved a number
of existing problems to make dynamization viable in the context of
database systems, as well as expanded the scope of dynamization to
include concurrency, but a database index requires more features than
update support. In particular, our framework must also support the
following additional features,
\begin{enumerate}
\item \textbf{Automatic Tuning of Insertion Rejection Rate.} \\
The tail latency control system discussed
in Chapter~\ref{chap:tail-latency} is based upon setting a
rejection rate parameter for inserts, which must be tuned for
the data structure being dynamized. The current version treats
this as a user-specified constant parameter, but it would be
ideal for this parameter to be automatically determined based
on the performance of the framework. In particular, we noted in
Chapter~\ref{chap:tail-latency} that having it fixed to a single
value is sub-optimal for some data structures, and there also
exist opportunities to dynamically adjust it based on the actual
rate of inserts into the system to achieve better throughput. The
design of a system for doing this automatic rejection rate tuning is
an important next step for the framework.
\item \textbf{Support for external storage.} \\
While we did have an implementation of sampling framework
discussed in Chapter~\ref{chap:sampling} that used an external
data structure, the general framework discussed in the following
chapters was considered for in-memory structures only. We will need
to extend it with support for external structures, as well as evaluate
whether our proposed techniques still function effectively in this
context.
\item \textbf{Crash recovery.} \\
It is critical for a database index to support crash recovery,
so that it can be recovered to a state consistent with the rest of
the database in the event of a system fault. Because our dynamized
indices are append-only, and can be viewed as a log of sorts,
inefficient crash recovery is straightforward: All operations
can be logged and replayed in the event of a crash. But this is
highly inefficient, and so a better scheme must be devised.
\item \textbf{Distributed systems support.} \\
The append-only and decomposed nature of dynamized indices make
them seem a natural fit in a distributed systems context. This was
briefly discussed in Section~\ref{ssec:ext-distributed}. While
not required for all, or even most, applications, support for
automatically distributing an index over multiple nodes in a
distributed system would be desirable.
\end{enumerate}
Once the full set of necessary index features can be supported by the
framework, we plan to integrate the system into a database to allow
user-defined indexing. To accommodate this, it will also be necessary
to devise a mechanism for allowing the query optimizer to use these
arbitrary, user-defined indices, when generating query plans.
|