chapters/conclusion.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

\chapter{Conclusion}
\label{chap:conclusion}

Using data structures, a wide range of analytical queries against large data
sets can be accelerated. Unfortunately, these data structures must be
concurrently updatable to ensure timely results, as the underlying data is
frequently subject to change. This requirement for concurrent update support
excludes many possible data structures from use in these contexts, and the
creation of a data structure with update support is non-trivial.

The framework proposed by this work would allow for existing data
structures to be automatically extended with tunable support for
concurrent updates, with potential for future work to add even more
features. It is based on an extension of the Bentley-Saxe method,
which supports updates in static structures by splitting the data
structure into multiple partitions and systematically reconstructing
them. The Bentley-Saxe method has been adjusted to utilize a different
query interface, based on the newly proposed extended decomposability,
which brings with it more efficient support for many types of search
problems not well served by the original techniques. It also introduces
two approaches for handling deletes, buffering of inserts, and a more
tunable reconstruction strategy, as well as support for concurrency,
none of which were present in the original method.

Using this framework, many data structures and search problems can be
used as the basis of an index, requiring only that they support the
eDSP abstraction and can uniquely identify and locate each record. The
creation of an index requires only a small amount of shim code between
the structure and the framework (called a shard).

The current version of the framework supports tunable, single-threaded
updates, and has been experimentally validated to extend static data
structures with update support, and maintain performance on-par
with or better than existing dynamic alternatives for a number of
complex search problems, including k-nearest neighbor and a variety
of independent sampling problems. Beyond presenting these results,
this work proposes the extension of this framework with support for
concurrency with tail-latency mitigations, online and fine-grained
tuning, and examining more sophisticated data partitioning schemes to
ease certain challenges associated with large-scale reconstructions.
The completion of this framework would be a major milestone in a larger
project to vastly expand the capabilities of database management systems
through the use of more complex data access primitives.