summaryrefslogtreecommitdiffstats
path: root/chapters/abstract.tex
blob: 5ddfd3759e76c5e6fd2c84f8729c391ac2d7c99a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Modern data systems must cope with a wider variety of data than ever
before, and as a result we've seen the proliferation of a large number of
highly specialized data management systems, such as vector and graph
databases. These systems are built upon specialized data structures for
a particular query, or class of queries, and as a result have a very
specific range of efficacy. Beyond this, they are difficult to develop
because of the requirements that they place upon the data structures at
their core, including requiring support for concurrent updates. As a
result, a large number of potentially useful data structures are excluded
from use in such systems, or at the very least require a large amount of
development time to be made useful.

This work seeks to address this difficulty by introducing a framework for
automatic data structure dynamization. Given a static data structure and
an associated query, satisfying certain requirements, this proposed work
will enable automatically adding support for concurrent updates, with
minimal modification to the data structure itself. It is based on a
body of theoretical work on dynamization, often called the "Bentley-Saxe
Method", which partitions data into a number of small data structures,
and periodically rebuilds these as records are inserted or deleted, in
a manner that maintains asymptotic bounds on worst case query time,
as well as amortized insertion time. These techniques, as they currently
exist, are limited in usefulness as they exhibit poor performance in
practice, and lack support for concurrency. But, they serve as a solid
theoretical base upon which a novel system can be built to address
these concerns.

To develop this framework, sampling queries (which are not well served
by existing dynamic data structures) are first considered. The results
of this analysis are then generalized to produce a framework for
single-threaded dynamization that is applicable to a large number
of possible data structures and query types, and the general framework
evaluated across a number of data structures and query types. These
dynamized static structures are shown to equal or exceed the performance
of existing specialized dynamic structures in both update and query
performance.

Finally, this general framework is expanded with support for concurrent
operations (inserts and queries), and the use of scheduling and
parallelism is studied to provide worst-case insertion guarantees,
as well as a rich trade-off space between query and insertion performance.