summaryrefslogtreecommitdiffstats
path: root/chapters/abstract.tex
blob: 602edd4a0df41a5ec564b8899514da91128453df (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Modern data systems must cope with a wider variety of data than ever
before, and as a result we've seen the proliferation of a large number of
highly specialized data management systems, such as vector and graph
databases. These systems are built upon specialized data structures for
a particular query, or class of queries, and as a result have a very
specific range of efficacy. Beyond this, they are difficult to develop
because of the requirements that they place upon the data structures at
their core, including requiring support for concurrent updates. As a
result, a large number of potentially useful data structures are excluded
from use in such systems, or at the very least require a large amount of
development time to be made useful.

This work seeks to address this difficulty by introducing a framework
for automatic data structure dynamization. Given a static data structure
and an associated query, satisfying certain requirements, this proposed
work will enable automatically adding support for concurrent updates,
with minimal modification to the data structure itself. It is based on a
body of theoretical work on dynamization, often called the ``Bentley-Saxe
Method'', which partitions data into a number of small data structures,
and periodically rebuilds these as records are inserted or deleted, in a
manner that maintains asymptotic bounds on worst case query time, as well
as amortized insertion time. These techniques, as they currently exist,
are limited in usefulness as they are restricted in the situations they
can be applied, lack support for configuration and concurrency, and have
poor insertion tail latency performance. Despite these shortcomings,
these techniques can serve as a solid theoretical base upon which a
novel system can be built to address these concerns.

To develop this framework, we first consider dynamizing data structures
for sampling queries (which are not well served by existing dynamic data
structures). We then generalize these results to produce a framework
that is able to provide concurrent insertion and deletion support for
a wide range of data structures and query types. Next, we examine the
design space of our framework and show that it supports useful trade-offs
between insertion and query performance. Finally, we examine the use
of concurrency and parallelism to provide better worst-case insertion
performance for our system.