From 5e4ad2777acc4c2420514e39fb98b7cf2e200996 Mon Sep 17 00:00:00 2001 From: Douglas Rumbaugh Date: Sun, 27 Apr 2025 17:36:57 -0400 Subject: Initial commit --- chapters/abstract.tex | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 chapters/abstract.tex (limited to 'chapters/abstract.tex') diff --git a/chapters/abstract.tex b/chapters/abstract.tex new file mode 100644 index 0000000..5ddfd37 --- /dev/null +++ b/chapters/abstract.tex @@ -0,0 +1,42 @@ +Modern data systems must cope with a wider variety of data than ever +before, and as a result we've seen the proliferation of a large number of +highly specialized data management systems, such as vector and graph +databases. These systems are built upon specialized data structures for +a particular query, or class of queries, and as a result have a very +specific range of efficacy. Beyond this, they are difficult to develop +because of the requirements that they place upon the data structures at +their core, including requiring support for concurrent updates. As a +result, a large number of potentially useful data structures are excluded +from use in such systems, or at the very least require a large amount of +development time to be made useful. + +This work seeks to address this difficulty by introducing a framework for +automatic data structure dynamization. Given a static data structure and +an associated query, satisfying certain requirements, this proposed work +will enable automatically adding support for concurrent updates, with +minimal modification to the data structure itself. It is based on a +body of theoretical work on dynamization, often called the "Bentley-Saxe +Method", which partitions data into a number of small data structures, +and periodically rebuilds these as records are inserted or deleted, in +a manner that maintains asymptotic bounds on worst case query time, +as well as amortized insertion time. These techniques, as they currently +exist, are limited in usefulness as they exhibit poor performance in +practice, and lack support for concurrency. But, they serve as a solid +theoretical base upon which a novel system can be built to address +these concerns. + +To develop this framework, sampling queries (which are not well served +by existing dynamic data structures) are first considered. The results +of this analysis are then generalized to produce a framework for +single-threaded dynamization that is applicable to a large number +of possible data structures and query types, and the general framework +evaluated across a number of data structures and query types. These +dynamized static structures are shown to equal or exceed the performance +of existing specialized dynamic structures in both update and query +performance. + +Finally, this general framework is expanded with support for concurrent +operations (inserts and queries), and the use of scheduling and +parallelism is studied to provide worst-case insertion guarantees, +as well as a rich trade-off space between query and insertion performance. + -- cgit v1.2.3