From 7700f2818cca731cadac034322a28f19e9ac3a17 Mon Sep 17 00:00:00 2001 From: Douglas Rumbaugh Date: Fri, 20 Jun 2025 17:24:18 -0400 Subject: updates --- chapters/conclusion.tex | 113 +++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 93 insertions(+), 20 deletions(-) (limited to 'chapters/conclusion.tex') diff --git a/chapters/conclusion.tex b/chapters/conclusion.tex index 8f29e96..13457b5 100644 --- a/chapters/conclusion.tex +++ b/chapters/conclusion.tex @@ -1,24 +1,97 @@ -\chapter{Conclusion} +\chapter{Summary and Future Work} \label{chap:conclusion} -In this work, we have considered approaches for automatically -adding support for concurrent updates to static data structures, -for the purpose of reducing the amount of work necessary to produce a -dynamic index. Classical dynamization techniques suffered from several -limitations on supported data structures, as well as performance problems -stemming from a lack of configurability and poor worst-case insertion -performance. We have attempted to address these limitations. - -The result of these efforts is a generalized dynamization framework built -upon a set of novel mathematical results that allows for many static -data structures to be automatically extended with tunable, concurrent -insertion and deletion support, with bounded additional query cost. The -technique expands on the base Bentley-Saxe method with new query interfaces -to enable support for search problems that are not traditional decomposable, -a tunable design space including buffering and alternative block layout -polices to allow for trade-offs between insertion and query performance, -and support for parallel reconstructions in a manner that effectively -reduces the worst-case insertion cost while maintaining similar query -performance. +One of the perennial problems in database systems is the design of new +indices to support new data types and search problems. While there exist +numerous data structures that could be used as the basis for such indices, +there is a mismatch between the required feature set of an index and +that of a data structure. This requires a significant amount of effort +to be expended in order to implement the missing features. In order +to circumvent this problem, there have been past efforts at creating +systems for automating some, or all, of the index design process in +certain contexts. These existing efforts fall short of a truly general +solution to the problem of automatic index generation. Automatic index +composition assumes a particular search problem and a set of data +structure primitives, and then composes those primitives into a custom +structure that is optimized for a particular workload. Generalized index +templates assume a solution structure, and attempt to solve a search +problem within that structure. In both cases, the core methodology of +the approach imposes restrictions on the types of problems to which they +can be applied. Thus, neither is a truly viable approach to creating +indices for arbitrary search problems in the general case. + +We propose a system based on a third technique: automatic feature +extension. Starting with an existing data structure for the search problem +of interest, various general techniques can be used to automatically +add the features missing by the structure to create an index. A special +case of this approach is well studied in the theoretical literature: +dynamization. Dynamization seeks to automatically add support for +inserts, and sometimes deletes, to a static data structure for a search +problem that satisfies certain constraints. Dynamization has a number +of limitations that prevent it from standing on its own as a solution +to this problem, and so this work has concentrated on overcoming these +shortcomings. + +By introducing new classifications of search problem, along with +mechanisms to support solving them over a dynamized structure, we extended +the applicability of dynamization techniques to a broader set of data +structures and search problems, as well as increased the number of search +problems for which deletes can be efficiently supported. We considered +the design space of the similarly structured LSM Tree data structure, +and borrowed certain applicable elements to introduce a configurable +design space to allow for trade-offs between insertion and query +performance. We then devised a system for controlling the worst-case +insertion performance dynamized structures, leveraging concurrency to +match the lowest existing worst-case bound in the theoretical literature, +and then parallelism to beat it. + +Through this effort, we have managed to resolve what we saw as the most +significant barriers to the use of dynamization in the context of database +indexing. + + +\section{Future Work} +While this is a significant step forward, there remains significant +work to be done before the ultimate goal of a general, automatic index +generation framework has been reached. We have resolved a number +of existing problems to make dynamization viable in the context of +database systems, as well as expanded the scope of dynamization to +include concurrency, but a database index requires more features than +update support. In particular, our framework must also support the +following additional features, + +\begin{enumerate} + \item \textbf{Support for external storage.} \\ + While we did have an implementation of sampling framework + discussed in Chapter~\ref{chap:sampling} that used an external + data structure, the general framework discussed in the following + chapters was considered for in-memory structures only. We will need + to extend it with support for external structures, as well as evaluate + whether our proposed techniques still function effectively in this + context. + \item \textbf{Crash recovery.} \\ + It is critical for a database index to support crash recovery, + so that it can be recovered to a state consistent with the rest of + the database in the event of a system fault. Because our dynamized + indices are append-only, and can be viewed as a log of sorts, + inefficient crash recovery is straightforward: All operations + can be logged and replayed in the event of a crash. But this is + highly inefficient, and so a better scheme must be devised. + \item \textbf{Distributed systems support.} \\ + The append-only and decomposed nature of dynamized indices make + them seem a natural fit in a distributed systems context. This was + briefly discussed in Section~\ref{ssec:ext-distributed}. While + not required for all, or even most, applications, support for + automatically distributing an index over multiple nodes in a + distributed system would be desirable. +\end{enumerate} + +Once the full set of necessary index features can be supported by the +framework, we plan to integrate the system into a database to allow +user-defined indexing. To accommodate this, it will also be necessary +to devise a mechanism for allowing the query optimizer to use these +arbitrary, user-defined indices, when generating query plans. + + -- cgit v1.2.3