summaryrefslogtreecommitdiffstats
path: root/chapters/conclusion.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/conclusion.tex')
-rw-r--r--chapters/conclusion.tex113
1 files changed, 93 insertions, 20 deletions
diff --git a/chapters/conclusion.tex b/chapters/conclusion.tex
index 8f29e96..13457b5 100644
--- a/chapters/conclusion.tex
+++ b/chapters/conclusion.tex
@@ -1,24 +1,97 @@
-\chapter{Conclusion}
+\chapter{Summary and Future Work}
\label{chap:conclusion}
-In this work, we have considered approaches for automatically
-adding support for concurrent updates to static data structures,
-for the purpose of reducing the amount of work necessary to produce a
-dynamic index. Classical dynamization techniques suffered from several
-limitations on supported data structures, as well as performance problems
-stemming from a lack of configurability and poor worst-case insertion
-performance. We have attempted to address these limitations.
-
-The result of these efforts is a generalized dynamization framework built
-upon a set of novel mathematical results that allows for many static
-data structures to be automatically extended with tunable, concurrent
-insertion and deletion support, with bounded additional query cost. The
-technique expands on the base Bentley-Saxe method with new query interfaces
-to enable support for search problems that are not traditional decomposable,
-a tunable design space including buffering and alternative block layout
-polices to allow for trade-offs between insertion and query performance,
-and support for parallel reconstructions in a manner that effectively
-reduces the worst-case insertion cost while maintaining similar query
-performance.
+One of the perennial problems in database systems is the design of new
+indices to support new data types and search problems. While there exist
+numerous data structures that could be used as the basis for such indices,
+there is a mismatch between the required feature set of an index and
+that of a data structure. This requires a significant amount of effort
+to be expended in order to implement the missing features. In order
+to circumvent this problem, there have been past efforts at creating
+systems for automating some, or all, of the index design process in
+certain contexts. These existing efforts fall short of a truly general
+solution to the problem of automatic index generation. Automatic index
+composition assumes a particular search problem and a set of data
+structure primitives, and then composes those primitives into a custom
+structure that is optimized for a particular workload. Generalized index
+templates assume a solution structure, and attempt to solve a search
+problem within that structure. In both cases, the core methodology of
+the approach imposes restrictions on the types of problems to which they
+can be applied. Thus, neither is a truly viable approach to creating
+indices for arbitrary search problems in the general case.
+
+We propose a system based on a third technique: automatic feature
+extension. Starting with an existing data structure for the search problem
+of interest, various general techniques can be used to automatically
+add the features missing by the structure to create an index. A special
+case of this approach is well studied in the theoretical literature:
+dynamization. Dynamization seeks to automatically add support for
+inserts, and sometimes deletes, to a static data structure for a search
+problem that satisfies certain constraints. Dynamization has a number
+of limitations that prevent it from standing on its own as a solution
+to this problem, and so this work has concentrated on overcoming these
+shortcomings.
+
+By introducing new classifications of search problem, along with
+mechanisms to support solving them over a dynamized structure, we extended
+the applicability of dynamization techniques to a broader set of data
+structures and search problems, as well as increased the number of search
+problems for which deletes can be efficiently supported. We considered
+the design space of the similarly structured LSM Tree data structure,
+and borrowed certain applicable elements to introduce a configurable
+design space to allow for trade-offs between insertion and query
+performance. We then devised a system for controlling the worst-case
+insertion performance dynamized structures, leveraging concurrency to
+match the lowest existing worst-case bound in the theoretical literature,
+and then parallelism to beat it.
+
+Through this effort, we have managed to resolve what we saw as the most
+significant barriers to the use of dynamization in the context of database
+indexing.
+
+
+\section{Future Work}
+While this is a significant step forward, there remains significant
+work to be done before the ultimate goal of a general, automatic index
+generation framework has been reached. We have resolved a number
+of existing problems to make dynamization viable in the context of
+database systems, as well as expanded the scope of dynamization to
+include concurrency, but a database index requires more features than
+update support. In particular, our framework must also support the
+following additional features,
+
+\begin{enumerate}
+ \item \textbf{Support for external storage.} \\
+ While we did have an implementation of sampling framework
+ discussed in Chapter~\ref{chap:sampling} that used an external
+ data structure, the general framework discussed in the following
+ chapters was considered for in-memory structures only. We will need
+ to extend it with support for external structures, as well as evaluate
+ whether our proposed techniques still function effectively in this
+ context.
+ \item \textbf{Crash recovery.} \\
+ It is critical for a database index to support crash recovery,
+ so that it can be recovered to a state consistent with the rest of
+ the database in the event of a system fault. Because our dynamized
+ indices are append-only, and can be viewed as a log of sorts,
+ inefficient crash recovery is straightforward: All operations
+ can be logged and replayed in the event of a crash. But this is
+ highly inefficient, and so a better scheme must be devised.
+ \item \textbf{Distributed systems support.} \\
+ The append-only and decomposed nature of dynamized indices make
+ them seem a natural fit in a distributed systems context. This was
+ briefly discussed in Section~\ref{ssec:ext-distributed}. While
+ not required for all, or even most, applications, support for
+ automatically distributing an index over multiple nodes in a
+ distributed system would be desirable.
+\end{enumerate}
+
+Once the full set of necessary index features can be supported by the
+framework, we plan to integrate the system into a database to allow
+user-defined indexing. To accommodate this, it will also be necessary
+to devise a mechanism for allowing the query optimizer to use these
+arbitrary, user-defined indices, when generating query plans.
+
+