summaryrefslogtreecommitdiffstats
path: root/chapters/future-work.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/future-work.tex')
-rw-r--r--chapters/future-work.tex174
1 files changed, 174 insertions, 0 deletions
diff --git a/chapters/future-work.tex b/chapters/future-work.tex
new file mode 100644
index 0000000..d4ddd52
--- /dev/null
+++ b/chapters/future-work.tex
@@ -0,0 +1,174 @@
+\chapter{Proposed Work}
+\label{chap:proposed}
+
+The previous two chapters described work already completed, however
+there are a number of work that remains to be done as part of this
+project. Update support is only one of the important features that an
+index requires of its data structure. In this chapter, the remaining
+research problems will be discussed briefly, to lay out a set of criteria
+for project completion.
+
+\section{Concurrency Support}
+
+Database management systems are designed to hide the latency of
+IO operations, and one of the techniques they use are being highly
+concurrent. As a result, any data structure used to build a database
+index must also support concurrent updates and queries. The sampling
+extension framework described in Chapter~\ref{chap:sampling} had basic
+concurrency support, but work is ongoing to integrate a superior system
+into the framework of Chapter~\ref{chap:framework}.
+
+Because the framework is based on the Bentley-Saxe method, it has a number
+of desirable properties for making concurrency management simpler. With
+the exception of the buffer, the vast majority of the data resides in
+static data structures. When using tombstones, these static structures
+become fully immutable. This turns concurrency control into a resource
+management problem, and suggests a simple multi-version concurrency
+control scheme. Each version of the structure, defined as being the
+state between two reconstructions, is tagged with an epoch number. A
+query, then, will read only a single epoch, which will be preserved
+in storage until all queries accessing it have terminated. Because the
+mutable buffer is append-only, a consistent view of it can be obtained
+by storing the tail of the log at the start of query execution. Thus,
+a fixed snapshot of the index can be represented as a two-tuple containing
+the epoch number and buffer tail index.
+
+The major limitation of the Chapter~\ref{chap:sampling} system was
+the handling of buffer expansion. While the mutable buffer itself is
+an unsorted array, and thus supports concurrent inserts using a simple
+fetch-and-add operation, the real hurdle to insert performance is managing
+reconstruction. During a reconstruction, the buffer is full and cannot
+support any new inserts. Because active queries may be using the buffer,
+it cannot be immediately flushed, and so inserts are blocked. Because of
+this, it is necessary to use multiple buffers to sustain insertions. When
+a buffer is filled, a background thread is used to perform the
+reconstruction, and a new buffer is added to continue inserting while that
+reconstruction occurs. In Chapter~\ref{chap:sampling}, the solution used
+was limited by its restriction to only two buffers (and as a result,
+a maximum of two active epochs at any point in time). Any sustained
+insertion workload would quickly fill up the pair of buffers, and then
+be forced to block until one of the buffers could be emptied. This
+emptying of the buffer was contingent on \emph{both} all queries using
+the buffer finishing, \emph{and} on the reconstruction using that buffer
+to finish. As a result, the length of the block on inserts could be long
+(multiple seconds, or even minutes for particularly large reconstructions)
+and indeterminate (a given index could be involved in a very long running
+query, and the buffer would be blocked until the query completed).
+
+Thus, a more effective concurrency solution would need to support
+dynamically adding mutable buffers as needed to maintain insertion
+throughput. This would allow for insertion throughput to be maintained
+so long as memory for more buffer space is available.\footnote{For the
+in-memory indexes considered thus far, it isn't clear that running out of
+memory for buffers is a recoverable error in all cases. The system would
+require the same amount of memory for storing record (technically more,
+considering index overhead) in a shard as it does in the buffer. In the
+case of an external storage system, the calculus would be different,
+of course.} It would also ensure that a long running could only block
+insertion if there is insufficient memory to create a new buffer or to
+run a reconstruction. However, as the number of buffered records grows,
+there is the potential for query performance to suffer, which leads to
+another important aspect of an effective concurrency control scheme.
+
+\subsection{Tail Latency Control}
+
+The concurrency control scheme discussed thus far allows for maintaining
+insertion throughput by allowing an unbounded portion of the new data
+to remain buffered in an unsorted fashion. Over time, this buffered
+data will be moved into data structures in the background, as the
+system performs merges (which are moved off of the critical path for
+most operations). While this system allows for fast inserts, it has the
+potential to damage query performance. This is because the more buffered
+data there is, the more a query must fall back on its inefficient
+scan-based buffer path, as opposed to using the data structure.
+
+Unfortunately, reconstructions can be incredibly lengthy (recall that
+the worst-case scenario involves rebuilding a static structure over
+all of the records; this is, thankfully, quite rare). This implies that
+it may be necessary in certain circumstances to throttle insertions to
+maintain certain levels of query performance. Additionally, it may be
+worth preemptively performing large reconstructions during periods of
+low utilization, similar to systems like Silk designed for mitigating
+tail latency spikes in LSM-tree based systems~\cite{balmau19}.
+
+Additionally, it is possible that large reconstructions may have a
+negative effect on query performance, due to system resource utilization.
+Reconstructions can use a large amount of memory bandwidth, which must
+be shared by queries. The effects of parallel reconstruction on query
+performance will need to be assessed, and strategies for mitigation of
+this effect, be it a scheduling-based solution, or a resource-throttling
+one, considered if necessary.
+
+
+\section{Fine-Grained Online Performance Tuning}
+
+The framework has a large number of configurable parameters, and
+introducing concurrency control will add even more. The parameter sweeps
+in Section~\ref{ssec:ds-exp} show that there are trade-offs between
+read and write performance across this space. Unfortunately, the current
+framework applies this configuration parameters globally, and does not
+allow them to be changed after the index is constructed. It seems apparent
+that better performance might be obtained by adjusting this approach.
+
+First, there is nothing preventing these parameters from being configured
+on a per-level basis. Having different layout policies on different
+levels (for example, tiering on higher levels and leveling on lower ones),
+different scale factors, etc. More index specific tuning, like controlling
+memory budget for auxiliary structures, could also be considered.
+
+This fine-grained tuning will open up an even broader design space,
+which has the benefit of improving the configurability of the system,
+but the disadvantage of making configuration more difficult. Additionally,
+it does nothing to address the problem of workload drift: a configuration
+may be optimal now, but will it remain effective in the future as the
+read/write mix of the workload changes? Both of these challenges can be
+addressed using dynamic tuning.
+
+The theory is that the framework could be augmented with some workload
+and performance statistics tracking. Based on these numbers, during
+reconstruction, the framework could decide to adjust the configuration
+of one or more levels in an online fashion, to lean more towards read
+or write performance, or to dial back memory budgets as the system's
+memory usage increases. Additionally, buffer-related parameters could
+be tweaked in real time as well. If insertion throughput is high, it
+might be worth it to temporarily increase the buffer size, rather than
+spawning multiple smaller buffers.
+
+A system like this would allow for more consistent performance of the
+system in the face of changing workloads, and also increase the ease
+of use of the framework by removing the burden of configuration from
+the user.
+
+
+\section{Alternative Data Partitioning Schemes}
+
+One problem with Bentley-Saxe or LSM-tree derived systems is temporary
+memory usage spikes. When performing a reconstruction, the system needs
+enough storage to store the shards involved in the reconstruction,
+and also the newly constructed shard. This is made worse in the face
+of multi-version concurrency, where multiple older versions of shards
+may be retained in memory at once. It's well known that, in the worst
+case, such a system may temporarily require double its current memory
+usage~\cite{dayan22}.
+
+One approach to addressing this problem in LSM-tree based systems is
+to adjust the compaction granularity~\cite{dayan22}. In the terminology
+associated with this framework, the idea is to further sub-divide each
+shard into smaller chunks, partitioned based on keys. That way, when a
+reconstruction is triggered, rather than reconstructing an entire shard,
+these smaller partitions can be used instead. One of the partitions in
+the source shard can be selected, and then merged with the partitions
+in the next level down having overlapping key ranges. The amount of
+memory required for reconstruction (and also reconstruction time costs)
+can then be controlled by adjusting these partitions.
+
+Unfortunately, while this system works incredibly well for LSM-tree
+based systems which store one-dimensional data in sorted arrays, it
+encounters some problems in the context of a general index. It isn't
+clear how to effectively partition multi-dimensional data in the same
+way. Additionally, in the general case, each partition would need to
+contain its own instance of the index, as the framework supports data
+structures that don't themselves support effective partitioning in the
+way that a simple sorted array would. These challenges will need to be
+overcome to devise effective, general schemes for data partitioning to
+address the problems of reconstruction size and memory usage.