1 files changed, 171 insertions, 158 deletions
diff --git a/chapters/tail-latency.tex b/chapters/tail-latency.tex
index 1d707b4..ee578a1 100644
--- a/chapters/tail-latency.tex
+++ b/chapters/tail-latency.tex
@@ -6,7 +6,7 @@
 \begin{figure}
 \subfloat[Insertion Throughput]{\includegraphics[width=.5\textwidth]{img/tail-latency/btree-tput.pdf} \label{fig:tl-btree-isam-tput}} 
 \subfloat[Insertion Latency Distribution]{\includegraphics[width=.5\textwidth]{img/tail-latency/btree-dist.pdf} \label{fig:tl-btree-isam-lat}} \\
-\caption{Insertion Performance of Dynamized ISAM vs. B+Tree}
+\caption{Insertion Performance of Dynamized ISAM vs. B+tree}
 \label{fig:tl-btree-isam}
 \end{figure}
 
@@ -17,7 +17,7 @@ structures with good overall insertion throughput, the latency of
 individual inserts is highly variable.  To illustrate this problem,
 consider the insertion performance in Figure~\ref{fig:tl-btree-isam},
 which compares the insertion latencies of a dynamized ISAM tree with
-that of its most direct dynamic analog: a B+Tree. While, as shown
+that of its most direct dynamic analog: a B+tree. While, as shown
 in Figure~\ref{fig:tl-btree-isam-tput}, the dynamized structure has
 superior average performance to the native dynamic structure, the
 latency distributions, shown in Figure~\ref{fig:tl-btree-isam-lat} are
@@ -25,10 +25,10 @@ quite different. The dynamized structure has much better best-case
 performance, but the worst-case performance is exceedingly poor.
 
 This poor worst-case performance is a direct consequence of the different
-approaches used by the dynamized structure and B+Tree to support updates.
-B+Trees use a form of amortized local reconstruction, whereas the
+approaches used by the dynamized structure and B+tree to support updates.
+B+trees use a form of amortized local reconstruction, whereas the
 dynamized ISAM tree uses amortized global reconstruction. Because the
-B+Tree only reconstructs the portions of the structure ``local'' to the
+B+tree only reconstructs the portions of the structure ``local'' to the
 update, even in the worst case only a small part of the data structure
 will need to be adjusted. However, when using global reconstruction
 based techniques, the worst-case insert requires rebuilding either the
@@ -37,8 +37,8 @@ proportion of it (for leveling). The fact that our dynamization technique
 uses buffering, and most of the shards involved in reconstruction are
 kept small by the logarithmic decomposition technique used to partition
 it, ensures that the majority of inserts are low cost compared to the
-B+Tree. At the extreme end of the latency distribution, though, the
-local reconstruction strategy used by the B+Tree results in significantly
+B+tree. At the extreme end of the latency distribution, though, the
+local reconstruction strategy used by the B+tree results in significantly
 better worst-case performance.
 
 Unfortunately, the design space that we have been considering is
@@ -216,14 +216,14 @@ of which will have exactly $N_B$ records.\footnote{
 	for simplicity.
 }
 
-Applying this technique to an ISAM Tree, and compared against a
-B+Tree, yields the insertion and query latency distributions shown
+Applying this technique to an ISAM tree, and compared against a
+B+tree, yields the insertion and query latency distributions shown
 in Figure~\ref{fig:tl-floodl0}.  Figure~\ref{fig:tl-floodl0-insert}
 shows that it is possible to obtain insertion latency distributions
 using amortized global reconstruction that are directly comparable to
 dynamic structures based on amortized local reconstruction.  However,
 this performance comes at the cost of queries, which are incredibly slow
-compared to B+Trees, as shown in Figure~\ref{fig:tl-floodl0-query}.
+compared to B+trees, as shown in Figure~\ref{fig:tl-floodl0-query}.
 
 \begin{figure}
 \subfloat[Insertion Latency Distribution]{\includegraphics[width=.5\textwidth]{img/tail-latency/floodl0-insert.pdf} \label{fig:tl-floodl0-insert}} 
@@ -1109,7 +1109,44 @@ to prioritize independence in our implementation.
 In this section, we perform several experiments to evaluate the ability of
 the system proposed in Section~\ref{sec:tl-impl} to control tail latencies.
 
-\subsection{Stall Proportion Sweep}
+\subsection{Stall Rate Sweep}
+
+
+As a first test, we will evaluate the ability of our insertion stall
+mechanism to control insertion tail latencies, as well as maintain
+similar decomposed structures to strict tiering.  We consider the shard
+count directly in this test, rather than query latencies, because our
+intention is to show that this technique is capable of controlling the
+number of shards in the decomposition. The shard count also serves as
+an indirect measure of query latency, but we will consider this metric
+directly in a later section.
+
+Recall that, when using insertion stalling, our framework does \emph{not}
+block inserts to maintain a shard bound. The buffer is always flushed
+immediately, regardless of the number of shards in the structure. Thus,
+the rate of insertion is controlled by the cost of flushing the
+buffer (we still block when the buffer is full) and the insertion
+stall rate. The structure is maintained fully in the background, with
+maintenance reconstructions being scheduled for all levels exceeding a
+specified shard count. Thus, the number of shards within the structure
+is controlled indirectly by limiting the insertion rate. We ran these
+tests with 32 background threads on a system with 40 physical cores to
+ensure sufficient resources to fully parallelize all reconstructions
+(we'll consider resource constrained situations later).
+
+We tested -ISAM tree with the 200 million record SOSD \texttt{OSM}
+dataset~\cite{sosd}, as well as VPTree with the one million,
+300-dimensional, \texttt{SBW} dataset~\cite{sbw}. For each test,
+we inserted $30\%$ of the records to warm up the structure, and then
+measured the individual latency of each insert after that. We measured
+the count of shards in the structure each time the buffer flushed
+(including during the warmup period).  Note that a stall rate of $\delta
+= 1$ indicates no stalling at all, and values less than one indicate
+$1 - \delta$ probability of an insert being rejected, after which the
+insert thread sleeps for about a microsecond. A lower stall rate means
+more stalls are introduced. The tiering policy is strict tiering with
+a scale factor of $s=6$ using the concurrency control scheme described
+in Section~\ref{ssec:dyn-concurrency}.
 
 \begin{figure}
 \centering
@@ -1119,38 +1156,10 @@ the system proposed in Section~\ref{sec:tl-impl} to control tail latencies.
 \label{fig:tl-stall-200m}
 \end{figure}
 
-First, we will consider the insertion and query performance of our
-system at a variety of stall proportions. The purpose of this testing
-is to demonstrate that inserting stalls into the insertion process is
-able to reduce the insertion tail latency, while being able to match the
-general insertion and query performance of a strict tiering policy. Recall
-that, in the insertion stall case, no explicit shard capacity limits are
-enforced by the framework. Reconstructions are triggered with each buffer
-flush on all levels exceeding a specified shard count ($s = 6$ in these
-tests) and the buffer flushes immediately when full with no regard to the
-state of the structure. Thus, limiting the insertion latency is the only
-means the system uses to maintain its shard count at a manageable level.
-These tests were run on a system with sufficient available resources to
-fully parallelize all reconstructions.
-
-First, Figure~\ref{fig:tl-stall-200m} shows the results of testing
-insertion of the 200 million record SOSD \texttt{OSM} dataset in a
-dynamized ISAM tree, using both our insertion stalling technique and
-strict tiering. We inserted $30\%$ of the records, and then measured
-the individual latency of each insert after that point to produce
-Figure~\ref{fig:tl-stall-200m-dist}. Figure~\ref{fig:tl-stall-200m-shard}
-was produced by recording the number of shards in the dynamized structure
-each time the buffer flushed.  Note that a stall value of one indicates
-no stalling at all, and values less than one indicate $1 - \delta$
-probability of an insert being rejected. Thus, a lower stall value means
-more stalls are introduced. The tiering policy is strict tiering with a
-scale factor of $s=6$. It uses the concurrency control scheme described
-in Section~\ref{ssec:dyn-concurrency}.
-
-
-Figure~\ref{fig:tl-stall-200m-dist} clearly shows that all insertion
-rejection probabilities succeed in greatly reducing tail latency relative
-to tiering. Additionally, it shows a small amount of available tuning of
+We'll begin by considering the ISAM
+tree. Figure~\ref{fig:tl-stall-200m-dist} clearly shows that all
+stall rates succeed in greatly reducing tail latency relative to
+tiering. Additionally, it shows a small amount of available tuning of
 the worst-case insertion latencies, with higher stall amounts reducing
 the tail latencies slightly at various points in the distribution. This
 latter effect results from the buffer flush latency hiding mechanism,
@@ -1163,23 +1172,19 @@ resulting in a stall.
 
 Of course, if the query latency is severely affected by the
 use of this mechanism, it may not be worth using. Thus, in
-Figure~\ref{fig:tl-stall-200m-shard} we show the probability density of
-various shard counts within the dynamized structure for each stalling
-amount, as well as strict tiering. We have elected to examine the shard
-count, rather than the query latencies, for this purpose because our
-intention with this technique is to directly control the number of
-shards, and our intention is to show that this is possible. Of course,
-the shard count control is necessary for the sake of query latencies,
-and we will consider query latency directly later.
-
-This figure shows that, even for no insertion throttle at all, the shard
-count within the structure remains well behaved and normally distributed,
-albeit with a slightly longer tail and a higher average value. Once
-stalls are introduced, though, it is possible to both reduce the tail,
-and shift the peak of the distribution through a variety of points. In
-particular, we see that a stall of $.99$ is sufficient to move the peak
-to very close to tiering, and lower stalls are able to further shift the
-peak of the distribution to even lower counts.
+Figure~\ref{fig:tl-stall-200m-shard} we show the probability density
+of various shard counts within the decomposed structure for each stall
+rate, as well as strict tiering.  This figure shows that, even for no
+insertion stalling, the shard count within the structure remains well
+behaved, albeit with a slightly longer tail and a higher average value
+compared to tiering. Once stalls are introduced, it is possible to
+both reduce the tail, and shift the peak of the distribution through
+a variety of points. In particular, we see that a stall of $.99$ is
+sufficient to move the peak to very close to tiering, and lower stall
+rates are able to further shift the peak of the distribution to even
+lower counts. This result implies that this stall mechanism may be able
+to produce a trade-off space for insertion and query performance, which
+is a question we will examine in Section~\ref{ssec:tl-design-space}.
 
 \begin{figure}
 \centering
@@ -1189,15 +1194,14 @@ peak of the distribution to even lower counts.
 \label{fig:tl-stall-4b}
 \end{figure}
 
-To validate that these results were not simply a result of the relatively
-small size of the data set used, we repeated the exact same testing
-using a set of four billion uniform integers, and these results are
-shown in Figure~\ref{fig:tl-stall-4b}. These results are aligned with
-the smaller data set, with Figure~\ref{fig:tl-stall-4b-dist} showing
-the same improvements in insertion tail latency for all stall amounts,
-and Figure~\ref{fig:tl-stall-4b-shard} showing similar trends in the
-shard count. If anything, the gap between strict tiering and un-throttled
-insertion is narrower with the larger data set than the smaller one.
+To validate that these results were not due to the relatively
+small size of the data set used, we repeated the exact same
+testing using a set of four billion uniform integers, shown in
+Figure~\ref{fig:tl-stall-4b}. These results are aligned with the
+smaller data set, with Figure~\ref{fig:tl-stall-4b-dist} showing the
+same improvements in insertion tail latency for all stall amounts, and
+Figure~\ref{fig:tl-stall-4b-shard} showing similar trends in the shard
+count. 
 
 \begin{figure}
 \subfloat[Insertion Latency Distribution]{\includegraphics[width=.5\textwidth]{img/tail-latency/knn-stall-insert-dist.pdf} \label{fig:tl-stall-knn-dist}} 
@@ -1207,11 +1211,17 @@ insertion is narrower with the larger data set than the smaller one.
 \end{figure}
 
 Finally, we considered our dynamized VPTree in
-Figure~\ref{fig:tl-stall-knn}, using the \texttt{SBW} dataset of
-about one million 300-dimensional vectors. This test shows some of
-the possible limitations of our fixed rejection rate. The ISAM Tree
-tested above is constructable in roughly linear time, being an MDSP
-with $B_M(n, k) \in \Theta(n \log k)$. Thus, the ratio $\frac{B_M(n,
+Figure~\ref{fig:tl-stall-knn}, This test shows some of the possible
+limitations of our fixed stall rate mechanism. The ISAM tree tested
+above is constructable in roughly linear time, being an MDSP with
+$B_M(n, k) \in \Theta(n \log k)$, where $k$ is the number of shards, and
+thus roughly constant.\footnote{
+	For strict tiering, $k=s$ in all cases. Because we don't enforce
+	the level shard capacity directly, however, in the insertion
+	stalling case $k \in \Omega(s)$. Based on the experimental results
+	about, however, it is clear that $k$ is typically quite close to
+	$s$ in practice for ISAM tree.
+} Thus, the ratio $\frac{B_M(n,
 k)}{n}$ used to determine the optimal insertion stall rate is
 asymptotically a constant.  For VPTree, however, the construction
 cost is super-linear, with $B(n) \in \Theta(n \log n)$, and also
@@ -1231,70 +1241,74 @@ the tail latency substantially compared to strict tiering, with the same
 latency distribution effects for larger stall rates as was seen in the
 ISAM examples.
 
-Thus, we've shown that introducing even a fixed stall while allowing
-the internal structure of the dynamization to develop naturally is able
-to match the shard count distribution of strict tiering, while having
-significantly lower insertion tail latencies. 
+These tests show that, for ISAM tree at least, introducing a constant
+stall rate while allowing the decomposition to develop naturally
+with background reconstructions only is able to match the shard count
+distribution of tiering, which strictly enforces the shard count bound
+using blocking, while achieving significantly better insertion tail
+latencies. VPTree is able to achieve the same results too, albeit
+requiring significantly higher stall rates to match the shard bound. 
+
 
 \subsection{Insertion Stall Trade-off Space}
+\label{ssec:tl-design-space}
+
+\begin{figure}
+\centering
+\subfloat[ISAM w/ Point Lookup]{\includegraphics[width=.5\textwidth]{img/tail-latency/stall-latency-curve.pdf} \label{fig:tl-latency-curve-isam}} 
+\subfloat[VPTree w/ $k$-NN]{\includegraphics[width=.5\textwidth]{img/tail-latency/knn-stall-latency-curve.pdf} \label{fig:tl-latency-curve-knn}} \\
+\caption{Insertion Throughput vs. Query Latency}
+\label{fig:tl-latency-curve}
+\end{figure}
 
-While we have shown that introducing insertion stalls accomplishes the
-goal of reducing tail latencies while being able to match the shard count
-of a strict tiering reconstruction strategy, we've not yet addressed
-what the actual performance of this structure is. By throttling inserts,
-we potentially reduce the insertion throughput. And, further, it isn't
-immediately obvious just how much query performance suffers as the shard
-count distribution shifts. In this test, we examine the average values
-of insertion throughput and query latency over a variety of stall rates.
+We have shown that introducing insertion stalls accomplishes our stated
+goal of reducing insertion tail latencies while simultaneously maintaining
+a shard count in line with strict tiering. However, we have not address
+the actual performance of the structure in terms of average throughput
+or query latency. By throttling insertion, we potentially reduce the
+throughput. Further, it isn't clear in practice how much query latency
+suffers as the shard count distribution changes. In this experiment, we
+address these concerns by directly measuring the insertion throughput
+and query latency over a variety of stall rates and compare the results
+to strict tiering.
 
 The results of this test for ISAM with the SOSD \texttt{OSM} dataset are
 shown in Figure~\ref{fig:tl-latency-curve-isam}, which shows the insertion
 throughput plotted against the average query latency for our system at
-various stall rates, and with tiering configured with an equivalent
-scale factor marked as red point for reference. This plot shows two
-interesting features of the insertion stall mechanism. First, it is
-possible to introduce stalls that do not significantly affect the write
-throughput, but do improve query latency. This is seen by the difference
-between the two points at the far right of the curve, where introducing
-a slight stall improves query performance at virtually no cost. This
-represents the region of the curve where the stalling introduces delay
-that doesn't exceed the cost of a buffer flush, and so the amount of
-time spent stalling by the system doesn't change much.
-
-The second, and perhaps more notable, point that this plot shows is
-that introducing the stall rate provides a beautiful design trade-off
-between query and insert performance. In fact, this space is far more
-useful than the trade-off space represented by layout policy and scale
-factor selection using strict reconstruction schemes that we examined
-in Chapter~\ref{chap:design-space}. At the upper end of the insertion
-optimized region, we see more than double the insertion throughput of
-tiering (with significantly lower tail latencies at well) at the cost
-of a slightly larger than 2x increase in query latency. Moving down the
-curve, we see that we are able to roughly match the performance of tiering
-within this space, and even shift to more query optimized configurations.
+various stall rates, and with tiering configured with an equivalent scale
+factor marked as red point for reference. The most interesting point
+demonstrated by this plot is that introducing the stall rate provides a
+beautiful design trade-off between query and insert performance. In fact,
+this space is far more useful than the trade-off space represented by
+layout policy and scale factor selection using strict reconstruction
+schemes that we examined in Chapter~\ref{chap:design-space}. At the
+upper end of the insertion optimized region, we see more than double
+the insertion throughput of tiering (with significantly lower tail
+latencies at well) at the cost of a slightly larger than 2x increase
+in query latency. Moving down the curve, we see that we are able to
+roughly match the performance of tiering within this space, and even
+shift to more query optimized configurations. Also, this trade-off curve
+falls \emph{below} the equivalently configured tiering on the chart,
+indicating that it's performance is strictly superior.
 
 We also performed the same testing for $k$-NN queries using
 VPTree and the \texttt{SBW} dataset.  The results are shown in
 Figure~\ref{fig:tl-latency-curve-knn}. Because the run time of $k$-NN
-queries is significantly longer than the point lookups in the ISAM test,
-we additionally applied a rate limit to the query thread, issuing new
-queries every 100 milliseconds, and configured query preemption with a
-trigger point of approximately 40 milliseconds.  We applied the same
-parameters for the tiering test, and counted any additional latency
-associated with query preemption towards the average query latency figures
-reported. This test shows that, like with ISAM, we have access to a
-similarly clear trade-off space by adjusting the insertion throughput,
-however in this case the standard tiering policy did perform better in
-terms of average insertion throughput and query latency.
-
-
-\begin{figure}
-\centering
-\subfloat[ISAM w/ Point Lookup]{\includegraphics[width=.5\textwidth]{img/tail-latency/stall-latency-curve.pdf} \label{fig:tl-latency-curve-isam}} 
-\subfloat[VPTree w/ $k$-NN]{\includegraphics[width=.5\textwidth]{img/tail-latency/knn-stall-latency-curve.pdf} \label{fig:tl-latency-curve-knn}} \\
-\caption{Insertion Throughput vs. Query Latency}
-\label{fig:tl-latency-curve}
-\end{figure}
+queries is significantly longer than the point lookups in the ISAM
+test, we additionally applied a rate limit to the query thread, issuing
+new queries every 100 milliseconds, and configured query preemption
+with a trigger point of approximately 40 milliseconds.  We applied
+the same parameters for the tiering test, and counted any additional
+latency associated with query preemption towards the average query
+latency figures reported. This test shows that, like with ISAM, we have
+access to a similarly clear trade-off space by adjusting the insertion
+throughput, however in this case the standard tiering policy did perform
+better in terms of average insertion throughput and query latency. The
+fact that stalling was outperformed by strict tiering for VPTree
+isn't a surprising result, given the observations made in the previous
+test. VPTree requires significantly higher insertion throttling to keep
+up with the longer reconstruction times, and the amount of throttling
+per record is asymptotically not constant as the structure grows.
 
 This shows a very interesting result. Not only is our approach able
 to match a strict reconstruction policy in terms of average query and
@@ -1310,7 +1324,7 @@ the previous version, although these have very different performance
 implications given our different compaction strategy. In this test, we
 examine the effects of these parameters on the insertion-query trade-off
 curves noted above, as well as on insertion tail latency. The results
-are shown in Figure~\ref{fig:tl-design-space}, for a dynamized ISAM Tree
+are shown in Figure~\ref{fig:tl-design-space}, for a dynamized ISAM tree
 using the SOSD \texttt{OSM} dataset and point lookup queries.
 
 \begin{figure}
@@ -1362,7 +1376,7 @@ In the previous tests, we ran our system configured with 32 available
 threads, which was more than enough to run all reconstructions and
 queries fully in parallel. However, it's important to determine how well
 the system works in more resource constrained environments.  The system
-shares internal threads between reconstructions and queries, and that
+shares internal threads between reconstructions and queries, and 
 flushing occurs on a dedicated thread separate from these. During the
 benchmark, one client thread issued queries continuously and another
 issued inserts. The index accumulated a total of five levels, so
@@ -1381,12 +1395,12 @@ results of this test are shown in Figure~\ref{fig:tl-latency-threads}. The
 first note is that the change in the number of available internal
 threads has little effect on the insertion throughput, as shown by the
 clustering of the points on the curve. This is to be expected, as inserts
-throughput is limited only by the stall amount, and by the buffer flushing
-operation. As flushing occurs on a dedicated thread, it is unaffected
-by changes in the internal thread configuration of the system.
+throughput is limited only by the stall amount, and by buffer flushing.
+As flushing occurs on a dedicated thread, it is unaffected by changes
+in the internal thread configuration of the system.
 
 In terms of query performance, there are two general effects that can be
-observed. The first effect is that the previously noted effect of reduced
+observed. The first effect is that the previously noted reduction in
 query performance as the insertion throughput is increased is observed
 in all cases, irrespective of thread count. However, interestingly,
 the thread count itself has little effect on the curve outside of the
@@ -1398,17 +1412,16 @@ capable of significantly higher insertion throughput at a given query
 latency. But, at very low insertion throughputs, this effect vanishes
 and all thread counts are roughly equivalent in performance.
 
-A large part of the reason for this significant deviation in
-behavior between one thread and multiple is likely that queries and
-reconstructions share the same pool of background threads in this
-framework. Our testing involved issuing queries continuously on a
-single thread, while performing inserts, and so two threads background
-threads ensures that a reconstruction and query can be run in parallel,
-whereas a single thread will force queries to wait behind long running
-reconstructions. Once this bottleneck is overcome, a reduction in the
-amount of parallel reconstruction seems to have only a minor influence
-on overall performance. This is likely because, although in the worst
-case the system requires $\log_s n$ threads to fully parallelize
+A large part of the reason for this significant deviation in behavior
+between one thread and multiple is likely that queries and reconstructions
+share the same pool of background threads.  Our testing involved issuing
+queries continuously on a single thread, while performing inserts, and so
+two threads background threads ensures that a reconstruction and query can
+be run in parallel, whereas a single thread will force queries to wait
+behind long running reconstructions. Once this bottleneck is overcome,
+a reduction in the amount of parallel reconstruction seems to have only a
+minor influence on overall performance. This is because, although in the
+worst case the system requires $\log_s n$ threads to fully parallelize
 reconstructions, this worst case is fairly rare. The vast majority of
 reconstructions only require a fraction of this total parallel capacity.
 
@@ -1425,20 +1438,20 @@ within our framework, including a significantly improved architecture
 for scheduling and executing parallel and background reconstructions,
 and a system for rate limiting by rejecting inserts via Bernoulli sampling.
 
-We evaluated this system for fixed insertion rejection rates, and found
-significant improvements in tail latencies, approaching the practical lower
-bound we established using the equal block method, without requiring
-significant degradation of query performance. In fact, we found that
-this rate limiting mechanism provides a design space with more effective
-trade-offs than the one we examined in Chapter~\ref{chap:design-space},
-with the system being able to exceed the query performance of an
-equivalently configured tiering system for certain rate limiting
-configurations. The method has limitations, assigning a fixed rejection
-rate of inserts works well for linear time constructable structures like
-the ISAM Tree, but was significantly less effective for the VPTree, which
-requires $\Theta(n \log n)$ time to construct. For structures like this,
-it will be necessary to dynamically scale the amount of throttling based
-on the record count and size of reconstruction. Additionally, our current
+We evaluated this system for fixed stall rates, and found significant
+improvements in tail latencies, approaching the practical lower bound we
+established using the equal block method, without requiring significant
+degradation of query performance. In fact, we found that this rate
+limiting mechanism provides a design space with more effective trade-offs
+than the one we examined in Chapter~\ref{chap:design-space}, with the
+system being able to exceed the query performance of an equivalently
+configured tiering system for certain rate limiting configurations. The
+method has limitations, assigning a fixed rejection rate of inserts
+works well for linear time constructable structures like the ISAM tree,
+but was significantly less effective for the VPTree, which requires
+$\Theta(n \log n)$ time to construct. For structures like this, it will
+be necessary to dynamically scale the amount of throttling based on
+the record count and size of reconstruction. Additionally, our current
 system isn't easily capable of reaching the ``ideal'' goal of being able
 to reliably trade query performance and insertion latency at a fixed
 throughput. Nonetheless, the mechanisms for supporting such features