2 files changed, 40 insertions, 10 deletions
diff --git a/chapters/design-space.tex b/chapters/design-space.tex
index 32fe546..32d9b9c 100644
--- a/chapters/design-space.tex
+++ b/chapters/design-space.tex
@@ -66,7 +66,7 @@ involves adjusting constants, we will leave the design-space related
 constants within our asymptotic expressions. Additionally, we will
 perform the analysis for a simple decomposable search problem. Deletes
 will be entirely neglected, and we won't make any assumptions about
-mergability. We will also neglect the buffer size, $N_B$, during this
+mergeability. We will also neglect the buffer size, $N_B$, during this
 analysis. Buffering isn't fundamental to the techniques we are examining
 in this chapter, and including it would increase the complexity of the
 analysis without contributing any useful insights.\footnote{
@@ -709,14 +709,14 @@ throughput for the three policies for both ISAM Tree and VPTree. This
 result should correlate with the amortized insertion costs for each
 policy derived in Section~\ref{sec:design-asymp}. At a scale factor of
 $s=2$, all three policies have similar insertion performance. This makes
-sense, as both leveling and Bentley-Saxe experience write-amplificiation
-proprotional to the scale factor, and at $s=2$ this isn't significantly
-larger than tiering's write amplificiation, particularly compared
+sense, as both leveling and Bentley-Saxe experience write-amplification
+proportional to the scale factor, and at $s=2$ this isn't significantly
+larger than tiering's write amplification, particularly compared
 to the other factors influencing insertion performance, such as
 reconstruction time. However, for larger scale factors, tiering shows
 \emph{significantly} higher insertion throughput, and Leveling and
 Bentley-Saxe show greatly degraded performance due to the large amount
-of additional write amplification. These reuslts are perfectly in line
+of additional write amplification. These results are perfectly in line
 with the mathematical analysis of the previous section.
 
 \subsection{General Insert vs. Query Trends}
@@ -758,7 +758,7 @@ performance degrades linearly with scale factor, and this is well
 demonstrated in the plot.
 
 The Bentley-Saxe method appears to follow a very similar trend to that
-of leveling, albiet with even more dramatic performance degredation as
+of leveling, albeit with even more dramatic performance degradation as
 the scale factor is increased. Generally it seems to be a strictly worse
 alternative to leveling in all but its best-case query cost, and we will
 omit it from our tests moving forward as a result.
@@ -793,21 +793,21 @@ also tested $k$-NN queries with varying values of $k$.
 \end{figure}
 
 Interestingly, for the range of selectivities tested for range counts, the
-overall query latency failed to converge, and there remains a consistant,
-albiet slight, stratification amongst the tested policies, as shown in
+overall query latency failed to converge, and there remains a consistent,
+albeit slight, stratification amongst the tested policies, as shown in
 Figure~\ref{fig:design-isam-sel}. As the selectivity continues to rise
 above those shown in the chart, the relative ordering of the policies
 remains the same, but the relative differences between them begin to
 shrink. This result makes sense given the asymptotics--there is still
 \emph{some} overhead associated with the decomposition, but as the cost
 of the query approaches linear, it makes up an increasingly irrelevant
-portion of the runtime.
+portion of the run time.
 
 The $k$-NN results in Figure~\ref{fig:design-knn-sel} show a slightly
 different story. This is also not surprising, because $k$-NN is a
 $C(n)$-decomposable problem, and the cost of result combination grows
 with $k$. Thus, larger $k$ values will \emph{increase} the effect that
-the decomposition has on the query runtime, unlike was the case in the
+the decomposition has on the query run time, unlike was the case in the
 range count queries, where the total cost of the combination is constant.
 
 % \section{Asymptotically Relevant Trade-offs}
diff --git a/chapters/tail-latency.tex b/chapters/tail-latency.tex
index 0cdeeab..e63f3c9 100644
--- a/chapters/tail-latency.tex
+++ b/chapters/tail-latency.tex
@@ -960,3 +960,33 @@ at a variety of stall proportions.
 
 
 \section{Conclusion}
+
+In this section, we addressed the final of the three major problems of
+dynamization: tail latency. We proposed a technique for limiting the
+rate of insertions to match the rate of reconstruction that is able to
+match the worst-case optimized approach of Overmars~\cite{overmars81} on
+a single thread, and able to exceed it given multiple parallel threads.
+We then implemented the necessary mechanisms to support this technique
+within our framework, including a significantly improved architecture
+for scheduling and executing parallel and background reconstructions,
+and a system for rate limiting by rejecting inserts via Bernoulli sampling.
+
+We evaluated this system for fixed insertion rejection rates, and found
+significant improvements in tail latencies, approaching the practical lower
+bound we established using the equal block method, without requiring
+significant degradation of query performance. In fact, we found that
+this rate limiting mechanism provides a design space with more effective
+trade-offs than the one we examined in Chapter~\ref{chap:design-space},
+with the system being able to exceed the query performance of an
+equivalently configured tiering system for certain rate limiting
+configurations. The method has limitations, assigning a fixed rejection
+rate of inserts works well for linear time constructable structures like
+the ISAM Tree, but was significantly less effective for the VPTree, which
+requires $\Theta(n \log n)$ time to construct. For structures like this,
+it will be necessary to dynamically scale the amount of throttling based
+on the record count and size of reconstruction. Additionally, our current
+system isn't easily capable of reaching the ``ideal'' goal of being able
+to reliably trade query performance and insertion latency at a fixed
+throughput. Nonetheless, the mechanisms for supporting such features
+are present, and even this simple implementation represents a marked
+improvement in terms of both insertion tail latency and configurability.