diff options
Diffstat (limited to 'chapters/tail-latency.tex')
| -rw-r--r-- | chapters/tail-latency.tex | 61 |
1 files changed, 59 insertions, 2 deletions
diff --git a/chapters/tail-latency.tex b/chapters/tail-latency.tex index a88fe0c..4e79cff 100644 --- a/chapters/tail-latency.tex +++ b/chapters/tail-latency.tex @@ -939,7 +939,7 @@ able to reduce the insertion tail latency, while being able to match the general insertion and query performance of a strict tiering policy. Recall that, in the insertion stall case, no explicit shard capacity limits are enforced by the framework. Reconstructions are triggered with each buffer -flush on all levels exceeding a specified shard count ($s = 4$ in these +flush on all levels exceeding a specified shard count ($s = 6$ in these tests) and the buffer flushes immediately when full with no regard to the state of the structure. Thus, limiting the insertion latency is the only means the system uses to maintain its shard count at a manageable level. @@ -957,7 +957,7 @@ each time the buffer flushed. Note that a stall value of one indicates no stalling at all, and values less than one indicate $1 - \delta$ probability of an insert being rejected. Thus, a lower stall value means more stalls are introduced. The tiering policy is strict tiering with a -scale factor of $s=4$. It uses the concurrency control scheme described +scale factor of $s=6$. It uses the concurrency control scheme described in Section~\ref{ssec:dyn-concurrency}. @@ -1101,6 +1101,63 @@ to provide a superior set of design trade-offs than the strict policies, at least in environments where sufficient parallel processing and memory are available to leverage parallel reconstructions. +\subsection{Thread Scaling} + +\begin{figure} +\centering +\subfloat[Insertion Throughput vs. Query Latency]{\includegraphics[width=.5\textwidth]{img/tail-latency/recon-thread-scale.pdf} \label{fig:tl-latency-threads}} +\subfloat[Insertion Query Interference]{\includegraphics[width=.5\textwidth]{img/tail-latency/knn-stall-shard-dist.pdf} \label{fig:tl-query-scaling}} \\ + +\caption{Framework Thread Scaling} +\label{fig:tl-threads} + +\end{figure} + +In the previous tests, we ran our system configured with 32 available +threads, which was more than enough to run all reconstructions and +queries fully in parallel. However, it's important to determine how well +the system works in more resource constrained environments. The system +shares internal threads between reconstructions and queries, and that +flushing occurs on a dedicated thread seperate from these. During the +benchmark, one client thread issued queries continously and another +issued inserts. The index accumulated a total of five levels, so +the maximum amount of parralelism available during the testing was 4 +parallel reconstructions, along with the dedicated flushing thread and +any concurrent queries. In these tests, we used the SOSD \texttt{OSM} +dataset (200M records) and point-lookup queries without early abort +against a dynamized ISAM tree. + +For our first test, we considered the insertion throughput vs. query +latency trade-off for various stall amounts with several internal +thread counts. We inserted 30\% of the dataset first, and then measured +the insertion throughput over the insertion of the rest of the data +on a client thread, while another client thread continously issued +queries against the structure. The results of this test are shown in +Figure~\ref{fig:tl-latency-threads}. The first note is that the change +in the number of available internal threads has little effect on the +insertion throughput. This is to be expected, as inserts throughput is +limited only by the stall amount, and by the buffer flushing operation. As +flushing occurs on a dedicated thread, it is unaffected by changes in the +internal thread configuration of the system. + +Query latency, however, does show a difference at the upper end of +insertion throughput. Insufficient parallel threads can affect the +query latency in two ways, +\begin{enumerate} + \item As queries and reconstructions share threads, if all threads + are occupied by a long running reconstruction, then queries must wait + for the reconstruction to complete before they can execute. + \item Increased capacity for parallel reconstructions allows shards + to be merged more rapidly, resulting in an overall reduction in the + shard count. +\end{enumerate} +Interestingly, at least in this test, both of these effects are largely +supressed with only a moderate reduction in insertion throughput. But, +insufficient parallelism does result in the higher-throughput +configurations suffering a significant query latency increase. + + + \section{Conclusion} |