diff options
Diffstat (limited to 'chapters/tail-latency.tex')
| -rw-r--r-- | chapters/tail-latency.tex | 88 |
1 files changed, 88 insertions, 0 deletions
diff --git a/chapters/tail-latency.tex b/chapters/tail-latency.tex index 9094e26..361dde0 100644 --- a/chapters/tail-latency.tex +++ b/chapters/tail-latency.tex @@ -2,3 +2,91 @@ \label{chap:tail-latency} \section{Introduction} + +\begin{figure} +\subfloat[Insertion Throughput]{\includegraphics[width=.5\textwidth]{img/design-space/isam-insert-dist.pdf} \label{fig:tl-btree-isam-tput}} +\subfloat[Insertion Latency Distribution]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-insert-dist.pdf} \label{fig:tl-btree-isam-lat}} \\ +\caption{Insertion Performance of Dynamized ISAM vs. B+Tree} +\label{fig:tl-btree-isam} +\end{figure} + +Up to this point in our investigation, we have not directly addressed +one of the largest problems associated with dynamization: insertion +tail latency. While these techniques result in structures that have +reasonable, or even good, insertion throughput, the latency associated +with each individual insert is wildly variable. To illustrate this +problem, consider the insertion performance in +Figure~\ref{fig:tl-btree-isam}, which compares the insertion latencies +of a dynamized ISAM tree with that of its most direct dynamic analog: +a B+Tree. While, as shown in Figure~\ref{fig:tl-btree-isam-tput}, +the dynamized structure has comperable average performance to the +native dynamic structure, the latency distributions are quite +different. Figure~\ref{fig:tl-btree-isam-lat} shows representations +of the distributions. While the dynamized structure has much better +"best-case" performance, the worst-case performance is exceedingly +poor. That the structure exhibits reasonable performance on average +is the result of these two ends of the distribution balancing each +other out. + +The reason for this poor tail latency is reconstructions. In order +to provide tight bounds on the number of shards within the structure, +our techniques must block inserts once the buffer has filled, until +sufficient room is cleared in the structure to accomodate these new +records. This results in the worst-case insertion behavior that we +described mathematically in the previous chapter. + +Unfortunately, the design space that we have been considering thus +far is very limited in its ability to meaningfully alter the +worst-case insertion performance. While we have seen that the choice +between leveling and tiering can have some effect, the actual benefit +in terms of tail latency is quite small, and the situation is made +worse by the fact that leveling, which can have better worst-case +insertion performance, lags behind tiering in terms of average +insertion performance. The use of leveling can allow for a small +reduction in the worst case, but at the cost of making the majority +of inserts worse because of increased write amplification. + +\begin{figure} +\subfloat[Scale Factor Sweep]{\includegraphics[width=.5\textwidth]{img/design-space/isam-insert-dist.pdf} \label{fig:tl-parm-sf}} +\subfloat[Buffer Size Sweep]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-insert-dist.pdf} \label{fig:tl-parm-bs}} \\ +\caption{Design Space Effects on Latency Distribution} +\label{fig:tl-parm-sweep} +\end{figure} + +Additionally, the other tuning nobs that are available to us are +of limited usefulness in tuning the worst case behavior. +Figure~\ref{fig:tl-parm-sweep} shows the latency distributions of +our framework as we vary the scale factor (Figure~\ref{fig:tl-parm-sf}) +and buffer size (Figure~\ref{fig:tl-parm-bs}) respectively. There +is no clear trend in worst-case performance to be seen here. This +is to be expected, ultimately the worst-case reconstructions in +both cases are largely the same regardless of scale factor or buffer +size: a reconstruction involving $\Theta(n)$ records. The selection +of configuration parameters can influence \emph{when} these +reconstructions occur, as well as slightly influence their size, but +ultimately the question of ``which configuration has the best tail-latency +performance'' is more a question of how many insertions the latency is +measured over, than any fundamental trade-offs with the design space. + +\begin{example} +Consider two dynamized structures, $\mathscr{I}_A$ and $\mathscr{I}_B$, +with slightly different configurations. Regardless of the layout +policy used (of those discussed in Chapter~\ref{chap:design-space}), +the worst-case insertion will occur when the structure is completely +full, i.e., after +\begin{equation*} +n_\text{worst} = N_B + \sum_{i=0}^{\log_s n} N_B \cdot s^{i+1} +\end{equation*} +Let this be $n_a$ for $\mathscr{I}_B$ and $n_b$ for $\mathscr{I}_B$, +and let $\mathscr{I}_A$ be configured with scale factor $s_a$ and +$\mathscr{I}_B$ with scale factor $s_b$, such that $s_a < s_b$. +\end{example} + +The upshot of this discussion is that tail latencies are due to the +worst-case reconstructions associated with this method, and that the +proposed design space does not provide the necessary tools to avoid or +reduce these costs. + +\section{The Insertion-Query Trade-off} + + |