\chapter{Controlling Insertion Tail Latency} \label{chap:tail-latency} \section{Introduction} \begin{figure} \subfloat[Insertion Throughput]{\includegraphics[width=.5\textwidth]{img/design-space/isam-insert-dist.pdf} \label{fig:tl-btree-isam-tput}} \subfloat[Insertion Latency Distribution]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-insert-dist.pdf} \label{fig:tl-btree-isam-lat}} \\ \caption{Insertion Performance of Dynamized ISAM vs. B+Tree} \label{fig:tl-btree-isam} \end{figure} Up to this point in our investigation, we have not directly addressed one of the largest problems associated with dynamization: insertion tail latency. While these techniques result in structures that have reasonable, or even good, insertion throughput, the latency associated with each individual insert is wildly variable. To illustrate this problem, consider the insertion performance in Figure~\ref{fig:tl-btree-isam}, which compares the insertion latencies of a dynamized ISAM tree with that of its most direct dynamic analog: a B+Tree. While, as shown in Figure~\ref{fig:tl-btree-isam-tput}, the dynamized structure has comperable average performance to the native dynamic structure, the latency distributions are quite different. Figure~\ref{fig:tl-btree-isam-lat} shows representations of the distributions. While the dynamized structure has much better "best-case" performance, the worst-case performance is exceedingly poor. That the structure exhibits reasonable performance on average is the result of these two ends of the distribution balancing each other out. The reason for this poor tail latency is reconstructions. In order to provide tight bounds on the number of shards within the structure, our techniques must block inserts once the buffer has filled, until sufficient room is cleared in the structure to accomodate these new records. This results in the worst-case insertion behavior that we described mathematically in the previous chapter. Unfortunately, the design space that we have been considering thus far is very limited in its ability to meaningfully alter the worst-case insertion performance. While we have seen that the choice between leveling and tiering can have some effect, the actual benefit in terms of tail latency is quite small, and the situation is made worse by the fact that leveling, which can have better worst-case insertion performance, lags behind tiering in terms of average insertion performance. The use of leveling can allow for a small reduction in the worst case, but at the cost of making the majority of inserts worse because of increased write amplification. \begin{figure} \subfloat[Scale Factor Sweep]{\includegraphics[width=.5\textwidth]{img/design-space/isam-insert-dist.pdf} \label{fig:tl-parm-sf}} \subfloat[Buffer Size Sweep]{\includegraphics[width=.5\textwidth]{img/design-space/vptree-insert-dist.pdf} \label{fig:tl-parm-bs}} \\ \caption{Design Space Effects on Latency Distribution} \label{fig:tl-parm-sweep} \end{figure} Additionally, the other tuning nobs that are available to us are of limited usefulness in tuning the worst case behavior. Figure~\ref{fig:tl-parm-sweep} shows the latency distributions of our framework as we vary the scale factor (Figure~\ref{fig:tl-parm-sf}) and buffer size (Figure~\ref{fig:tl-parm-bs}) respectively. There is no clear trend in worst-case performance to be seen here. This is to be expected, ultimately the worst-case reconstructions in both cases are largely the same regardless of scale factor or buffer size: a reconstruction involving $\Theta(n)$ records. The selection of configuration parameters can influence \emph{when} these reconstructions occur, as well as slightly influence their size, but ultimately the question of ``which configuration has the best tail-latency performance'' is more a question of how many insertions the latency is measured over, than any fundamental trade-offs with the design space. \begin{example} Consider two dynamized structures, $\mathscr{I}_A$ and $\mathscr{I}_B$, with slightly different configurations. Regardless of the layout policy used (of those discussed in Chapter~\ref{chap:design-space}), the worst-case insertion will occur when the structure is completely full, i.e., after \begin{equation*} n_\text{worst} = N_B + \sum_{i=0}^{\log_s n} N_B \cdot s^{i+1} \end{equation*} Let this be $n_a$ for $\mathscr{I}_B$ and $n_b$ for $\mathscr{I}_B$, and let $\mathscr{I}_A$ be configured with scale factor $s_a$ and $\mathscr{I}_B$ with scale factor $s_b$, such that $s_a < s_b$. \end{example} The upshot of this discussion is that tail latencies are due to the worst-case reconstructions associated with this method, and that the proposed design space does not provide the necessary tools to avoid or reduce these costs. \section{The Insertion-Query Trade-off}