diff options
| author | Douglas Rumbaugh <dbr4@psu.edu> | 2025-06-08 15:04:00 -0400 |
|---|---|---|
| committer | Douglas Rumbaugh <dbr4@psu.edu> | 2025-06-08 15:04:00 -0400 |
| commit | 33bc7e620276f4269ee5f1820e5477135e020b3f (patch) | |
| tree | 03a7bb2ccbf7f1d2943871a69bca18006270bd20 /chapters/tail-latency.tex | |
| parent | 50adf588694170699adfa75cd2d1763263085165 (diff) | |
| download | dissertation-33bc7e620276f4269ee5f1820e5477135e020b3f.tar.gz | |
Julia updates v2
Diffstat (limited to 'chapters/tail-latency.tex')
| -rw-r--r-- | chapters/tail-latency.tex | 32 |
1 files changed, 16 insertions, 16 deletions
diff --git a/chapters/tail-latency.tex b/chapters/tail-latency.tex index 8ec8d26..5c0e0ba 100644 --- a/chapters/tail-latency.tex +++ b/chapters/tail-latency.tex @@ -60,7 +60,7 @@ majority of inserts worse because of increased write amplification. \label{fig:tl-parm-sweep} \end{figure} -The other tuning nobs that are available to us are of limited usefulness +The other tuning knobs that are available to us are of limited usefulness in tuning the worst case behavior. Figure~\ref{fig:tl-parm-sweep} shows the latency distributions of our framework as we vary the scale factor (Figure~\ref{fig:tl-parm-sf}) and buffer size @@ -227,7 +227,7 @@ Figure~\ref{fig:tl-floodl0-insert} shows that it is possible to obtain insertion latency distributions using amortized global reconstruction that are directly comparable to dynamic structures based on amortized local reconstruction, at least in some cases. In particular, the -worst-case insertion tail latency in this model is direct function +worst-case insertion tail latency in this model is a direct function of the buffer size, as the worst-case insert occurs when the buffer must be flushed to a shard. However, this performance comes at the cost of queries, which are incredibly slow compared to B+Trees, as @@ -947,10 +947,10 @@ These tests were run on a system with sufficient available resources to fully parallelize all reconstructions. First, Figure~\ref{fig:tl-stall-200m} shows the results of testing -insertion of the 200 million record SOSD \texttt{OSM} dataset in -a dynamized ISAM tree. Using our insertion stalling technique, as -well as strict tiering. We inserted $30\%$ of the records, and then -measured the individual latency of each insert after that point to produce +insertion of the 200 million record SOSD \texttt{OSM} dataset in a +dynamized ISAM tree, using both our insertion stalling technique and +strict tiering. We inserted $30\%$ of the records, and then measured +the individual latency of each insert after that point to produce Figure~\ref{fig:tl-stall-200m-dist}. Figure~\ref{fig:tl-stall-200m-shard} was produced by recording the number of shards in the dynamized structure each time the buffer flushed. Note that a stall value of one indicates @@ -961,14 +961,14 @@ scale factor of $s=6$. It uses the concurrency control scheme described in Section~\ref{ssec:dyn-concurrency}. -Figure~\ref{fig:tl-stall-200m-dist} clearly shows that all of insertion +Figure~\ref{fig:tl-stall-200m-dist} clearly shows that all insertion rejection probabilities succeed in greatly reducing tail latency relative to tiering. Additionally, it shows a small amount of available tuning of the worst-case insertion latencies, with higher stall amounts reducing the tail latencies slightly at various points in the distribution. This latter effect results from the buffer flush latency hiding mechanism, which was retained from Chapter~\ref{chap:framework}. The buffer actually -has space to two two versions, and the second version can be filled while +has space for two versions, and the second version can be filled while the first is flushing. This means that, for more aggressive stalling, some of the time spent blocking on the buffer flush is redistributed over the inserts into the second version of the buffer, rather than @@ -1121,7 +1121,7 @@ are available to leverage parallel reconstructions. Our new system retains the concept of buffer size and scale factor from the previous version, although these have very different performance implications given our different compaction strategy. In this test, we -examine the effects of these parameters on the insertion-query tradeoff +examine the effects of these parameters on the insertion-query trade-off curves noted above, as well as on insertion tail latency. The results are shown in Figure~\ref{fig:tl-design-space}, for a dynamized ISAM Tree using the SOSD \texttt{OSM} dataset and point lookup queries. @@ -1140,13 +1140,13 @@ Figure~\ref{fig:tl-sf-curve}. Recall that our system of reconstruction in this chapter does not explicitly enforce any structural invariants, and so the scale factor's only role is in determining at what point a given level will have a reconstruction scheduled for it. Lower scale factors will -more aggresively compact shards, while higher scale factors will allow +more aggressively compact shards, while higher scale factors will allow more shards to accumulate before attempting to perform a reconstruction. Interestingly, there are clear differences in the curves, particularly at higher insertion throughputs. For lower throughputs, a scale factor of $s=2$ appears strictly inferior, while the other tested scale factors result in roughly equivalent curves. However, as the insertion throughput is -increased, the curves begin to seperate more, with $s = 6$ emerging as +increased, the curves begin to separate more, with $s = 6$ emerging as the superior option for the majority of the space. Next, we consider the effect that buffer size has on insertion @@ -1176,10 +1176,10 @@ threads, which was more than enough to run all reconstructions and queries fully in parallel. However, it's important to determine how well the system works in more resource constrained environments. The system shares internal threads between reconstructions and queries, and that -flushing occurs on a dedicated thread seperate from these. During the -benchmark, one client thread issued queries continously and another +flushing occurs on a dedicated thread separate from these. During the +benchmark, one client thread issued queries continuously and another issued inserts. The index accumulated a total of five levels, so -the maximum amount of parralelism available during the testing was 4 +the maximum amount of parallelism available during the testing was 4 parallel reconstructions, along with the dedicated flushing thread and any concurrent queries. In these tests, we used the SOSD \texttt{OSM} dataset (200M records) and point-lookup queries without early abort @@ -1189,7 +1189,7 @@ For our first test, we considered the insertion throughput vs. query latency trade-off for various stall amounts with several internal thread counts. We inserted 30\% of the dataset first, and then measured the insertion throughput over the insertion of the rest of the data -on a client thread, while another client thread continously issued +on a client thread, while another client thread continuously issued queries against the structure. The results of this test are shown in Figure~\ref{fig:tl-latency-threads}. The first note is that the change in the number of available internal threads has little effect on the @@ -1210,7 +1210,7 @@ query latency in two ways, shard count. \end{enumerate} Interestingly, at least in this test, both of these effects are largely -supressed with only a moderate reduction in insertion throughput. But, +suppressed with only a moderate reduction in insertion throughput. But, insufficient parallelism does result in the higher-throughput configurations suffering a significant query latency increase in general. |