updates

author: Douglas Rumbaugh <dbr4@psu.edu> 2025-06-04 16:01:20 -0400
committer: Douglas Rumbaugh <dbr4@psu.edu> 2025-06-04 16:01:20 -0400
commit: 0dab92f9b14e75f68dda8c556398ea2d55e27494 (patch)
tree: 0e53f3dbc9c22b3eab050a1fae1644fecc8cbfd0 /chapters/tail-latency.tex
parent: 432a7fd7da164841a2ad755b839de1e65244944d (diff)
download: dissertation-0dab92f9b14e75f68dda8c556398ea2d55e27494.tar.gz
1 files changed, 59 insertions, 2 deletions
diff --git a/chapters/tail-latency.tex b/chapters/tail-latency.tex
index a88fe0c..4e79cff 100644
--- a/chapters/tail-latency.tex
+++ b/chapters/tail-latency.tex
@@ -939,7 +939,7 @@ able to reduce the insertion tail latency, while being able to match the
 general insertion and query performance of a strict tiering policy. Recall
 that, in the insertion stall case, no explicit shard capacity limits are
 enforced by the framework. Reconstructions are triggered with each buffer
-flush on all levels exceeding a specified shard count ($s = 4$ in these
+flush on all levels exceeding a specified shard count ($s = 6$ in these
 tests) and the buffer flushes immediately when full with no regard to the
 state of the structure. Thus, limiting the insertion latency is the only
 means the system uses to maintain its shard count at a manageable level.
@@ -957,7 +957,7 @@ each time the buffer flushed.  Note that a stall value of one indicates
 no stalling at all, and values less than one indicate $1 - \delta$
 probability of an insert being rejected. Thus, a lower stall value means
 more stalls are introduced. The tiering policy is strict tiering with a
-scale factor of $s=4$. It uses the concurrency control scheme described
+scale factor of $s=6$. It uses the concurrency control scheme described
 in Section~\ref{ssec:dyn-concurrency}.
 
 
@@ -1101,6 +1101,63 @@ to provide a superior set of design trade-offs than the strict policies,
 at least in environments where sufficient parallel processing and memory
 are available to leverage parallel reconstructions.
 
+\subsection{Thread Scaling}
+
+\begin{figure}
+\centering
+\subfloat[Insertion Throughput vs. Query Latency]{\includegraphics[width=.5\textwidth]{img/tail-latency/recon-thread-scale.pdf} \label{fig:tl-latency-threads}} 
+\subfloat[Insertion Query Interference]{\includegraphics[width=.5\textwidth]{img/tail-latency/knn-stall-shard-dist.pdf} \label{fig:tl-query-scaling}} \\
+
+\caption{Framework Thread Scaling}
+\label{fig:tl-threads}
+
+\end{figure}
+
+In the previous tests, we ran our system configured with 32 available
+threads, which was more than enough to run all reconstructions and
+queries fully in parallel. However, it's important to determine how well
+the system works in more resource constrained environments.  The system
+shares internal threads between reconstructions and queries, and that
+flushing occurs on a dedicated thread seperate from these. During the
+benchmark, one client thread issued queries continously and another
+issued inserts. The index accumulated a total of five levels, so
+the maximum amount of parralelism available during the testing was 4
+parallel reconstructions, along with the dedicated flushing thread and
+any concurrent queries. In these tests, we used the SOSD \texttt{OSM}
+dataset (200M records) and point-lookup queries without early abort
+against a dynamized ISAM tree.
+
+For our first test, we considered the insertion throughput vs. query
+latency trade-off for various stall amounts with several internal
+thread counts. We inserted 30\% of the dataset first, and then measured
+the insertion throughput over the insertion of the rest of the data
+on a client thread, while another client thread continously issued
+queries against the structure.  The results of this test are shown in
+Figure~\ref{fig:tl-latency-threads}. The first note is that the change
+in the number of available internal threads has little effect on the
+insertion throughput. This is to be expected, as inserts throughput is
+limited only by the stall amount, and by the buffer flushing operation. As
+flushing occurs on a dedicated thread, it is unaffected by changes in the
+internal thread configuration of the system.
+
+Query latency, however, does show a difference at the upper end of
+insertion throughput. Insufficient parallel threads can affect the
+query latency in two ways,
+\begin{enumerate}
+	\item As queries and reconstructions share threads, if all threads
+	are occupied by a long running reconstruction, then queries must wait
+	for the reconstruction to complete before they can execute.
+	\item Increased capacity for parallel reconstructions allows shards
+	to be merged more rapidly, resulting in an overall reduction in the
+	shard count.
+\end{enumerate}
+Interestingly, at least in this test, both of these effects are largely
+supressed with only a moderate reduction in insertion throughput. But,
+insufficient parallelism does result in the higher-throughput
+configurations suffering a significant query latency increase.
+
+
+
 
 \section{Conclusion}
author	Douglas Rumbaugh <dbr4@psu.edu>	2025-06-04 16:01:20 -0400
committer	Douglas Rumbaugh <dbr4@psu.edu>	2025-06-04 16:01:20 -0400
commit	0dab92f9b14e75f68dda8c556398ea2d55e27494 (patch)
tree	0e53f3dbc9c22b3eab050a1fae1644fecc8cbfd0 /chapters/tail-latency.tex
parent	432a7fd7da164841a2ad755b839de1e65244944d (diff)
download	dissertation-0dab92f9b14e75f68dda8c556398ea2d55e27494.tar.gz