minor tweaksHEAD master

author: Douglas B. Rumbaugh <doug@douglasrumbaugh.com> 2025-07-15 11:53:28 -0400
committer: Douglas B. Rumbaugh <doug@douglasrumbaugh.com> 2025-07-15 11:53:28 -0400
commit: fe7842aa6177ad61b4ff6c97925918d02f1e72c0 (patch)
tree: f12190ef1bec43a1e4a98b86884b8ffb536a1042
parent: 05aab7bd45e691a0b0f527d0ab4dd7cae0b3ec55 (diff)
download: dissertation-fe7842aa6177ad61b4ff6c97925918d02f1e72c0.tar.gz
2 files changed, 26 insertions, 25 deletions
diff --git a/chapters/tail-latency.tex b/chapters/tail-latency.tex
index dbe867c..bba0081 100644
--- a/chapters/tail-latency.tex
+++ b/chapters/tail-latency.tex
@@ -346,8 +346,8 @@ according to the tiering policy. When a level contains $s$ blocks,
 a reconstruction will immediately be triggered to merge these blocks
 and push the result down to the next level. To ensure that the number
 of blocks in the structure remains bounded by $\Theta(\log_s n)$, we
-will throttle the insertion rate by adding a stall time, $\delta$, to
-each insert. $\delta$ will be determined such that it is sufficiently
+will throttle the insertion rate by adding a stall time, $\gamma$, to
+each insert. $\gamma$ will be determined such that it is sufficiently
 large to ensure that any scheduled reconstructions have enough time to
 complete before the shard count on any level exceeds $s$. This process
 is summarized in Algorithm~\ref{alg:tl-relaxed-recon}. Note that
@@ -360,10 +360,10 @@ the appropriate amount of stalling occurs.
 \begin{algorithm}
 \caption{Insertion Algorithm with Stalling}
 \label{alg:tl-relaxed-recon}
-\KwIn{$r$: a record to be inserted, $\mathscr{I} = (\mathcal{B}, \mathscr{L}_0 \ldots \mathscr{L}_\ell)$: a dynamized structure, $\delta$: insertion stall amount}
+\KwIn{$r$: a record to be inserted, $\mathscr{I} = (\mathcal{B}, \mathscr{L}_0 \ldots \mathscr{L}_\ell)$: a dynamized structure, $\gamma$: insertion stall amount}
 
 \Comment{Stall insertion process by specified amount}
-sleep($\delta$) \;
+sleep($\gamma$) \;
 \BlankLine
 \Comment{Append to the buffer if possible}
 \If {$|\mathcal{B}| < N_B$} {
@@ -411,8 +411,8 @@ record counts--each level has an increasing number of records per block.}}
 \end{figure}
 
 To ensure the correctness of this algorithm, it is necessary to show
-that there exists a value for $\delta$ that ensures that the structural
-invariants can be maintained. Logically, this $\delta$ can be thought
+that there exists a value for $\gamma$ that ensures that the structural
+invariants can be maintained. Logically, this $\gamma$ can be thought
 of as the amount of time needed to perform the active reconstruction
 operation, amortized over the inserts between when this reconstruction
 can be scheduled, and when it needs to be complete. We'll consider how
@@ -452,13 +452,13 @@ to ensure that no more than $s$ shards exist on the last level.
 Assume that all inserts run on a single thread that can be scheduled
 alongside the reconstructions, and let each insert have a cost of
 \begin{equation*}
-I(n) \in \Theta(1 + \delta)
+I(n) \in \Theta(1 + \gamma)
 \end{equation*}
-where $1$ is the cost of appending to the buffer, and $\delta$
+where $1$ is the cost of appending to the buffer, and $\gamma$
 is a calculated stall time. During the stalling, the insert
 thread will be idle and reconstructions can be run on the execution unit.
 To ensure the last-level reconstruction is complete by the time that
-$\Theta(n)$ inserts have finished, it is necessary that $\delta \in
+$\Theta(n)$ inserts have finished, it is necessary that $\gamma \in
 \Theta\left(\frac{B(n)}{n}\right)$.
 
 However, this amount of stall is insufficient to maintain exactly $s$
@@ -473,7 +473,7 @@ for the time to complete these reconstructions as well. In the worst-case,
 there will be one active reconstruction on each of the $\log_s n$ levels,
 and thus we must introduce stalls such that,
 \begin{equation*}
-I(n) \in \Theta(1 + \delta_0 + \delta_1 + \ldots \delta_{\log n - 1})
+I(n) \in \Theta(1 + \gamma_0 + \gamma_1 + \ldots \gamma_{\log n - 1})
 \end{equation*}
 All of these internal reconstructions will be strictly less than the size
 of the last-level reconstruction, and so we can bound them all above by
@@ -516,7 +516,7 @@ To see why this is important, consider an implementation that, contrary
 to Theorem~\ref{theo:worst-case-optimal}, only stalls enough to cover
 the last-level reconstruction. All other reconstructions are blocked
 until the last-level one has been completed.  This approach would
-result in $\delta = \frac{B(n)}{n}$ stall and complete the last
+result in $\gamma = \frac{B(n)}{n}$ stall and complete the last
 level reconstruction after $\Theta(n)$ inserts. During this time,
 $\Theta(\frac{n}{N_B})$ blocks would accumulate in L0, ultimately
 resulting in a bound of $\Theta(n)$ blocks in the structure, rather than
@@ -641,18 +641,18 @@ level $i$, divided by the number of inserts that can occur before the
 reconstruction must be done (i.e., the capacity of the index above this
 point). This gives,
 \begin{equation*}
-\delta_i \in O\left( \frac{B(N_B \cdot s^{i+1})}{\sum_{j=0}^{i-1} N_B\cdot s^{j+1}} \right)
+\gamma_i \in O\left( \frac{B(N_B \cdot s^{i+1})}{\sum_{j=0}^{i-1} N_B\cdot s^{j+1}} \right)
 \end{equation*}
 stall for each level. Noting that $s > 1$, $s \in \Theta(1)$, and that
 the denominator is the sum of a geometric progression, we have
 \begin{align*}
-\delta_i \in &O\left( \frac{B(N_B\cdot s^{i+1})}{s\cdot N_B \sum_{j=0}^{i-1} s^{j}} \right) \\
-             &O\left( \frac{(1-s) B(N_B\cdot s^{i+1})}{N_B\cdot (s - s^{i+1})} \right) \\
-			 &O\left( \frac{B(N_B\cdot s^{i+1})}{N_B \cdot s^{i+1}}\right)
+\gamma_i \in &~O\left( \frac{B(N_B\cdot s^{i+1})}{s\cdot N_B \sum_{j=0}^{i-1} s^{j}} \right) \\
+             &~O\left( \frac{(1-s) B(N_B\cdot s^{i+1})}{N_B\cdot (s - s^{i+1})} \right) \\
+			 &~O\left( \frac{B(N_B\cdot s^{i+1})}{N_B \cdot s^{i+1}}\right)
 \end{align*}
 
 For $B(n) \in \Omega(n)$, the numerator of the fraction will grow at
-least as rapidly as the denominator, meaning that $\delta_\ell$ will
+least as rapidly as the denominator, meaning that $\gamma_\ell$ will
 always be the largest. Thus, the stall necessary to cover the last-level
 reconstruction will be at least as much as is necessary for the internal
 reconstructions.
@@ -1080,7 +1080,7 @@ that arise from direct throughput monitoring, and has a few additional
 benefits.  It is based on a single parameter that can be readily updated
 on demand using atomics. Our current prototype uses a single, fixed value
 for the probability, but ultimately it should be dynamically tuned to
-approximate the $\delta$ value from Theorem~\ref{theo:worst-case-optimal}
+approximate the $\gamma$ value from Theorem~\ref{theo:worst-case-optimal}
 as closely as possible. It also doesn't require significant modification
 of the existing client interfaces, and can easily support multiple threads
 of insertion without needing an explicit serialization process to ensure
@@ -1129,16 +1129,16 @@ tests with 32 background threads on a system with 40 physical cores to
 ensure sufficient resources to fully parallelize all reconstructions
 (we'll consider resource constrained situations later).
 
-We tested -ISAM tree with the 200 million record SOSD \texttt{OSM}
+We tested ISAM tree with the 200 million record SOSD \texttt{OSM}
 dataset~\cite{sosd-datasets}, as well as VPTree with the one million,
 300-dimensional, \texttt{SBW} dataset~\cite{sbw}. For each test,
 we inserted $30\%$ of the records to warm up the structure, and then
-measured the individual latency of each insert after that. We measured
-the count of shards in the structure each time the buffer flushed
-(including during the warmup period).  Note that a rejection rate of
-$1$ indicates no stalling at all, and values less than one indicate
-$1 - \delta$ probability of an insert being rejected, after which the
-insert thread sleeps for about a microsecond. A lower rejection rate means
+measured the individual latency of each insert after that. We measured the
+count of shards in the structure each time the buffer flushed (including
+during the warmup period).  Note that a rejection rate, $r$, of $r =
+1$ indicates no stalling at all, and values less than one indicate $1
+- r$ probability of an insert being rejected, after which the insert
+thread sleeps for about a microsecond. A lower rejection rate means
 more stalls are introduced. The tiering policy is strict tiering with
 a scale factor of $s=6$ using the concurrency control scheme described
 in Section~\ref{ssec:dyn-concurrency}.
@@ -1360,7 +1360,7 @@ relatively small.
 \begin{figure}
 \centering
 \subfloat[Insertion Throughput vs. Query Latency]{\includegraphics[width=.5\textwidth]{img/tail-latency/recon-thread-scale.pdf} \label{fig:tl-latency-threads}} 
-\subfloat[Maximum Insertion Throughput for a Given Query Latency]{\includegraphics[width=.5\textwidth]{img/tail-latency/constant-query.pdf} \label{fig:tl-query-scaling}} \\
+\subfloat[Maximum Insertion Throughput for Query Latency Target]{\includegraphics[width=.5\textwidth]{img/tail-latency/constant-query.pdf} \label{fig:tl-query-scaling}} \\
 
 \caption{Framework Thread Scaling}
 \label{fig:tl-threads}
diff --git a/supplementary/nomenclature.tex b/supplementary/nomenclature.tex
index 2846491..848a615 100644
--- a/supplementary/nomenclature.tex
+++ b/supplementary/nomenclature.tex
@@ -31,5 +31,6 @@
 	$\mathscr{L}_i$ & The $i$th level in a decomposition \\ \hline
 	$\mathcal{V}_i$ & The $i$th version of a decomposed structure \\ \hline
 	$f(n)$ & Number of blocks in an equal block size decomposition \\ \hline
+	$\gamma$ & Amount of extra delay added to insertions \\ \hline
 \end{tabular}
 \end{center}
author	Douglas B. Rumbaugh <doug@douglasrumbaugh.com>	2025-07-15 11:53:28 -0400
committer	Douglas B. Rumbaugh <doug@douglasrumbaugh.com>	2025-07-15 11:53:28 -0400
commit	fe7842aa6177ad61b4ff6c97925918d02f1e72c0 (patch)
tree	f12190ef1bec43a1e4a98b86884b8ffb536a1042
parent	05aab7bd45e691a0b0f527d0ab4dd7cae0b3ec55 (diff)
download	dissertation-fe7842aa6177ad61b4ff6c97925918d02f1e72c0.tar.gz