summaryrefslogtreecommitdiffstats
path: root/chapters/tail-latency.tex
diff options
context:
space:
mode:
authorDouglas B. Rumbaugh <doug@douglasrumbaugh.com>2025-07-15 11:53:28 -0400
committerDouglas B. Rumbaugh <doug@douglasrumbaugh.com>2025-07-15 11:53:28 -0400
commitfe7842aa6177ad61b4ff6c97925918d02f1e72c0 (patch)
treef12190ef1bec43a1e4a98b86884b8ffb536a1042 /chapters/tail-latency.tex
parent05aab7bd45e691a0b0f527d0ab4dd7cae0b3ec55 (diff)
downloaddissertation-fe7842aa6177ad61b4ff6c97925918d02f1e72c0.tar.gz
minor tweaksHEADmaster
Diffstat (limited to 'chapters/tail-latency.tex')
-rw-r--r--chapters/tail-latency.tex50
1 files changed, 25 insertions, 25 deletions
diff --git a/chapters/tail-latency.tex b/chapters/tail-latency.tex
index dbe867c..bba0081 100644
--- a/chapters/tail-latency.tex
+++ b/chapters/tail-latency.tex
@@ -346,8 +346,8 @@ according to the tiering policy. When a level contains $s$ blocks,
a reconstruction will immediately be triggered to merge these blocks
and push the result down to the next level. To ensure that the number
of blocks in the structure remains bounded by $\Theta(\log_s n)$, we
-will throttle the insertion rate by adding a stall time, $\delta$, to
-each insert. $\delta$ will be determined such that it is sufficiently
+will throttle the insertion rate by adding a stall time, $\gamma$, to
+each insert. $\gamma$ will be determined such that it is sufficiently
large to ensure that any scheduled reconstructions have enough time to
complete before the shard count on any level exceeds $s$. This process
is summarized in Algorithm~\ref{alg:tl-relaxed-recon}. Note that
@@ -360,10 +360,10 @@ the appropriate amount of stalling occurs.
\begin{algorithm}
\caption{Insertion Algorithm with Stalling}
\label{alg:tl-relaxed-recon}
-\KwIn{$r$: a record to be inserted, $\mathscr{I} = (\mathcal{B}, \mathscr{L}_0 \ldots \mathscr{L}_\ell)$: a dynamized structure, $\delta$: insertion stall amount}
+\KwIn{$r$: a record to be inserted, $\mathscr{I} = (\mathcal{B}, \mathscr{L}_0 \ldots \mathscr{L}_\ell)$: a dynamized structure, $\gamma$: insertion stall amount}
\Comment{Stall insertion process by specified amount}
-sleep($\delta$) \;
+sleep($\gamma$) \;
\BlankLine
\Comment{Append to the buffer if possible}
\If {$|\mathcal{B}| < N_B$} {
@@ -411,8 +411,8 @@ record counts--each level has an increasing number of records per block.}}
\end{figure}
To ensure the correctness of this algorithm, it is necessary to show
-that there exists a value for $\delta$ that ensures that the structural
-invariants can be maintained. Logically, this $\delta$ can be thought
+that there exists a value for $\gamma$ that ensures that the structural
+invariants can be maintained. Logically, this $\gamma$ can be thought
of as the amount of time needed to perform the active reconstruction
operation, amortized over the inserts between when this reconstruction
can be scheduled, and when it needs to be complete. We'll consider how
@@ -452,13 +452,13 @@ to ensure that no more than $s$ shards exist on the last level.
Assume that all inserts run on a single thread that can be scheduled
alongside the reconstructions, and let each insert have a cost of
\begin{equation*}
-I(n) \in \Theta(1 + \delta)
+I(n) \in \Theta(1 + \gamma)
\end{equation*}
-where $1$ is the cost of appending to the buffer, and $\delta$
+where $1$ is the cost of appending to the buffer, and $\gamma$
is a calculated stall time. During the stalling, the insert
thread will be idle and reconstructions can be run on the execution unit.
To ensure the last-level reconstruction is complete by the time that
-$\Theta(n)$ inserts have finished, it is necessary that $\delta \in
+$\Theta(n)$ inserts have finished, it is necessary that $\gamma \in
\Theta\left(\frac{B(n)}{n}\right)$.
However, this amount of stall is insufficient to maintain exactly $s$
@@ -473,7 +473,7 @@ for the time to complete these reconstructions as well. In the worst-case,
there will be one active reconstruction on each of the $\log_s n$ levels,
and thus we must introduce stalls such that,
\begin{equation*}
-I(n) \in \Theta(1 + \delta_0 + \delta_1 + \ldots \delta_{\log n - 1})
+I(n) \in \Theta(1 + \gamma_0 + \gamma_1 + \ldots \gamma_{\log n - 1})
\end{equation*}
All of these internal reconstructions will be strictly less than the size
of the last-level reconstruction, and so we can bound them all above by
@@ -516,7 +516,7 @@ To see why this is important, consider an implementation that, contrary
to Theorem~\ref{theo:worst-case-optimal}, only stalls enough to cover
the last-level reconstruction. All other reconstructions are blocked
until the last-level one has been completed. This approach would
-result in $\delta = \frac{B(n)}{n}$ stall and complete the last
+result in $\gamma = \frac{B(n)}{n}$ stall and complete the last
level reconstruction after $\Theta(n)$ inserts. During this time,
$\Theta(\frac{n}{N_B})$ blocks would accumulate in L0, ultimately
resulting in a bound of $\Theta(n)$ blocks in the structure, rather than
@@ -641,18 +641,18 @@ level $i$, divided by the number of inserts that can occur before the
reconstruction must be done (i.e., the capacity of the index above this
point). This gives,
\begin{equation*}
-\delta_i \in O\left( \frac{B(N_B \cdot s^{i+1})}{\sum_{j=0}^{i-1} N_B\cdot s^{j+1}} \right)
+\gamma_i \in O\left( \frac{B(N_B \cdot s^{i+1})}{\sum_{j=0}^{i-1} N_B\cdot s^{j+1}} \right)
\end{equation*}
stall for each level. Noting that $s > 1$, $s \in \Theta(1)$, and that
the denominator is the sum of a geometric progression, we have
\begin{align*}
-\delta_i \in &O\left( \frac{B(N_B\cdot s^{i+1})}{s\cdot N_B \sum_{j=0}^{i-1} s^{j}} \right) \\
- &O\left( \frac{(1-s) B(N_B\cdot s^{i+1})}{N_B\cdot (s - s^{i+1})} \right) \\
- &O\left( \frac{B(N_B\cdot s^{i+1})}{N_B \cdot s^{i+1}}\right)
+\gamma_i \in &~O\left( \frac{B(N_B\cdot s^{i+1})}{s\cdot N_B \sum_{j=0}^{i-1} s^{j}} \right) \\
+ &~O\left( \frac{(1-s) B(N_B\cdot s^{i+1})}{N_B\cdot (s - s^{i+1})} \right) \\
+ &~O\left( \frac{B(N_B\cdot s^{i+1})}{N_B \cdot s^{i+1}}\right)
\end{align*}
For $B(n) \in \Omega(n)$, the numerator of the fraction will grow at
-least as rapidly as the denominator, meaning that $\delta_\ell$ will
+least as rapidly as the denominator, meaning that $\gamma_\ell$ will
always be the largest. Thus, the stall necessary to cover the last-level
reconstruction will be at least as much as is necessary for the internal
reconstructions.
@@ -1080,7 +1080,7 @@ that arise from direct throughput monitoring, and has a few additional
benefits. It is based on a single parameter that can be readily updated
on demand using atomics. Our current prototype uses a single, fixed value
for the probability, but ultimately it should be dynamically tuned to
-approximate the $\delta$ value from Theorem~\ref{theo:worst-case-optimal}
+approximate the $\gamma$ value from Theorem~\ref{theo:worst-case-optimal}
as closely as possible. It also doesn't require significant modification
of the existing client interfaces, and can easily support multiple threads
of insertion without needing an explicit serialization process to ensure
@@ -1129,16 +1129,16 @@ tests with 32 background threads on a system with 40 physical cores to
ensure sufficient resources to fully parallelize all reconstructions
(we'll consider resource constrained situations later).
-We tested -ISAM tree with the 200 million record SOSD \texttt{OSM}
+We tested ISAM tree with the 200 million record SOSD \texttt{OSM}
dataset~\cite{sosd-datasets}, as well as VPTree with the one million,
300-dimensional, \texttt{SBW} dataset~\cite{sbw}. For each test,
we inserted $30\%$ of the records to warm up the structure, and then
-measured the individual latency of each insert after that. We measured
-the count of shards in the structure each time the buffer flushed
-(including during the warmup period). Note that a rejection rate of
-$1$ indicates no stalling at all, and values less than one indicate
-$1 - \delta$ probability of an insert being rejected, after which the
-insert thread sleeps for about a microsecond. A lower rejection rate means
+measured the individual latency of each insert after that. We measured the
+count of shards in the structure each time the buffer flushed (including
+during the warmup period). Note that a rejection rate, $r$, of $r =
+1$ indicates no stalling at all, and values less than one indicate $1
+- r$ probability of an insert being rejected, after which the insert
+thread sleeps for about a microsecond. A lower rejection rate means
more stalls are introduced. The tiering policy is strict tiering with
a scale factor of $s=6$ using the concurrency control scheme described
in Section~\ref{ssec:dyn-concurrency}.
@@ -1360,7 +1360,7 @@ relatively small.
\begin{figure}
\centering
\subfloat[Insertion Throughput vs. Query Latency]{\includegraphics[width=.5\textwidth]{img/tail-latency/recon-thread-scale.pdf} \label{fig:tl-latency-threads}}
-\subfloat[Maximum Insertion Throughput for a Given Query Latency]{\includegraphics[width=.5\textwidth]{img/tail-latency/constant-query.pdf} \label{fig:tl-query-scaling}} \\
+\subfloat[Maximum Insertion Throughput for Query Latency Target]{\includegraphics[width=.5\textwidth]{img/tail-latency/constant-query.pdf} \label{fig:tl-query-scaling}} \\
\caption{Framework Thread Scaling}
\label{fig:tl-threads}