summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--chapters/beyond-dsp.tex10
-rw-r--r--chapters/design-space.tex1
-rw-r--r--chapters/dynamic-extension-sampling.tex2
-rw-r--r--chapters/sigmod23/background.tex23
-rw-r--r--chapters/sigmod23/framework.tex575
-rw-r--r--chapters/tail-latency.tex1
-rw-r--r--paper.tex3
7 files changed, 374 insertions, 241 deletions
diff --git a/chapters/beyond-dsp.tex b/chapters/beyond-dsp.tex
index 77f5fb4..50d6369 100644
--- a/chapters/beyond-dsp.tex
+++ b/chapters/beyond-dsp.tex
@@ -1,4 +1,14 @@
\chapter{Generalizing the Framework}
+
+\begin{center}
+ \emph{The following chapter is an adaptation of work completed in collaboration with Dr. Dong Xie and Dr. Zhuoyue Zhao
+ and published
+ in PVLDB Volume 17, Issue 11 (July 2024) under the title "Towards Systematic Index Dynamization".
+ }
+ \hrule
+\end{center}
+
+
\label{chap:framework}
The previous chapter demonstrated
diff --git a/chapters/design-space.tex b/chapters/design-space.tex
new file mode 100644
index 0000000..7ee98bd
--- /dev/null
+++ b/chapters/design-space.tex
@@ -0,0 +1 @@
+\chapter{Exploring the Design Space}
diff --git a/chapters/dynamic-extension-sampling.tex b/chapters/dynamic-extension-sampling.tex
index 58db672..738c962 100644
--- a/chapters/dynamic-extension-sampling.tex
+++ b/chapters/dynamic-extension-sampling.tex
@@ -1,4 +1,4 @@
-\chapter{Dynamic Extension Framework for Sampling Indexes}
+\chapter{Dynamization of Static Sampling Indices}
\label{chap:sampling}
\begin{center}
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex
index ad89e03..b4ccbf1 100644
--- a/chapters/sigmod23/background.tex
+++ b/chapters/sigmod23/background.tex
@@ -124,7 +124,13 @@ query, and selecting or rejecting it for inclusion within the sample
with a fixed probability~\cite{db2-doc}. This process requires that each
record in the result set be considered, and thus provides no performance
benefit relative to the query being sampled from, as it must be answered
-in full anyway before returning only some of the results.
+in full anyway before returning only some of the results.\footnote{
+ To clarify, this is not to say that Bernoulli sampling isn't
+ useful. It \emph{can} be used to improve the performance of queries
+ by limiting the cardinality of intermediate results, etc. But it is
+ not particularly useful for improving the performance of IQS queries,
+ where the sampling is performed on the final result set of the query.
+}
For performance, the statistical guarantees can be discarded and
systematic or block sampling used instead. Systematic sampling considers
@@ -230,6 +236,7 @@ structures attached to the nodes. More examples of alias augmentation
applied to different IQS problems can be found in a recent survey by
Tao~\cite{tao22}.
+\Paragraph{Miscellanea.}
There also exist specialized data structures with support for both
efficient sampling and updates~\cite{hu14}, but these structures have
poor constant factors and are very complex, rendering them of little
@@ -252,7 +259,19 @@ only once per sample set (if at all), but fail to support updates. Thus,
there appears to be a general dichotomy of sampling techniques: existing
sampling data structures support either updates, or efficient sampling,
but generally not both. It will be the purpose of this chapter to resolve
-this dichotomy.
+this dichotomy. In particular, we seek to develop structures with the
+following desiderata,
+
+\begin{enumerate}
+ \item Support data updates (including deletes) with similar average
+ performance to a standard B+Tree.
+ \item Support IQS queries that do not pay a per-sample cost
+ proportional to some function of the data size. In other words,
+ $k$ should \emph{not} be be multiplied by any function of $n$
+ in the query cost function.
+ %FIXME: this guy comes out of nowhere...
+ \item Provide the user with some basic performance tuning capability.
+\end{enumerate}
diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex
index 88ac1ac..c878d93 100644
--- a/chapters/sigmod23/framework.tex
+++ b/chapters/sigmod23/framework.tex
@@ -16,10 +16,10 @@ there in the context of IRS apply equally to the other sampling problems
considered in this chapter. In this section, we will discuss approaches
for resolving these problems.
-\subsection{Sampling over Partitioned Datasets}
+\subsection{Sampling over Decomposed Structures}
The core problem facing any attempt to dynamize SSIs is that independently
-sampling from a partitioned dataset is difficult. As discussed in
+sampling from a decomposed structure is difficult. As discussed in
Section~\ref{ssec:background-irs}, accomplishing this task within the
DSP model used by the Bentley-Saxe method requires drawing a full $k$
samples from each of the blocks, and then repeatedly down-sampling each
@@ -27,14 +27,11 @@ of the intermediate sample sets. However, it is possible to devise a
more efficient query process if we abandon the DSP model and consider
a slightly more complicated procedure.
-First, we need to resolve a minor definitional problem. As noted before,
-the DSP model is based on deterministic queries. The definition doesn't
-apply for sampling queries, because it assumes that the result sets of
-identical queries should also be identical. For general IQS, we also need
-to enforce conditions on the query being sampled from.
+First, we'll define the IQS problem in terms of the notation and concepts
+used in Chapter~\cite{chap:background} for search problems,
-\begin{definition}[Query Sampling Problem]
- Given a search problem, $F$, a sampling problem is function
+\begin{definition}[Independent Query Sampling Problem]
+ Given a search problem, $F$, a query sampling problem is function
of the form $X: (F, \mathcal{D}, \mathcal{Q}, \mathbb{Z}^+)
\to \mathcal{R}$ where $\mathcal{D}$ is the domain of records
and $\mathcal{Q}$ is the domain of query parameters of $F$. The
@@ -42,8 +39,14 @@ to enforce conditions on the query being sampled from.
of records from the solution to $F$ drawn independently such that,
$|R| = k$ for some $k \in \mathbb{Z}^+$.
\end{definition}
-With this in mind, we can now define the decomposability conditions for
-a query sampling problem,
+
+To consider the decomposability of such problems, we need to resolve a
+minor definitional issue. As noted before, the DSP model is based on
+deterministic queries. The definition doesn't apply for sampling queries,
+because it assumes that the result sets of identical queries should
+also be identical. For general IQS, we also need to enforce conditions
+on the query being sampled from. Based on these observations, we can
+define the decomposability conditions for a query sampling problem,
\begin{definition}[Decomposable Sampling Problem]
A query sampling problem, $X: (F, \mathcal{D}, \mathcal{Q},
@@ -95,7 +98,7 @@ structures query cost functions are of the form,
\end{equation*}
-Consider an arbitrary decomposable sampling query with a cost function
+Consider an arbitrary decomposable sampling problem with a cost function
of the above form, $X(\mathscr{I}, F, q, k)$, which draws a sample
of $k$ records from $d \subseteq \mathcal{D}$ using an instance of
an SSI $\mathscr{I} \in \mathcal{I}$. Applying dynamization results
@@ -146,9 +149,10 @@ query results in a naturally two-phase process, but DSPs are assumed to
be single phase. We can construct a more effective process for answering
such queries based on a multi-stage process, summarized in Figure~\ref{fig:sample}.
\begin{enumerate}
- \item Determine each block's respective weight under a given
- query to be sampled from (e.g., the number of records falling
- into the query range for IRS).
+ \item Perform the query pre-processing work, and determine each
+ block's respective weight under a given query to be sampled
+ from (e.g., the number of records falling into the query range
+ for IRS).
\item Build a temporary alias structure over these weights.
@@ -156,7 +160,8 @@ such queries based on a multi-stage process, summarized in Figure~\ref{fig:sampl
samples to draw from each block.
\item Draw the appropriate number of samples from each block and
- merge them together to form the final query result.
+ merge them together to form the final query result, using any
+ necessary pre-processing results in the process.
\end{enumerate}
It is possible that some of the records sampled in Step 4 must be
rejected, either because of deletes or some other property of the sampling
@@ -184,182 +189,340 @@ $k$ records have been sampled.
Assuming a Bentley-Saxe decomposition with $\log n$ blocks and assuming
a constant number of repetitions, the cost of answering a decomposible
-sampling query having a pre-processing cost of $P(n)$ and a per-sample
-cost of $S(n)$ will be,
+sampling query having a pre-processing cost of $P(n)$, a weight-determination
+cost of $W(n)$ and a per-sample cost of $S(n)$ will be,
\begin{equation}
\label{eq:dsp-sample-cost}
\boxed{
-\mathscr{Q}(n, k) \in \Theta \left( P(n) \log_2 n + k S(n) \right)
+\mathscr{Q}(n, k) \in \Theta \left( (P(n) + W(n)) \log_2 n + k S(n) \right)
}
\end{equation}
where the cost of building the alias structure is $\Theta(\log_2 n)$
and thus absorbed into the pre-processing cost. For the SSIs discussed
in this chapter, which have $S(n) \in \Theta(1)$, this model provides us
with the desired decoupling of the data size ($n$) from the per-sample
-cost.
+cost. Additionally, for all of the SSIs considered in this paper,
+the weights can be determined in either $W(n) \in \Theta(1)$ time,
+or are naturally determined as part of the pre-processing, and thus the
+$W(n)$ term can be merged into $P(n)$.
\subsection{Supporting Deletes}
-Because the shards are static, records cannot be arbitrarily removed from them.
-This requires that deletes be supported in some other way, with the ultimate
-goal being the prevention of deleted records' appearance in sampling query
-result sets. This can be realized in two ways: locating the record and marking
-it, or inserting a new record which indicates that an existing record should be
-treated as deleted. The framework supports both of these techniques, the
-selection of which is called the \emph{delete policy}. The former policy is
-called \emph{tagging} and the latter \emph{tombstone}.
-
-Tagging a record is straightforward. Point-lookups are performed against each
-shard in the index, as well as the buffer, for the record to be deleted. When
-it is found, a bit in a header attached to the record is set. When sampling,
-any records selected with this bit set are automatically rejected. Tombstones
-represent a lazy strategy for deleting records. When a record is deleted using
-tombstones, a new record with identical key and value, but with a ``tombstone''
-bit set, is inserted into the index. A record's presence can be checked by
-performing a point-lookup. If a tombstone with the same key and value exists
-above the record in the index, then it should be rejected when sampled.
-
-Two important aspects of performance are pertinent when discussing deletes: the
-cost of the delete operation, and the cost of verifying the presence of a
-sampled record. The choice of delete policy represents a trade-off between
-these two costs. Beyond this simple trade-off, the delete policy also has other
-implications that can affect its applicability to certain types of SSI. Most
-notably, tombstones do not require any in-place updating of records, whereas
-tagging does. This means that using tombstones is the only way to ensure total
-immutability of the data within shards, which avoids random writes and eases
-concurrency control. The tombstone delete policy, then, is particularly
-appealing in external and concurrent contexts.
-
-\Paragraph{Deletion Cost.} The cost of a delete under the tombstone policy is
-the same as an ordinary insert. Tagging, by contrast, requires a point-lookup
-of the record to be deleted, and so is more expensive. Assuming a point-lookup
-operation with cost $L(n)$, a tagged delete must search each level in the
-index, as well as the buffer, requiring $O\left(N_b + L(n)\log_s n\right)$
-time.
-
-\Paragraph{Rejection Check Costs.} In addition to the cost of the delete
-itself, the delete policy affects the cost of determining if a given record has
-been deleted. This is called the \emph{rejection check cost}, $R(n)$. When
-using tagging, the information necessary to make the rejection decision is
-local to the sampled record, and so $R(n) \in O(1)$. However, when using tombstones
-it is not; a point-lookup must be performed to search for a given record's
-corresponding tombstone. This look-up must examine the buffer, and each shard
-within the index. This results in a rejection check cost of $R(n) \in O\left(N_b +
-L(n) \log_s n\right)$. The rejection check process for the two delete policies is
-summarized in Figure~\ref{fig:delete}.
-
-Two factors contribute to the tombstone rejection check cost: the size of the
-buffer, and the cost of performing a point-lookup against the shards. The
-latter cost can be controlled using the framework's ability to associate
-auxiliary structures with shards. For SSIs which do not support efficient
-point-lookups, a hash table can be added to map key-value pairs to their
-location within the SSI. This allows for constant-time rejection checks, even
-in situations where the index would not otherwise support them. However, the
-storage cost of this intervention is high, and in situations where the SSI does
-support efficient point-lookups, it is not necessary. Further performance
-improvements can be achieved by noting that the probability of a given record
-having an associated tombstone in any particular shard is relatively small.
-This means that many point-lookups will be executed against shards that do not
-contain the tombstone being searched for. In this case, these unnecessary
-lookups can be partially avoided using Bloom filters~\cite{bloom70} for
-tombstones. By inserting tombstones into these filters during reconstruction,
-point-lookups against some shards which do not contain the tombstone being
-searched for can be bypassed. Filters can be attached to the buffer as well,
-which may be even more significant due to the linear cost of scanning it. As
-the goal is a reduction of rejection check costs, these filters need only be
-populated with tombstones. In a later section, techniques for bounding the
-number of tombstones on a given level are discussed, which will allow for the
-memory usage of these filters to be tightly controlled while still ensuring
-precise bounds on filter error.
-
-\Paragraph{Sampling with Deletes.} The addition of deletes to the framework
-alters the analysis of sampling costs. A record that has been deleted cannot
-be present in the sample set, and therefore the presence of each sampled record
-must be verified. If a record has been deleted, it must be rejected. When
-retrying samples rejected due to delete, the process must restart from shard
-selection, as deleted records may be counted in the weight totals used to
-construct that structure. This increases the cost of sampling to,
-\begin{equation}
-\label{eq:sampling-cost}
- O\left([W(n) + P(n)]\log_s n + \frac{kS(n)}{1 - \mathbf{Pr}[\text{rejection}]} \cdot R(n)\right)
-\end{equation}
-where $R(n)$ is the cost of checking if a sampled record has been deleted, and
-$\nicefrac{k}{1 -\mathbf{Pr}[\text{rejection}]}$ is the expected number of sampling
-attempts required to obtain $k$ samples, given a fixed rejection probability.
-The rejection probability itself is a function of the workload, and is
-unbounded.
-
-\Paragraph{Bounding the Rejection Probability.} Rejections during sampling
-constitute wasted memory accesses and random number generations, and so steps
-should be taken to minimize their frequency. The probability of a rejection is
-directly related to the number of deleted records, which is itself a function
-of workload and dataset. This means that, without building counter-measures
-into the framework, tight bounds on sampling performance cannot be provided in
-the presence of deleted records. It is therefore critical that the framework
-support some method for bounding the number of deleted records within the
-index.
-
-While the static nature of shards prevents the direct removal of records at the
-moment they are deleted, it doesn't prevent the removal of records during
-reconstruction. When using tagging, all tagged records encountered during
-reconstruction can be removed. When using tombstones, however, the removal
-process is non-trivial. In principle, a rejection check could be performed for
-each record encountered during reconstruction, but this would increase
-reconstruction costs and introduce a new problem of tracking tombstones
-associated with records that have been removed. Instead, a lazier approach can
-be used: delaying removal until a tombstone and its associated record
-participate in the same shard reconstruction. This delay allows both the record
-and its tombstone to be removed at the same time, an approach called
-\emph{tombstone cancellation}. In general, this can be implemented using an
-extra linear scan of the input shards before reconstruction to identify
-tombstones and associated records for cancellation, but potential optimizations
-exist for many SSIs, allowing it to be performed during the reconstruction
-itself at no extra cost.
-
-The removal of deleted records passively during reconstruction is not enough to
-bound the number of deleted records within the index. It is not difficult to
-envision pathological scenarios where deletes result in unbounded rejection
-rates, even with this mitigation in place. However, the dropping of deleted
-records does provide a useful property: any specific deleted record will
-eventually be removed from the index after a finite number of reconstructions.
-Using this fact, a bound on the number of deleted records can be enforced. A
-new parameter, $\delta$, is defined, representing the maximum proportion of
-deleted records within the index. Each level, and the buffer, tracks the number
-of deleted records it contains by counting its tagged records or tombstones.
-Following each buffer flush, the proportion of deleted records is checked
-against $\delta$. If any level is found to exceed it, then a proactive
-reconstruction is triggered, pushing its shards down into the next level. The
-process is repeated until all levels respect the bound, allowing the number of
-deleted records to be precisely controlled, which, by extension, bounds the
-rejection rate. This process is called \emph{compaction}.
-
-Assuming every record is equally likely to be sampled, this new bound can be
-applied to the analysis of sampling costs. The probability of a record being
-rejected is $\mathbf{Pr}[\text{rejection}] = \delta$. Applying this result to
-Equation~\ref{eq:sampling-cost} yields,
+As discussed in Section~\ref{ssec:background-deletes}, the Bentley-Saxe
+method can support deleting records through the use of either weak
+deletes, or a secondary ghost structure, assume certain properties are
+satisfied by either the search problem or data structure. Unfortunately,
+neither approach can work as a "drop-in" solution in the context of
+sampling problems, because of the way that deleted records interact with
+the sampling process itself. Sampling problems, as formalized here,
+are neither invertable, nor deletion decomposable. In this section,
+we'll discuss our mechanisms for supporting deletes, as well as how
+these can be handled during sampling while maintaining correctness.
+
+Because both deletion policies have their advantages under certain
+contexts, we decided to support both. Specifically, we propose two
+mechanisms for deletes, which are
+
+\begin{enumerate}
+\item \textbf{Tagged Deletes.} Each record in the structure includes a
+header with a visibility bit set. On delete, the structure is searched
+for the record, and the bit is set in indicate that it has been deleted.
+This mechanism is used to support \emph{weak deletes}.
+\item \textbf{Tombstone Deletes.} On delete, a new record is inserted into
+the structure with a tombstone bit set in the header. This mechanism is
+used to support \emph{ghost structure} based deletes.
+\end{enumerate}
+
+Broadly speaking, for sampling problems, tombstone deletes cause a number
+of problems because \emph{sampling problems are not invertible}. However,
+this limitation can be worked around during the query process if desired.
+Tagging is much more natural for these search problems. However, the
+flexibility of selecting either option is desirable because of their
+different performance characteristics.
+
+While tagging is a fairly direct method of implementing weak deletes,
+tombstones are sufficiently different from the traditional ghost structure
+system that it is worth motivating the decision to use them here. One
+of the major limitations of the ghost structure approach for handling
+deletes is that there is not a principled method for removing deleted
+records from the decomposed structure. The standard approach is to set an
+arbitrary number of delete records, and rebuild the entire structure when
+this threshold is crossed~\cite{saxe79}. Mixing the "ghost" records into
+the same structures as the original records allows for deleted records
+to naturally be cleaned up over time as they meet their tombstones during
+reconstructions. This is an important consequence that will be discussed
+in more detail in Section~\ref{ssec-sampling-delete-bounding}.
+
+There are two relevant aspects of performance that the two mechanisms
+trade-off between: the cost of performing the delete, and the cost of
+checking if a sampled record has been deleted. In addition to these,
+the use of tombstones also makes supporting concurrency and external
+data structures far easier. This is because tombstone deletes are simple
+inserts, and thus they leave the individual structures immutable. Tagging
+requires doing in-place updates of the record header in the structures,
+resulting in possible race conditions and random IO operations on
+disk. This makes tombstone deletes particularly attractive in these
+contexts.
+
+
+\subsubsection{Deletion Cost}
+We will first consider the cost of performing a delete using either
+mechanism.
+
+\Paragraph{Tombstone Deletes.}
+The cost of using a tombstone delete in a Bentley-Saxe dynamization is
+the same as a simple insert,
+\begin{equation*}
+\mathscr{D}(n)_A \in \Theta\left(\frac{B(n)}{n} \log_2 (n)\right)
+\end{equation*}
+with the worst-case cost being $\Theta(B(n))$. Note that there is also
+a minor performance effect resulting from deleted records appearing
+twice within the structure, once for the original record and once for
+the tombstone, inflating the overall size of the structure.
+
+\Paragraph{Tagged Deletes.} In contrast to tombstone deletes, tagged
+deletes are not simple inserts, and so have their own cost function. The
+process of deleting a record under tagging consists of first searching
+the entire structure for the record to be deleted, and then setting a
+bit in its header. As a result, the performance of this operation is
+a function of how expensive it is to locate an individual record within
+the decomposed data structure.
+
+In the theoretical literature, this lookup operation is provided
+by a global hash table built over every record in the structure,
+mapping each record to the block that contains it. Then, the data
+structure's weak delete operation can be applied to the relevant
+block~\cite{merge-dsp}. While this is certainly an option for us, we
+note that the SSIs we are currently considering all support a reasonably
+efficient $\Theta(\log n)$ lookup operation as it is, and have elected
+to design tagged deletes to allow this operation to be leveraged when
+available, rather than needing to deal with maintaining global hash table.
+If a given SSI has a point-lookup cost of $L(n)$, then a tagged delete
+on a Bentley-Saxe decomposition of that SSI will require, at worst,
+executing a point-lookup on each block, with a total cost of
+
+\begin{equation*}
+\mathscr{D}(n) \in \Theta\left( L(n) \log_2 (n)\right)
+\end{equation*}
+
+If the SSI being considered does \emph{not} support an efficient
+point-lookup operation, then a hash table can be used instead. We consider
+individual hash tables associated with each block, rather than a single
+global one, for simplicity of implementation and analysis. So, in these
+cases, the same procedure as above can be used, with $L(n) \in \Theta(1)$.
+
+
+\begin{figure}
+ \centering
+ \subfloat[Tombstone Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}}\\
+ \subfloat[Tagging Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}}
+
+ \caption{\textbf{Overview of the rejection check procedure for deleted records.} First,
+ a record is sampled (1).
+ When using the tombstone delete policy
+ (Figure~\ref{fig:delete-tombstone}), the rejection check starts by (2) querying
+ the bloom filter of the mutable buffer. The filter indicates the record is
+ not present, so (3) the filter on $L_0$ is queried next. This filter
+ returns a false positive, so (4) a point-lookup is executed against $L_0$.
+ The lookup fails to find a tombstone, so the search continues and (5) the
+ filter on $L_1$ is checked, which reports that the tombstone is present.
+ This time, it is not a false positive, and so (6) a lookup against $L_1$
+ (7) locates the tombstone. The record is thus rejected. When using the
+ tagging policy (Figure~\ref{fig:delete-tag}), (1) the record is sampled and
+ (2) checked directly for the delete tag. It is set, so the record is
+ immediately rejected.}
+
+ \label{fig:delete}
+
+\end{figure}
+
+\subsubsection{Rejection Check Costs}
+
+Because sampling queries are neither invertible nor deletion decomposable,
+the query process must be modified to support deletes using either of the
+above mechanisms. This modification entails requiring that each sampled
+record be manually checked to confirm that it hasn't been deleted, prior
+to adding it to the sample set. We call the cost of this operation the
+\emph{rejection check cost}, $R(n)$. The process differs between the
+two deletion mechanisms, and the two procedures are summarized in
+Figure~\ref{fig:delete}.
+
+For tagged deletes, this is a simple process. The information about the
+deletion status of a given record is stored directly alongside the record,
+within its header. So, once a record has been sampled, this check can be
+immediately performed with $R(n) \in \Theta(1)$ time.
+
+Tombstone deletes, however, introduce a significant difficulty in
+performing the rejection check. The information about whether a record
+has been deleted is not local to the record itself, and therefore a
+point-lookup is required to search for the tombstone associated with
+each sample. Thus, the rejection check cost when using tombstones to
+implement deletes over a Bentley-Saxe decomposition of an SSI is,
\begin{equation}
-%\label{eq:sampling-cost-del}
- O\left([W(n) + P(n)]\log_s n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right)
+R(n) \in \Theta( L(n) \log_2 n)
\end{equation}
+This performance cost seems catastrophically bad, considering
+it must be paid per sample, but there are ways to mitigate
+it. We will discuss these mitigations in more detail later,
+during our discussion of the implementation of these results in
+Section~\ref{sec:sampling-implementation}.
+
+
+\subsubsection{Bounding Rejection Probability}
+
+When a sampled record has been rejected, it must be resampled. This
+introduces performance overhead resulting from extra memory access and
+random number generations, and hurts our ability to provide performance
+bounds on our sampling operations. In the worst case, a structure
+may consist mostly or entirely of deleted records, resulting in
+a potentially unbounded number of rejections during sampling. Thus,
+in order to maintain sampling performance bounds, the probability of a
+rejection during sampling must be bounded.
+
+The reconstructions associated with Bentley-Saxe dynamization give us
+a natural way of controlling the number of deleted records within the
+structure, and thereby bounding the rejection rate. During reconstruction,
+we have the opportunity to remove deleted records. This will cause the
+record counts associated with each block of the structure to gradually
+drift out of alignment with the "perfect" powers of two associated with
+the Bentley-Saxe method, however. In the theoretical literature on this
+topic, the solution to this problem is to periodically repartition all of
+the records to re-align the block sizes~\cite{merge-dsp, saxe79}. This
+approach could also be easily applied here, if desired, though we
+do not in our implementations, for reasons that will be discussed in
+Section~\ref{sec:sampling-implementation}.
+
+The process of removing these deleted records during reconstructions is
+different for the two mechanisms. Tagged deletes are straightforward,
+because all tagged records can simply be dropped when they are involved
+in a reconstruction. Tombstones, however, require a slightly more complex
+approach. Rather than being able to drop deleted records immediately,
+during reconstructions the records can only be dropped when the
+tombstone and its associate record are involved in the \emph{same}
+reconstruction, at which point both can be dropped. We call this
+process \emph{tombstone cancellation}. In the general case, it can be
+implemented using a preliminary linear pass over the records involved
+in a reconstruction to identify the records to be dropped, but in many
+cases reconstruction involves sorting the records anyway, and by taking
+care with ordering semantics, tombstones and their associated records can
+be sorted into adjacent spots, allowing them to be efficiently dropped
+during reconstruction without any extra overhead.
+
+While the dropping of deleted records during reconstruction helps, it is
+not sufficient on its own to ensure a particular bound on the number of
+deleted records within the structure. Pathological scenarios resulting in
+unbounded rejection rates, even in the presence of this mitigation, are
+possible. For example, tagging alone will never trigger reconstructions,
+and so it would be possible to delete every single record within the
+structure without triggering a reconstruction, or records could be deleted
+in the reverse order that they were inserted using tombstones. In either
+case, a passive system of dropping records naturally during reconstruction
+is not sufficient.
+
+Fortunately, this passive system can be used as the basis for a
+system that does provide a bound. This is because it guarantees,
+whether tagging or tombstones are used, that any given deleted
+record will \emph{eventually} be cancelled out after a finite number
+of reconstructions. If the number of deleted records gets too high,
+some or all of these deleted records can be cleared out by proactively
+performing reconstructions. We call these proactive reconstructions
+\emph{compactions}.
+
+The basic strategy, then, is to define a maximum allowable proportion
+of deleted records, $\delta \in [0, 1]$. Each block in the decomposition
+tracks the number of tombstones or tagged records within it. This count
+can be easily maintained by incrementing a counter when a record in the
+block is tagged, and by counting tombstones during reconstructions. These
+counts on each block are then monitored, and if the proportion of deletes
+in a block ever exceeds $\delta$, a proactive reconstruction including
+this block and one or more blocks below it in the structure can be
+triggered. The proportion of the newly compacted block can then be checked
+again, and this process repeated until all blocks respect the bound.
+
+For tagging, a single round of compaction will always suffice, because all
+deleted records involved in the reconstruction will be dropped. Tombstones
+may require multiple cascading rounds of compaction to occur, because a
+tombstone record will only cancel when it encounters the record that it
+deletes. However, because tombstones always follow the record they
+delete in insertion order, and will therefore always be "above" that
+record in the structure, each reconstruction will move every tombstone
+involved closer to the record it deletes, ensuring that eventually the
+bound will be satisfied.
+
+Asymptotically, this compaction process will not affect the amortized
+insertion cost of the structure. This is because the cost is based on
+the number of reconstructions that a given record is involved in over
+the lifetime of the structure. Preemptive compaction does not increase
+the number of reconstructions, only \emph{when} they occur.
+
+\subsubsection{Sampling Procedure with Deletes}
+
+Because sampling is neither deletion decomposable nor invertible,
+the presence of deletes will have an effect on the query costs. As
+already mentioned, the basic cost associated with deletes is a rejection
+check associated with each sampled record. When a record is sampled,
+it must be checked to determine whether it has been deleted or not. If
+it has, then it must be rejected. Note that when this rejection occurs,
+it cannot be retried immediately on the same block, but rather a new
+block must be selected to sample from. This is because deleted records
+aren't accounted for in the weight calculations, and so could introduce
+bias. As a straightforward example of this problem, consider a block
+that contains only deleted records. Any sample drawn from this block will
+be rejected, and so retrying samples against this block will result in
+an infinite loop.
+
+Assuming the compaction strategy mentioned in the previous section is
+applied, ensuring a bound of at most $\delta$ proportion of deleted
+records in the structure, and assuming all records have an equal
+probability of being sampled, the cost of answering sampling queries
+accounting for rejections is,
-Asymptotically, this proactive compaction does not alter the analysis of
-insertion costs. Each record is still written at most $s$ times on each level,
-there are at most $\log_s n$ levels, and the buffer insertion and SSI
-construction costs are all unchanged, and so on. This results in the amortized
-insertion cost remaining the same.
+\begin{equation*}
+%\label{eq:sampling-cost-del}
+ \mathscr{Q}(n, k) = \Theta\left([W(n) + P(n)]\log_2 n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right)
+\end{equation*}
+Where $\frac{k}{1 - \delta}$ is the expected number of samples that must
+be taken to obtain a sample set of size $k$.
-This compaction strategy is based upon tombstone and record counts, and the
-bounds assume that every record is equally likely to be sampled. For certain
-sampling problems (such as WSS), there are other conditions that must be
-considered to provide a bound on the rejection rate. To account for these
-situations in a general fashion, the framework supports problem-specific
-compaction triggers that can be tailored to the SSI being used. These allow
-compactions to be triggered based on other properties, such as rejection rate
-of a level, weight of deleted records, and the like.
+\subsection{Performance Tuning and Configuration}
+\subsubsection{LSM Tree Imports}
+\subsection{Insertion}
+\label{ssec:insert}
+The framework supports inserting new records by first appending them to the end
+of the mutable buffer. When it is full, the buffer is flushed into a sequence
+of levels containing shards of increasing capacity, using a procedure
+determined by the layout policy as discussed in Section~\ref{sec:framework}.
+This method allows for the cost of repeated shard reconstruction to be
+effectively amortized.
+Let the cost of constructing the SSI from an arbitrary set of $n$ records be
+$C_c(n)$ and the cost of reconstructing the SSI given two or more shards
+containing $n$ records in total be $C_r(n)$. The cost of an insert is composed
+of three parts: appending to the mutable buffer, constructing a new
+shard from the buffered records during a flush, and the total cost of
+reconstructing shards containing the record over the lifetime of the index. The
+cost of appending to the mutable buffer is constant, and the cost of constructing a
+shard from the buffer can be amortized across the records participating in the
+buffer flush, giving $\nicefrac{C_c(N_b)}{N_b}$. These costs are paid exactly once for
+each record. To derive an expression for the cost of repeated reconstruction,
+first note that each record will participate in at most $s$ reconstructions on
+a given level, resulting in a worst-case amortized cost of $O\left(s\cdot
+\nicefrac{C_r(n)}{n}\right)$ paid per level. The index itself will contain at most
+$\log_s n$ levels. Thus, over the lifetime of the index a given record
+will pay $O\left(s\cdot \nicefrac{C_r(n)}{n}\log_s n\right)$ cost in repeated
+reconstruction.
-\subsection{Performance Tuning and Configuration}
+Combining these results, the total amortized insertion cost is
+\begin{equation}
+O\left(\frac{C_c(N_b)}{N_b} + s \cdot \frac{C_r(n)}{n} \log_s n\right)
+\end{equation}
+This can be simplified by noting that $s$ is constant, and that $N_b \ll n$ and also
+a constant. By neglecting these terms, the amortized insertion cost of the
+framework is,
+\begin{equation}
+O\left(\frac{C_r(n)}{n}\log_s n\right)
+\end{equation}
\captionsetup[subfloat]{justification=centering}
@@ -378,8 +541,8 @@ of a level, weight of deleted records, and the like.
\end{figure*}
+\section{Framework Implementation}
-\subsection{Framework Overview}
Our framework has been designed to work efficiently with any SSI, so long
as it has the following properties.
@@ -491,70 +654,6 @@ close with a detailed discussion of the trade-offs within the framework's
design space (Section~\ref{ssec:design-space}).
-\subsection{Insertion}
-\label{ssec:insert}
-The framework supports inserting new records by first appending them to the end
-of the mutable buffer. When it is full, the buffer is flushed into a sequence
-of levels containing shards of increasing capacity, using a procedure
-determined by the layout policy as discussed in Section~\ref{sec:framework}.
-This method allows for the cost of repeated shard reconstruction to be
-effectively amortized.
-
-Let the cost of constructing the SSI from an arbitrary set of $n$ records be
-$C_c(n)$ and the cost of reconstructing the SSI given two or more shards
-containing $n$ records in total be $C_r(n)$. The cost of an insert is composed
-of three parts: appending to the mutable buffer, constructing a new
-shard from the buffered records during a flush, and the total cost of
-reconstructing shards containing the record over the lifetime of the index. The
-cost of appending to the mutable buffer is constant, and the cost of constructing a
-shard from the buffer can be amortized across the records participating in the
-buffer flush, giving $\nicefrac{C_c(N_b)}{N_b}$. These costs are paid exactly once for
-each record. To derive an expression for the cost of repeated reconstruction,
-first note that each record will participate in at most $s$ reconstructions on
-a given level, resulting in a worst-case amortized cost of $O\left(s\cdot
-\nicefrac{C_r(n)}{n}\right)$ paid per level. The index itself will contain at most
-$\log_s n$ levels. Thus, over the lifetime of the index a given record
-will pay $O\left(s\cdot \nicefrac{C_r(n)}{n}\log_s n\right)$ cost in repeated
-reconstruction.
-
-Combining these results, the total amortized insertion cost is
-\begin{equation}
-O\left(\frac{C_c(N_b)}{N_b} + s \cdot \frac{C_r(n)}{n} \log_s n\right)
-\end{equation}
-This can be simplified by noting that $s$ is constant, and that $N_b \ll n$ and also
-a constant. By neglecting these terms, the amortized insertion cost of the
-framework is,
-\begin{equation}
-O\left(\frac{C_r(n)}{n}\log_s n\right)
-\end{equation}
-
-\begin{figure}
- \centering
- \subfloat[Tombstone Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}}\\
- \subfloat[Tagging Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}}
-
- \caption{\textbf{Overview of the rejection check procedure for deleted records.} First,
- a record is sampled (1).
- When using the tombstone delete policy
- (Figure~\ref{fig:delete-tombstone}), the rejection check starts by (2) querying
- the bloom filter of the mutable buffer. The filter indicates the record is
- not present, so (3) the filter on $L_0$ is queried next. This filter
- returns a false positive, so (4) a point-lookup is executed against $L_0$.
- The lookup fails to find a tombstone, so the search continues and (5) the
- filter on $L_1$ is checked, which reports that the tombstone is present.
- This time, it is not a false positive, and so (6) a lookup against $L_1$
- (7) locates the tombstone. The record is thus rejected. When using the
- tagging policy (Figure~\ref{fig:delete-tag}), (1) the record is sampled and
- (2) checked directly for the delete tag. It is set, so the record is
- immediately rejected.}
-
- \label{fig:delete}
-
-\end{figure}
-
-
-\subsection{Deletion}
-\label{ssec:delete}
\subsection{Trade-offs on Framework Design Space}
diff --git a/chapters/tail-latency.tex b/chapters/tail-latency.tex
new file mode 100644
index 0000000..110069d
--- /dev/null
+++ b/chapters/tail-latency.tex
@@ -0,0 +1 @@
+\chapter{Controlling Insertion Tail Latency}
diff --git a/paper.tex b/paper.tex
index 37ea0ef..d1f7a47 100644
--- a/paper.tex
+++ b/paper.tex
@@ -364,8 +364,11 @@ of Engineering Science and Mechanics
\input{chapters/background}
\input{chapters/dynamic-extension-sampling}
\input{chapters/beyond-dsp}
+\input{chapters/design-space}
+\input{chapters/tail-latency}
\input{chapters/future-work}
\input{chapters/conclusion}
+
%\include{Chapter-2/Chapter-2}
%\include{Chapter-3/Chapter-3}
%\include{Chapter-4/Chapter-4}