diff options
| author | Douglas Rumbaugh <dbr4@psu.edu> | 2025-05-05 16:23:25 -0400 |
|---|---|---|
| committer | Douglas Rumbaugh <dbr4@psu.edu> | 2025-05-05 16:23:25 -0400 |
| commit | ac1244fced7e6c6ba93d4292dd9a18ce293236eb (patch) | |
| tree | 671696721d572a9e9ec2b92f94e1ff347ac26760 /chapters/sigmod23 | |
| parent | eb519d35d7f11427dd5fc877130b02478f0da80d (diff) | |
| download | dissertation-ac1244fced7e6c6ba93d4292dd9a18ce293236eb.tar.gz | |
Updates
Diffstat (limited to 'chapters/sigmod23')
| -rw-r--r-- | chapters/sigmod23/background.tex | 23 | ||||
| -rw-r--r-- | chapters/sigmod23/framework.tex | 575 |
2 files changed, 358 insertions, 240 deletions
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex index ad89e03..b4ccbf1 100644 --- a/chapters/sigmod23/background.tex +++ b/chapters/sigmod23/background.tex @@ -124,7 +124,13 @@ query, and selecting or rejecting it for inclusion within the sample with a fixed probability~\cite{db2-doc}. This process requires that each record in the result set be considered, and thus provides no performance benefit relative to the query being sampled from, as it must be answered -in full anyway before returning only some of the results. +in full anyway before returning only some of the results.\footnote{ + To clarify, this is not to say that Bernoulli sampling isn't + useful. It \emph{can} be used to improve the performance of queries + by limiting the cardinality of intermediate results, etc. But it is + not particularly useful for improving the performance of IQS queries, + where the sampling is performed on the final result set of the query. +} For performance, the statistical guarantees can be discarded and systematic or block sampling used instead. Systematic sampling considers @@ -230,6 +236,7 @@ structures attached to the nodes. More examples of alias augmentation applied to different IQS problems can be found in a recent survey by Tao~\cite{tao22}. +\Paragraph{Miscellanea.} There also exist specialized data structures with support for both efficient sampling and updates~\cite{hu14}, but these structures have poor constant factors and are very complex, rendering them of little @@ -252,7 +259,19 @@ only once per sample set (if at all), but fail to support updates. Thus, there appears to be a general dichotomy of sampling techniques: existing sampling data structures support either updates, or efficient sampling, but generally not both. It will be the purpose of this chapter to resolve -this dichotomy. +this dichotomy. In particular, we seek to develop structures with the +following desiderata, + +\begin{enumerate} + \item Support data updates (including deletes) with similar average + performance to a standard B+Tree. + \item Support IQS queries that do not pay a per-sample cost + proportional to some function of the data size. In other words, + $k$ should \emph{not} be be multiplied by any function of $n$ + in the query cost function. + %FIXME: this guy comes out of nowhere... + \item Provide the user with some basic performance tuning capability. +\end{enumerate} diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex index 88ac1ac..c878d93 100644 --- a/chapters/sigmod23/framework.tex +++ b/chapters/sigmod23/framework.tex @@ -16,10 +16,10 @@ there in the context of IRS apply equally to the other sampling problems considered in this chapter. In this section, we will discuss approaches for resolving these problems. -\subsection{Sampling over Partitioned Datasets} +\subsection{Sampling over Decomposed Structures} The core problem facing any attempt to dynamize SSIs is that independently -sampling from a partitioned dataset is difficult. As discussed in +sampling from a decomposed structure is difficult. As discussed in Section~\ref{ssec:background-irs}, accomplishing this task within the DSP model used by the Bentley-Saxe method requires drawing a full $k$ samples from each of the blocks, and then repeatedly down-sampling each @@ -27,14 +27,11 @@ of the intermediate sample sets. However, it is possible to devise a more efficient query process if we abandon the DSP model and consider a slightly more complicated procedure. -First, we need to resolve a minor definitional problem. As noted before, -the DSP model is based on deterministic queries. The definition doesn't -apply for sampling queries, because it assumes that the result sets of -identical queries should also be identical. For general IQS, we also need -to enforce conditions on the query being sampled from. +First, we'll define the IQS problem in terms of the notation and concepts +used in Chapter~\cite{chap:background} for search problems, -\begin{definition}[Query Sampling Problem] - Given a search problem, $F$, a sampling problem is function +\begin{definition}[Independent Query Sampling Problem] + Given a search problem, $F$, a query sampling problem is function of the form $X: (F, \mathcal{D}, \mathcal{Q}, \mathbb{Z}^+) \to \mathcal{R}$ where $\mathcal{D}$ is the domain of records and $\mathcal{Q}$ is the domain of query parameters of $F$. The @@ -42,8 +39,14 @@ to enforce conditions on the query being sampled from. of records from the solution to $F$ drawn independently such that, $|R| = k$ for some $k \in \mathbb{Z}^+$. \end{definition} -With this in mind, we can now define the decomposability conditions for -a query sampling problem, + +To consider the decomposability of such problems, we need to resolve a +minor definitional issue. As noted before, the DSP model is based on +deterministic queries. The definition doesn't apply for sampling queries, +because it assumes that the result sets of identical queries should +also be identical. For general IQS, we also need to enforce conditions +on the query being sampled from. Based on these observations, we can +define the decomposability conditions for a query sampling problem, \begin{definition}[Decomposable Sampling Problem] A query sampling problem, $X: (F, \mathcal{D}, \mathcal{Q}, @@ -95,7 +98,7 @@ structures query cost functions are of the form, \end{equation*} -Consider an arbitrary decomposable sampling query with a cost function +Consider an arbitrary decomposable sampling problem with a cost function of the above form, $X(\mathscr{I}, F, q, k)$, which draws a sample of $k$ records from $d \subseteq \mathcal{D}$ using an instance of an SSI $\mathscr{I} \in \mathcal{I}$. Applying dynamization results @@ -146,9 +149,10 @@ query results in a naturally two-phase process, but DSPs are assumed to be single phase. We can construct a more effective process for answering such queries based on a multi-stage process, summarized in Figure~\ref{fig:sample}. \begin{enumerate} - \item Determine each block's respective weight under a given - query to be sampled from (e.g., the number of records falling - into the query range for IRS). + \item Perform the query pre-processing work, and determine each + block's respective weight under a given query to be sampled + from (e.g., the number of records falling into the query range + for IRS). \item Build a temporary alias structure over these weights. @@ -156,7 +160,8 @@ such queries based on a multi-stage process, summarized in Figure~\ref{fig:sampl samples to draw from each block. \item Draw the appropriate number of samples from each block and - merge them together to form the final query result. + merge them together to form the final query result, using any + necessary pre-processing results in the process. \end{enumerate} It is possible that some of the records sampled in Step 4 must be rejected, either because of deletes or some other property of the sampling @@ -184,182 +189,340 @@ $k$ records have been sampled. Assuming a Bentley-Saxe decomposition with $\log n$ blocks and assuming a constant number of repetitions, the cost of answering a decomposible -sampling query having a pre-processing cost of $P(n)$ and a per-sample -cost of $S(n)$ will be, +sampling query having a pre-processing cost of $P(n)$, a weight-determination +cost of $W(n)$ and a per-sample cost of $S(n)$ will be, \begin{equation} \label{eq:dsp-sample-cost} \boxed{ -\mathscr{Q}(n, k) \in \Theta \left( P(n) \log_2 n + k S(n) \right) +\mathscr{Q}(n, k) \in \Theta \left( (P(n) + W(n)) \log_2 n + k S(n) \right) } \end{equation} where the cost of building the alias structure is $\Theta(\log_2 n)$ and thus absorbed into the pre-processing cost. For the SSIs discussed in this chapter, which have $S(n) \in \Theta(1)$, this model provides us with the desired decoupling of the data size ($n$) from the per-sample -cost. +cost. Additionally, for all of the SSIs considered in this paper, +the weights can be determined in either $W(n) \in \Theta(1)$ time, +or are naturally determined as part of the pre-processing, and thus the +$W(n)$ term can be merged into $P(n)$. \subsection{Supporting Deletes} -Because the shards are static, records cannot be arbitrarily removed from them. -This requires that deletes be supported in some other way, with the ultimate -goal being the prevention of deleted records' appearance in sampling query -result sets. This can be realized in two ways: locating the record and marking -it, or inserting a new record which indicates that an existing record should be -treated as deleted. The framework supports both of these techniques, the -selection of which is called the \emph{delete policy}. The former policy is -called \emph{tagging} and the latter \emph{tombstone}. - -Tagging a record is straightforward. Point-lookups are performed against each -shard in the index, as well as the buffer, for the record to be deleted. When -it is found, a bit in a header attached to the record is set. When sampling, -any records selected with this bit set are automatically rejected. Tombstones -represent a lazy strategy for deleting records. When a record is deleted using -tombstones, a new record with identical key and value, but with a ``tombstone'' -bit set, is inserted into the index. A record's presence can be checked by -performing a point-lookup. If a tombstone with the same key and value exists -above the record in the index, then it should be rejected when sampled. - -Two important aspects of performance are pertinent when discussing deletes: the -cost of the delete operation, and the cost of verifying the presence of a -sampled record. The choice of delete policy represents a trade-off between -these two costs. Beyond this simple trade-off, the delete policy also has other -implications that can affect its applicability to certain types of SSI. Most -notably, tombstones do not require any in-place updating of records, whereas -tagging does. This means that using tombstones is the only way to ensure total -immutability of the data within shards, which avoids random writes and eases -concurrency control. The tombstone delete policy, then, is particularly -appealing in external and concurrent contexts. - -\Paragraph{Deletion Cost.} The cost of a delete under the tombstone policy is -the same as an ordinary insert. Tagging, by contrast, requires a point-lookup -of the record to be deleted, and so is more expensive. Assuming a point-lookup -operation with cost $L(n)$, a tagged delete must search each level in the -index, as well as the buffer, requiring $O\left(N_b + L(n)\log_s n\right)$ -time. - -\Paragraph{Rejection Check Costs.} In addition to the cost of the delete -itself, the delete policy affects the cost of determining if a given record has -been deleted. This is called the \emph{rejection check cost}, $R(n)$. When -using tagging, the information necessary to make the rejection decision is -local to the sampled record, and so $R(n) \in O(1)$. However, when using tombstones -it is not; a point-lookup must be performed to search for a given record's -corresponding tombstone. This look-up must examine the buffer, and each shard -within the index. This results in a rejection check cost of $R(n) \in O\left(N_b + -L(n) \log_s n\right)$. The rejection check process for the two delete policies is -summarized in Figure~\ref{fig:delete}. - -Two factors contribute to the tombstone rejection check cost: the size of the -buffer, and the cost of performing a point-lookup against the shards. The -latter cost can be controlled using the framework's ability to associate -auxiliary structures with shards. For SSIs which do not support efficient -point-lookups, a hash table can be added to map key-value pairs to their -location within the SSI. This allows for constant-time rejection checks, even -in situations where the index would not otherwise support them. However, the -storage cost of this intervention is high, and in situations where the SSI does -support efficient point-lookups, it is not necessary. Further performance -improvements can be achieved by noting that the probability of a given record -having an associated tombstone in any particular shard is relatively small. -This means that many point-lookups will be executed against shards that do not -contain the tombstone being searched for. In this case, these unnecessary -lookups can be partially avoided using Bloom filters~\cite{bloom70} for -tombstones. By inserting tombstones into these filters during reconstruction, -point-lookups against some shards which do not contain the tombstone being -searched for can be bypassed. Filters can be attached to the buffer as well, -which may be even more significant due to the linear cost of scanning it. As -the goal is a reduction of rejection check costs, these filters need only be -populated with tombstones. In a later section, techniques for bounding the -number of tombstones on a given level are discussed, which will allow for the -memory usage of these filters to be tightly controlled while still ensuring -precise bounds on filter error. - -\Paragraph{Sampling with Deletes.} The addition of deletes to the framework -alters the analysis of sampling costs. A record that has been deleted cannot -be present in the sample set, and therefore the presence of each sampled record -must be verified. If a record has been deleted, it must be rejected. When -retrying samples rejected due to delete, the process must restart from shard -selection, as deleted records may be counted in the weight totals used to -construct that structure. This increases the cost of sampling to, -\begin{equation} -\label{eq:sampling-cost} - O\left([W(n) + P(n)]\log_s n + \frac{kS(n)}{1 - \mathbf{Pr}[\text{rejection}]} \cdot R(n)\right) -\end{equation} -where $R(n)$ is the cost of checking if a sampled record has been deleted, and -$\nicefrac{k}{1 -\mathbf{Pr}[\text{rejection}]}$ is the expected number of sampling -attempts required to obtain $k$ samples, given a fixed rejection probability. -The rejection probability itself is a function of the workload, and is -unbounded. - -\Paragraph{Bounding the Rejection Probability.} Rejections during sampling -constitute wasted memory accesses and random number generations, and so steps -should be taken to minimize their frequency. The probability of a rejection is -directly related to the number of deleted records, which is itself a function -of workload and dataset. This means that, without building counter-measures -into the framework, tight bounds on sampling performance cannot be provided in -the presence of deleted records. It is therefore critical that the framework -support some method for bounding the number of deleted records within the -index. - -While the static nature of shards prevents the direct removal of records at the -moment they are deleted, it doesn't prevent the removal of records during -reconstruction. When using tagging, all tagged records encountered during -reconstruction can be removed. When using tombstones, however, the removal -process is non-trivial. In principle, a rejection check could be performed for -each record encountered during reconstruction, but this would increase -reconstruction costs and introduce a new problem of tracking tombstones -associated with records that have been removed. Instead, a lazier approach can -be used: delaying removal until a tombstone and its associated record -participate in the same shard reconstruction. This delay allows both the record -and its tombstone to be removed at the same time, an approach called -\emph{tombstone cancellation}. In general, this can be implemented using an -extra linear scan of the input shards before reconstruction to identify -tombstones and associated records for cancellation, but potential optimizations -exist for many SSIs, allowing it to be performed during the reconstruction -itself at no extra cost. - -The removal of deleted records passively during reconstruction is not enough to -bound the number of deleted records within the index. It is not difficult to -envision pathological scenarios where deletes result in unbounded rejection -rates, even with this mitigation in place. However, the dropping of deleted -records does provide a useful property: any specific deleted record will -eventually be removed from the index after a finite number of reconstructions. -Using this fact, a bound on the number of deleted records can be enforced. A -new parameter, $\delta$, is defined, representing the maximum proportion of -deleted records within the index. Each level, and the buffer, tracks the number -of deleted records it contains by counting its tagged records or tombstones. -Following each buffer flush, the proportion of deleted records is checked -against $\delta$. If any level is found to exceed it, then a proactive -reconstruction is triggered, pushing its shards down into the next level. The -process is repeated until all levels respect the bound, allowing the number of -deleted records to be precisely controlled, which, by extension, bounds the -rejection rate. This process is called \emph{compaction}. - -Assuming every record is equally likely to be sampled, this new bound can be -applied to the analysis of sampling costs. The probability of a record being -rejected is $\mathbf{Pr}[\text{rejection}] = \delta$. Applying this result to -Equation~\ref{eq:sampling-cost} yields, +As discussed in Section~\ref{ssec:background-deletes}, the Bentley-Saxe +method can support deleting records through the use of either weak +deletes, or a secondary ghost structure, assume certain properties are +satisfied by either the search problem or data structure. Unfortunately, +neither approach can work as a "drop-in" solution in the context of +sampling problems, because of the way that deleted records interact with +the sampling process itself. Sampling problems, as formalized here, +are neither invertable, nor deletion decomposable. In this section, +we'll discuss our mechanisms for supporting deletes, as well as how +these can be handled during sampling while maintaining correctness. + +Because both deletion policies have their advantages under certain +contexts, we decided to support both. Specifically, we propose two +mechanisms for deletes, which are + +\begin{enumerate} +\item \textbf{Tagged Deletes.} Each record in the structure includes a +header with a visibility bit set. On delete, the structure is searched +for the record, and the bit is set in indicate that it has been deleted. +This mechanism is used to support \emph{weak deletes}. +\item \textbf{Tombstone Deletes.} On delete, a new record is inserted into +the structure with a tombstone bit set in the header. This mechanism is +used to support \emph{ghost structure} based deletes. +\end{enumerate} + +Broadly speaking, for sampling problems, tombstone deletes cause a number +of problems because \emph{sampling problems are not invertible}. However, +this limitation can be worked around during the query process if desired. +Tagging is much more natural for these search problems. However, the +flexibility of selecting either option is desirable because of their +different performance characteristics. + +While tagging is a fairly direct method of implementing weak deletes, +tombstones are sufficiently different from the traditional ghost structure +system that it is worth motivating the decision to use them here. One +of the major limitations of the ghost structure approach for handling +deletes is that there is not a principled method for removing deleted +records from the decomposed structure. The standard approach is to set an +arbitrary number of delete records, and rebuild the entire structure when +this threshold is crossed~\cite{saxe79}. Mixing the "ghost" records into +the same structures as the original records allows for deleted records +to naturally be cleaned up over time as they meet their tombstones during +reconstructions. This is an important consequence that will be discussed +in more detail in Section~\ref{ssec-sampling-delete-bounding}. + +There are two relevant aspects of performance that the two mechanisms +trade-off between: the cost of performing the delete, and the cost of +checking if a sampled record has been deleted. In addition to these, +the use of tombstones also makes supporting concurrency and external +data structures far easier. This is because tombstone deletes are simple +inserts, and thus they leave the individual structures immutable. Tagging +requires doing in-place updates of the record header in the structures, +resulting in possible race conditions and random IO operations on +disk. This makes tombstone deletes particularly attractive in these +contexts. + + +\subsubsection{Deletion Cost} +We will first consider the cost of performing a delete using either +mechanism. + +\Paragraph{Tombstone Deletes.} +The cost of using a tombstone delete in a Bentley-Saxe dynamization is +the same as a simple insert, +\begin{equation*} +\mathscr{D}(n)_A \in \Theta\left(\frac{B(n)}{n} \log_2 (n)\right) +\end{equation*} +with the worst-case cost being $\Theta(B(n))$. Note that there is also +a minor performance effect resulting from deleted records appearing +twice within the structure, once for the original record and once for +the tombstone, inflating the overall size of the structure. + +\Paragraph{Tagged Deletes.} In contrast to tombstone deletes, tagged +deletes are not simple inserts, and so have their own cost function. The +process of deleting a record under tagging consists of first searching +the entire structure for the record to be deleted, and then setting a +bit in its header. As a result, the performance of this operation is +a function of how expensive it is to locate an individual record within +the decomposed data structure. + +In the theoretical literature, this lookup operation is provided +by a global hash table built over every record in the structure, +mapping each record to the block that contains it. Then, the data +structure's weak delete operation can be applied to the relevant +block~\cite{merge-dsp}. While this is certainly an option for us, we +note that the SSIs we are currently considering all support a reasonably +efficient $\Theta(\log n)$ lookup operation as it is, and have elected +to design tagged deletes to allow this operation to be leveraged when +available, rather than needing to deal with maintaining global hash table. +If a given SSI has a point-lookup cost of $L(n)$, then a tagged delete +on a Bentley-Saxe decomposition of that SSI will require, at worst, +executing a point-lookup on each block, with a total cost of + +\begin{equation*} +\mathscr{D}(n) \in \Theta\left( L(n) \log_2 (n)\right) +\end{equation*} + +If the SSI being considered does \emph{not} support an efficient +point-lookup operation, then a hash table can be used instead. We consider +individual hash tables associated with each block, rather than a single +global one, for simplicity of implementation and analysis. So, in these +cases, the same procedure as above can be used, with $L(n) \in \Theta(1)$. + + +\begin{figure} + \centering + \subfloat[Tombstone Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}}\\ + \subfloat[Tagging Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}} + + \caption{\textbf{Overview of the rejection check procedure for deleted records.} First, + a record is sampled (1). + When using the tombstone delete policy + (Figure~\ref{fig:delete-tombstone}), the rejection check starts by (2) querying + the bloom filter of the mutable buffer. The filter indicates the record is + not present, so (3) the filter on $L_0$ is queried next. This filter + returns a false positive, so (4) a point-lookup is executed against $L_0$. + The lookup fails to find a tombstone, so the search continues and (5) the + filter on $L_1$ is checked, which reports that the tombstone is present. + This time, it is not a false positive, and so (6) a lookup against $L_1$ + (7) locates the tombstone. The record is thus rejected. When using the + tagging policy (Figure~\ref{fig:delete-tag}), (1) the record is sampled and + (2) checked directly for the delete tag. It is set, so the record is + immediately rejected.} + + \label{fig:delete} + +\end{figure} + +\subsubsection{Rejection Check Costs} + +Because sampling queries are neither invertible nor deletion decomposable, +the query process must be modified to support deletes using either of the +above mechanisms. This modification entails requiring that each sampled +record be manually checked to confirm that it hasn't been deleted, prior +to adding it to the sample set. We call the cost of this operation the +\emph{rejection check cost}, $R(n)$. The process differs between the +two deletion mechanisms, and the two procedures are summarized in +Figure~\ref{fig:delete}. + +For tagged deletes, this is a simple process. The information about the +deletion status of a given record is stored directly alongside the record, +within its header. So, once a record has been sampled, this check can be +immediately performed with $R(n) \in \Theta(1)$ time. + +Tombstone deletes, however, introduce a significant difficulty in +performing the rejection check. The information about whether a record +has been deleted is not local to the record itself, and therefore a +point-lookup is required to search for the tombstone associated with +each sample. Thus, the rejection check cost when using tombstones to +implement deletes over a Bentley-Saxe decomposition of an SSI is, \begin{equation} -%\label{eq:sampling-cost-del} - O\left([W(n) + P(n)]\log_s n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right) +R(n) \in \Theta( L(n) \log_2 n) \end{equation} +This performance cost seems catastrophically bad, considering +it must be paid per sample, but there are ways to mitigate +it. We will discuss these mitigations in more detail later, +during our discussion of the implementation of these results in +Section~\ref{sec:sampling-implementation}. + + +\subsubsection{Bounding Rejection Probability} + +When a sampled record has been rejected, it must be resampled. This +introduces performance overhead resulting from extra memory access and +random number generations, and hurts our ability to provide performance +bounds on our sampling operations. In the worst case, a structure +may consist mostly or entirely of deleted records, resulting in +a potentially unbounded number of rejections during sampling. Thus, +in order to maintain sampling performance bounds, the probability of a +rejection during sampling must be bounded. + +The reconstructions associated with Bentley-Saxe dynamization give us +a natural way of controlling the number of deleted records within the +structure, and thereby bounding the rejection rate. During reconstruction, +we have the opportunity to remove deleted records. This will cause the +record counts associated with each block of the structure to gradually +drift out of alignment with the "perfect" powers of two associated with +the Bentley-Saxe method, however. In the theoretical literature on this +topic, the solution to this problem is to periodically repartition all of +the records to re-align the block sizes~\cite{merge-dsp, saxe79}. This +approach could also be easily applied here, if desired, though we +do not in our implementations, for reasons that will be discussed in +Section~\ref{sec:sampling-implementation}. + +The process of removing these deleted records during reconstructions is +different for the two mechanisms. Tagged deletes are straightforward, +because all tagged records can simply be dropped when they are involved +in a reconstruction. Tombstones, however, require a slightly more complex +approach. Rather than being able to drop deleted records immediately, +during reconstructions the records can only be dropped when the +tombstone and its associate record are involved in the \emph{same} +reconstruction, at which point both can be dropped. We call this +process \emph{tombstone cancellation}. In the general case, it can be +implemented using a preliminary linear pass over the records involved +in a reconstruction to identify the records to be dropped, but in many +cases reconstruction involves sorting the records anyway, and by taking +care with ordering semantics, tombstones and their associated records can +be sorted into adjacent spots, allowing them to be efficiently dropped +during reconstruction without any extra overhead. + +While the dropping of deleted records during reconstruction helps, it is +not sufficient on its own to ensure a particular bound on the number of +deleted records within the structure. Pathological scenarios resulting in +unbounded rejection rates, even in the presence of this mitigation, are +possible. For example, tagging alone will never trigger reconstructions, +and so it would be possible to delete every single record within the +structure without triggering a reconstruction, or records could be deleted +in the reverse order that they were inserted using tombstones. In either +case, a passive system of dropping records naturally during reconstruction +is not sufficient. + +Fortunately, this passive system can be used as the basis for a +system that does provide a bound. This is because it guarantees, +whether tagging or tombstones are used, that any given deleted +record will \emph{eventually} be cancelled out after a finite number +of reconstructions. If the number of deleted records gets too high, +some or all of these deleted records can be cleared out by proactively +performing reconstructions. We call these proactive reconstructions +\emph{compactions}. + +The basic strategy, then, is to define a maximum allowable proportion +of deleted records, $\delta \in [0, 1]$. Each block in the decomposition +tracks the number of tombstones or tagged records within it. This count +can be easily maintained by incrementing a counter when a record in the +block is tagged, and by counting tombstones during reconstructions. These +counts on each block are then monitored, and if the proportion of deletes +in a block ever exceeds $\delta$, a proactive reconstruction including +this block and one or more blocks below it in the structure can be +triggered. The proportion of the newly compacted block can then be checked +again, and this process repeated until all blocks respect the bound. + +For tagging, a single round of compaction will always suffice, because all +deleted records involved in the reconstruction will be dropped. Tombstones +may require multiple cascading rounds of compaction to occur, because a +tombstone record will only cancel when it encounters the record that it +deletes. However, because tombstones always follow the record they +delete in insertion order, and will therefore always be "above" that +record in the structure, each reconstruction will move every tombstone +involved closer to the record it deletes, ensuring that eventually the +bound will be satisfied. + +Asymptotically, this compaction process will not affect the amortized +insertion cost of the structure. This is because the cost is based on +the number of reconstructions that a given record is involved in over +the lifetime of the structure. Preemptive compaction does not increase +the number of reconstructions, only \emph{when} they occur. + +\subsubsection{Sampling Procedure with Deletes} + +Because sampling is neither deletion decomposable nor invertible, +the presence of deletes will have an effect on the query costs. As +already mentioned, the basic cost associated with deletes is a rejection +check associated with each sampled record. When a record is sampled, +it must be checked to determine whether it has been deleted or not. If +it has, then it must be rejected. Note that when this rejection occurs, +it cannot be retried immediately on the same block, but rather a new +block must be selected to sample from. This is because deleted records +aren't accounted for in the weight calculations, and so could introduce +bias. As a straightforward example of this problem, consider a block +that contains only deleted records. Any sample drawn from this block will +be rejected, and so retrying samples against this block will result in +an infinite loop. + +Assuming the compaction strategy mentioned in the previous section is +applied, ensuring a bound of at most $\delta$ proportion of deleted +records in the structure, and assuming all records have an equal +probability of being sampled, the cost of answering sampling queries +accounting for rejections is, -Asymptotically, this proactive compaction does not alter the analysis of -insertion costs. Each record is still written at most $s$ times on each level, -there are at most $\log_s n$ levels, and the buffer insertion and SSI -construction costs are all unchanged, and so on. This results in the amortized -insertion cost remaining the same. +\begin{equation*} +%\label{eq:sampling-cost-del} + \mathscr{Q}(n, k) = \Theta\left([W(n) + P(n)]\log_2 n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right) +\end{equation*} +Where $\frac{k}{1 - \delta}$ is the expected number of samples that must +be taken to obtain a sample set of size $k$. -This compaction strategy is based upon tombstone and record counts, and the -bounds assume that every record is equally likely to be sampled. For certain -sampling problems (such as WSS), there are other conditions that must be -considered to provide a bound on the rejection rate. To account for these -situations in a general fashion, the framework supports problem-specific -compaction triggers that can be tailored to the SSI being used. These allow -compactions to be triggered based on other properties, such as rejection rate -of a level, weight of deleted records, and the like. +\subsection{Performance Tuning and Configuration} +\subsubsection{LSM Tree Imports} +\subsection{Insertion} +\label{ssec:insert} +The framework supports inserting new records by first appending them to the end +of the mutable buffer. When it is full, the buffer is flushed into a sequence +of levels containing shards of increasing capacity, using a procedure +determined by the layout policy as discussed in Section~\ref{sec:framework}. +This method allows for the cost of repeated shard reconstruction to be +effectively amortized. +Let the cost of constructing the SSI from an arbitrary set of $n$ records be +$C_c(n)$ and the cost of reconstructing the SSI given two or more shards +containing $n$ records in total be $C_r(n)$. The cost of an insert is composed +of three parts: appending to the mutable buffer, constructing a new +shard from the buffered records during a flush, and the total cost of +reconstructing shards containing the record over the lifetime of the index. The +cost of appending to the mutable buffer is constant, and the cost of constructing a +shard from the buffer can be amortized across the records participating in the +buffer flush, giving $\nicefrac{C_c(N_b)}{N_b}$. These costs are paid exactly once for +each record. To derive an expression for the cost of repeated reconstruction, +first note that each record will participate in at most $s$ reconstructions on +a given level, resulting in a worst-case amortized cost of $O\left(s\cdot +\nicefrac{C_r(n)}{n}\right)$ paid per level. The index itself will contain at most +$\log_s n$ levels. Thus, over the lifetime of the index a given record +will pay $O\left(s\cdot \nicefrac{C_r(n)}{n}\log_s n\right)$ cost in repeated +reconstruction. -\subsection{Performance Tuning and Configuration} +Combining these results, the total amortized insertion cost is +\begin{equation} +O\left(\frac{C_c(N_b)}{N_b} + s \cdot \frac{C_r(n)}{n} \log_s n\right) +\end{equation} +This can be simplified by noting that $s$ is constant, and that $N_b \ll n$ and also +a constant. By neglecting these terms, the amortized insertion cost of the +framework is, +\begin{equation} +O\left(\frac{C_r(n)}{n}\log_s n\right) +\end{equation} \captionsetup[subfloat]{justification=centering} @@ -378,8 +541,8 @@ of a level, weight of deleted records, and the like. \end{figure*} +\section{Framework Implementation} -\subsection{Framework Overview} Our framework has been designed to work efficiently with any SSI, so long as it has the following properties. @@ -491,70 +654,6 @@ close with a detailed discussion of the trade-offs within the framework's design space (Section~\ref{ssec:design-space}). -\subsection{Insertion} -\label{ssec:insert} -The framework supports inserting new records by first appending them to the end -of the mutable buffer. When it is full, the buffer is flushed into a sequence -of levels containing shards of increasing capacity, using a procedure -determined by the layout policy as discussed in Section~\ref{sec:framework}. -This method allows for the cost of repeated shard reconstruction to be -effectively amortized. - -Let the cost of constructing the SSI from an arbitrary set of $n$ records be -$C_c(n)$ and the cost of reconstructing the SSI given two or more shards -containing $n$ records in total be $C_r(n)$. The cost of an insert is composed -of three parts: appending to the mutable buffer, constructing a new -shard from the buffered records during a flush, and the total cost of -reconstructing shards containing the record over the lifetime of the index. The -cost of appending to the mutable buffer is constant, and the cost of constructing a -shard from the buffer can be amortized across the records participating in the -buffer flush, giving $\nicefrac{C_c(N_b)}{N_b}$. These costs are paid exactly once for -each record. To derive an expression for the cost of repeated reconstruction, -first note that each record will participate in at most $s$ reconstructions on -a given level, resulting in a worst-case amortized cost of $O\left(s\cdot -\nicefrac{C_r(n)}{n}\right)$ paid per level. The index itself will contain at most -$\log_s n$ levels. Thus, over the lifetime of the index a given record -will pay $O\left(s\cdot \nicefrac{C_r(n)}{n}\log_s n\right)$ cost in repeated -reconstruction. - -Combining these results, the total amortized insertion cost is -\begin{equation} -O\left(\frac{C_c(N_b)}{N_b} + s \cdot \frac{C_r(n)}{n} \log_s n\right) -\end{equation} -This can be simplified by noting that $s$ is constant, and that $N_b \ll n$ and also -a constant. By neglecting these terms, the amortized insertion cost of the -framework is, -\begin{equation} -O\left(\frac{C_r(n)}{n}\log_s n\right) -\end{equation} - -\begin{figure} - \centering - \subfloat[Tombstone Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}}\\ - \subfloat[Tagging Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}} - - \caption{\textbf{Overview of the rejection check procedure for deleted records.} First, - a record is sampled (1). - When using the tombstone delete policy - (Figure~\ref{fig:delete-tombstone}), the rejection check starts by (2) querying - the bloom filter of the mutable buffer. The filter indicates the record is - not present, so (3) the filter on $L_0$ is queried next. This filter - returns a false positive, so (4) a point-lookup is executed against $L_0$. - The lookup fails to find a tombstone, so the search continues and (5) the - filter on $L_1$ is checked, which reports that the tombstone is present. - This time, it is not a false positive, and so (6) a lookup against $L_1$ - (7) locates the tombstone. The record is thus rejected. When using the - tagging policy (Figure~\ref{fig:delete-tag}), (1) the record is sampled and - (2) checked directly for the delete tag. It is set, so the record is - immediately rejected.} - - \label{fig:delete} - -\end{figure} - - -\subsection{Deletion} -\label{ssec:delete} \subsection{Trade-offs on Framework Design Space} |