\section{Dynamization of SSIs} \label{sec:framework} Our goal, then, is to design a solution to indepedent sampling that is able to achieve \emph{both} efficient updates and efficient sampling, while also maintaining statistical independence both within and between IQS queries, and to do so in a generalized fashion without needing to design new dynamic data structures for each problem. Given the range of SSIs already available, it seems reasonable to attempt to apply dynamization techniques to accomplish this goal. Using the Bentley-Saxe method would allow us to to support inserts and deletes without requiring any modification of the SSIs. Unfortunately, as discussed in Section~\ref{ssec:background-irs}, there are problems with directly applying BSM to sampling problems. All of the considerations discussed there in the context of IRS apply equally to the other sampling problems considered in this chapter. In this section, we will discuss approaches for resolving these problems. \subsection{Sampling over Partitioned Datasets} The core problem facing any attempt to dynamize SSIs is that independently sampling from a partitioned dataset is difficult. As discussed in Section~\ref{ssec:background-irs}, accomplishing this task within the DSP model used by the Bentley-Saxe method requires drawing a full $k$ samples from each of the blocks, and then repeatedly down-sampling each of the intermediate sample sets. However, it is possible to devise a more efficient query process if we abandon the DSP model and consider a slightly more complicated procedure. First, we need to resolve a minor definitional problem. As noted before, the DSP model is based on deterministic queries. The definition doesn't apply for sampling queries, because it assumes that the result sets of identical queries should also be identical. For general IQS, we also need to enforce conditions on the query being sampled from. \begin{definition}[Query Sampling Problem] Given a search problem, $F$, a sampling problem is function of the form $X: (F, \mathcal{D}, \mathcal{Q}, \mathbb{Z}^+) \to \mathcal{R}$ where $\mathcal{D}$ is the domain of records and $\mathcal{Q}$ is the domain of query parameters of $F$. The solution to a sampling problem, $R \in \mathcal{R}$ will be a subset of records from the solution to $F$ drawn independently such that, $|R| = k$ for some $k \in \mathbb{Z}^+$. \end{definition} With this in mind, we can now define the decomposability conditions for a query sampling problem, \begin{definition}[Decomposable Sampling Problem] A query sampling problem, $X: (F, \mathcal{D}, \mathcal{Q}, \mathbb{Z}^+ \to \mathcal{R}$) is decomposable if and only if the following conditions are met for all $q \in \mathcal{Q}, k \in \mathbb{Z}^+$, \begin{enumerate} \item There exists a $\Theta(C(n,k))$ time computable, associative, and commutative binary operator $\mergeop$ such that, \begin{equation*} X(F, A \cup B, q, k) \sim X(F, A, q, k)~ \mergeop ~X(F, B, q, k) \end{equation*} for all $A, B \in \mathcal{PS}(\mathcal{D})$ where $A \cap B = \emptyset$. \item For any dataset $D \subseteq \mathcal{D}$ that has been decomposed into $m$ partitions such that $D = \bigcup_{i=1}^m D_i$ and $D_i \cap D_j = \emptyset \quad \forall i,j < m, i \neq j$, \begin{equation*} F(D, q) = \bigcup_{i=1}^m F(D_i, q) \end{equation*} \end{enumerate} \end{definition} These two conditions warrant further explaination. The first condition is simply a redefinition of the standard decomposability criteria to consider matching the distribution, rather than the exact records in $R$, as the correctness condition for the merge process. The second condition handles a necessary property of the underlying search problem being sampled from. Note that this condition is \emph{stricter} than normal decomposability for $F$, and essentially requires that the query being sampled from return a set of records, rather than an aggregate value or some other result that cannot be meaningfully sampled from. This condition is satisfied by predicate-filtering style database queries, among others. With these definitions in mind, let's turn to solving these query sampling problems. First, we note that many SSIs have a sampling procedure that naturally involves two phases. First, some preliminary work is done to determine metadata concerning the set of records to sample from, and then $k$ samples are drawn from the structure, taking advantage of this metadata. If we represent the time cost of the prelimary work with $P(n)$ and the cost of drawing a sample with $S(n)$, then these structures query cost functions are of the form, \begin{equation*} \mathscr{Q}(n, k) = P(n) + k S(n) \end{equation*} Consider an arbitrary decomposable sampling query with a cost function of the above form, $X(\mathscr{I}, F, q, k)$, which draws a sample of $k$ records from $d \subseteq \mathcal{D}$ using an instance of an SSI $\mathscr{I} \in \mathcal{I}$. Applying dynamization results in $d$ being split across $m$ disjoint instances of $\mathcal{I}$ such that $d = \bigcup_{i=0}^m \text{unbuild}(\mathscr{I}_i)$ and $\text{unbuild}(\mathscr{I}_i) \cap \text{unbuild}(\mathscr{I}_j) = \emptyset \quad \forall i, j < m, i \neq j$. If we consider a Bentley-Saxe dynamization of such a structure, the $\mergeop$ operation would be a $\Theta(k)$ down-sampling. Thus, the total query cost of such a structure would be, \begin{equation*} \Theta\left(\log_2 n \left( P(n) + k S(n) + k\right)\right) \end{equation*} This cost function is sub-optimal for two reasons. First, we pay extra cost to merge the result sets together because of the down-sampling combination operator. Secondly, this formulation fails to avoid a per-sample dependence on $n$, even in the case where $S(n) \in \Theta(1)$. This gets even worse when considering rejections that may occur as a result of deleted records. Recall from Section~\ref{ssec:background-deletes} that deletion can be supported using weak deletes or a shadow structure in a Bentley-Saxe dynamization. Using either approach, it isn't possible to avoid deleted records in advance when sampling, and so these will need to be rejected and retried. In the DSP model, this retry will need to reprocess every block a second time. You cannot retry in place without introducing bias into the result set. We will discuss this more in Section~\ref{ssec:sampling-deletes}. \begin{figure} \centering \includegraphics[width=\textwidth]{img/sigmod23/sampling} \caption{\textbf{Overview of the multiple-block query sampling process} for Example~\ref{ex:sample} with $k=1000$. First, (1) the normalized weights of the shards is determined, then (2) these weights are used to construct an alias structure. Next, (3) the alias structure is queried $k$ times to determine per shard sample sizes, and then (4) sampling is performed. Finally, (5) any rejected samples are retried starting from the alias structure, and the process is repeated until the desired number of samples has been retrieved.} \label{fig:sample} \end{figure} The key insight that allowed us to solve this particular problem was that there is a mismatch between the structure of the sampling query process, and the structure assumed by DSPs. Using an SSI to answer a sampling query results in a naturally two-phase process, but DSPs are assumed to be single phase. We can construct a more effective process for answering such queries based on a multi-stage process, summarized in Figure~\ref{fig:sample}. \begin{enumerate} \item Determine each block's respective weight under a given query to be sampled from (e.g., the number of records falling into the query range for IRS). \item Build a temporary alias structure over these weights. \item Query the alias structure $k$ times to determine how many samples to draw from each block. \item Draw the appropriate number of samples from each block and merge them together to form the final query result. \end{enumerate} It is possible that some of the records sampled in Step 4 must be rejected, either because of deletes or some other property of the sampling procedure being used. If $r$ records are rejected, the above procedure can be repeated from Step 3, taking $k - r$ as the number of times to query the alias structure, without needing to redo any of the preprocessing steps. This can be repeated as many times as necessary until the required $k$ records have been sampled. \begin{example} \label{ex:sample} Consider executing a WSS query, with $k=1000$, across three blocks containing integer keys with unit weight. $\mathscr{I}_1$ contains only the key $-2$, $\mathscr{I}_2$ contains all integers on $[1,100]$, and $\mathscr{I}_3$ contains all integers on $[101, 200]$. These structures are shown in Figure~\ref{fig:sample}. Sampling is performed by first determining the normalized weights for each block: $w_1 = 0.005$, $w_2 = 0.4975$, $w_3 = 0.4975$, which are then used to construct a block alias structure. The block alias structure is then queried $k$ times, resulting in a distribution of $k_i$s that is commensurate with the relative weights of each block. Finally, each block is queried in turn to draw the appropriate number of samples. \end{example} Assuming a Bentley-Saxe decomposition with $\log n$ blocks and assuming a constant number of repetitions, the cost of answering a decomposible sampling query having a pre-processing cost of $P(n)$ and a per-sample cost of $S(n)$ will be, \begin{equation} \label{eq:dsp-sample-cost} \boxed{ \mathscr{Q}(n, k) \in \Theta \left( P(n) \log_2 n + k S(n) \right) } \end{equation} where the cost of building the alias structure is $\Theta(\log_2 n)$ and thus absorbed into the pre-processing cost. For the SSIs discussed in this chapter, which have $S(n) \in \Theta(1)$, this model provides us with the desired decoupling of the data size ($n$) from the per-sample cost. \subsection{Supporting Deletes} Because the shards are static, records cannot be arbitrarily removed from them. This requires that deletes be supported in some other way, with the ultimate goal being the prevention of deleted records' appearance in sampling query result sets. This can be realized in two ways: locating the record and marking it, or inserting a new record which indicates that an existing record should be treated as deleted. The framework supports both of these techniques, the selection of which is called the \emph{delete policy}. The former policy is called \emph{tagging} and the latter \emph{tombstone}. Tagging a record is straightforward. Point-lookups are performed against each shard in the index, as well as the buffer, for the record to be deleted. When it is found, a bit in a header attached to the record is set. When sampling, any records selected with this bit set are automatically rejected. Tombstones represent a lazy strategy for deleting records. When a record is deleted using tombstones, a new record with identical key and value, but with a ``tombstone'' bit set, is inserted into the index. A record's presence can be checked by performing a point-lookup. If a tombstone with the same key and value exists above the record in the index, then it should be rejected when sampled. Two important aspects of performance are pertinent when discussing deletes: the cost of the delete operation, and the cost of verifying the presence of a sampled record. The choice of delete policy represents a trade-off between these two costs. Beyond this simple trade-off, the delete policy also has other implications that can affect its applicability to certain types of SSI. Most notably, tombstones do not require any in-place updating of records, whereas tagging does. This means that using tombstones is the only way to ensure total immutability of the data within shards, which avoids random writes and eases concurrency control. The tombstone delete policy, then, is particularly appealing in external and concurrent contexts. \Paragraph{Deletion Cost.} The cost of a delete under the tombstone policy is the same as an ordinary insert. Tagging, by contrast, requires a point-lookup of the record to be deleted, and so is more expensive. Assuming a point-lookup operation with cost $L(n)$, a tagged delete must search each level in the index, as well as the buffer, requiring $O\left(N_b + L(n)\log_s n\right)$ time. \Paragraph{Rejection Check Costs.} In addition to the cost of the delete itself, the delete policy affects the cost of determining if a given record has been deleted. This is called the \emph{rejection check cost}, $R(n)$. When using tagging, the information necessary to make the rejection decision is local to the sampled record, and so $R(n) \in O(1)$. However, when using tombstones it is not; a point-lookup must be performed to search for a given record's corresponding tombstone. This look-up must examine the buffer, and each shard within the index. This results in a rejection check cost of $R(n) \in O\left(N_b + L(n) \log_s n\right)$. The rejection check process for the two delete policies is summarized in Figure~\ref{fig:delete}. Two factors contribute to the tombstone rejection check cost: the size of the buffer, and the cost of performing a point-lookup against the shards. The latter cost can be controlled using the framework's ability to associate auxiliary structures with shards. For SSIs which do not support efficient point-lookups, a hash table can be added to map key-value pairs to their location within the SSI. This allows for constant-time rejection checks, even in situations where the index would not otherwise support them. However, the storage cost of this intervention is high, and in situations where the SSI does support efficient point-lookups, it is not necessary. Further performance improvements can be achieved by noting that the probability of a given record having an associated tombstone in any particular shard is relatively small. This means that many point-lookups will be executed against shards that do not contain the tombstone being searched for. In this case, these unnecessary lookups can be partially avoided using Bloom filters~\cite{bloom70} for tombstones. By inserting tombstones into these filters during reconstruction, point-lookups against some shards which do not contain the tombstone being searched for can be bypassed. Filters can be attached to the buffer as well, which may be even more significant due to the linear cost of scanning it. As the goal is a reduction of rejection check costs, these filters need only be populated with tombstones. In a later section, techniques for bounding the number of tombstones on a given level are discussed, which will allow for the memory usage of these filters to be tightly controlled while still ensuring precise bounds on filter error. \Paragraph{Sampling with Deletes.} The addition of deletes to the framework alters the analysis of sampling costs. A record that has been deleted cannot be present in the sample set, and therefore the presence of each sampled record must be verified. If a record has been deleted, it must be rejected. When retrying samples rejected due to delete, the process must restart from shard selection, as deleted records may be counted in the weight totals used to construct that structure. This increases the cost of sampling to, \begin{equation} \label{eq:sampling-cost} O\left([W(n) + P(n)]\log_s n + \frac{kS(n)}{1 - \mathbf{Pr}[\text{rejection}]} \cdot R(n)\right) \end{equation} where $R(n)$ is the cost of checking if a sampled record has been deleted, and $\nicefrac{k}{1 -\mathbf{Pr}[\text{rejection}]}$ is the expected number of sampling attempts required to obtain $k$ samples, given a fixed rejection probability. The rejection probability itself is a function of the workload, and is unbounded. \Paragraph{Bounding the Rejection Probability.} Rejections during sampling constitute wasted memory accesses and random number generations, and so steps should be taken to minimize their frequency. The probability of a rejection is directly related to the number of deleted records, which is itself a function of workload and dataset. This means that, without building counter-measures into the framework, tight bounds on sampling performance cannot be provided in the presence of deleted records. It is therefore critical that the framework support some method for bounding the number of deleted records within the index. While the static nature of shards prevents the direct removal of records at the moment they are deleted, it doesn't prevent the removal of records during reconstruction. When using tagging, all tagged records encountered during reconstruction can be removed. When using tombstones, however, the removal process is non-trivial. In principle, a rejection check could be performed for each record encountered during reconstruction, but this would increase reconstruction costs and introduce a new problem of tracking tombstones associated with records that have been removed. Instead, a lazier approach can be used: delaying removal until a tombstone and its associated record participate in the same shard reconstruction. This delay allows both the record and its tombstone to be removed at the same time, an approach called \emph{tombstone cancellation}. In general, this can be implemented using an extra linear scan of the input shards before reconstruction to identify tombstones and associated records for cancellation, but potential optimizations exist for many SSIs, allowing it to be performed during the reconstruction itself at no extra cost. The removal of deleted records passively during reconstruction is not enough to bound the number of deleted records within the index. It is not difficult to envision pathological scenarios where deletes result in unbounded rejection rates, even with this mitigation in place. However, the dropping of deleted records does provide a useful property: any specific deleted record will eventually be removed from the index after a finite number of reconstructions. Using this fact, a bound on the number of deleted records can be enforced. A new parameter, $\delta$, is defined, representing the maximum proportion of deleted records within the index. Each level, and the buffer, tracks the number of deleted records it contains by counting its tagged records or tombstones. Following each buffer flush, the proportion of deleted records is checked against $\delta$. If any level is found to exceed it, then a proactive reconstruction is triggered, pushing its shards down into the next level. The process is repeated until all levels respect the bound, allowing the number of deleted records to be precisely controlled, which, by extension, bounds the rejection rate. This process is called \emph{compaction}. Assuming every record is equally likely to be sampled, this new bound can be applied to the analysis of sampling costs. The probability of a record being rejected is $\mathbf{Pr}[\text{rejection}] = \delta$. Applying this result to Equation~\ref{eq:sampling-cost} yields, \begin{equation} %\label{eq:sampling-cost-del} O\left([W(n) + P(n)]\log_s n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right) \end{equation} Asymptotically, this proactive compaction does not alter the analysis of insertion costs. Each record is still written at most $s$ times on each level, there are at most $\log_s n$ levels, and the buffer insertion and SSI construction costs are all unchanged, and so on. This results in the amortized insertion cost remaining the same. This compaction strategy is based upon tombstone and record counts, and the bounds assume that every record is equally likely to be sampled. For certain sampling problems (such as WSS), there are other conditions that must be considered to provide a bound on the rejection rate. To account for these situations in a general fashion, the framework supports problem-specific compaction triggers that can be tailored to the SSI being used. These allow compactions to be triggered based on other properties, such as rejection rate of a level, weight of deleted records, and the like. \subsection{Performance Tuning and Configuration} \captionsetup[subfloat]{justification=centering} \begin{figure*} \centering \subfloat[Leveling]{\includegraphics[width=.75\textwidth]{img/sigmod23/merge-leveling} \label{fig:leveling}}\\ \subfloat[Tiering]{\includegraphics[width=.75\textwidth]{img/sigmod23/merge-tiering} \label{fig:tiering}} \caption{\textbf{A graphical overview of the sampling framework and its insert procedure.} A mutable buffer (MB) sits atop two levels (L0, L1) containing shards (pairs of SSIs and auxiliary structures [A]) using the leveling (Figure~\ref{fig:leveling}) and tiering (Figure~\ref{fig:tiering}) layout policies. Records are represented as black/colored squares, and grey squares represent unused capacity. An insertion requiring a multi-level reconstruction is illustrated.} \label{fig:framework} \end{figure*} \subsection{Framework Overview} Our framework has been designed to work efficiently with any SSI, so long as it has the following properties. \begin{enumerate} \item The underlying full query $Q$ supported by the SSI from whose results samples are drawn satisfies the following property: for any dataset $D = \cup_{i = 1}^{n}D_i$ where $D_i \cap D_j = \emptyset$, $Q(D) = \cup_{i = 1}^{n}Q(D_i)$. \item \emph{(Optional)} The SSI supports efficient point-lookups. \item \emph{(Optional)} The SSI is capable of efficiently reporting the total weight of all records returned by the underlying full query. \end{enumerate} The first property applies to the query being sampled from, and is essential for the correctness of sample sets reported by extended sampling indexes.\footnote{ This condition is stricter than the definition of a decomposable search problem in the Bentley-Saxe method, which allows for \emph{any} constant-time merge operation, not just union. However, this condition is satisfied by many common types of database query, such as predicate-based filtering queries.} The latter two properties are optional, but reduce deletion and sampling costs respectively. Should the SSI fail to support point-lookups, an auxiliary hash table can be attached to the data structures. Should it fail to support query result weight reporting, rejection sampling can be used in place of the more efficient scheme discussed in Section~\ref{ssec:sample}. The analysis of this framework will generally assume that all three conditions are satisfied. Given an SSI with these properties, a dynamic extension can be produced as shown in Figure~\ref{fig:framework}. The extended index consists of disjoint shards containing an instance of the SSI being extended, and optional auxiliary data structures. The auxiliary structures allow acceleration of certain operations that are required by the framework, but which the SSI being extended does not itself support efficiently. Examples of possible auxiliary structures include hash tables, Bloom filters~\cite{bloom70}, and range filters~\cite{zhang18,siqiang20}. The shards are arranged into levels of increasing record capacity, with either one shard, or up to a fixed maximum number of shards, per level. The decision to place one or many shards per level is called the \emph{layout policy}. The policy names are borrowed from the literature on the LSM tree, with the former called \emph{leveling} and the latter called \emph{tiering}. To avoid a reconstruction on every insert, an unsorted array of fixed capacity ($N_b$), called the \emph{mutable buffer}, is used to buffer updates. Because it is unsorted, it is kept small to maintain reasonably efficient sampling and point-lookup performance. All updates are performed by appending new records to the tail of this buffer. If a record currently within the index is to be updated to a new value, it must first be deleted, and then a record with the new value inserted. This ensures that old versions of records are properly filtered from query results. When the buffer is full, it is flushed to make room for new records. The flushing procedure is based on the layout policy in use. When using leveling (Figure~\ref{fig:leveling}) a new SSI is constructed using both the records in $L_0$ and those in the buffer. This is used to create a new shard, which replaces the one previously in $L_0$. When using tiering (Figure~\ref{fig:tiering}) a new shard is built using only the records from the buffer, and placed into $L_0$ without altering the existing shards. Each level has a record capacity of $N_b \cdot s^{i+1}$, controlled by a configurable parameter, $s$, called the scale factor. Records are organized in one large shard under leveling, or in $s$ shards of $N_b \cdot s^i$ capacity each under tiering. When a level reaches its capacity, it must be emptied to make room for the records flushed into it. This is accomplished by moving its records down to the next level of the index. Under leveling, this requires constructing a new shard containing all records from both the source and target levels, and placing this shard into the target, leaving the source empty. Under tiering, the shards in the source level are combined into a single new shard that is placed into the target level. Should the target be full, it is first emptied by applying the same procedure. New empty levels are dynamically added as necessary to accommodate these reconstructions. Note that shard reconstructions are not necessarily performed using merging, though merging can be used as an optimization of the reconstruction procedure where such an algorithm exists. In general, reconstruction requires only pooling the records of the shards being combined and then applying the SSI's standard construction algorithm to this set of records. \begin{table}[t] \caption{Frequently Used Notation} \centering \begin{tabular}{|p{2.5cm} p{5cm}|} \hline \textbf{Variable} & \textbf{Description} \\ \hline $N_b$ & Capacity of the mutable buffer \\ \hline $s$ & Scale factor \\ \hline $C_c(n)$ & SSI initial construction cost \\ \hline $C_r(n)$ & SSI reconstruction cost \\ \hline $L(n)$ & SSI point-lookup cost \\ \hline $P(n)$ & SSI sampling pre-processing cost \\ \hline $S(n)$ & SSI per-sample sampling cost \\ \hline $W(n)$ & Shard weight determination cost \\ \hline $R(n)$ & Shard rejection check cost \\ \hline $\delta$ & Maximum delete proportion \\ \hline %$\rho$ & Maximum rejection rate \\ \hline \end{tabular} \label{tab:nomen} \end{table} Table~\ref{tab:nomen} lists frequently used notation for the various parameters of the framework, which will be used in the coming analysis of the costs and trade-offs associated with operations within the framework's design space. The remainder of this section will discuss the performance characteristics of insertion into this structure (Section~\ref{ssec:insert}), how it can be used to correctly answer sampling queries (Section~\ref{ssec:insert}), and efficient approaches for supporting deletes (Section~\ref{ssec:delete}). Finally, it will close with a detailed discussion of the trade-offs within the framework's design space (Section~\ref{ssec:design-space}). \subsection{Insertion} \label{ssec:insert} The framework supports inserting new records by first appending them to the end of the mutable buffer. When it is full, the buffer is flushed into a sequence of levels containing shards of increasing capacity, using a procedure determined by the layout policy as discussed in Section~\ref{sec:framework}. This method allows for the cost of repeated shard reconstruction to be effectively amortized. Let the cost of constructing the SSI from an arbitrary set of $n$ records be $C_c(n)$ and the cost of reconstructing the SSI given two or more shards containing $n$ records in total be $C_r(n)$. The cost of an insert is composed of three parts: appending to the mutable buffer, constructing a new shard from the buffered records during a flush, and the total cost of reconstructing shards containing the record over the lifetime of the index. The cost of appending to the mutable buffer is constant, and the cost of constructing a shard from the buffer can be amortized across the records participating in the buffer flush, giving $\nicefrac{C_c(N_b)}{N_b}$. These costs are paid exactly once for each record. To derive an expression for the cost of repeated reconstruction, first note that each record will participate in at most $s$ reconstructions on a given level, resulting in a worst-case amortized cost of $O\left(s\cdot \nicefrac{C_r(n)}{n}\right)$ paid per level. The index itself will contain at most $\log_s n$ levels. Thus, over the lifetime of the index a given record will pay $O\left(s\cdot \nicefrac{C_r(n)}{n}\log_s n\right)$ cost in repeated reconstruction. Combining these results, the total amortized insertion cost is \begin{equation} O\left(\frac{C_c(N_b)}{N_b} + s \cdot \frac{C_r(n)}{n} \log_s n\right) \end{equation} This can be simplified by noting that $s$ is constant, and that $N_b \ll n$ and also a constant. By neglecting these terms, the amortized insertion cost of the framework is, \begin{equation} O\left(\frac{C_r(n)}{n}\log_s n\right) \end{equation} \begin{figure} \centering \subfloat[Tombstone Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}}\\ \subfloat[Tagging Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}} \caption{\textbf{Overview of the rejection check procedure for deleted records.} First, a record is sampled (1). When using the tombstone delete policy (Figure~\ref{fig:delete-tombstone}), the rejection check starts by (2) querying the bloom filter of the mutable buffer. The filter indicates the record is not present, so (3) the filter on $L_0$ is queried next. This filter returns a false positive, so (4) a point-lookup is executed against $L_0$. The lookup fails to find a tombstone, so the search continues and (5) the filter on $L_1$ is checked, which reports that the tombstone is present. This time, it is not a false positive, and so (6) a lookup against $L_1$ (7) locates the tombstone. The record is thus rejected. When using the tagging policy (Figure~\ref{fig:delete-tag}), (1) the record is sampled and (2) checked directly for the delete tag. It is set, so the record is immediately rejected.} \label{fig:delete} \end{figure} \subsection{Deletion} \label{ssec:delete} \subsection{Trade-offs on Framework Design Space} \label{ssec:design-space} The framework has several tunable parameters, allowing it to be tailored for specific applications. This design space contains trade-offs among three major performance characteristics: update cost, sampling cost, and auxiliary memory usage. The two most significant decisions when implementing this framework are the selection of the layout and delete policies. The asymptotic analysis of the previous sections obscures some of the differences between these policies, but they do have significant practical performance implications. \Paragraph{Layout Policy.} The choice of layout policy represents a clear trade-off between update and sampling performance. Leveling results in fewer shards of larger size, whereas tiering results in a larger number of smaller shards. As a result, leveling reduces the costs associated with point-lookups and sampling query preprocessing by a constant factor, compared to tiering. However, it results in more write amplification: a given record may be involved in up to $s$ reconstructions on a single level, as opposed to the single reconstruction per level under tiering. \Paragraph{Delete Policy.} There is a trade-off between delete performance and sampling performance that exists in the choice of delete policy. Tagging requires a point-lookup when performing a delete, which is more expensive than the insert required by tombstones. However, it also allows constant-time rejection checks, unlike tombstones which require a point-lookup of each sampled record. In situations where deletes are common and write-throughput is critical, tombstones may be more useful. Tombstones are also ideal in situations where immutability is required, or random writes must be avoided. Generally speaking, however, tagging is superior when using SSIs that support it, because sampling rejection checks will usually be more common than deletes. \Paragraph{Mutable Buffer Capacity and Scale Factor.} The mutable buffer capacity and scale factor both influence the number of levels within the index, and by extension the number of distinct shards. Sampling and point-lookups have better performance with fewer shards. Smaller shards are also faster to reconstruct, although the same adjustments that reduce shard size also result in a larger number of reconstructions, so the trade-off here is less clear. The scale factor has an interesting interaction with the layout policy: when using leveling, the scale factor directly controls the amount of write amplification per level. Larger scale factors mean more time is spent reconstructing shards on a level, reducing update performance. Tiering does not have this problem and should see its update performance benefit directly from a larger scale factor, as this reduces the number of reconstructions. The buffer capacity also influences the number of levels, but is more significant in its effects on point-lookup performance: a lookup must perform a linear scan of the buffer. Likewise, the unstructured nature of the buffer also will contribute negatively towards sampling performance, irrespective of which buffer sampling technique is used. As a result, although a large buffer will reduce the number of shards, it will also hurt sampling and delete (under tagging) performance. It is important to minimize the cost of these buffer scans, and so it is preferable to keep the buffer small, ideally small enough to fit within the CPU's L2 cache. The number of shards within the index is, then, better controlled by changing the scale factor, rather than the buffer capacity. Using a smaller buffer will result in more compactions and shard reconstructions; however, the empirical evaluation in Section~\ref{ssec:ds-exp} demonstrates that this is not a serious performance problem when a scale factor is chosen appropriately. When the shards are in memory, frequent small reconstructions do not have a significant performance penalty compared to less frequent, larger ones. \Paragraph{Auxiliary Structures.} The framework's support for arbitrary auxiliary data structures allows for memory to be traded in exchange for insertion or sampling performance. The use of Bloom filters for accelerating tombstone rejection checks has already been discussed, but many other options exist. Bloom filters could also be used to accelerate point-lookups for delete tagging, though such filters would require much more memory than tombstone-only ones to be effective. An auxiliary hash table could be used for accelerating point-lookups, or range filters like SuRF \cite{zhang18} or Rosetta \cite{siqiang20} added to accelerate pre-processing for range queries like in IRS or WIRS.