\section{Dynamization of SSIs} \label{sec:framework} Our goal, then, is to design a solution to independent sampling that is able to achieve \emph{both} efficient updates and efficient sampling, while also maintaining statistical independence both within and between IQS queries, and to do so in a generalized fashion without needing to design new dynamic data structures for each problem. Given the range of SSIs already available, it seems reasonable to attempt to apply dynamization techniques to accomplish this goal. Using the Bentley-Saxe method would allow us to to support inserts and deletes without requiring any modification of the SSIs. Unfortunately, as discussed in Section~\ref{ssec:background-irs}, there are problems with directly applying BSM to sampling problems. All of the considerations discussed there in the context of IRS apply equally to the other sampling problems considered in this chapter. In this section, we will discuss approaches for resolving these problems. \begin{table}[t] \centering \begin{tabular}{|l l|} \hline \textbf{Variable} & \textbf{Description} \\ \hline $N_b$ & Capacity of the mutable buffer \\ \hline $s$ & Scale factor \\ \hline $B_c(n)$ & SSI construction cost from unsorted records \\ \hline $B_r(n)$ & SSI reconstruction cost from existing SSI instances\\ \hline $L(n)$ & SSI point-lookup cost \\ \hline $P(n)$ & SSI sampling pre-processing cost \\ \hline $S(n)$ & SSI per-sample sampling cost \\ \hline $W(n)$ & SSI weight determination cost \\ \hline $R(n)$ & Rejection check cost \\ \hline $\delta$ & Maximum delete proportion \\ \hline \end{tabular} \label{tab:nomen} \caption{\textbf{Nomenclature.} A reference of variables and functions used in this chapter.} \end{table} \subsection{Sampling over Decomposed Structures} \label{ssec:decomposed-structure-sampling} The core problem facing any attempt to dynamize SSIs is that independently sampling from a decomposed structure is difficult. As discussed in Section~\ref{ssec:background-irs}, accomplishing this task within the DSP model used by the Bentley-Saxe method requires drawing a full $k$ samples from each of the blocks, and then repeatedly down-sampling each of the intermediate sample sets. However, it is possible to devise a more efficient query process if we abandon the DSP model and consider a slightly more complicated procedure. First, we'll define the IQS problem in terms of the notation and concepts used in Chapter~\cite{chap:background} for search problems, \begin{definition}[Independent Query Sampling Problem] Given a search problem, $F$, a query sampling problem is function of the form $X: (F, \mathcal{D}, \mathcal{Q}, \mathbb{Z}^+) \to \mathcal{R}$ where $\mathcal{D}$ is the domain of records and $\mathcal{Q}$ is the domain of query parameters of $F$. The solution to a sampling problem, $R \in \mathcal{R}$ will be a subset of records from the solution to $F$ drawn independently such that, $|R| = k$ for some $k \in \mathbb{Z}^+$. \end{definition} To consider the decomposability of such problems, we need to resolve a minor definitional issue. As noted before, the DSP model is based on deterministic queries. The definition doesn't apply for sampling queries, because it assumes that the result sets of identical queries should also be identical. For general IQS, we also need to enforce conditions on the query being sampled from. Based on these observations, we can define the decomposability conditions for a query sampling problem, \begin{definition}[Decomposable Sampling Problem] A query sampling problem, $X: (F, \mathcal{D}, \mathcal{Q}, \mathbb{Z}^+ \to \mathcal{R}$) is decomposable if and only if the following conditions are met for all $q \in \mathcal{Q}, k \in \mathbb{Z}^+$, \begin{enumerate} \item There exists a $\Theta(C(n,k))$ time computable, associative, and commutative binary operator $\mergeop$ such that, \begin{equation*} X(F, A \cup B, q, k) \sim X(F, A, q, k)~ \mergeop ~X(F, B, q, k) \end{equation*} for all $A, B \in \mathcal{PS}(\mathcal{D})$ where $A \cap B = \emptyset$. \item For any dataset $D \subseteq \mathcal{D}$ that has been decomposed into $m$ partitions such that $D = \bigcup_{i=1}^m D_i$ and $D_i \cap D_j = \emptyset \quad \forall i,j < m, i \neq j$, \begin{equation*} F(D, q) = \bigcup_{i=1}^m F(D_i, q) \end{equation*} \end{enumerate} \end{definition} These two conditions warrant further explanation. The first condition is simply a redefinition of the standard decomposability criteria to consider matching the distribution, rather than the exact records in $R$, as the correctness condition for the merge process. The second condition handles a necessary property of the underlying search problem being sampled from. Note that this condition is \emph{stricter} than normal decomposability for $F$, and essentially requires that the query being sampled from return a set of records, rather than an aggregate value or some other result that cannot be meaningfully sampled from. This condition is satisfied by predicate-filtering style database queries, among others. With these definitions in mind, let's turn to solving these query sampling problems. First, we note that many SSIs have a sampling procedure that naturally involves two phases. First, some preliminary work is done to determine metadata concerning the set of records to sample from, and then $k$ samples are drawn from the structure, taking advantage of this metadata. If we represent the time cost of the preliminary work with $P(n)$ and the cost of drawing a sample with $S(n)$, then these structures query cost functions are of the form, \begin{equation*} \mathscr{Q}(n, k) = P(n) + k S(n) \end{equation*} Consider an arbitrary decomposable sampling problem with a cost function of the above form, $X(\mathscr{I}, F, q, k)$, which draws a sample of $k$ records from $d \subseteq \mathcal{D}$ using an instance of an SSI $\mathscr{I} \in \mathcal{I}$. Applying dynamization results in $d$ being split across $m$ disjoint instances of $\mathcal{I}$ such that $d = \bigcup_{i=0}^m \text{unbuild}(\mathscr{I}_i)$ and $\text{unbuild}(\mathscr{I}_i) \cap \text{unbuild}(\mathscr{I}_j) = \emptyset \quad \forall i, j < m, i \neq j$. If we consider a Bentley-Saxe dynamization of such a structure, the $\mergeop$ operation would be a $\Theta(k)$ down-sampling. Thus, the total query cost of such a structure would be, \begin{equation*} \Theta\left(\log_2 n \left( P(n) + k S(n) + k\right)\right) \end{equation*} This cost function is sub-optimal for two reasons. First, we pay extra cost to merge the result sets together because of the down-sampling combination operator. Secondly, this formulation fails to avoid a per-sample dependence on $n$, even in the case where $S(n) \in \Theta(1)$. This gets even worse when considering rejections that may occur as a result of deleted records. Recall from Section~\ref{ssec:background-deletes} that deletion can be supported using weak deletes or a shadow structure in a Bentley-Saxe dynamization. Using either approach, it isn't possible to avoid deleted records in advance when sampling, and so these will need to be rejected and retried. In the DSP model, this retry will need to reprocess every block a second time. You cannot retry in place without introducing bias into the result set. We will discuss this more in Section~\ref{ssec:sampling-deletes}. \begin{figure} \centering \includegraphics[width=\textwidth]{img/sigmod23/sampling} \caption{\textbf{Overview of the multiple-block query sampling process} for Example~\ref{ex:sample} with $k=1000$. First, (1) the normalized weights of the shards is determined, then (2) these weights are used to construct an alias structure. Next, (3) the alias structure is queried $k$ times to determine per shard sample sizes, and then (4) sampling is performed. Finally, (5) any rejected samples are retried starting from the alias structure, and the process is repeated until the desired number of samples has been retrieved.} \label{fig:sample} \end{figure} The key insight that allowed us to solve this particular problem was that there is a mismatch between the structure of the sampling query process, and the structure assumed by DSPs. Using an SSI to answer a sampling query results in a naturally two-phase process, but DSPs are assumed to be single phase. We can construct a more effective process for answering such queries based on a multi-stage process, summarized in Figure~\ref{fig:sample}. \begin{enumerate} \item Perform the query pre-processing work, and determine each block's respective weight under a given query to be sampled from (e.g., the number of records falling into the query range for IRS). \item Build a temporary alias structure over these weights. \item Query the alias structure $k$ times to determine how many samples to draw from each block. \item Draw the appropriate number of samples from each block and merge them together to form the final query result, using any necessary pre-processing results in the process. \end{enumerate} It is possible that some of the records sampled in Step 4 must be rejected, either because of deletes or some other property of the sampling procedure being used. If $r$ records are rejected, the above procedure can be repeated from Step 3, taking $k - r$ as the number of times to query the alias structure, without needing to redo any of the preprocessing steps. This can be repeated as many times as necessary until the required $k$ records have been sampled. \begin{example} \label{ex:sample} Consider executing a WSS query, with $k=1000$, across three blocks containing integer keys with unit weight. $\mathscr{I}_1$ contains only the key $-2$, $\mathscr{I}_2$ contains all integers on $[1,100]$, and $\mathscr{I}_3$ contains all integers on $[101, 200]$. These structures are shown in Figure~\ref{fig:sample}. Sampling is performed by first determining the normalized weights for each block: $w_1 = 0.005$, $w_2 = 0.4975$, $w_3 = 0.4975$, which are then used to construct a block alias structure. The block alias structure is then queried $k$ times, resulting in a distribution of $k_i$s that is commensurate with the relative weights of each block. Finally, each block is queried in turn to draw the appropriate number of samples. \end{example} Assuming a Bentley-Saxe decomposition with $\log n$ blocks and assuming a constant number of repetitions, the cost of answering a decomposable sampling query having a pre-processing cost of $P(n)$, a weight-determination cost of $W(n)$ and a per-sample cost of $S(n)$ will be, \begin{equation} \label{eq:dsp-sample-cost} \boxed{ \mathscr{Q}(n, k) \in \Theta \left( (P(n) + W(n)) \log_2 n + k S(n) \right) } \end{equation} where the cost of building the alias structure is $\Theta(\log_2 n)$ and thus absorbed into the pre-processing cost. For the SSIs discussed in this chapter, which have $S(n) \in \Theta(1)$, this model provides us with the desired decoupling of the data size ($n$) from the per-sample cost. Additionally, for all of the SSIs considered in this paper, the weights can be determined in either $W(n) \in \Theta(1)$ time, or are naturally determined as part of the pre-processing, and thus the $W(n)$ term can be merged into $P(n)$. \subsection{Supporting Deletes} \ref{ssec:sampling-deletes} As discussed in Section~\ref{ssec:background-deletes}, the Bentley-Saxe method can support deleting records through the use of either weak deletes, or a secondary ghost structure, assuming certain properties are satisfied by either the search problem or data structure. Unfortunately, neither approach can work as a ``drop-in'' solution in the context of sampling problems, because of the way that deleted records interact with the sampling process itself. Sampling problems, as formalized here, are neither invertible, nor deletion decomposable. In this section, we'll discuss our mechanisms for supporting deletes, as well as how these can be handled during sampling while maintaining correctness. Because both deletion policies have their advantages under certain contexts, we decided to support both. Specifically, we propose two mechanisms for deletes, which are \begin{enumerate} \item \textbf{Tagged Deletes.} Each record in the structure includes a header with a visibility bit set. On delete, the structure is searched for the record, and the bit is set in indicate that it has been deleted. This mechanism is used to support \emph{weak deletes}. \item \textbf{Tombstone Deletes.} On delete, a new record is inserted into the structure with a tombstone bit set in the header. This mechanism is used to support \emph{ghost structure} based deletes. \end{enumerate} Broadly speaking, for sampling problems, tombstone deletes cause a number of problems because \emph{sampling problems are not invertible}. This limitation can be worked around during the query process if desired. Tagging is much more natural for these search problems. However, the flexibility of selecting either option is desirable because of their different performance characteristics. While tagging is a fairly direct method of implementing weak deletes, tombstones are sufficiently different from the traditional ghost structure system that it is worth motivating the decision to use them here. One of the major limitations of the ghost structure approach for handling deletes is that there is not a principled method for removing deleted records from the decomposed structure. The standard approach is to set an arbitrary number of delete records, and rebuild the entire structure when this threshold is crossed~\cite{saxe79}. Mixing the "ghost" records into the same structures as the original records allows for deleted records to naturally be cleaned up over time as they meet their tombstones during reconstructions. This is an important consequence that will be discussed in more detail in Section~\ref{ssec-sampling-delete-bounding}. There are two relevant aspects of performance that the two mechanisms trade-off between: the cost of performing the delete, and the cost of checking if a sampled record has been deleted. In addition to these, the use of tombstones also makes supporting concurrency and external data structures far easier. This is because tombstone deletes are simple inserts, and thus they leave the individual structures immutable. Tagging requires doing in-place updates of the record header in the structures, resulting in possible race conditions and random IO operations on disk. This makes tombstone deletes particularly attractive in these contexts. \subsubsection{Deletion Cost} \label{ssec:sampling-deletes} We will first consider the cost of performing a delete using either mechanism. \Paragraph{Tombstone Deletes.} The cost of using a tombstone delete in a Bentley-Saxe dynamization is the same as a simple insert, \begin{equation*} \mathscr{D}(n)_A \in \Theta\left(\frac{B(n)}{n} \log_2 (n)\right) \end{equation*} with the worst-case cost being $\Theta(B(n))$. Note that there is also a minor performance effect resulting from deleted records appearing twice within the structure, once for the original record and once for the tombstone, inflating the overall size of the structure. \Paragraph{Tagged Deletes.} In contrast to tombstone deletes, tagged deletes are not simple inserts, and so have their own cost function. The process of deleting a record under tagging consists of first searching the entire structure for the record to be deleted, and then setting a bit in its header. As a result, the performance of this operation is a function of how expensive it is to locate an individual record within the decomposed data structure. In the theoretical literature, this lookup operation is provided by a global hash table built over every record in the structure, mapping each record to the block that contains it. Then, the data structure's weak delete operation can be applied to the relevant block~\cite{merge-dsp}. While this is certainly an option for us, we note that the SSIs we are currently considering all support a reasonably efficient $\Theta(\log n)$ lookup operation as it is, and have elected to design tagged deletes to allow this operation to be leveraged when available, rather than needing to deal with maintaining global hash table. If a given SSI has a point-lookup cost of $L(n)$, then a tagged delete on a Bentley-Saxe decomposition of that SSI will require, at worst, executing a point-lookup on each block, with a total cost of \begin{equation*} \mathscr{D}(n) \in \Theta\left( L(n) \log_2 (n)\right) \end{equation*} If the SSI being considered does \emph{not} support an efficient point-lookup operation, then a hash table can be used instead. We consider individual hash tables associated with each block, rather than a single global one, for simplicity of implementation and analysis. So, in these cases, the same procedure as above can be used, with $L(n) \in \Theta(1)$. \begin{figure} \centering \subfloat[Tombstone Rejection Check]{\includegraphics[width=.5\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}} \subfloat[Tagging Rejection Check]{\includegraphics[width=.5\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}} \caption{\textbf{Overview of the rejection check procedure for deleted records.} First, a record is sampled (1). When using the tombstone delete policy (Figure~\ref{fig:delete-tombstone}), the rejection check starts by (2) querying the bloom filter of the mutable buffer. The filter indicates the record is not present, so (3) the filter on $L_0$ is queried next. This filter returns a false positive, so (4) a point-lookup is executed against $L_0$. The lookup fails to find a tombstone, so the search continues and (5) the filter on $L_1$ is checked, which reports that the tombstone is present. This time, it is not a false positive, and so (6) a lookup against $L_1$ (7) locates the tombstone. The record is thus rejected. When using the tagging policy (Figure~\ref{fig:delete-tag}), (1) the record is sampled and (2) checked directly for the delete tag. It is set, so the record is immediately rejected.} \label{fig:delete} \end{figure} \subsubsection{Rejection Check Costs} Because sampling queries are neither invertible nor deletion decomposable, the query process must be modified to support deletes using either of the above mechanisms. This modification entails requiring that each sampled record be manually checked to confirm that it hasn't been deleted, prior to adding it to the sample set. We call the cost of this operation the \emph{rejection check cost}, $R(n)$. The process differs between the two deletion mechanisms, and the two procedures are summarized in Figure~\ref{fig:delete}. For tagged deletes, this is a simple process. The information about the deletion status of a given record is stored directly alongside the record, within its header. So, once a record has been sampled, this check can be immediately performed with $R(n) \in \Theta(1)$ time. Tombstone deletes, however, introduce a significant difficulty in performing the rejection check. The information about whether a record has been deleted is not local to the record itself, and therefore a point-lookup is required to search for the tombstone associated with each sample. Thus, the rejection check cost when using tombstones to implement deletes over a Bentley-Saxe decomposition of an SSI is, \begin{equation} R(n) \in \Theta( L(n) \log_2 n) \end{equation} This performance cost seems catastrophically bad, considering it must be paid per sample, but there are ways to mitigate it. We will discuss these mitigations in more detail later, during our discussion of the implementation of these results in Section~\ref{sec:sampling-implementation}. \subsubsection{Bounding Rejection Probability} \label{sssec:sampling-rejection-bound} When a sampled record has been rejected, it must be re-sampled. This introduces performance overhead resulting from extra memory access and random number generations, and hurts our ability to provide performance bounds on our sampling operations. In the worst case, a structure may consist mostly or entirely of deleted records, resulting in a potentially unbounded number of rejections during sampling. Thus, in order to maintain sampling performance bounds, the probability of a rejection during sampling must be bounded. The reconstructions associated with Bentley-Saxe dynamization give us a natural way of controlling the number of deleted records within the structure, and thereby bounding the rejection rate. During reconstruction, we have the opportunity to remove deleted records. This will cause the record counts associated with each block of the structure to gradually drift out of alignment with the "perfect" powers of two associated with the Bentley-Saxe method, however. In the theoretical literature on this topic, the solution to this problem is to periodically re-partition all of the records to re-align the block sizes~\cite{merge-dsp, saxe79}. This approach could also be easily applied here, if desired, though we do not in our implementations, for reasons that will be discussed in Section~\ref{sec:sampling-implementation}. The process of removing these deleted records during reconstructions is different for the two mechanisms. Tagged deletes are straightforward, because all tagged records can simply be dropped when they are involved in a reconstruction. Tombstones, however, require a slightly more complex approach. Rather than being able to drop deleted records immediately, during reconstructions the records can only be dropped when the tombstone and its associate record are involved in the \emph{same} reconstruction, at which point both can be dropped. We call this process \emph{tombstone cancellation}. In the general case, it can be implemented using a preliminary linear pass over the records involved in a reconstruction to identify the records to be dropped, but in many cases reconstruction involves sorting the records anyway, and by taking care with ordering semantics, tombstones and their associated records can be sorted into adjacent spots, allowing them to be efficiently dropped during reconstruction without any extra overhead. While the dropping of deleted records during reconstruction helps, it is not sufficient on its own to ensure a particular bound on the number of deleted records within the structure. Pathological scenarios resulting in unbounded rejection rates, even in the presence of this mitigation, are possible. For example, tagging alone will never trigger reconstructions, and so it would be possible to delete every single record within the structure without triggering a reconstruction, or records could be deleted in the reverse order that they were inserted using tombstones. In either case, a passive system of dropping records naturally during reconstruction is not sufficient. Fortunately, this passive system can be used as the basis for a system that does provide a bound. This is because it guarantees, whether tagging or tombstones are used, that any given deleted record will \emph{eventually} be canceled out after a finite number of reconstructions. If the number of deleted records gets too high, some or all of these deleted records can be cleared out by proactively performing reconstructions. We call these proactive reconstructions \emph{compactions}. The basic strategy, then, is to define a maximum allowable proportion of deleted records, $\delta \in [0, 1]$. Each block in the decomposition tracks the number of tombstones or tagged records within it. This count can be easily maintained by incrementing a counter when a record in the block is tagged, and by counting tombstones during reconstructions. These counts on each block are then monitored, and if the proportion of deletes in a block ever exceeds $\delta$, a proactive reconstruction including this block and one or more blocks below it in the structure can be triggered. The proportion of the newly compacted block can then be checked again, and this process repeated until all blocks respect the bound. For tagging, a single round of compaction will always suffice, because all deleted records involved in the reconstruction will be dropped. Tombstones may require multiple cascading rounds of compaction to occur, because a tombstone record will only cancel when it encounters the record that it deletes. However, because tombstones always follow the record they delete in insertion order, and will therefore always be "above" that record in the structure, each reconstruction will move every tombstone involved closer to the record it deletes, ensuring that eventually the bound will be satisfied. Asymptotically, this compaction process will not affect the amortized insertion cost of the structure. This is because the cost is based on the number of reconstructions that a given record is involved in over the lifetime of the structure. Preemptive compaction does not increase the number of reconstructions, only \emph{when} they occur. \subsubsection{Sampling Procedure with Deletes} \label{ssec:sampling-with-deletes} Because sampling is neither deletion decomposable nor invertible, the presence of deletes will have an effect on the query costs. As already mentioned, the basic cost associated with deletes is a rejection check associated with each sampled record. When a record is sampled, it must be checked to determine whether it has been deleted or not. If it has, then it must be rejected. Note that when this rejection occurs, it cannot be retried immediately on the same block, but rather a new block must be selected to sample from. This is because deleted records aren't accounted for in the weight calculations, and so could introduce bias. As a straightforward example of this problem, consider a block that contains only deleted records. Any sample drawn from this block will be rejected, and so retrying samples against this block will result in an infinite loop. Assuming the compaction strategy mentioned in the previous section is applied, ensuring a bound of at most $\delta$ proportion of deleted records in the structure, and assuming all records have an equal probability of being sampled, the cost of answering sampling queries accounting for rejections is, \begin{equation*} %\label{eq:sampling-cost-del} \mathscr{Q}(n, k) = \Theta\left([W(n) + P(n)]\log_2 n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right) \end{equation*} Where $\frac{k}{1 - \delta}$ is the expected number of samples that must be taken to obtain a sample set of size $k$. \subsection{Performance Tuning and Configuration} The final of the desiderata referenced earlier in this chapter for our dynamized sampling indices is having tunable performance. The base Bentley-Saxe method has a highly rigid reconstruction policy that, while theoretically convenient, does not lend itself to performance tuning. However, it can be readily modified to form a more relaxed policy that is both tunable, and generally more performant, at the cost of some additional theoretical complexity. There has been some theoretical work in this area, based upon nesting instances of the equal block method within the Bentley-Saxe method~\cite{overmars81}, but these methods are unwieldy and are targeted at tuning the worst-case at the expense of the common case. We will take a different approach to adding configurability to our dynamization system. Though it has thus far gone unmentioned, some readers may have noted the astonishing similarity between decomposition-based dynamization techniques, and a data structure called the Log-structured Merge-tree. First proposed by O'Neil in the mid '90s\cite{oneil96}, the LSM Tree was designed to optimize write throughout for external data structures. It accomplished this task by buffer inserted records in a small in-memory AVL Tree, and then flushing this buffer to disk when it filled up. The flush process itself would fully rebuild the on-disk structure (a B+Tree), including all of the currently existing records on external storage. O'Neil also proposed version which used several, layered, external structures, to reduce the cost of reconstruction. In more recent times, the LSM Tree has seen significant development and been used as the basis for key-value stores like RocksDB~\cite{dong21} and LevelDB~\cite{leveldb}. This work has produced an incredibly large and well explored parametrization of the reconstruction procedures of LSM Trees, a good summary of which can be bound in this recent tutorial paper~\cite{sarkar23}. Examples of this design space exploration include: different ways to organize each "level" of the tree~\cite{dayan19, dostoevsky, autumn}, different growth rates, buffering, sub-partitioning of structures to allow finer-grained reconstruction~\cite{dayan22}, and approaches for allocating resources to auxiliary structures attached to the main ones for accelerating certain types of query~\cite{dayan18-1, zhu21, monkey}. Many of the elements within the LSM Tree design space are based upon the specifics of the data structure itself, and are not generally applicable. However, some of the higher-level concepts can be imported and applied in the context of dynamization. Specifically, we have decided to import the following four elements for use in our dynamization technique, \begin{itemize} \item A small dynamic buffer into which new records are inserted \item A variable growth rate, called as \emph{scale factor} \item The ability to attach auxiliary structures to each block \item Two different strategies for reconstructing data structures \end{itemize} This design space and its associated trade-offs will be discussed in more detail in Chapter~\ref{chap:design-space}, but we'll describe it briefly here. \Paragraph{Buffering.} In the standard Bentley-Saxe method, each insert triggers a reconstruction. Many of these are quite small, but it still makes most insertions somewhat expensive. By adding a small buffer, a large number of inserts can be performed without requiring any reconstructions at all. For generality, we elected to use an unsorted array as our buffer, as dynamic versions of the structures we are dynamizing may not exist. This introduces some query cost, as queries must be answered from these unsorted records as well, but in the case of sampling this isn't a serious problem. The implications of this will be discussed in Section~\ref{ssec:sampling-cost-funcs}. The size of this buffer, $N_B$ is a user-specified constant, and all block capacities are multiplied by it. In the Bentley-Saxe method, the $i$th block contains $2^i$ records. In our scheme, with buffering, this becomes $N_B \cdot 2^i$ records in the $i$th block. We call this unsorted array the \emph{mutable buffer}. \Paragraph{Scale Factor.} In the Bentley-Saxe method, each block is twice as large as the block the precedes it There is, however, no reason why this growth rate couldn't be adjusted. In our system, we make the growth rate a user-specified constant called the \emph{scale factor}, $s$, such that the $i$th level contains $N_B \cdot s^i$ records. \Paragraph{Auxiliary Structures.} In Section~\ref{ssec:sampling-deletes}, we encountered two problems relating to supporting deletes that can be resolved through the use of auxiliary structures. First, regardless of whether tagging or tombstones are used, the data structure requires support for an efficient point-lookup operation. Many SSIs are tree-based and thus support this, but not all data structures do. In such cases, the point-lookup operation could be provided by attaching an auxiliary hash table to the data structure that maps records to their location in the SSI. We use term \emph{shard} to refer to the combination of a block with these optional auxiliary structures. In addition, the tombstone deletion mechanism requires performing a point lookup for every record sampled, to validate that it has not been deleted. This introduces a large amount of overhead into the sampling process, as this requires searching each block in the structure. One approach that can be used to help improve the performance of these searches, without requiring as much storage as adding auxiliary hash tables to every block, is to include bloom filters~\cite{bloom70}. A bloom filter is an approximate data structure that answers tests of set membership with bounded, single-sided error. These are commonly used in LSM Trees to accelerate point lookups by allowing levels that don't contain the record being searched for to be skipped. In our case, we only care about tombstone records, so rather than building these filters over all records, we can build them over tombstones. This approach can greatly improve the sampling performance of the structure when tombstone deletes are used. \Paragraph{Layout Policy.} The Bentley-Saxe method considers blocks individually, without any other organization beyond increasing size. In contrast, LSM Trees have multiple layers of structural organization. The top level structure is a level, upon which record capacity restrictions are applied. These levels are then partitioned into individual structures, which can be further organized by key range. Because our intention is to support general data structures, which may or may not be easily partition by a key, we will not consider the finest grain of partitioning. However, we can borrow the concept of levels, and lay out shards in these levels according to different strategies. Specifically, we consider two layout policies. First, we can allow a single shard per level, a policy called \emph{Leveling}. This approach is traditionally read optimized, as it generally results in fewer shards within the overall structure for a given scale factor. Under leveling, the $i$th level has a capacity of $N_B \cdot s^{i+1}$ records. We can also allow multiple shards per level, resulting in a write-optimized policy called \emph{Tiering}. In tiering, each level can hold up to $s$ shards, each with up to $N_B \cdot s^i$ records. Note that this doesn't alter the overall record capacity of each level relative to leveling, only the way the records are divided up into shards. \section{Practical Dynamization Framework} Based upon the results discussed in the previous section, we are now ready to discuss the dynamization framework that we have produced for adding update support to SSIs. This framework allows us to achieve all three of our desiderata, at least for certain configurations, and provides a wide range of performance tuning options to the user. \subsection{Requirements} The requirements that the framework places upon SSIs are rather modest. The sampling problem being considered must be a decomposable sampling problem (Definition \ref{def:decomp-sampling}) and the SSI must support the \texttt{build} and \texttt{unbuild} operations. Optionally, if the SSI supports point lookups or if the SSI can be constructed from multiple instances of the SSI more efficiently than its normal static construction, these two operations can be leveraged by the framework. However, these are not requirements, as the framework provides facilities to work around their absence. \captionsetup[subfloat]{justification=centering} \begin{figure*} \centering \subfloat[Leveling]{\includegraphics[width=.5\textwidth]{img/sigmod23/merge-leveling} \label{fig:leveling}} \subfloat[Tiering]{\includegraphics[width=.5\textwidth]{img/sigmod23/merge-tiering} \label{fig:tiering}} \caption{\textbf{A graphical overview of our dynamization framework.} A mutable buffer (MB) sits atop two levels (L0, L1) containing shards (pairs of SSIs and auxiliary structures [A]) using the leveling (Figure~\ref{fig:leveling}) and tiering (Figure~\ref{fig:tiering}) layout policies. Records are represented as black/colored squares, and grey squares represent unused capacity. An insertion requiring a multi-level reconstruction is illustrated.} \label{fig:sampling-framework} \end{figure*} \subsection{Framework Construction} The framework itself is shown in Figure~\ref{fig:sampling-framework}, along with some of its configuration parameters and its insert procedure (which will be discussed in the next section). It consists of an unsorted array of size $N_B$ records, sitting atop a sequence of \emph{levels}, each containing SSIs according to the layout policy. If leveling is used, each level will contain a single SSI with up to $N_B \cdot s^{i+1}$ records. If tiering is used, each level will contain up to $s$ SSIs, each with up to $N_B \cdot s^i$ records. The scale factor, $s$, controls the rate at which the capacity of each level grows. The framework supports deletes using either the tombstone or tagging policy, which can be selected by the user according to her preference. To support these delete mechanisms, each record contains an attached header with bits to indicate its tombstone or delete status. \subsection{Supported Operations and Cost Functions} \Paragraph{Insert.} Inserting a record into the dynamization involves appending it to the mutable buffer, which requires $\Theta(1)$ time. When the buffer reaches its capacity, it must be flushed into the structure itself before any further records can be inserted. First, a shard will be constructed from the records in the buffer using the SSI's \texttt{build} operation, with $B(N_B)$ cost. This shard will then be merged into the levels below it, which may require further reconstructions to occur to make room. The manner in which these reconstructions proceed follows the selection of layout policy, \begin{itemize} \item \textbf{Leveling.} When a buffer flush occurs in the leveling policy, the system scans the existing levels to find the first level which has sufficient empty space to store the contents of the level above it. More formally, if the number of records in level $i$ is $N_i$, then $i$ is determined such that $N_i + N_B\cdot s^{i} <= N_B \cdot s^{i+1}$. If no level exists that satisfies the record count constraint, then an empty level is added and $i$ is set to the index of this new level. Then, a reconstruction is executed containing all of the records in levels $i$ and $i - 1$ (where $i=-1$ indicates the temporary shard built from the buffer). Following this reconstruction, all levels $j < i$ are shifted by one level to $j + 1$. \item \textbf{Tiering.} When using tiering, the system will locate the first level, $i$, containing fewer than $s$ shards. If no such level exists, then a new empty level is added and $i$ is set to the index of that level. Then, for each level $j < i$, a reconstruction is performed involving all $s$ shards on that level. The resulting new shard will then be placed into the level at $j + 1$ and $j$ will be emptied. Following this, the newly created shard from the buffer will be appended to level $0$. \end{itemize} In either case, the reconstructions all use instances of the shard as input, and so if the SSI supports more efficient construction in this case (with $B_M(n)$ cost), then this routine can be used here. Once all of the necessary reconstructions have been performed, each level is checked to verify that the proportion of tombstones or deleted records is less than $\delta$. If this condition fails, then a proactive compaction is triggered. This compaction involves doing the reconstructions necessary to move the shard violating the delete bound down one level. Once the compaction is complete, the delete proportions are checked again, and this process is repeated until all levels satisfy the bound. Following this procedure, inserts have a worst case cost of $I \in \Theta(B_M(n))$, equivalent to Bentley-Saxe. The amortized cost can be determined by finding the total cost of reconstructions involving each record and amortizing it over each insert. The cost of the insert is composed of three parts, \begin{enumerate} \item The cost of appending to the buffer \item The cost of flushing the buffer to a shard \item The total cost of the reconstructions the record is involved in over the lifetime of the structure \end{enumerate} The first cost is constant and the second is $B(N_B)$. Regardless of layout policy, there will be $\Theta(\log_s(n))$ total levels, and the record will, at worst, be written a constant number of times to each level, resulting in a maximum of $\Theta(\log_s(n)B_M(n))$ cost associated with these reconstructions. Thus, the total cost associated with each record in the structure is, \begin{equation*} \Theta(1) + \Theta(B(N_B)) + \Theta(\log_s(n)B_M(n)) \end{equation*} Assuming that $N_B \ll n$, the first two terms of this expression are constant. Dropping them and amortizing the result over $n$ records give us the amortized insertion cost, \begin{equation*} I_a(n) \in \Theta\left(\frac{B_M(n)}{n}\log_s(n)\right) \end{equation*} If the SSI being considered does not support a more efficient construction procedure from other instances of the same SSI, and the general Bentley-Saxe \texttt{unbuild} and \texttt{build} operations must be used, the the cost becomes $I_a(n) \in \Theta\left(\frac{B(n)}{n}\log_s(n)\right)$ instead. \Paragraph{Delete.} The framework supports both tombstone and tagged deletes, each with different performance. Using tombstones, the cost of a delete is identical to that of an insert. When using tagging, the cost of a delete is the same as cost of doing a point lookup, as the "delete" itself is simply setting a bit in the header of the record, once it has been located. There will be $\Theta(\log_s n)$ total shards in the structure, each with a look-up cost of $L(n)$ using either the SSI's native point-lookup, or an auxiliary hash table, and the lookup must also scan the buffer in $\Theta(N_B)$ time. Thus, the worst-case cost of a tagged delete is, \begin{equation*} D(n) = \Theta(N_B + L(n)\log_s(n)) \end{equation*} \Paragraph{Update.} Given the above definitions of insert and delete, in-place updates of records can be supported by first deleting the record to be updated, and then inserting the updated value as a new record. Thus, the update cost is $\Theta(I(n) + D(n))$. \Paragraph{Sampling.} Answering sampling queries from this structure is largely the same as was discussed for a standard Bentley-Saxe dynamization in Section~\ref{ssec:sampling-with-deletes} with the addition of a need to sample from the unsorted buffer as well. There are two approaches for sampling from the buffer. The most general approach would be to temporarily build an SSI over the records within the buffer, and then treat this is a normal shard for the remainder of the sampling procedure. In this case, the sampling algorithm remains identical to the algorithm discussed in Section~\ref{ssec:decomposed-structure-sampling}, following the construction of the temporary shard. This results in a worst-case sampling cost of, \begin{equation*} \mathscr{Q}(n, k) = \Theta\left(B(N_B) + [W(n) + P(n)]\log_2 n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right) \end{equation*} In practice, however, it is often possible to perform rejection sampling against the buffer, without needing to do any additional work to prepare it. In this case, the full weight of the buffer can be used to determine how many samples to draw from it, and then these samples can be obtained using standard rejection sampling to both control the weight, and enforce any necessary predicates. Because $N_B \ll n$, this procedure will not introduce anything more than constant overhead in the sampling process as the probability of sampling from the buffer is quite low, and the cost of doing so is constant, and so the overall query cost when rejection sampling is possible is, \begin{equation*} \mathscr{Q}(n, k) = \Theta\left([W(n) + P(n)]\log_2 n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right) \end{equation*} In both cases, $R(n) \in \Theta(1)$ for tagging deletes, and $R(n) \in N_B + L(N) \log_s n$ for tombstones (including the cost of searching the buffer for the tombstone).