\section{Applications of the Framework} \label{sec:instance} Using the framework from the previous section, we can create dynamizations of SSIs for various sampling problems. In this section, we consider three different decomposable sampling problems and their associated SSIs, discussing the necessary details of implementation to ensure they work efficiently. \subsection{Weighted Set Sampling (Alias Structure)} \label{ssec:wss-struct} As a first example, we will consider the alias structure~\cite{walker74} for weighted set sampling. This is a static data structure that is constructable in $B(n) \in \Theta(n)$ time and is capable of answering sampling queries in $\Theta(1)$ time per sample. This structure does \emph{not} directly support point-lookups, nor is it naturally sorted to allow for convenient tombstone cancellation. However, the structure itself doesn't place any requirements on the ordering of the underlying data, and so both of these limitations can be addressed by building it over a sorted array. This pre-sorting will require $B(n) \in \Theta(n \log n)$ time to build from the buffer, however after this a sorted-merge can be used to perform reconstructions from the shards themselves. As the maximum number of shards involved in a reconstruction using either layout policy is $\Theta(1)$ using our framework, this means that we can perform reconstructions in $B_M(n) \in \Theta(n)$ time, including tombstone cancellation. The total weight of the structure can also be calculated at no time cost when it is constructed, allows $W(n) \in \Theta(1)$ time as well. Point lookups over the sorted data can be done using a binary search in $L(n) \in \Theta(\log_2 n)$ time, and sampling queries require no pre-processing, so $P(n) \in \Theta(1)$. The mutable buffer can be sampled using rejection sampling. This results in the following cost functions for the various operations supported by the dynamization, \begin{align*} \text{Amortized Insertion/Tombstone Delete:} \quad &\Theta\left(\log_s n\right) \\ \text{Worst-case Sampling:} \quad &\Theta\left(\log_s n + \frac{k}{1 - \delta}\cdot R(n)\right) \\ \text{Worst-case Tagged Delete:} \quad &\Theta\left(\log_s n \log n\right) \end{align*} where $R(n) \in \Theta(1)$ for tagging and $R(n) \in \Theta(\log_s n \log n)$ for tombstones. \Paragraph{Sampling Rejection Rate Bound.} Bounding the number of deleted records is not sufficient to bound the rejection rate of weighted sampling queries on its own, because it doesn't account for the weights of the records being deleted. Recall in our discussion of this bound that we assumed that all records had equal weights. Without this assumption, it is possible to construct adversarial cases where a very highly weighted record is deleted, resulting in it being preferentially sampled and rejected repeatedly. To ensure that our solution is robust even in the face of such adversarial workloads, for the weighted sampling case we introduce another compaction trigger based on the measured rejection rate of each level. We define the rejection rate of level $i$ as, \begin{equation*} \rho_i = \frac{\text{rejections}}{\text{sampling attempts}} \end{equation*} and allow the user to specify a maximum rejection rate, $\rho$. If $\rho_i > \rho$ on a given level, then a proactive compaction is triggered. In the case of tagged deletes, the rejection rate of a level is based on the rejections resulting from sampling attempts on that level. This will \emph{not} work when using tombstones, however, as compacting the level containing the record that was rejected will not make progress towards eliminating that record from the structure in this case. Instead, when using tombstones, the rejection rate is tracked based on the level containing the tombstone that caused the rejection. This ensures that the tombstone is moved towards its associated record, and that the compaction makes progress towards removing it. \subsection{Independent Range Sampling (ISAM Tree)} \label{ssec:irs-struct} We will next considered independent range sampling. For this decomposable sampling problem, we use the ISAM tree for the SSI. Because our shards are static, we can build highly compact and efficient ISAM trees by storing the records directly in a sorted array. So long as the leaf node size is a multiple of the record size, this array can be treated as a sequence of leaf nodes in the tree, and internal nodes can be built above this using array indices as pointers. These internal nodes can also be constructed contiguously in an array, maximizing cache efficiency. To build this structure from the buffer requires sorting the records first, and then performing a linear time bulk-load, and hence $B(n) \in \Theta(n \log n)$. However, sorted-array merges can be used for further reconstructions, meaning that $B_M(n) \in \Theta(n)$. The data structure itself supports point lookups in $L(n)\in \Theta(\log n)$ time. IRS queries can be answered by first using two tree traversals to identify the minimum and maximum array indices associated with the query range in $\Theta(\log n)$ time, and then generating array indices within this range uniformly at random for each sample. The initial traversals can be considered preprocessing time, so $P(n) \in \Theta(\log n)$. The weight of the shard is simply the difference between the upper and lower indices of the range (i.e., the number of records in the range), and so $W(n) \in \Theta(1)$ time, and the per-sample cost is a single random number generation, so $S(n) \in \Theta(1)$. The mutable buffer can be sampled using rejection sampling. Accounting for all these costs, the time complexity of the various operations are, \begin{align*} \text{Amortized Insertion/Tombstone Delete:} \quad &O\left(\log_s n\right) \\ \text{Worst-case Sampling:} \quad &O\left(\log_s n \log_f n + \frac{k}{1 - \delta}\cdot R(n)\right) \\ \text{Worst-case Tagged Delete:} \quad &O\left(\log_s n \log_f n\right) \end{align*} where $R(n) \in \Theta(1)$ for tagging and $R(n) \in \Theta(\log_s n \log_f n)$ for tombstones and $f$ is the fanout of the ISAM tree. \subsection{Weighted Independent Range Sampling (Alias-augmented B+Tree)} \label{ssec:wirs-struct} As a final example of applying this framework, we consider WIRS. This is a decomposable sampling problem that can be answered using the alias-augmented B+tree structure~\cite{tao22, afshani17,hu14}. This data structure is built over sorted data, but can be bulk-loaded from this data in linear time, resulting in costs of $B(n) \in \Theta(n \log n)$ and $B_M(n) \in \Theta(n)$, though the constant factors associated with these functions are quite high, as each bulk-loading requires multiple linear-time operations for building both the B+tree and the alias structures, among other things. As it is built on a B+tree, the structure supports $L(n) \in \Theta(\log n)$ point lookups. Answering sampling queries requires $P(n) \in \Theta(\log n)$ pre-processing time to establish the query interval, during which the weight of the interval can be calculated in $W(n) \in \Theta(1)$ time using the aggregate weight tags in the tree's internal nodes. After this, samples can be drawn in $S(n) \in \Theta(1)$ time. This all results in the following costs, \begin{align*} \text{Amortized Insertion/Tombstone Delete:} \quad &O\left(\log_s n\right) \\ \text{Worst-case Sampling:} \quad &O\left(\log_s n \log_f n + \frac{k}{1 - \delta} \cdot R(n)\right) \\ \text{Worst-case Tagged Delete:} \quad &O\left(\log_s n \log_f n\right) \end{align*} where $R(n) \in O(1)$ for tagging and $R(n) \in O(\log_s n \log_f n)$ for tombstones and $f$ is the fanout of the tree. This is another weighted sampling problem, and so we also apply the same rejection rate based compaction trigger as discussed in Section~\ref{ssec:wss-struct} for the dynamized alias structure.