\section{Framework Instantiations} \label{sec:instance} In this section, the framework is applied to three sampling problems and their associated SSIs. All three sampling problems draw random samples from records satisfying a simple predicate, and so result sets for all three can be constructed by directly merging the result sets of the queries executed against individual shards, the primary requirement for the application of the framework. The SSIs used for each problem are discussed, including their support of the remaining two optional requirements for framework application. \subsection{Dynamically Extended WSS Structure} \label{ssec:wss-struct} As a first example of applying this framework for dynamic extension, the alias structure for answering WSS queries is considered. This is a static structure that can be constructed in $O(n)$ time and supports WSS queries in $O(1)$ time. The alias structure will be used as the SSI, with the shards containing an alias structure paired with a sorted array of records. { The use of sorted arrays for storing the records allows for more efficient point-lookups, without requiring any additional space. The total weight associated with a query for a given alias structure is the total weight of all of its records, and can be tracked at the shard level and retrieved in constant time. } Using the formulae from Section~\ref{sec:framework}, the worst-case costs of insertion, sampling, and deletion are easily derived. The initial construction cost from the buffer is $C_c(N_b) \in O(N_b \log N_b)$, requiring the sorting of the buffer followed by alias construction. After this point, the shards can be reconstructed in linear time while maintaining sorted order. Thus, the reconstruction cost is $C_r(n) \in O(n)$. As each shard contains a sorted array, the point-lookup cost is $L(n) \in O(\log n)$. The total weight can be tracked with the shard, requiring $W(n) \in O(1)$ time to access, and there is no necessary preprocessing, so $P(n) \in O(1)$. Samples can be drawn in $S(n) \in O(1)$ time. Plugging these results into the formulae for insertion, sampling, and deletion costs gives, \begin{align*} \text{Insertion:} \quad &O\left(\log_s n\right) \\ \text{Sampling:} \quad &O\left(\log_s n + \frac{k}{1 - \delta}\cdot R(n)\right) \\ \text{Tagged Delete:} \quad &O\left(\log_s n \log n\right) \end{align*} where $R(n) \in O(1)$ for tagging and $R(n) \in O(\log_s n \log n)$ for tombstones. \Paragraph{Bounding Rejection Rate.} In the weighted sampling case, the framework's generic record-based compaction trigger mechanism is insufficient to bound the rejection rate. This is because the probability of a given record being sampling is dependent upon its weight, as well as the number of records in the index. If a highly weighted record is deleted, it will be preferentially sampled, resulting in a larger number of rejections than would be expected based on record counts alone. This problem can be rectified using the framework's user-specified compaction trigger mechanism. In addition to tracking record counts, each level also tracks its rejection rate, \begin{equation*} \rho_i = \frac{\text{rejections}}{\text{sampling attempts}} \end{equation*} A configurable rejection rate cap, $\rho$, is then defined. If $\rho_i > \rho$ on a level, a compaction is triggered. In the case the tombstone delete policy, it is not the level containing the sampled record, but rather the level containing its tombstone, that is considered the source of the rejection. This is necessary to ensure that the tombstone is moved closer to canceling its associated record by the compaction. \subsection{Dynamically Extended IRS Structure} \label{ssec:irs-struct} Another sampling problem to which the framework can be applied is independent range sampling (IRS). The SSI in this example is the in-memory ISAM tree. The ISAM tree supports efficient point-lookups directly, and the total weight of an IRS query can be easily obtained by counting the number of records within the query range, which is determined as part of the preprocessing of the query. The static nature of shards in the framework allows for an ISAM tree to be constructed with adjacent nodes positioned contiguously in memory. By selecting a leaf node size that is a multiple of the record size, and avoiding placing any headers within leaf nodes, the set of leaf nodes can be treated as a sorted array of records with direct indexing, and the internal nodes allow for faster searching of this array. Because of this layout, per-sample tree-traversals are avoided. The start and end of the range from which to sample can be determined using a pair of traversals, and then records can be sampled from this range using random number generation and array indexing. Assuming a sorted set of input records, the ISAM tree can be bulk-loaded in linear time. The insertion analysis proceeds like the WSS example previously discussed. The initial construction cost is $C_c(N_b) \in O(N_b \log N_b)$ and reconstruction cost is $C_r(n) \in O(n)$. The ISAM tree supports point-lookups in $L(n) \in O(\log_f n)$ time, where $f$ is the fanout of the tree. The process for performing range sampling against the ISAM tree involves two stages. First, the tree is traversed twice: once to establish the index of the first record greater than or equal to the lower bound of the query, and again to find the index of the last record less than or equal to the upper bound of the query. This process has the effect of providing the number of records within the query range, and can be used to determine the weight of the shard in the shard alias structure. Its cost is $P(n) \in O(\log_f n)$. Once the bounds are established, samples can be drawn by randomly generating uniform integers between the upper and lower bound, in $S(n) \in O(1)$ time each. This results in the extended version of the ISAM tree having the following insert, sampling, and delete costs, \begin{align*} \text{Insertion:} \quad &O\left(\log_s n\right) \\ \text{Sampling:} \quad &O\left(\log_s n \log_f n + \frac{k}{1 - \delta}\cdot R(n)\right) \\ \text{Tagged Delete:} \quad &O\left(\log_s n \log_f n\right) \end{align*} where $R(n) \in O(1)$ for tagging and $R(n) \in O(\log_s n \log_f n)$ for tombstones. \subsection{Dynamically Extended WIRS Structure} \label{ssec:wirs-struct} As a final example of applying this framework, the WIRS problem will be considered. Specifically, the alias-augmented B+tree approach, described by Tao \cite{tao22}, generalizing work by Afshani and Wei \cite{afshani17}, and Hu et al. \cite{hu14}, will be extended. This structure allows for efficient point-lookups, as it is based on the B+tree, and the total weight of a given WIRS query can be calculated given the query range using aggregate weight tags within the tree. The alias-augmented B+tree is a static structure of linear space, capable of being built initially in $C_c(N_b) \in O(N_b \log N_b)$ time, being bulk-loaded from sorted lists of records in $C_r(n) \in O(n)$ time, and answering WIRS queries in $O(\log_f n + k)$ time, where the query cost consists of preliminary work to identify the sampling range and calculate the total weight, with $P(n) \in O(\log_f n)$ cost, and constant-time drawing of samples from that range with $S(n) \in O(1)$. This results in the following costs, \begin{align*} \text{Insertion:} \quad &O\left(\log_s n\right) \\ \text{Sampling:} \quad &O\left(\log_s n \log_f n + \frac{k}{1 - \delta} \cdot R(n)\right) \\ \text{Tagged Delete:} \quad &O\left(\log_s n \log_f n\right) \end{align*} where $R(n) \in O(1)$ for tagging and $R(n) \in O(\log_s n \log_f n)$ for tombstones. Because this is a weighted sampling structure, the custom compaction trigger discussed in in Section~\ref{ssec:wss-struct} is applied to maintain bounded rejection rates during sampling.