\section{Dynamic Sampling Index Framework} \label{sec:framework} This work is an attempt to design a solution to independent sampling that achieves \emph{both} efficient updates and near-constant cost per sample. As the goal is to tackle the problem in a generalized fashion, rather than design problem-specific data structures for used as the basis of an index, a framework is created that allows for already existing static data structures to be used as the basis for a sampling index, by automatically adding support for data updates using a modified version of the Bentley-Saxe method. Unfortunately, Bentley-Saxe as described in Section~\ref{ssec:bsm} cannot be directly applied to sampling problems. The concept of decomposability is not cleanly applicable to sampling, because the distribution of records in the result set, rather than the records themselves, must be matched following the result merge. Efficiently controlling the distribution requires each sub-query to access information external to the structure against which it is being processed, a contingency unaccounted for by Bentley-Saxe. Further, the process of reconstruction used in Bentley-Saxe provides poor worst-case complexity bounds~\cite{saxe79}, and attempts to modify the procedure to provide better worst-case performance are complex and have worse performance in the common case~\cite{overmars81}. Despite these limitations, this chapter will argue that the core principles of the Bentley-Saxe method can be profitably applied to sampling indexes, once a system for controlling result set distributions and a more effective reconstruction scheme have been devised. The solution to the former will be discussed in Section~\ref{ssec:sample}. For the latter, inspiration is drawn from the literature on the LSM tree. The LSM tree~\cite{oneil96} is a data structure proposed to optimize write throughput in disk-based storage engines. It consists of a memory table of bounded size, used to buffer recent changes, and a hierarchy of external levels containing indexes of exponentially increasing size. When the memory table has reached capacity, it is emptied into the external levels. Random writes are avoided by treating the data within the external levels as immutable; all writes go through the memory table. This introduces write amplification but maximizes sequential writes, which is important for maintaining high throughput in disk-based systems. The LSM tree is associated with a broad and well studied design space~\cite{dayan17,dayan18,dayan22,balmau19,dayan18-1} containing trade-offs between three key performance metrics: read performance, write performance, and auxiliary memory usage. The challenges faced in reconstructing predominately in-memory indexes are quite different from those which the LSM tree is intended to address, having little to do with disk-based systems and sequential IO operations. But, the LSM tree possesses a rich design space for managing the periodic reconstruction of data structures in a manner that is both more practical and more flexible than that of Bentley-Saxe. By borrowing from this design space, this preexisting body of work can be leveraged, and many of Bentley-Saxe's limitations addressed. \captionsetup[subfloat]{justification=centering} \begin{figure*} \centering \subfloat[Leveling]{\includegraphics[width=.75\textwidth]{img/sigmod23/merge-leveling} \label{fig:leveling}}\\ \subfloat[Tiering]{\includegraphics[width=.75\textwidth]{img/sigmod23/merge-tiering} \label{fig:tiering}} \caption{\textbf{A graphical overview of the sampling framework and its insert procedure.} A mutable buffer (MB) sits atop two levels (L0, L1) containing shards (pairs of SSIs and auxiliary structures [A]) using the leveling (Figure~\ref{fig:leveling}) and tiering (Figure~\ref{fig:tiering}) layout policies. Records are represented as black/colored squares, and grey squares represent unused capacity. An insertion requiring a multi-level reconstruction is illustrated.} \label{fig:framework} \end{figure*} \subsection{Framework Overview} The goal of this chapter is to build a general framework that extends most SSIs with efficient support for updates by splitting the index into small data structures to reduce reconstruction costs, and then distributing the sampling process over these smaller structures. The framework is designed to work efficiently with any SSI, so long as it has the following properties, \begin{enumerate} \item The underlying full query $Q$ supported by the SSI from whose results samples are drawn satisfies the following property: for any dataset $D = \cup_{i = 1}^{n}D_i$ where $D_i \cap D_j = \emptyset$, $Q(D) = \cup_{i = 1}^{n}Q(D_i)$. \item \emph{(Optional)} The SSI supports efficient point-lookups. \item \emph{(Optional)} The SSI is capable of efficiently reporting the total weight of all records returned by the underlying full query. \end{enumerate} The first property applies to the query being sampled from, and is essential for the correctness of sample sets reported by extended sampling indexes.\footnote{ This condition is stricter than the definition of a decomposable search problem in the Bentley-Saxe method, which allows for \emph{any} constant-time merge operation, not just union. However, this condition is satisfied by many common types of database query, such as predicate-based filtering queries.} The latter two properties are optional, but reduce deletion and sampling costs respectively. Should the SSI fail to support point-lookups, an auxiliary hash table can be attached to the data structures. Should it fail to support query result weight reporting, rejection sampling can be used in place of the more efficient scheme discussed in Section~\ref{ssec:sample}. The analysis of this framework will generally assume that all three conditions are satisfied. Given an SSI with these properties, a dynamic extension can be produced as shown in Figure~\ref{fig:framework}. The extended index consists of disjoint shards containing an instance of the SSI being extended, and optional auxiliary data structures. The auxiliary structures allow acceleration of certain operations that are required by the framework, but which the SSI being extended does not itself support efficiently. Examples of possible auxiliary structures include hash tables, Bloom filters~\cite{bloom70}, and range filters~\cite{zhang18,siqiang20}. The shards are arranged into levels of increasing record capacity, with either one shard, or up to a fixed maximum number of shards, per level. The decision to place one or many shards per level is called the \emph{layout policy}. The policy names are borrowed from the literature on the LSM tree, with the former called \emph{leveling} and the latter called \emph{tiering}. To avoid a reconstruction on every insert, an unsorted array of fixed capacity ($N_b$), called the \emph{mutable buffer}, is used to buffer updates. Because it is unsorted, it is kept small to maintain reasonably efficient sampling and point-lookup performance. All updates are performed by appending new records to the tail of this buffer. If a record currently within the index is to be updated to a new value, it must first be deleted, and then a record with the new value inserted. This ensures that old versions of records are properly filtered from query results. When the buffer is full, it is flushed to make room for new records. The flushing procedure is based on the layout policy in use. When using leveling (Figure~\ref{fig:leveling}) a new SSI is constructed using both the records in $L_0$ and those in the buffer. This is used to create a new shard, which replaces the one previously in $L_0$. When using tiering (Figure~\ref{fig:tiering}) a new shard is built using only the records from the buffer, and placed into $L_0$ without altering the existing shards. Each level has a record capacity of $N_b \cdot s^{i+1}$, controlled by a configurable parameter, $s$, called the scale factor. Records are organized in one large shard under leveling, or in $s$ shards of $N_b \cdot s^i$ capacity each under tiering. When a level reaches its capacity, it must be emptied to make room for the records flushed into it. This is accomplished by moving its records down to the next level of the index. Under leveling, this requires constructing a new shard containing all records from both the source and target levels, and placing this shard into the target, leaving the source empty. Under tiering, the shards in the source level are combined into a single new shard that is placed into the target level. Should the target be full, it is first emptied by applying the same procedure. New empty levels are dynamically added as necessary to accommodate these reconstructions. Note that shard reconstructions are not necessarily performed using merging, though merging can be used as an optimization of the reconstruction procedure where such an algorithm exists. In general, reconstruction requires only pooling the records of the shards being combined and then applying the SSI's standard construction algorithm to this set of records. \begin{table}[t] \caption{Frequently Used Notation} \centering \begin{tabular}{|p{2.5cm} p{5cm}|} \hline \textbf{Variable} & \textbf{Description} \\ \hline $N_b$ & Capacity of the mutable buffer \\ \hline $s$ & Scale factor \\ \hline $C_c(n)$ & SSI initial construction cost \\ \hline $C_r(n)$ & SSI reconstruction cost \\ \hline $L(n)$ & SSI point-lookup cost \\ \hline $P(n)$ & SSI sampling pre-processing cost \\ \hline $S(n)$ & SSI per-sample sampling cost \\ \hline $W(n)$ & Shard weight determination cost \\ \hline $R(n)$ & Shard rejection check cost \\ \hline $\delta$ & Maximum delete proportion \\ \hline %$\rho$ & Maximum rejection rate \\ \hline \end{tabular} \label{tab:nomen} \end{table} Table~\ref{tab:nomen} lists frequently used notation for the various parameters of the framework, which will be used in the coming analysis of the costs and trade-offs associated with operations within the framework's design space. The remainder of this section will discuss the performance characteristics of insertion into this structure (Section~\ref{ssec:insert}), how it can be used to correctly answer sampling queries (Section~\ref{ssec:insert}), and efficient approaches for supporting deletes (Section~\ref{ssec:delete}). Finally, it will close with a detailed discussion of the trade-offs within the framework's design space (Section~\ref{ssec:design-space}). \subsection{Insertion} \label{ssec:insert} The framework supports inserting new records by first appending them to the end of the mutable buffer. When it is full, the buffer is flushed into a sequence of levels containing shards of increasing capacity, using a procedure determined by the layout policy as discussed in Section~\ref{sec:framework}. This method allows for the cost of repeated shard reconstruction to be effectively amortized. Let the cost of constructing the SSI from an arbitrary set of $n$ records be $C_c(n)$ and the cost of reconstructing the SSI given two or more shards containing $n$ records in total be $C_r(n)$. The cost of an insert is composed of three parts: appending to the mutable buffer, constructing a new shard from the buffered records during a flush, and the total cost of reconstructing shards containing the record over the lifetime of the index. The cost of appending to the mutable buffer is constant, and the cost of constructing a shard from the buffer can be amortized across the records participating in the buffer flush, giving $\nicefrac{C_c(N_b)}{N_b}$. These costs are paid exactly once for each record. To derive an expression for the cost of repeated reconstruction, first note that each record will participate in at most $s$ reconstructions on a given level, resulting in a worst-case amortized cost of $O\left(s\cdot \nicefrac{C_r(n)}{n}\right)$ paid per level. The index itself will contain at most $\log_s n$ levels. Thus, over the lifetime of the index a given record will pay $O\left(s\cdot \nicefrac{C_r(n)}{n}\log_s n\right)$ cost in repeated reconstruction. Combining these results, the total amortized insertion cost is \begin{equation} O\left(\frac{C_c(N_b)}{N_b} + s \cdot \frac{C_r(n)}{n} \log_s n\right) \end{equation} This can be simplified by noting that $s$ is constant, and that $N_b \ll n$ and also a constant. By neglecting these terms, the amortized insertion cost of the framework is, \begin{equation} O\left(\frac{C_r(n)}{n}\log_s n\right) \end{equation} \subsection{Sampling} \label{ssec:sample} \begin{figure} \centering \includegraphics[width=\textwidth]{img/sigmod23/sampling} \caption{\textbf{Overview of the multiple-shard sampling query process} for Example~\ref{ex:sample} with $k=1000$. First, (1) the normalized weights of the shards is determined, then (2) these weights are used to construct an alias structure. Next, (3) the alias structure is queried $k$ times to determine per shard sample sizes, and then (4) sampling is performed. Finally, (5) any rejected samples are retried starting from the alias structure, and the process is repeated until the desired number of samples has been retrieved.} \label{fig:sample} \end{figure} For many SSIs, sampling queries are completed in two stages. Some preliminary processing is done to identify the range of records from which to sample, and then samples are drawn from that range. For example, IRS over a sorted list of records can be performed by first identifying the upper and lower bounds of the query range in the list, and then sampling records by randomly generating indexes within those bounds. The general cost of a sampling query can be modeled as $P(n) + k S(n)$, where $P(n)$ is the cost of preprocessing, $k$ is the number of samples drawn, and $S(n)$ is the cost of sampling a single record. When sampling from multiple shards, the situation grows more complex. For each sample, the shard to select the record from must first be decided. Consider an arbitrary sampling query $X(D, k)$ asking for a sample set of size $k$ against dataset $D$. The framework splits $D$ across $m$ disjoint shards, such that $D = \bigcup_{i=1}^m D_i$ and $D_i \cap D_j = \emptyset, \forall i,j < m$. The framework must ensure that $X(D, k)$ and $\bigcup_{i=0}^m X(D_i, k_i)$ follow the same distribution, by selecting appropriate values for the $k_i$s. If care is not taken to balance the number of samples drawn from a shard with the total weight of the shard under $X$, then bias can be introduced into the sample set's distribution. The selection of $k_i$s can be viewed as an instance of WSS, and solved using the alias method. When sampling using the framework, first the weight of each shard under the sampling query is determined and a \emph{shard alias structure} built over these weights. Then, for each sample, the shard alias is used to determine the shard from which to draw the sample. Let $W(n)$ be the cost of determining this total weight for a single shard under the query. The initial setup cost, prior to drawing any samples, will be $O\left([W(n) + P(n)]\log_s n\right)$, as the preliminary work for sampling from each shard must be performed, as well as weights determined and alias structure constructed. In many cases, however, the preliminary work will also determine the total weight, and so the relevant operation need only be applied once to accomplish both tasks. To ensure that all records appear in the sample set with the appropriate probability, the mutable buffer itself must also be a valid target for sampling. There are two generally applicable techniques that can be applied for this, both of which can be supported by the framework. The query being sampled from can be directly executed against the buffer and the result set used to build a temporary SSI, which can be sampled from. Alternatively, rejection sampling can be used to sample directly from the buffer, without executing the query. In this case, the total weight of the buffer is used for its entry in the shard alias structure. This can result in the buffer being over-represented in the shard selection process, and so any rejections during buffer sampling must be retried starting from shard selection. These same considerations apply to rejection sampling used against shards, as well. \begin{example} \label{ex:sample} Consider executing a WSS query, with $k=1000$, across three shards containing integer keys with unit weight. $S_1$ contains only the key $-2$, $S_2$ contains all integers on $[1,100]$, and $S_3$ contains all integers on $[101, 200]$. These structures are shown in Figure~\ref{fig:sample}. Sampling is performed by first determining the normalized weights for each shard: $w_1 = 0.005$, $w_2 = 0.4975$, $w_3 = 0.4975$, which are then used to construct a shard alias structure. The shard alias structure is then queried $k$ times, resulting in a distribution of $k_i$s that is commensurate with the relative weights of each shard. Finally, each shard is queried in turn to draw the appropriate number of samples. \end{example} Assuming that rejection sampling is used on the mutable buffer, the worst-case time complexity for drawing $k$ samples from an index containing $n$ elements with a sampling cost of $S(n)$ is, \begin{equation} \label{eq:sample-cost} O\left(\left[W(n) + P(n)\right]\log_s n + kS(n)\right) \end{equation} %If instead a temporary SSI is constructed, the cost of sampling %becomes: $O\left(N_b + C_c(N_b) + (W(n) + P(n))\log_s n + kS(n)\right)$. \begin{figure} \centering \subfloat[Tombstone Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}}\\ \subfloat[Tagging Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}} \caption{\textbf{Overview of the rejection check procedure for deleted records.} First, a record is sampled (1). When using the tombstone delete policy (Figure~\ref{fig:delete-tombstone}), the rejection check starts by (2) querying the bloom filter of the mutable buffer. The filter indicates the record is not present, so (3) the filter on $L_0$ is queried next. This filter returns a false positive, so (4) a point-lookup is executed against $L_0$. The lookup fails to find a tombstone, so the search continues and (5) the filter on $L_1$ is checked, which reports that the tombstone is present. This time, it is not a false positive, and so (6) a lookup against $L_1$ (7) locates the tombstone. The record is thus rejected. When using the tagging policy (Figure~\ref{fig:delete-tag}), (1) the record is sampled and (2) checked directly for the delete tag. It is set, so the record is immediately rejected.} \label{fig:delete} \end{figure} \subsection{Deletion} \label{ssec:delete} Because the shards are static, records cannot be arbitrarily removed from them. This requires that deletes be supported in some other way, with the ultimate goal being the prevention of deleted records' appearance in sampling query result sets. This can be realized in two ways: locating the record and marking it, or inserting a new record which indicates that an existing record should be treated as deleted. The framework supports both of these techniques, the selection of which is called the \emph{delete policy}. The former policy is called \emph{tagging} and the latter \emph{tombstone}. Tagging a record is straightforward. Point-lookups are performed against each shard in the index, as well as the buffer, for the record to be deleted. When it is found, a bit in a header attached to the record is set. When sampling, any records selected with this bit set are automatically rejected. Tombstones represent a lazy strategy for deleting records. When a record is deleted using tombstones, a new record with identical key and value, but with a ``tombstone'' bit set, is inserted into the index. A record's presence can be checked by performing a point-lookup. If a tombstone with the same key and value exists above the record in the index, then it should be rejected when sampled. Two important aspects of performance are pertinent when discussing deletes: the cost of the delete operation, and the cost of verifying the presence of a sampled record. The choice of delete policy represents a trade-off between these two costs. Beyond this simple trade-off, the delete policy also has other implications that can affect its applicability to certain types of SSI. Most notably, tombstones do not require any in-place updating of records, whereas tagging does. This means that using tombstones is the only way to ensure total immutability of the data within shards, which avoids random writes and eases concurrency control. The tombstone delete policy, then, is particularly appealing in external and concurrent contexts. \Paragraph{Deletion Cost.} The cost of a delete under the tombstone policy is the same as an ordinary insert. Tagging, by contrast, requires a point-lookup of the record to be deleted, and so is more expensive. Assuming a point-lookup operation with cost $L(n)$, a tagged delete must search each level in the index, as well as the buffer, requiring $O\left(N_b + L(n)\log_s n\right)$ time. \Paragraph{Rejection Check Costs.} In addition to the cost of the delete itself, the delete policy affects the cost of determining if a given record has been deleted. This is called the \emph{rejection check cost}, $R(n)$. When using tagging, the information necessary to make the rejection decision is local to the sampled record, and so $R(n) \in O(1)$. However, when using tombstones it is not; a point-lookup must be performed to search for a given record's corresponding tombstone. This look-up must examine the buffer, and each shard within the index. This results in a rejection check cost of $R(n) \in O\left(N_b + L(n) \log_s n\right)$. The rejection check process for the two delete policies is summarized in Figure~\ref{fig:delete}. Two factors contribute to the tombstone rejection check cost: the size of the buffer, and the cost of performing a point-lookup against the shards. The latter cost can be controlled using the framework's ability to associate auxiliary structures with shards. For SSIs which do not support efficient point-lookups, a hash table can be added to map key-value pairs to their location within the SSI. This allows for constant-time rejection checks, even in situations where the index would not otherwise support them. However, the storage cost of this intervention is high, and in situations where the SSI does support efficient point-lookups, it is not necessary. Further performance improvements can be achieved by noting that the probability of a given record having an associated tombstone in any particular shard is relatively small. This means that many point-lookups will be executed against shards that do not contain the tombstone being searched for. In this case, these unnecessary lookups can be partially avoided using Bloom filters~\cite{bloom70} for tombstones. By inserting tombstones into these filters during reconstruction, point-lookups against some shards which do not contain the tombstone being searched for can be bypassed. Filters can be attached to the buffer as well, which may be even more significant due to the linear cost of scanning it. As the goal is a reduction of rejection check costs, these filters need only be populated with tombstones. In a later section, techniques for bounding the number of tombstones on a given level are discussed, which will allow for the memory usage of these filters to be tightly controlled while still ensuring precise bounds on filter error. \Paragraph{Sampling with Deletes.} The addition of deletes to the framework alters the analysis of sampling costs. A record that has been deleted cannot be present in the sample set, and therefore the presence of each sampled record must be verified. If a record has been deleted, it must be rejected. When retrying samples rejected due to delete, the process must restart from shard selection, as deleted records may be counted in the weight totals used to construct that structure. This increases the cost of sampling to, \begin{equation} \label{eq:sampling-cost} O\left([W(n) + P(n)]\log_s n + \frac{kS(n)}{1 - \mathbf{Pr}[\text{rejection}]} \cdot R(n)\right) \end{equation} where $R(n)$ is the cost of checking if a sampled record has been deleted, and $\nicefrac{k}{1 -\mathbf{Pr}[\text{rejection}]}$ is the expected number of sampling attempts required to obtain $k$ samples, given a fixed rejection probability. The rejection probability itself is a function of the workload, and is unbounded. \Paragraph{Bounding the Rejection Probability.} Rejections during sampling constitute wasted memory accesses and random number generations, and so steps should be taken to minimize their frequency. The probability of a rejection is directly related to the number of deleted records, which is itself a function of workload and dataset. This means that, without building counter-measures into the framework, tight bounds on sampling performance cannot be provided in the presence of deleted records. It is therefore critical that the framework support some method for bounding the number of deleted records within the index. While the static nature of shards prevents the direct removal of records at the moment they are deleted, it doesn't prevent the removal of records during reconstruction. When using tagging, all tagged records encountered during reconstruction can be removed. When using tombstones, however, the removal process is non-trivial. In principle, a rejection check could be performed for each record encountered during reconstruction, but this would increase reconstruction costs and introduce a new problem of tracking tombstones associated with records that have been removed. Instead, a lazier approach can be used: delaying removal until a tombstone and its associated record participate in the same shard reconstruction. This delay allows both the record and its tombstone to be removed at the same time, an approach called \emph{tombstone cancellation}. In general, this can be implemented using an extra linear scan of the input shards before reconstruction to identify tombstones and associated records for cancellation, but potential optimizations exist for many SSIs, allowing it to be performed during the reconstruction itself at no extra cost. The removal of deleted records passively during reconstruction is not enough to bound the number of deleted records within the index. It is not difficult to envision pathological scenarios where deletes result in unbounded rejection rates, even with this mitigation in place. However, the dropping of deleted records does provide a useful property: any specific deleted record will eventually be removed from the index after a finite number of reconstructions. Using this fact, a bound on the number of deleted records can be enforced. A new parameter, $\delta$, is defined, representing the maximum proportion of deleted records within the index. Each level, and the buffer, tracks the number of deleted records it contains by counting its tagged records or tombstones. Following each buffer flush, the proportion of deleted records is checked against $\delta$. If any level is found to exceed it, then a proactive reconstruction is triggered, pushing its shards down into the next level. The process is repeated until all levels respect the bound, allowing the number of deleted records to be precisely controlled, which, by extension, bounds the rejection rate. This process is called \emph{compaction}. Assuming every record is equally likely to be sampled, this new bound can be applied to the analysis of sampling costs. The probability of a record being rejected is $\mathbf{Pr}[\text{rejection}] = \delta$. Applying this result to Equation~\ref{eq:sampling-cost} yields, \begin{equation} %\label{eq:sampling-cost-del} O\left([W(n) + P(n)]\log_s n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right) \end{equation} Asymptotically, this proactive compaction does not alter the analysis of insertion costs. Each record is still written at most $s$ times on each level, there are at most $\log_s n$ levels, and the buffer insertion and SSI construction costs are all unchanged, and so on. This results in the amortized insertion cost remaining the same. This compaction strategy is based upon tombstone and record counts, and the bounds assume that every record is equally likely to be sampled. For certain sampling problems (such as WSS), there are other conditions that must be considered to provide a bound on the rejection rate. To account for these situations in a general fashion, the framework supports problem-specific compaction triggers that can be tailored to the SSI being used. These allow compactions to be triggered based on other properties, such as rejection rate of a level, weight of deleted records, and the like. \subsection{Trade-offs on Framework Design Space} \label{ssec:design-space} The framework has several tunable parameters, allowing it to be tailored for specific applications. This design space contains trade-offs among three major performance characteristics: update cost, sampling cost, and auxiliary memory usage. The two most significant decisions when implementing this framework are the selection of the layout and delete policies. The asymptotic analysis of the previous sections obscures some of the differences between these policies, but they do have significant practical performance implications. \Paragraph{Layout Policy.} The choice of layout policy represents a clear trade-off between update and sampling performance. Leveling results in fewer shards of larger size, whereas tiering results in a larger number of smaller shards. As a result, leveling reduces the costs associated with point-lookups and sampling query preprocessing by a constant factor, compared to tiering. However, it results in more write amplification: a given record may be involved in up to $s$ reconstructions on a single level, as opposed to the single reconstruction per level under tiering. \Paragraph{Delete Policy.} There is a trade-off between delete performance and sampling performance that exists in the choice of delete policy. Tagging requires a point-lookup when performing a delete, which is more expensive than the insert required by tombstones. However, it also allows constant-time rejection checks, unlike tombstones which require a point-lookup of each sampled record. In situations where deletes are common and write-throughput is critical, tombstones may be more useful. Tombstones are also ideal in situations where immutability is required, or random writes must be avoided. Generally speaking, however, tagging is superior when using SSIs that support it, because sampling rejection checks will usually be more common than deletes. \Paragraph{Mutable Buffer Capacity and Scale Factor.} The mutable buffer capacity and scale factor both influence the number of levels within the index, and by extension the number of distinct shards. Sampling and point-lookups have better performance with fewer shards. Smaller shards are also faster to reconstruct, although the same adjustments that reduce shard size also result in a larger number of reconstructions, so the trade-off here is less clear. The scale factor has an interesting interaction with the layout policy: when using leveling, the scale factor directly controls the amount of write amplification per level. Larger scale factors mean more time is spent reconstructing shards on a level, reducing update performance. Tiering does not have this problem and should see its update performance benefit directly from a larger scale factor, as this reduces the number of reconstructions. The buffer capacity also influences the number of levels, but is more significant in its effects on point-lookup performance: a lookup must perform a linear scan of the buffer. Likewise, the unstructured nature of the buffer also will contribute negatively towards sampling performance, irrespective of which buffer sampling technique is used. As a result, although a large buffer will reduce the number of shards, it will also hurt sampling and delete (under tagging) performance. It is important to minimize the cost of these buffer scans, and so it is preferable to keep the buffer small, ideally small enough to fit within the CPU's L2 cache. The number of shards within the index is, then, better controlled by changing the scale factor, rather than the buffer capacity. Using a smaller buffer will result in more compactions and shard reconstructions; however, the empirical evaluation in Section~\ref{ssec:ds-exp} demonstrates that this is not a serious performance problem when a scale factor is chosen appropriately. When the shards are in memory, frequent small reconstructions do not have a significant performance penalty compared to less frequent, larger ones. \Paragraph{Auxiliary Structures.} The framework's support for arbitrary auxiliary data structures allows for memory to be traded in exchange for insertion or sampling performance. The use of Bloom filters for accelerating tombstone rejection checks has already been discussed, but many other options exist. Bloom filters could also be used to accelerate point-lookups for delete tagging, though such filters would require much more memory than tombstone-only ones to be effective. An auxiliary hash table could be used for accelerating point-lookups, or range filters like SuRF \cite{zhang18} or Rosetta \cite{siqiang20} added to accelerate pre-processing for range queries like in IRS or WIRS.