From f1fcf8426764b2e8fc8de08a6d74968d2fbc1b27 Mon Sep 17 00:00:00 2001 From: Douglas Rumbaugh Date: Tue, 6 May 2025 17:41:03 -0400 Subject: Updates to chapter 3 --- chapters/sigmod23/framework.tex | 538 +++++++++++++++++++++++----------------- 1 file changed, 314 insertions(+), 224 deletions(-) (limited to 'chapters/sigmod23/framework.tex') diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex index c878d93..89f15c3 100644 --- a/chapters/sigmod23/framework.tex +++ b/chapters/sigmod23/framework.tex @@ -16,7 +16,32 @@ there in the context of IRS apply equally to the other sampling problems considered in this chapter. In this section, we will discuss approaches for resolving these problems. + +\begin{table}[t] +\centering + +\begin{tabular}{|l l|} + \hline + \textbf{Variable} & \textbf{Description} \\ \hline + $N_b$ & Capacity of the mutable buffer \\ \hline + $s$ & Scale factor \\ \hline + $B_c(n)$ & SSI construction cost from unsorted records \\ \hline + $B_r(n)$ & SSI reconstruction cost from existing SSI instances\\ \hline + $L(n)$ & SSI point-lookup cost \\ \hline + $P(n)$ & SSI sampling pre-processing cost \\ \hline + $S(n)$ & SSI per-sample sampling cost \\ \hline + $W(n)$ & SSI weight determination cost \\ \hline + $R(n)$ & Rejection check cost \\ \hline + $\delta$ & Maximum delete proportion \\ \hline +\end{tabular} +\label{tab:nomen} + +\caption{\textbf{Nomenclature.} A reference of variables and functions +used in this chapter.} +\end{table} + \subsection{Sampling over Decomposed Structures} +\label{ssec:decomposed-structure-sampling} The core problem facing any attempt to dynamize SSIs is that independently sampling from a decomposed structure is difficult. As discussed in @@ -266,6 +291,7 @@ contexts. \subsubsection{Deletion Cost} +\label{ssec:sampling-deletes} We will first consider the cost of performing a delete using either mechanism. @@ -314,8 +340,8 @@ cases, the same procedure as above can be used, with $L(n) \in \Theta(1)$. \begin{figure} \centering - \subfloat[Tombstone Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}}\\ - \subfloat[Tagging Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}} + \subfloat[Tombstone Rejection Check]{\includegraphics[width=.5\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}} + \subfloat[Tagging Rejection Check]{\includegraphics[width=.5\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}} \caption{\textbf{Overview of the rejection check procedure for deleted records.} First, a record is sampled (1). @@ -456,6 +482,7 @@ the lifetime of the structure. Preemptive compaction does not increase the number of reconstructions, only \emph{when} they occur. \subsubsection{Sampling Procedure with Deletes} +\label{ssec:sampling-with-deletes} Because sampling is neither deletion decomposable nor invertible, the presence of deletes will have an effect on the query costs. As @@ -486,244 +513,307 @@ be taken to obtain a sample set of size $k$. \subsection{Performance Tuning and Configuration} -\subsubsection{LSM Tree Imports} -\subsection{Insertion} -\label{ssec:insert} -The framework supports inserting new records by first appending them to the end -of the mutable buffer. When it is full, the buffer is flushed into a sequence -of levels containing shards of increasing capacity, using a procedure -determined by the layout policy as discussed in Section~\ref{sec:framework}. -This method allows for the cost of repeated shard reconstruction to be -effectively amortized. - -Let the cost of constructing the SSI from an arbitrary set of $n$ records be -$C_c(n)$ and the cost of reconstructing the SSI given two or more shards -containing $n$ records in total be $C_r(n)$. The cost of an insert is composed -of three parts: appending to the mutable buffer, constructing a new -shard from the buffered records during a flush, and the total cost of -reconstructing shards containing the record over the lifetime of the index. The -cost of appending to the mutable buffer is constant, and the cost of constructing a -shard from the buffer can be amortized across the records participating in the -buffer flush, giving $\nicefrac{C_c(N_b)}{N_b}$. These costs are paid exactly once for -each record. To derive an expression for the cost of repeated reconstruction, -first note that each record will participate in at most $s$ reconstructions on -a given level, resulting in a worst-case amortized cost of $O\left(s\cdot -\nicefrac{C_r(n)}{n}\right)$ paid per level. The index itself will contain at most -$\log_s n$ levels. Thus, over the lifetime of the index a given record -will pay $O\left(s\cdot \nicefrac{C_r(n)}{n}\log_s n\right)$ cost in repeated -reconstruction. - -Combining these results, the total amortized insertion cost is -\begin{equation} -O\left(\frac{C_c(N_b)}{N_b} + s \cdot \frac{C_r(n)}{n} \log_s n\right) -\end{equation} -This can be simplified by noting that $s$ is constant, and that $N_b \ll n$ and also -a constant. By neglecting these terms, the amortized insertion cost of the -framework is, -\begin{equation} -O\left(\frac{C_r(n)}{n}\log_s n\right) -\end{equation} -\captionsetup[subfloat]{justification=centering} +The final of the desiderata referenced earlier in this chapter for our +dynamized sampling indices is having tunable performance. The base +Bentley-Saxe method has a highly rigid reconstruction policy that, +while theoretically convenient, does not lend itself to performance +tuning. However, it can be readily modified to form a more relaxed policy +that is both tunable, and generally more performant, at the cost of some +additional theoretical complexity. There has been some theoretical work +in this area, based upon nesting instances of the equal block method +within the Bentley-Saxe method~\cite{overmars81}, but these methods are +unwieldy and are targetted at tuning the worst-case at the expense of the +common case. We will take a different approach to adding configurability +to our dynamization system. + +Though it has thus far gone unmentioned, readers familiar with LSM Trees +may have noted the astonishing similarity between decomposition-based +dynamization techniques, and a data structure called the Log-structured +Merge-tree. First proposed by O'Neil in the mid '90s\cite{oneil96}, +the LSM Tree was designed to optmize write throughout for external data +structures. It accomplished this task by buffer inserted records in a +small in-memory AVL Tree, and then flushing this buffer to disk when +it filled up. The flush process itself would fully rebuild the on-disk +structure (a B+Tree), including all of the currently existing records +on external storage. O'Neil also proposed version which used several, +layered, external structures, to reduce the cost of reconstruction. + +In more recent times, the LSM Tree has seen significant development and +been used as the basis for key-value stores like RocksDB~\cite{dong21} +and LevelDB~\cite{leveldb}. This work as produced an incredibly large +and well explored parameterization of the reconstruction procedures of +LSM Trees, a good summary of which can be bound in this recent tutorial +paper~\cite{sarkar23}. Examples of this design space exploration include: +different ways to organize each "level" of the tree~\cite{dayan19, +dostoevsky, autumn}, different growth rates, buffering, sub-partioning +of structures to allow finer-grained reconstruction~\cite{dayan22}, and +approaches for allocating resources to auxilliary structures attached to +the main ones for accelerating certain types of query~\cite{dayan18-1, +zhu21, monkey}. + +Many of the elements within the LSM Tree design space are based upon the +specifics of the data structure itself, and are not generally applicable. +However, some of the higher-level concepts can be imported and applied in +the context of dynamization. Specifically, we have decided to import the +following four elements for use in our dynamization technique, +\begin{itemize} + \item A small dynamic buffer into which new records are inserted + \item A variable growth rate, called as \emph{scale factor} + \item The ability to attach auxilliary structures to each block + \item Two different strategies for reconstructing data structures +\end{itemize} +This design space and its associated trade-offs will be discussed in +more detail in Chapter~\ref{chap:design-space}, but we'll describe it +briefly here. + +\Paragraph{Buffering.} In the standard Bentley-Saxe method, each +insert triggers a reconstruction. Many of these are quite small, but +it still makes most insertions somewhat expensive. By adding a small +buffer, a large number of inserts can be performed without requiring +any reconstructions at all. For generality, we elected to use an +unsorted array as our buffer, as dynamic versions of the structures +we are dynamizing may not exist. This introduces some query cost, as +queries must be answered from these unsorted records as well, but in +the case of sampling this isn't a serious problem. The implications of +this will be discussed in Section~\ref{ssec:sampling-cost-funcs}. The +size of this buffer, $N_B$ is a user-specified constant, and all block +capacities are multiplied by it. In the Bentley-Saxe method, the $i$th +block contains $2^i$ records. In our scheme, with buffering, this becomes +$N_B \cdot 2^i$ records in the $i$th block. We call this unsorted array +the \emph{mutable buffer}. + +\Paragraph{Scale Factor.} In the Bentley-Saxe method, each block is +twice as large as the block the preceeds it There is, however, no reason +why this growth rate couldn't be adjusted. In our system, we make the +growth rate a user-specified constant called the \emph{scale factor}, +$s$, such that the $i$th level contains $N_B \cdot s^i$ records. + +\Paragraph{Auxilliary Structures.} In Section~\ref{ssec:sampling-deletes}, +we encountered two problems relating to supporting deletes that can be +resolved through the use of auxilliary structures. First, regardless +of whether tagging or tombstones are used, the data structure requires +support for an efficient point-lookup operation. Many SSIs are tree-based +and thus support this, but not all data structures do. In such cases, +the point-lookup operation could be provided by attaching an auxilliary +hash table to the data structure that maps records to their location in +the SSI. We use term \emph{shard} to refer to the combination of a +block with these optional auxilliary structures. + +In addition, the tombstone deletion mechanism requires performing a point +lookup for every record sampled, to validate that it has not been deleted. +This introduces a large amount of overhead into the sampling process, +as this requires searching each block in the structure. One approach +that can be used to help improve the performance of these searches, +without requiring as much storage as adding auxilliary hash tables to +every block, is to include bloom filters~\cite{bloom70}. A bloom filter +is an approximate data structure that answers tests of set membership +with bounded, single-sided error. These are commonly used in LSM Trees +to accelerate point lookups by allowing levels that don't contain the +record being searched for to be skipped. In our case, we only care about +tombstone records, so rather than building these filters over all records, +we can build them over tombstones. This approach can greatly improve +the sampling performance of the structure when tombstone deletes are used. + +\Paragraph{Layout Policy.} The Bentley-Saxe method considers blocks +individually, without any other organization beyond increasing size. In +contrast, LSM Trees have multiple layers of structural organization. The +top level structure is a level, upon which record capacity restrictions +are applied. These levels are then partitioned into individual structures, +which can be further organized by key range. Because our intention is to +support general data structures, which may or may not be easily partition +by a key, we will not consider the finest grain of partitioning. However, +we can borrow the concept of levels, and lay out shards in these levels +according to different strategies. + +Specifically, we consider two layout policies. First, we can allow a +single shard per level, a policy called \emph{Leveling}. This approach +is traditionally read optimized, as it generally results in fewer shards +within the overall structure for a given scale factor. Under leveling, +the $i$th level has a capacity of $N_B \cdot s^{i+1}$ records. We can +also allow multiple shards per level, resulting in a write-optimized +policy called \emph{Tiering}. In tiering, each level can hold up to $s$ +shards, each with up to $N_B \cdot s^i$ records. Note that this doesn't +alter the overall record capacity of each level relative to leveling, +only the way the records are divided up into shards. + +\section{Practical Dynamization Framework} + +Based upon the results discussed in the previous section, we are now ready +to discuss the dynamization framework that we have produced for adding +update support to SSIs. This framework allows us to achieve all three +of our desiderata, at least for certain configurations, and provides a +wide range of performance tuning options to the user. + +\subsection{Requirements} + +The requirements that the framework places upon SSIs are rather +modest. The sampling problem being considered must be a decomposable +sampling problem (Definition \ref{def:decomp-sampling}) and the SSI must +support the \texttt{build} and \texttt{unbuild} operations. Optionally, +if the SSI supports point lookups or if the SSI can be constructed +from multiple instances of the SSI more efficiently than its normal +static construction, these two operations can be leveraged by the +framework. However, these are not requirements, as the framework provides +facilities to work around their absence. + +\captionsetup[subfloat]{justification=centering} \begin{figure*} \centering - \subfloat[Leveling]{\includegraphics[width=.75\textwidth]{img/sigmod23/merge-leveling} \label{fig:leveling}}\\ - \subfloat[Tiering]{\includegraphics[width=.75\textwidth]{img/sigmod23/merge-tiering} \label{fig:tiering}} + \subfloat[Leveling]{\includegraphics[width=.5\textwidth]{img/sigmod23/merge-leveling} \label{fig:leveling}} + \subfloat[Tiering]{\includegraphics[width=.5\textwidth]{img/sigmod23/merge-tiering} \label{fig:tiering}} - \caption{\textbf{A graphical overview of the sampling framework and its insert procedure.} A + \caption{\textbf{A graphical overview of our dynamization framework.} A mutable buffer (MB) sits atop two levels (L0, L1) containing shards (pairs of SSIs and auxiliary structures [A]) using the leveling (Figure~\ref{fig:leveling}) and tiering (Figure~\ref{fig:tiering}) layout policies. Records are represented as black/colored squares, and grey squares represent unused capacity. An insertion requiring a multi-level - reconstruction is illustrated.} \label{fig:framework} + reconstruction is illustrated.} \label{fig:sampling-framework} \end{figure*} -\section{Framework Implementation} - -Our framework has been designed to work efficiently with any SSI, so long -as it has the following properties. - +\subsection{Framework Construction} + +The framework itself is shown in Figure~\ref{fig:sampling-framework}, +along with some of its configuration parameters and its insert procedure +(which will be discussed in the next section). It consists of an unsorted +array of size $N_B$ records, sitting atop a sequence of \emph{levels}, +each containing SSIs according to the layout policy. If leveling +is used, each level will contain a single SSI with up to $N_B \cdot +s^{i+1}$ records. If tiering is used, each level will contain up to +$s$ SSIs, each with up to $N_B \cdot s^i$ records. The scale factor, +$s$, controls the rate at which the capacity of each level grows. The +framework supports deletes using either the tombstone or tagging policy, +which can be selected by the user acccording to her preference. To support +these delete mechanisms, each record contains an attached header with +bits to indicate its tombstone or delete status. + +\subsection{Supported Operations and Cost Functions} +\Paragraph{Insert.} Inserting a record into the dynamization involves +appending it to the mutable buffer, which requires $\Theta(1)$ time. When +the buffer reaches its capacity, it must be flushed into the structure +itself before any further records can be inserted. First, a shard will be +constructed from the records in the buffer using the SSI's \texttt{build} +operation, with $B(N_B)$ cost. This shard will then be merged into the +levels below it, which may require further reconstructions to occur to +make room. The manner in which these reconstructions proceed follows the +selection of layout policy, +\begin{itemize} +\item[\textbf{Leveling}] When a buffer flush occurs in the leveling +policy, the system scans the existing levels to find the first level +which has sufficient empty space to store the contents of the level above +it. More formally, if the number of records in level $i$ is $N_i$, then +$i$ is determined such that $N_i + N_B\cdot s^{i} <= N_B \cdot s^{i+1}$. +If no level exists that satisfies the record count constraint, then an +empty level is added and $i$ is set to the index of this new level. Then, +a reconstruction is executed containing all of the records in levels $i$ +and $i - 1$ (where $i=-1$ indicates the temporary shard built from the +buffer). Following this reconstruction, all levels $j < i$ are shifted +by one level. +\item[\textbf{Tiering}] When using tiering, the system will locate +the first level, $i$, containing fewer than $s$ shards. If no such +level exists, then a new empty level is added and $i$ is set to the +index of that level. Then, for each level $j < i$, a reconstruction +is performed involving all $s$ shards on that level. The resulting new +shard will then be placed into the level at $j + 1$ and $j$ will be +emptied. Following this, the newly created shard from the buffer will +be appended to level $0$. +\end{itemize} + +In either case, the reconstructions all use instances of the shard as +input, and so if the SSI supports more efficient construction in this case +(with $B_M(n)$ cost), then this routine can be used here. Once all of +the necessary reconstructions have been performed, each level is checked +to verify that the proportion of tombstones or deleted records is less +than $\delta$. If this condition fails, then a proactive compaction is +triggered. This compaction involves doing the reconstructions necessary +to move the shard violating the delete bound down one level. Once the +compaction is complete, the delete proportions are checked again, and +this process is repeated until all levels satisfy the bound. + +Following this procedure, inserts have a worst case cost of $I \in +\Theta(B_M(n))$, equivalent to Bently-Saxe. The amortized cost can be +determined by finding the total cost of reconstructions involving each +record and amortizing it over each insert. The cost of the insert is +composed of three parts, \begin{enumerate} - \item The underlying full query $Q$ supported by the SSI from whose results - samples are drawn satisfies the following property: - for any dataset $D = \cup_{i = 1}^{n}D_i$ - where $D_i \cap D_j = \emptyset$, $Q(D) = \cup_{i = 1}^{n}Q(D_i)$. - \item \emph{(Optional)} The SSI supports efficient point-lookups. - \item \emph{(Optional)} The SSI is capable of efficiently reporting the total weight of all records - returned by the underlying full query. +\item The cost of appending to the buffer +\item The cost of flushing the buffer to a shard +\item The total cost of the reconstructions the record is involved + in over the lifetime of the structure \end{enumerate} +The first cost is constant and the second is $B(N_B)$. Regardless of +layout policy, there will be $\Theta(\log_s(n))$ total levels, and +the record will, at worst, be written a constant number of times to +each level, resulting in a maximum of $\Theta(\log_s(n)B_M(n))$ cost +associated with these reconstructions. Thus, the total cost associated +with each record in the structure is, +\begin{equation*} +\Theta(1) + \Theta(B(N_B)) + \Theta(\log_s(n)B_M(n)) +\end{equation*} +Assuming that $N_B \ll n$, the first two terms of this expression are +constant. Dropping them and amortizing the result over $n$ records give +us the amortized insertion cost, +\begin{equation*} +I_a(n) \in \Theta\left(\frac{B_M(n)}{n}\log_s(n)\right) +\end{equation*} +If the SSI being considered does not support a more efficient +construction procedure from other instances of the same SSI, and +the general Bentley-Saxe \texttt{unbuild} and \texttt{build} +operations must be used, the the cost becomes $I_a(n) \in +\Theta\left(\frac{B(n)}{n}\log_s(n)\right)$ instead. + +\Paragraph{Delete.} The framework supports both tombstone and tagged +deletes, each with different performance. Using tombstones, the cost +of a delete is identical to that of an insert. When using tagging, the +cost of a delete is the same as cost of doing a point lookup, as the +"delete" itself is simply setting a bit in the header of the record, +once it has been located. There will be $\Theta(\log_s n)$ total shards +in the structure, each with a look-up cost of $L(n)$ using either the +SSI's native point-lookup, or an auxilliary hash table, and the lookup +must also scan the buffer in $\Theta(N_B)$ time. Thus, the worst-case +cost of a tagged delete is, +\begin{equation*} +D(n) = \Theta(N_B + L(n)\log_s(n)) +\end{equation*} -The first property applies to the query being sampled from, and is essential -for the correctness of sample sets reported by extended sampling -indexes.\footnote{ This condition is stricter than the definition of a -decomposable search problem in the Bentley-Saxe method, which allows for -\emph{any} constant-time merge operation, not just union. -However, this condition is satisfied by many common types of database -query, such as predicate-based filtering queries.} The latter two properties -are optional, but reduce deletion and sampling costs respectively. Should the -SSI fail to support point-lookups, an auxiliary hash table can be attached to -the data structures. -Should it fail to support query result weight reporting, rejection -sampling can be used in place of the more efficient scheme discussed in -Section~\ref{ssec:sample}. The analysis of this framework will generally -assume that all three conditions are satisfied. - -Given an SSI with these properties, a dynamic extension can be produced as -shown in Figure~\ref{fig:framework}. The extended index consists of disjoint -shards containing an instance of the SSI being extended, and optional auxiliary -data structures. The auxiliary structures allow acceleration of certain -operations that are required by the framework, but which the SSI being extended -does not itself support efficiently. Examples of possible auxiliary structures -include hash tables, Bloom filters~\cite{bloom70}, and range -filters~\cite{zhang18,siqiang20}. The shards are arranged into levels of -increasing record capacity, with either one shard, or up to a fixed maximum -number of shards, per level. The decision to place one or many shards per level -is called the \emph{layout policy}. The policy names are borrowed from the -literature on the LSM tree, with the former called \emph{leveling} and the -latter called \emph{tiering}. - -To avoid a reconstruction on every insert, an unsorted array of fixed capacity -($N_b$), called the \emph{mutable buffer}, is used to buffer updates. Because it is -unsorted, it is kept small to maintain reasonably efficient sampling -and point-lookup performance. All updates are performed by appending new -records to the tail of this buffer. -If a record currently within the index is -to be updated to a new value, it must first be deleted, and then a record with -the new value inserted. This ensures that old versions of records are properly -filtered from query results. - -When the buffer is full, it is flushed to make room for new records. The -flushing procedure is based on the layout policy in use. When using leveling -(Figure~\ref{fig:leveling}) a new SSI is constructed using both the records in -$L_0$ and those in the buffer. This is used to create a new shard, which -replaces the one previously in $L_0$. When using tiering -(Figure~\ref{fig:tiering}) a new shard is built using only the records from the -buffer, and placed into $L_0$ without altering the existing shards. Each level -has a record capacity of $N_b \cdot s^{i+1}$, controlled by a configurable -parameter, $s$, called the scale factor. Records are organized in one large -shard under leveling, or in $s$ shards of $N_b \cdot s^i$ capacity each under -tiering. When a level reaches its capacity, it must be emptied to make room for -the records flushed into it. This is accomplished by moving its records down to -the next level of the index. Under leveling, this requires constructing a new -shard containing all records from both the source and target levels, and -placing this shard into the target, leaving the source empty. Under tiering, -the shards in the source level are combined into a single new shard that is -placed into the target level. Should the target be full, it is first emptied by -applying the same procedure. New empty levels -are dynamically added as necessary to accommodate these reconstructions. -Note that shard reconstructions are not necessarily performed using -merging, though merging can be used as an optimization of the reconstruction -procedure where such an algorithm exists. In general, reconstruction requires -only pooling the records of the shards being combined and then applying the SSI's -standard construction algorithm to this set of records. +\Paragraph{Update.} Given the above definitions of insert and delete, +in-place updates of records can be supported by first deleting the record +to be updated, and then inserting the updated value as a new record. Thus, +the update cost is $\Theta(I(n) + D(n))$. + +\Paragraph{Sampling.} Answering sampling queries from this structure is +largely the same as was discussed for a standard Bentley-Saxe dynamization +in Section~\ref{ssec:sampling-with-deletes} with the addition of a need +to sample from the unsorted buffer as well. There are two approaches +for sampling from the buffer. The most general approach would be to +temporarily build an SSI over the records within the buffer, and then +treat this is a normal shard for the remainder of the sampling procedure. +In this case, the sampling algorithm remains indentical to the algorithm +discussed in Section~\ref{ssec:decomposed-structure-sampling}, following +the construction of the temporary shard. This results in a worst-case +sampling cost of, +\begin{equation*} + \mathscr{Q}(n, k) = \Theta\left(B(N_B) + [W(n) + P(n)]\log_2 n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right) +\end{equation*} -\begin{table}[t] -\caption{Frequently Used Notation} -\centering +In practice, however, it is often possible to perform rejection sampling +against the buffer, without needing to do any additional work to prepare +it. In this case, the full weight of the buffer can be used to determine +how many samples to draw from it, and then these samples can be obtained +using standard rejection sampling to both control the weight, and enforce +any necessary predicates. Because $N_B \ll n$, this procedure will not +introduce anything more than constant overhead in the sampling process as +the probability of sampling from the buffer is quite low, and the cost of +doing so is constant, and so the overall query cost when rejection sampling +is possible is, -\begin{tabular}{|p{2.5cm} p{5cm}|} - \hline - \textbf{Variable} & \textbf{Description} \\ \hline - $N_b$ & Capacity of the mutable buffer \\ \hline - $s$ & Scale factor \\ \hline - $C_c(n)$ & SSI initial construction cost \\ \hline - $C_r(n)$ & SSI reconstruction cost \\ \hline - $L(n)$ & SSI point-lookup cost \\ \hline - $P(n)$ & SSI sampling pre-processing cost \\ \hline - $S(n)$ & SSI per-sample sampling cost \\ \hline - $W(n)$ & Shard weight determination cost \\ \hline - $R(n)$ & Shard rejection check cost \\ \hline - $\delta$ & Maximum delete proportion \\ \hline - %$\rho$ & Maximum rejection rate \\ \hline -\end{tabular} -\label{tab:nomen} - -\end{table} +\begin{equation*} + \mathscr{Q}(n, k) = \Theta\left([W(n) + P(n)]\log_2 n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right) +\end{equation*} -Table~\ref{tab:nomen} lists frequently used notation for the various parameters -of the framework, which will be used in the coming analysis of the costs and -trade-offs associated with operations within the framework's design space. The -remainder of this section will discuss the performance characteristics of -insertion into this structure (Section~\ref{ssec:insert}), how it can be used -to correctly answer sampling queries (Section~\ref{ssec:insert}), and efficient -approaches for supporting deletes (Section~\ref{ssec:delete}). Finally, it will -close with a detailed discussion of the trade-offs within the framework's -design space (Section~\ref{ssec:design-space}). - - - - -\subsection{Trade-offs on Framework Design Space} -\label{ssec:design-space} -The framework has several tunable parameters, allowing it to be tailored for -specific applications. This design space contains trade-offs among three major -performance characteristics: update cost, sampling cost, and auxiliary memory -usage. The two most significant decisions when implementing this framework are -the selection of the layout and delete policies. The asymptotic analysis of the -previous sections obscures some of the differences between these policies, but -they do have significant practical performance implications. - -\Paragraph{Layout Policy.} The choice of layout policy represents a clear -trade-off between update and sampling performance. Leveling -results in fewer shards of larger size, whereas tiering results in a larger -number of smaller shards. As a result, leveling reduces the costs associated -with point-lookups and sampling query preprocessing by a constant factor, -compared to tiering. However, it results in more write amplification: a given -record may be involved in up to $s$ reconstructions on a single level, as -opposed to the single reconstruction per level under tiering. - -\Paragraph{Delete Policy.} There is a trade-off between delete performance and -sampling performance that exists in the choice of delete policy. Tagging -requires a point-lookup when performing a delete, which is more expensive than -the insert required by tombstones. However, it also allows constant-time -rejection checks, unlike tombstones which require a point-lookup of each -sampled record. In situations where deletes are common and write-throughput is -critical, tombstones may be more useful. Tombstones are also ideal in -situations where immutability is required, or random writes must be avoided. -Generally speaking, however, tagging is superior when using SSIs that support -it, because sampling rejection checks will usually be more common than deletes. - -\Paragraph{Mutable Buffer Capacity and Scale Factor.} The mutable buffer -capacity and scale factor both influence the number of levels within the index, -and by extension the number of distinct shards. Sampling and point-lookups have -better performance with fewer shards. Smaller shards are also faster to -reconstruct, although the same adjustments that reduce shard size also result -in a larger number of reconstructions, so the trade-off here is less clear. - -The scale factor has an interesting interaction with the layout policy: when -using leveling, the scale factor directly controls the amount of write -amplification per level. Larger scale factors mean more time is spent -reconstructing shards on a level, reducing update performance. Tiering does not -have this problem and should see its update performance benefit directly from a -larger scale factor, as this reduces the number of reconstructions. - -The buffer capacity also influences the number of levels, but is more -significant in its effects on point-lookup performance: a lookup must perform a -linear scan of the buffer. Likewise, the unstructured nature of the buffer also -will contribute negatively towards sampling performance, irrespective of which -buffer sampling technique is used. As a result, although a large buffer will -reduce the number of shards, it will also hurt sampling and delete (under -tagging) performance. It is important to minimize the cost of these buffer -scans, and so it is preferable to keep the buffer small, ideally small enough -to fit within the CPU's L2 cache. The number of shards within the index is, -then, better controlled by changing the scale factor, rather than the buffer -capacity. Using a smaller buffer will result in more compactions and shard -reconstructions; however, the empirical evaluation in Section~\ref{ssec:ds-exp} -demonstrates that this is not a serious performance problem when a scale factor -is chosen appropriately. When the shards are in memory, frequent small -reconstructions do not have a significant performance penalty compared to less -frequent, larger ones. - -\Paragraph{Auxiliary Structures.} The framework's support for arbitrary -auxiliary data structures allows for memory to be traded in exchange for -insertion or sampling performance. The use of Bloom filters for accelerating -tombstone rejection checks has already been discussed, but many other options -exist. Bloom filters could also be used to accelerate point-lookups for delete -tagging, though such filters would require much more memory than tombstone-only -ones to be effective. An auxiliary hash table could be used for accelerating -point-lookups, or range filters like SuRF \cite{zhang18} or Rosetta -\cite{siqiang20} added to accelerate pre-processing for range queries like in -IRS or WIRS. +In both cases, $R(n) \in \Theta(1)$ for tagging deletes, and $R(n) \in +N_B + L(N) \log_s n$ for tombstones (including the cost of searching +the buffer for the tombstone). -- cgit v1.2.3