summaryrefslogtreecommitdiffstats
path: root/chapters/sigmod23/framework.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/sigmod23/framework.tex')
-rw-r--r--chapters/sigmod23/framework.tex538
1 files changed, 314 insertions, 224 deletions
diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex
index c878d93..89f15c3 100644
--- a/chapters/sigmod23/framework.tex
+++ b/chapters/sigmod23/framework.tex
@@ -16,7 +16,32 @@ there in the context of IRS apply equally to the other sampling problems
considered in this chapter. In this section, we will discuss approaches
for resolving these problems.
+
+\begin{table}[t]
+\centering
+
+\begin{tabular}{|l l|}
+ \hline
+ \textbf{Variable} & \textbf{Description} \\ \hline
+ $N_b$ & Capacity of the mutable buffer \\ \hline
+ $s$ & Scale factor \\ \hline
+ $B_c(n)$ & SSI construction cost from unsorted records \\ \hline
+ $B_r(n)$ & SSI reconstruction cost from existing SSI instances\\ \hline
+ $L(n)$ & SSI point-lookup cost \\ \hline
+ $P(n)$ & SSI sampling pre-processing cost \\ \hline
+ $S(n)$ & SSI per-sample sampling cost \\ \hline
+ $W(n)$ & SSI weight determination cost \\ \hline
+ $R(n)$ & Rejection check cost \\ \hline
+ $\delta$ & Maximum delete proportion \\ \hline
+\end{tabular}
+\label{tab:nomen}
+
+\caption{\textbf{Nomenclature.} A reference of variables and functions
+used in this chapter.}
+\end{table}
+
\subsection{Sampling over Decomposed Structures}
+\label{ssec:decomposed-structure-sampling}
The core problem facing any attempt to dynamize SSIs is that independently
sampling from a decomposed structure is difficult. As discussed in
@@ -266,6 +291,7 @@ contexts.
\subsubsection{Deletion Cost}
+\label{ssec:sampling-deletes}
We will first consider the cost of performing a delete using either
mechanism.
@@ -314,8 +340,8 @@ cases, the same procedure as above can be used, with $L(n) \in \Theta(1)$.
\begin{figure}
\centering
- \subfloat[Tombstone Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}}\\
- \subfloat[Tagging Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}}
+ \subfloat[Tombstone Rejection Check]{\includegraphics[width=.5\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}}
+ \subfloat[Tagging Rejection Check]{\includegraphics[width=.5\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}}
\caption{\textbf{Overview of the rejection check procedure for deleted records.} First,
a record is sampled (1).
@@ -456,6 +482,7 @@ the lifetime of the structure. Preemptive compaction does not increase
the number of reconstructions, only \emph{when} they occur.
\subsubsection{Sampling Procedure with Deletes}
+\label{ssec:sampling-with-deletes}
Because sampling is neither deletion decomposable nor invertible,
the presence of deletes will have an effect on the query costs. As
@@ -486,244 +513,307 @@ be taken to obtain a sample set of size $k$.
\subsection{Performance Tuning and Configuration}
-\subsubsection{LSM Tree Imports}
-\subsection{Insertion}
-\label{ssec:insert}
-The framework supports inserting new records by first appending them to the end
-of the mutable buffer. When it is full, the buffer is flushed into a sequence
-of levels containing shards of increasing capacity, using a procedure
-determined by the layout policy as discussed in Section~\ref{sec:framework}.
-This method allows for the cost of repeated shard reconstruction to be
-effectively amortized.
-
-Let the cost of constructing the SSI from an arbitrary set of $n$ records be
-$C_c(n)$ and the cost of reconstructing the SSI given two or more shards
-containing $n$ records in total be $C_r(n)$. The cost of an insert is composed
-of three parts: appending to the mutable buffer, constructing a new
-shard from the buffered records during a flush, and the total cost of
-reconstructing shards containing the record over the lifetime of the index. The
-cost of appending to the mutable buffer is constant, and the cost of constructing a
-shard from the buffer can be amortized across the records participating in the
-buffer flush, giving $\nicefrac{C_c(N_b)}{N_b}$. These costs are paid exactly once for
-each record. To derive an expression for the cost of repeated reconstruction,
-first note that each record will participate in at most $s$ reconstructions on
-a given level, resulting in a worst-case amortized cost of $O\left(s\cdot
-\nicefrac{C_r(n)}{n}\right)$ paid per level. The index itself will contain at most
-$\log_s n$ levels. Thus, over the lifetime of the index a given record
-will pay $O\left(s\cdot \nicefrac{C_r(n)}{n}\log_s n\right)$ cost in repeated
-reconstruction.
-
-Combining these results, the total amortized insertion cost is
-\begin{equation}
-O\left(\frac{C_c(N_b)}{N_b} + s \cdot \frac{C_r(n)}{n} \log_s n\right)
-\end{equation}
-This can be simplified by noting that $s$ is constant, and that $N_b \ll n$ and also
-a constant. By neglecting these terms, the amortized insertion cost of the
-framework is,
-\begin{equation}
-O\left(\frac{C_r(n)}{n}\log_s n\right)
-\end{equation}
-\captionsetup[subfloat]{justification=centering}
+The final of the desiderata referenced earlier in this chapter for our
+dynamized sampling indices is having tunable performance. The base
+Bentley-Saxe method has a highly rigid reconstruction policy that,
+while theoretically convenient, does not lend itself to performance
+tuning. However, it can be readily modified to form a more relaxed policy
+that is both tunable, and generally more performant, at the cost of some
+additional theoretical complexity. There has been some theoretical work
+in this area, based upon nesting instances of the equal block method
+within the Bentley-Saxe method~\cite{overmars81}, but these methods are
+unwieldy and are targetted at tuning the worst-case at the expense of the
+common case. We will take a different approach to adding configurability
+to our dynamization system.
+
+Though it has thus far gone unmentioned, readers familiar with LSM Trees
+may have noted the astonishing similarity between decomposition-based
+dynamization techniques, and a data structure called the Log-structured
+Merge-tree. First proposed by O'Neil in the mid '90s\cite{oneil96},
+the LSM Tree was designed to optmize write throughout for external data
+structures. It accomplished this task by buffer inserted records in a
+small in-memory AVL Tree, and then flushing this buffer to disk when
+it filled up. The flush process itself would fully rebuild the on-disk
+structure (a B+Tree), including all of the currently existing records
+on external storage. O'Neil also proposed version which used several,
+layered, external structures, to reduce the cost of reconstruction.
+
+In more recent times, the LSM Tree has seen significant development and
+been used as the basis for key-value stores like RocksDB~\cite{dong21}
+and LevelDB~\cite{leveldb}. This work as produced an incredibly large
+and well explored parameterization of the reconstruction procedures of
+LSM Trees, a good summary of which can be bound in this recent tutorial
+paper~\cite{sarkar23}. Examples of this design space exploration include:
+different ways to organize each "level" of the tree~\cite{dayan19,
+dostoevsky, autumn}, different growth rates, buffering, sub-partioning
+of structures to allow finer-grained reconstruction~\cite{dayan22}, and
+approaches for allocating resources to auxilliary structures attached to
+the main ones for accelerating certain types of query~\cite{dayan18-1,
+zhu21, monkey}.
+
+Many of the elements within the LSM Tree design space are based upon the
+specifics of the data structure itself, and are not generally applicable.
+However, some of the higher-level concepts can be imported and applied in
+the context of dynamization. Specifically, we have decided to import the
+following four elements for use in our dynamization technique,
+\begin{itemize}
+ \item A small dynamic buffer into which new records are inserted
+ \item A variable growth rate, called as \emph{scale factor}
+ \item The ability to attach auxilliary structures to each block
+ \item Two different strategies for reconstructing data structures
+\end{itemize}
+This design space and its associated trade-offs will be discussed in
+more detail in Chapter~\ref{chap:design-space}, but we'll describe it
+briefly here.
+
+\Paragraph{Buffering.} In the standard Bentley-Saxe method, each
+insert triggers a reconstruction. Many of these are quite small, but
+it still makes most insertions somewhat expensive. By adding a small
+buffer, a large number of inserts can be performed without requiring
+any reconstructions at all. For generality, we elected to use an
+unsorted array as our buffer, as dynamic versions of the structures
+we are dynamizing may not exist. This introduces some query cost, as
+queries must be answered from these unsorted records as well, but in
+the case of sampling this isn't a serious problem. The implications of
+this will be discussed in Section~\ref{ssec:sampling-cost-funcs}. The
+size of this buffer, $N_B$ is a user-specified constant, and all block
+capacities are multiplied by it. In the Bentley-Saxe method, the $i$th
+block contains $2^i$ records. In our scheme, with buffering, this becomes
+$N_B \cdot 2^i$ records in the $i$th block. We call this unsorted array
+the \emph{mutable buffer}.
+
+\Paragraph{Scale Factor.} In the Bentley-Saxe method, each block is
+twice as large as the block the preceeds it There is, however, no reason
+why this growth rate couldn't be adjusted. In our system, we make the
+growth rate a user-specified constant called the \emph{scale factor},
+$s$, such that the $i$th level contains $N_B \cdot s^i$ records.
+
+\Paragraph{Auxilliary Structures.} In Section~\ref{ssec:sampling-deletes},
+we encountered two problems relating to supporting deletes that can be
+resolved through the use of auxilliary structures. First, regardless
+of whether tagging or tombstones are used, the data structure requires
+support for an efficient point-lookup operation. Many SSIs are tree-based
+and thus support this, but not all data structures do. In such cases,
+the point-lookup operation could be provided by attaching an auxilliary
+hash table to the data structure that maps records to their location in
+the SSI. We use term \emph{shard} to refer to the combination of a
+block with these optional auxilliary structures.
+
+In addition, the tombstone deletion mechanism requires performing a point
+lookup for every record sampled, to validate that it has not been deleted.
+This introduces a large amount of overhead into the sampling process,
+as this requires searching each block in the structure. One approach
+that can be used to help improve the performance of these searches,
+without requiring as much storage as adding auxilliary hash tables to
+every block, is to include bloom filters~\cite{bloom70}. A bloom filter
+is an approximate data structure that answers tests of set membership
+with bounded, single-sided error. These are commonly used in LSM Trees
+to accelerate point lookups by allowing levels that don't contain the
+record being searched for to be skipped. In our case, we only care about
+tombstone records, so rather than building these filters over all records,
+we can build them over tombstones. This approach can greatly improve
+the sampling performance of the structure when tombstone deletes are used.
+
+\Paragraph{Layout Policy.} The Bentley-Saxe method considers blocks
+individually, without any other organization beyond increasing size. In
+contrast, LSM Trees have multiple layers of structural organization. The
+top level structure is a level, upon which record capacity restrictions
+are applied. These levels are then partitioned into individual structures,
+which can be further organized by key range. Because our intention is to
+support general data structures, which may or may not be easily partition
+by a key, we will not consider the finest grain of partitioning. However,
+we can borrow the concept of levels, and lay out shards in these levels
+according to different strategies.
+
+Specifically, we consider two layout policies. First, we can allow a
+single shard per level, a policy called \emph{Leveling}. This approach
+is traditionally read optimized, as it generally results in fewer shards
+within the overall structure for a given scale factor. Under leveling,
+the $i$th level has a capacity of $N_B \cdot s^{i+1}$ records. We can
+also allow multiple shards per level, resulting in a write-optimized
+policy called \emph{Tiering}. In tiering, each level can hold up to $s$
+shards, each with up to $N_B \cdot s^i$ records. Note that this doesn't
+alter the overall record capacity of each level relative to leveling,
+only the way the records are divided up into shards.
+
+\section{Practical Dynamization Framework}
+
+Based upon the results discussed in the previous section, we are now ready
+to discuss the dynamization framework that we have produced for adding
+update support to SSIs. This framework allows us to achieve all three
+of our desiderata, at least for certain configurations, and provides a
+wide range of performance tuning options to the user.
+
+\subsection{Requirements}
+
+The requirements that the framework places upon SSIs are rather
+modest. The sampling problem being considered must be a decomposable
+sampling problem (Definition \ref{def:decomp-sampling}) and the SSI must
+support the \texttt{build} and \texttt{unbuild} operations. Optionally,
+if the SSI supports point lookups or if the SSI can be constructed
+from multiple instances of the SSI more efficiently than its normal
+static construction, these two operations can be leveraged by the
+framework. However, these are not requirements, as the framework provides
+facilities to work around their absence.
+
+\captionsetup[subfloat]{justification=centering}
\begin{figure*}
\centering
- \subfloat[Leveling]{\includegraphics[width=.75\textwidth]{img/sigmod23/merge-leveling} \label{fig:leveling}}\\
- \subfloat[Tiering]{\includegraphics[width=.75\textwidth]{img/sigmod23/merge-tiering} \label{fig:tiering}}
+ \subfloat[Leveling]{\includegraphics[width=.5\textwidth]{img/sigmod23/merge-leveling} \label{fig:leveling}}
+ \subfloat[Tiering]{\includegraphics[width=.5\textwidth]{img/sigmod23/merge-tiering} \label{fig:tiering}}
- \caption{\textbf{A graphical overview of the sampling framework and its insert procedure.} A
+ \caption{\textbf{A graphical overview of our dynamization framework.} A
mutable buffer (MB) sits atop two levels (L0, L1) containing shards (pairs
of SSIs and auxiliary structures [A]) using the leveling
(Figure~\ref{fig:leveling}) and tiering (Figure~\ref{fig:tiering}) layout
policies. Records are represented as black/colored squares, and grey
squares represent unused capacity. An insertion requiring a multi-level
- reconstruction is illustrated.} \label{fig:framework}
+ reconstruction is illustrated.} \label{fig:sampling-framework}
\end{figure*}
-\section{Framework Implementation}
-
-Our framework has been designed to work efficiently with any SSI, so long
-as it has the following properties.
-
+\subsection{Framework Construction}
+
+The framework itself is shown in Figure~\ref{fig:sampling-framework},
+along with some of its configuration parameters and its insert procedure
+(which will be discussed in the next section). It consists of an unsorted
+array of size $N_B$ records, sitting atop a sequence of \emph{levels},
+each containing SSIs according to the layout policy. If leveling
+is used, each level will contain a single SSI with up to $N_B \cdot
+s^{i+1}$ records. If tiering is used, each level will contain up to
+$s$ SSIs, each with up to $N_B \cdot s^i$ records. The scale factor,
+$s$, controls the rate at which the capacity of each level grows. The
+framework supports deletes using either the tombstone or tagging policy,
+which can be selected by the user acccording to her preference. To support
+these delete mechanisms, each record contains an attached header with
+bits to indicate its tombstone or delete status.
+
+\subsection{Supported Operations and Cost Functions}
+\Paragraph{Insert.} Inserting a record into the dynamization involves
+appending it to the mutable buffer, which requires $\Theta(1)$ time. When
+the buffer reaches its capacity, it must be flushed into the structure
+itself before any further records can be inserted. First, a shard will be
+constructed from the records in the buffer using the SSI's \texttt{build}
+operation, with $B(N_B)$ cost. This shard will then be merged into the
+levels below it, which may require further reconstructions to occur to
+make room. The manner in which these reconstructions proceed follows the
+selection of layout policy,
+\begin{itemize}
+\item[\textbf{Leveling}] When a buffer flush occurs in the leveling
+policy, the system scans the existing levels to find the first level
+which has sufficient empty space to store the contents of the level above
+it. More formally, if the number of records in level $i$ is $N_i$, then
+$i$ is determined such that $N_i + N_B\cdot s^{i} <= N_B \cdot s^{i+1}$.
+If no level exists that satisfies the record count constraint, then an
+empty level is added and $i$ is set to the index of this new level. Then,
+a reconstruction is executed containing all of the records in levels $i$
+and $i - 1$ (where $i=-1$ indicates the temporary shard built from the
+buffer). Following this reconstruction, all levels $j < i$ are shifted
+by one level.
+\item[\textbf{Tiering}] When using tiering, the system will locate
+the first level, $i$, containing fewer than $s$ shards. If no such
+level exists, then a new empty level is added and $i$ is set to the
+index of that level. Then, for each level $j < i$, a reconstruction
+is performed involving all $s$ shards on that level. The resulting new
+shard will then be placed into the level at $j + 1$ and $j$ will be
+emptied. Following this, the newly created shard from the buffer will
+be appended to level $0$.
+\end{itemize}
+
+In either case, the reconstructions all use instances of the shard as
+input, and so if the SSI supports more efficient construction in this case
+(with $B_M(n)$ cost), then this routine can be used here. Once all of
+the necessary reconstructions have been performed, each level is checked
+to verify that the proportion of tombstones or deleted records is less
+than $\delta$. If this condition fails, then a proactive compaction is
+triggered. This compaction involves doing the reconstructions necessary
+to move the shard violating the delete bound down one level. Once the
+compaction is complete, the delete proportions are checked again, and
+this process is repeated until all levels satisfy the bound.
+
+Following this procedure, inserts have a worst case cost of $I \in
+\Theta(B_M(n))$, equivalent to Bently-Saxe. The amortized cost can be
+determined by finding the total cost of reconstructions involving each
+record and amortizing it over each insert. The cost of the insert is
+composed of three parts,
\begin{enumerate}
- \item The underlying full query $Q$ supported by the SSI from whose results
- samples are drawn satisfies the following property:
- for any dataset $D = \cup_{i = 1}^{n}D_i$
- where $D_i \cap D_j = \emptyset$, $Q(D) = \cup_{i = 1}^{n}Q(D_i)$.
- \item \emph{(Optional)} The SSI supports efficient point-lookups.
- \item \emph{(Optional)} The SSI is capable of efficiently reporting the total weight of all records
- returned by the underlying full query.
+\item The cost of appending to the buffer
+\item The cost of flushing the buffer to a shard
+\item The total cost of the reconstructions the record is involved
+ in over the lifetime of the structure
\end{enumerate}
+The first cost is constant and the second is $B(N_B)$. Regardless of
+layout policy, there will be $\Theta(\log_s(n))$ total levels, and
+the record will, at worst, be written a constant number of times to
+each level, resulting in a maximum of $\Theta(\log_s(n)B_M(n))$ cost
+associated with these reconstructions. Thus, the total cost associated
+with each record in the structure is,
+\begin{equation*}
+\Theta(1) + \Theta(B(N_B)) + \Theta(\log_s(n)B_M(n))
+\end{equation*}
+Assuming that $N_B \ll n$, the first two terms of this expression are
+constant. Dropping them and amortizing the result over $n$ records give
+us the amortized insertion cost,
+\begin{equation*}
+I_a(n) \in \Theta\left(\frac{B_M(n)}{n}\log_s(n)\right)
+\end{equation*}
+If the SSI being considered does not support a more efficient
+construction procedure from other instances of the same SSI, and
+the general Bentley-Saxe \texttt{unbuild} and \texttt{build}
+operations must be used, the the cost becomes $I_a(n) \in
+\Theta\left(\frac{B(n)}{n}\log_s(n)\right)$ instead.
+
+\Paragraph{Delete.} The framework supports both tombstone and tagged
+deletes, each with different performance. Using tombstones, the cost
+of a delete is identical to that of an insert. When using tagging, the
+cost of a delete is the same as cost of doing a point lookup, as the
+"delete" itself is simply setting a bit in the header of the record,
+once it has been located. There will be $\Theta(\log_s n)$ total shards
+in the structure, each with a look-up cost of $L(n)$ using either the
+SSI's native point-lookup, or an auxilliary hash table, and the lookup
+must also scan the buffer in $\Theta(N_B)$ time. Thus, the worst-case
+cost of a tagged delete is,
+\begin{equation*}
+D(n) = \Theta(N_B + L(n)\log_s(n))
+\end{equation*}
-The first property applies to the query being sampled from, and is essential
-for the correctness of sample sets reported by extended sampling
-indexes.\footnote{ This condition is stricter than the definition of a
-decomposable search problem in the Bentley-Saxe method, which allows for
-\emph{any} constant-time merge operation, not just union.
-However, this condition is satisfied by many common types of database
-query, such as predicate-based filtering queries.} The latter two properties
-are optional, but reduce deletion and sampling costs respectively. Should the
-SSI fail to support point-lookups, an auxiliary hash table can be attached to
-the data structures.
-Should it fail to support query result weight reporting, rejection
-sampling can be used in place of the more efficient scheme discussed in
-Section~\ref{ssec:sample}. The analysis of this framework will generally
-assume that all three conditions are satisfied.
-
-Given an SSI with these properties, a dynamic extension can be produced as
-shown in Figure~\ref{fig:framework}. The extended index consists of disjoint
-shards containing an instance of the SSI being extended, and optional auxiliary
-data structures. The auxiliary structures allow acceleration of certain
-operations that are required by the framework, but which the SSI being extended
-does not itself support efficiently. Examples of possible auxiliary structures
-include hash tables, Bloom filters~\cite{bloom70}, and range
-filters~\cite{zhang18,siqiang20}. The shards are arranged into levels of
-increasing record capacity, with either one shard, or up to a fixed maximum
-number of shards, per level. The decision to place one or many shards per level
-is called the \emph{layout policy}. The policy names are borrowed from the
-literature on the LSM tree, with the former called \emph{leveling} and the
-latter called \emph{tiering}.
-
-To avoid a reconstruction on every insert, an unsorted array of fixed capacity
-($N_b$), called the \emph{mutable buffer}, is used to buffer updates. Because it is
-unsorted, it is kept small to maintain reasonably efficient sampling
-and point-lookup performance. All updates are performed by appending new
-records to the tail of this buffer.
-If a record currently within the index is
-to be updated to a new value, it must first be deleted, and then a record with
-the new value inserted. This ensures that old versions of records are properly
-filtered from query results.
-
-When the buffer is full, it is flushed to make room for new records. The
-flushing procedure is based on the layout policy in use. When using leveling
-(Figure~\ref{fig:leveling}) a new SSI is constructed using both the records in
-$L_0$ and those in the buffer. This is used to create a new shard, which
-replaces the one previously in $L_0$. When using tiering
-(Figure~\ref{fig:tiering}) a new shard is built using only the records from the
-buffer, and placed into $L_0$ without altering the existing shards. Each level
-has a record capacity of $N_b \cdot s^{i+1}$, controlled by a configurable
-parameter, $s$, called the scale factor. Records are organized in one large
-shard under leveling, or in $s$ shards of $N_b \cdot s^i$ capacity each under
-tiering. When a level reaches its capacity, it must be emptied to make room for
-the records flushed into it. This is accomplished by moving its records down to
-the next level of the index. Under leveling, this requires constructing a new
-shard containing all records from both the source and target levels, and
-placing this shard into the target, leaving the source empty. Under tiering,
-the shards in the source level are combined into a single new shard that is
-placed into the target level. Should the target be full, it is first emptied by
-applying the same procedure. New empty levels
-are dynamically added as necessary to accommodate these reconstructions.
-Note that shard reconstructions are not necessarily performed using
-merging, though merging can be used as an optimization of the reconstruction
-procedure where such an algorithm exists. In general, reconstruction requires
-only pooling the records of the shards being combined and then applying the SSI's
-standard construction algorithm to this set of records.
+\Paragraph{Update.} Given the above definitions of insert and delete,
+in-place updates of records can be supported by first deleting the record
+to be updated, and then inserting the updated value as a new record. Thus,
+the update cost is $\Theta(I(n) + D(n))$.
+
+\Paragraph{Sampling.} Answering sampling queries from this structure is
+largely the same as was discussed for a standard Bentley-Saxe dynamization
+in Section~\ref{ssec:sampling-with-deletes} with the addition of a need
+to sample from the unsorted buffer as well. There are two approaches
+for sampling from the buffer. The most general approach would be to
+temporarily build an SSI over the records within the buffer, and then
+treat this is a normal shard for the remainder of the sampling procedure.
+In this case, the sampling algorithm remains indentical to the algorithm
+discussed in Section~\ref{ssec:decomposed-structure-sampling}, following
+the construction of the temporary shard. This results in a worst-case
+sampling cost of,
+\begin{equation*}
+ \mathscr{Q}(n, k) = \Theta\left(B(N_B) + [W(n) + P(n)]\log_2 n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right)
+\end{equation*}
-\begin{table}[t]
-\caption{Frequently Used Notation}
-\centering
+In practice, however, it is often possible to perform rejection sampling
+against the buffer, without needing to do any additional work to prepare
+it. In this case, the full weight of the buffer can be used to determine
+how many samples to draw from it, and then these samples can be obtained
+using standard rejection sampling to both control the weight, and enforce
+any necessary predicates. Because $N_B \ll n$, this procedure will not
+introduce anything more than constant overhead in the sampling process as
+the probability of sampling from the buffer is quite low, and the cost of
+doing so is constant, and so the overall query cost when rejection sampling
+is possible is,
-\begin{tabular}{|p{2.5cm} p{5cm}|}
- \hline
- \textbf{Variable} & \textbf{Description} \\ \hline
- $N_b$ & Capacity of the mutable buffer \\ \hline
- $s$ & Scale factor \\ \hline
- $C_c(n)$ & SSI initial construction cost \\ \hline
- $C_r(n)$ & SSI reconstruction cost \\ \hline
- $L(n)$ & SSI point-lookup cost \\ \hline
- $P(n)$ & SSI sampling pre-processing cost \\ \hline
- $S(n)$ & SSI per-sample sampling cost \\ \hline
- $W(n)$ & Shard weight determination cost \\ \hline
- $R(n)$ & Shard rejection check cost \\ \hline
- $\delta$ & Maximum delete proportion \\ \hline
- %$\rho$ & Maximum rejection rate \\ \hline
-\end{tabular}
-\label{tab:nomen}
-
-\end{table}
+\begin{equation*}
+ \mathscr{Q}(n, k) = \Theta\left([W(n) + P(n)]\log_2 n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right)
+\end{equation*}
-Table~\ref{tab:nomen} lists frequently used notation for the various parameters
-of the framework, which will be used in the coming analysis of the costs and
-trade-offs associated with operations within the framework's design space. The
-remainder of this section will discuss the performance characteristics of
-insertion into this structure (Section~\ref{ssec:insert}), how it can be used
-to correctly answer sampling queries (Section~\ref{ssec:insert}), and efficient
-approaches for supporting deletes (Section~\ref{ssec:delete}). Finally, it will
-close with a detailed discussion of the trade-offs within the framework's
-design space (Section~\ref{ssec:design-space}).
-
-
-
-
-\subsection{Trade-offs on Framework Design Space}
-\label{ssec:design-space}
-The framework has several tunable parameters, allowing it to be tailored for
-specific applications. This design space contains trade-offs among three major
-performance characteristics: update cost, sampling cost, and auxiliary memory
-usage. The two most significant decisions when implementing this framework are
-the selection of the layout and delete policies. The asymptotic analysis of the
-previous sections obscures some of the differences between these policies, but
-they do have significant practical performance implications.
-
-\Paragraph{Layout Policy.} The choice of layout policy represents a clear
-trade-off between update and sampling performance. Leveling
-results in fewer shards of larger size, whereas tiering results in a larger
-number of smaller shards. As a result, leveling reduces the costs associated
-with point-lookups and sampling query preprocessing by a constant factor,
-compared to tiering. However, it results in more write amplification: a given
-record may be involved in up to $s$ reconstructions on a single level, as
-opposed to the single reconstruction per level under tiering.
-
-\Paragraph{Delete Policy.} There is a trade-off between delete performance and
-sampling performance that exists in the choice of delete policy. Tagging
-requires a point-lookup when performing a delete, which is more expensive than
-the insert required by tombstones. However, it also allows constant-time
-rejection checks, unlike tombstones which require a point-lookup of each
-sampled record. In situations where deletes are common and write-throughput is
-critical, tombstones may be more useful. Tombstones are also ideal in
-situations where immutability is required, or random writes must be avoided.
-Generally speaking, however, tagging is superior when using SSIs that support
-it, because sampling rejection checks will usually be more common than deletes.
-
-\Paragraph{Mutable Buffer Capacity and Scale Factor.} The mutable buffer
-capacity and scale factor both influence the number of levels within the index,
-and by extension the number of distinct shards. Sampling and point-lookups have
-better performance with fewer shards. Smaller shards are also faster to
-reconstruct, although the same adjustments that reduce shard size also result
-in a larger number of reconstructions, so the trade-off here is less clear.
-
-The scale factor has an interesting interaction with the layout policy: when
-using leveling, the scale factor directly controls the amount of write
-amplification per level. Larger scale factors mean more time is spent
-reconstructing shards on a level, reducing update performance. Tiering does not
-have this problem and should see its update performance benefit directly from a
-larger scale factor, as this reduces the number of reconstructions.
-
-The buffer capacity also influences the number of levels, but is more
-significant in its effects on point-lookup performance: a lookup must perform a
-linear scan of the buffer. Likewise, the unstructured nature of the buffer also
-will contribute negatively towards sampling performance, irrespective of which
-buffer sampling technique is used. As a result, although a large buffer will
-reduce the number of shards, it will also hurt sampling and delete (under
-tagging) performance. It is important to minimize the cost of these buffer
-scans, and so it is preferable to keep the buffer small, ideally small enough
-to fit within the CPU's L2 cache. The number of shards within the index is,
-then, better controlled by changing the scale factor, rather than the buffer
-capacity. Using a smaller buffer will result in more compactions and shard
-reconstructions; however, the empirical evaluation in Section~\ref{ssec:ds-exp}
-demonstrates that this is not a serious performance problem when a scale factor
-is chosen appropriately. When the shards are in memory, frequent small
-reconstructions do not have a significant performance penalty compared to less
-frequent, larger ones.
-
-\Paragraph{Auxiliary Structures.} The framework's support for arbitrary
-auxiliary data structures allows for memory to be traded in exchange for
-insertion or sampling performance. The use of Bloom filters for accelerating
-tombstone rejection checks has already been discussed, but many other options
-exist. Bloom filters could also be used to accelerate point-lookups for delete
-tagging, though such filters would require much more memory than tombstone-only
-ones to be effective. An auxiliary hash table could be used for accelerating
-point-lookups, or range filters like SuRF \cite{zhang18} or Rosetta
-\cite{siqiang20} added to accelerate pre-processing for range queries like in
-IRS or WIRS.
+In both cases, $R(n) \in \Theta(1)$ for tagging deletes, and $R(n) \in
+N_B + L(N) \log_s n$ for tombstones (including the cost of searching
+the buffer for the tombstone).