diff options
Diffstat (limited to 'chapters/sigmod23/framework.tex')
| -rw-r--r-- | chapters/sigmod23/framework.tex | 134 |
1 files changed, 69 insertions, 65 deletions
diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex index 256d127..804194b 100644 --- a/chapters/sigmod23/framework.tex +++ b/chapters/sigmod23/framework.tex @@ -50,6 +50,7 @@ on the query being sampled from. Based on these observations, we can define the decomposability conditions for a query sampling problem, \begin{definition}[Decomposable Sampling Problem] + \label{def:decomp-sampling} A query sampling problem, $X: (F, \mathcal{D}, \mathcal{Q}, \mathbb{Z}^+ \to \mathcal{R}$) is decomposable if and only if the following conditions are met for all $q \in \mathcal{Q}, @@ -78,12 +79,14 @@ These two conditions warrant further explanation. The first condition is simply a redefinition of the standard decomposability criteria to consider matching the distribution, rather than the exact records in $R$, as the correctness condition for the merge process. The second condition -handles a necessary property of the underlying search problem being -sampled from. Note that this condition is \emph{stricter} than normal -decomposability for $F$, and essentially requires that the query being -sampled from return a set of records, rather than an aggregate value or -some other result that cannot be meaningfully sampled from. This condition -is satisfied by predicate-filtering style database queries, among others. +addresses the search problem from which results are to be sampled. Not all +search problems admit sampling of this sort--for example, an aggregation +query that returns a single result. This condition essentially requires +that the search problem being sampled from return a set of records, rather +than an aggregate value or some other result that cannot be meaningfully +sampled from. This condition is satisfied by predicate-filtering style +database queries, among others. However, it should be noted that this +condition is \emph{stricter} than normal decomposability. With these definitions in mind, let's turn to solving these query sampling problems. First, we note that many SSIs have a sampling procedure that @@ -120,7 +123,7 @@ down-sampling combination operator. Secondly, this formulation fails to avoid a per-sample dependence on $n$, even in the case where $S(n) \in \Theta(1)$. This gets even worse when considering rejections that may occur as a result of deleted records. Recall from -Section~\ref{ssec:background-deletes} that deletion can be supported +Section~\ref{ssec:dyn-deletes} that deletion can be supported using weak deletes or a shadow structure in a Bentley-Saxe dynamization. Using either approach, it isn't possible to avoid deleted records in advance when sampling, and so these will need to be rejected and retried. @@ -208,9 +211,8 @@ or are naturally determined as part of the pre-processing, and thus the $W(n)$ term can be merged into $P(n)$. \subsection{Supporting Deletes} -\ref{ssec:sampling-deletes} - -As discussed in Section~\ref{ssec:background-deletes}, the Bentley-Saxe +\label{ssec:sampling-deletes} +As discussed in Section~\ref{ssec:dyn-deletes}, the Bentley-Saxe method can support deleting records through the use of either weak deletes, or a secondary ghost structure, assuming certain properties are satisfied by either the search problem or data structure. Unfortunately, @@ -222,13 +224,14 @@ we'll discuss our mechanisms for supporting deletes, as well as how these can be handled during sampling while maintaining correctness. Because both deletion policies have their advantages under certain -contexts, we decided to support both. Specifically, we propose two -mechanisms for deletes, which are +contexts, we decided to support both. We require that each record contain +a small header, which is used to store visibility metadata. Given this, +we propose two mechanisms for deletes, \begin{enumerate} \item \textbf{Tagged Deletes.} Each record in the structure includes a -header with a visibility bit set. On delete, the structure is searched -for the record, and the bit is set in indicate that it has been deleted. +visibility bit in its header. On delete, the structure is searched +for the record, and the bit is set to indicate that it has been deleted. This mechanism is used to support \emph{weak deletes}. \item \textbf{Tombstone Deletes.} On delete, a new record is inserted into the structure with a tombstone bit set in the header. This mechanism is @@ -252,8 +255,9 @@ arbitrary number of delete records, and rebuild the entire structure when this threshold is crossed~\cite{saxe79}. Mixing the "ghost" records into the same structures as the original records allows for deleted records to naturally be cleaned up over time as they meet their tombstones during -reconstructions. This is an important consequence that will be discussed -in more detail in Section~\ref{ssec-sampling-delete-bounding}. +reconstructions using a technique called tombstone cancellation. This +technique, and its important consequences related to sampling, will be +discussed in Section~\ref{sssec:sampling-rejection-bound}. There are two relevant aspects of performance that the two mechanisms trade-off between: the cost of performing the delete, and the cost of @@ -368,7 +372,7 @@ This performance cost seems catastrophically bad, considering it must be paid per sample, but there are ways to mitigate it. We will discuss these mitigations in more detail later, during our discussion of the implementation of these results in -Section~\ref{sec:sampling-implementation}. +Section~\ref{ssec:sampling-framework}. \subsubsection{Bounding Rejection Probability} @@ -392,8 +396,7 @@ the Bentley-Saxe method, however. In the theoretical literature on this topic, the solution to this problem is to periodically re-partition all of the records to re-align the block sizes~\cite{merge-dsp, saxe79}. This approach could also be easily applied here, if desired, though we -do not in our implementations, for reasons that will be discussed in -Section~\ref{sec:sampling-implementation}. +do not in our implementations. The process of removing these deleted records during reconstructions is different for the two mechanisms. Tagged deletes are straightforward, @@ -411,16 +414,16 @@ care with ordering semantics, tombstones and their associated records can be sorted into adjacent spots, allowing them to be efficiently dropped during reconstruction without any extra overhead. -While the dropping of deleted records during reconstruction helps, it is -not sufficient on its own to ensure a particular bound on the number of -deleted records within the structure. Pathological scenarios resulting in -unbounded rejection rates, even in the presence of this mitigation, are -possible. For example, tagging alone will never trigger reconstructions, -and so it would be possible to delete every single record within the -structure without triggering a reconstruction, or records could be deleted -in the reverse order that they were inserted using tombstones. In either -case, a passive system of dropping records naturally during reconstruction -is not sufficient. +While the dropping of deleted records during reconstruction helps, +it is not sufficient on its own to ensure a particular bound on the +number of deleted records within the structure. Pathological scenarios +resulting in unbounded rejection rates, even in the presence of this +mitigation, are possible. For example, tagging alone will never trigger +reconstructions, and so it would be possible to delete every single +record within the structure without triggering a reconstruction. Or, +when using tombstones, records could be deleted in the reverse order +that they were inserted. In either case, a passive system of dropping +records naturally during reconstruction is not sufficient. Fortunately, this passive system can be used as the basis for a system that does provide a bound. This is because it guarantees, @@ -490,6 +493,7 @@ be taken to obtain a sample set of size $k$. \subsection{Performance Tuning and Configuration} +\label{ssec:sampling-design-space} The final of the desiderata referenced earlier in this chapter for our dynamized sampling indices is having tunable performance. The base @@ -508,7 +512,7 @@ Though it has thus far gone unmentioned, some readers may have noted the astonishing similarity between decomposition-based dynamization techniques, and a data structure called the Log-structured Merge-tree. First proposed by O'Neil in the mid '90s\cite{oneil96}, -the LSM Tree was designed to optimize write throughout for external data +the LSM Tree was designed to optimize write throughput for external data structures. It accomplished this task by buffer inserted records in a small in-memory AVL Tree, and then flushing this buffer to disk when it filled up. The flush process itself would fully rebuild the on-disk @@ -518,22 +522,23 @@ layered, external structures, to reduce the cost of reconstruction. In more recent times, the LSM Tree has seen significant development and been used as the basis for key-value stores like RocksDB~\cite{dong21} -and LevelDB~\cite{leveldb}. This work has produced an incredibly large -and well explored parametrization of the reconstruction procedures of -LSM Trees, a good summary of which can be bound in this recent tutorial -paper~\cite{sarkar23}. Examples of this design space exploration include: -different ways to organize each "level" of the tree~\cite{dayan19, -dostoevsky, autumn}, different growth rates, buffering, sub-partitioning -of structures to allow finer-grained reconstruction~\cite{dayan22}, and -approaches for allocating resources to auxiliary structures attached to -the main ones for accelerating certain types of query~\cite{dayan18-1, -zhu21, monkey}. +and LevelDB~\cite{leveldb}. This work has produced an incredibly +large and well explored parametrization of the reconstruction +procedures of LSM Trees, a good summary of which can be bounded in +this recent tutorial paper~\cite{sarkar23}. Examples of this design +space exploration include: different ways to organize each "level" +of the tree~\cite{dayan19, dostoevsky, autumn}, different growth +rates, buffering, sub-partitioning of structures to allow finer-grained +reconstruction~\cite{dayan22}, and approaches for allocating resources to +auxiliary structures attached to the main ones for accelerating certain +types of query~\cite{dayan18-1, zhu21, monkey}. This work is discussed +in greater depth in Chapter~\ref{chap:related-work} Many of the elements within the LSM Tree design space are based upon the -specifics of the data structure itself, and are not generally applicable. -However, some of the higher-level concepts can be imported and applied in -the context of dynamization. Specifically, we have decided to import the -following four elements for use in our dynamization technique, +specifics of the data structure itself, and are not applicable to our +use case. However, some of the higher-level concepts can be imported and +applied in the context of dynamization. Specifically, we have decided to +import the following four elements for use in our dynamization technique, \begin{itemize} \item A small dynamic buffer into which new records are inserted \item A variable growth rate, called as \emph{scale factor} @@ -554,11 +559,11 @@ we are dynamizing may not exist. This introduces some query cost, as queries must be answered from these unsorted records as well, but in the case of sampling this isn't a serious problem. The implications of this will be discussed in Section~\ref{ssec:sampling-cost-funcs}. The -size of this buffer, $N_B$ is a user-specified constant, and all block -capacities are multiplied by it. In the Bentley-Saxe method, the $i$th -block contains $2^i$ records. In our scheme, with buffering, this becomes -$N_B \cdot 2^i$ records in the $i$th block. We call this unsorted array -the \emph{mutable buffer}. +size of this buffer, $N_B$ is a user-specified constant. Block capacities +are defined in terms of multiples of $N_B$, such that each buffer flush +corresponds to an insert in the traditioanl Bentley-Saxe method. Thus, +rather than the $i$th block containing $2^i$ records, it contains $N_B +\cdot 2^i$ records. We call this unsorted array the \emph{mutable buffer}. \Paragraph{Scale Factor.} In the Bentley-Saxe method, each block is twice as large as the block the precedes it There is, however, no reason @@ -593,19 +598,19 @@ we can build them over tombstones. This approach can greatly improve the sampling performance of the structure when tombstone deletes are used. \Paragraph{Layout Policy.} The Bentley-Saxe method considers blocks -individually, without any other organization beyond increasing size. In -contrast, LSM Trees have multiple layers of structural organization. The -top level structure is a level, upon which record capacity restrictions -are applied. These levels are then partitioned into individual structures, -which can be further organized by key range. Because our intention is to -support general data structures, which may or may not be easily partition -by a key, we will not consider the finest grain of partitioning. However, -we can borrow the concept of levels, and lay out shards in these levels -according to different strategies. +individually, without any other organization beyond increasing +size. In contrast, LSM Trees have multiple layers of structural +organization. Record capacity restrictions are enforced on structures +called \emph{levels}, which are partitioned into individual data +structures, and then further organized into non-overlapping key ranges. +Because our intention is to support general data structures, which may +or may not be easily partitioned by a key, we will not consider the finest +grain of partitioning. However, we can borrow the concept of levels, +and lay out shards in these levels according to different strategies. Specifically, we consider two layout policies. First, we can allow a single shard per level, a policy called \emph{Leveling}. This approach -is traditionally read optimized, as it generally results in fewer shards +is traditionally read-optimized, as it generally results in fewer shards within the overall structure for a given scale factor. Under leveling, the $i$th level has a capacity of $N_B \cdot s^{i+1}$ records. We can also allow multiple shards per level, resulting in a write-optimized @@ -628,12 +633,10 @@ The requirements that the framework places upon SSIs are rather modest. The sampling problem being considered must be a decomposable sampling problem (Definition \ref{def:decomp-sampling}) and the SSI must support the \texttt{build} and \texttt{unbuild} operations. Optionally, -if the SSI supports point lookups or if the SSI can be constructed -from multiple instances of the SSI more efficiently than its normal -static construction, these two operations can be leveraged by the -framework. However, these are not requirements, as the framework provides -facilities to work around their absence. - +if the SSI supports point lookups or if the SSI is merge decomposable, +then these two operations can be leveraged by the framework. However, +these are not requirements, as the framework provides facilities to work +around their absence. \captionsetup[subfloat]{justification=centering} \begin{figure*} @@ -669,6 +672,7 @@ these delete mechanisms, each record contains an attached header with bits to indicate its tombstone or delete status. \subsection{Supported Operations and Cost Functions} +\label{ssec:sampling-cost-funcs} \Paragraph{Insert.} Inserting a record into the dynamization involves appending it to the mutable buffer, which requires $\Theta(1)$ time. When the buffer reaches its capacity, it must be flushed into the structure |