diff options
Diffstat (limited to 'chapters')
| -rw-r--r-- | chapters/beyond-dsp.tex | 126 | ||||
| -rw-r--r-- | chapters/dynamization.tex | 124 | ||||
| -rw-r--r-- | chapters/related-works.tex | 1 | ||||
| -rw-r--r-- | chapters/sigmod23/background.tex | 18 | ||||
| -rw-r--r-- | chapters/sigmod23/examples.tex | 2 | ||||
| -rw-r--r-- | chapters/sigmod23/exp-baseline.tex | 2 | ||||
| -rw-r--r-- | chapters/sigmod23/exp-parameter-space.tex | 12 | ||||
| -rw-r--r-- | chapters/sigmod23/experiment.tex | 2 | ||||
| -rw-r--r-- | chapters/sigmod23/extensions.tex | 4 | ||||
| -rw-r--r-- | chapters/sigmod23/framework.tex | 134 |
10 files changed, 220 insertions, 205 deletions
diff --git a/chapters/beyond-dsp.tex b/chapters/beyond-dsp.tex index 87f44ba..73f8174 100644 --- a/chapters/beyond-dsp.tex +++ b/chapters/beyond-dsp.tex @@ -202,7 +202,7 @@ problem. The core idea underlying our solution in that chapter was to introduce individualized local queries for each block, which were created after a pre-processing step to allow information about each block to be determined first. In that particular example, we established the weight -each block should have during sampling, and then creating custom sampling +each block should have during sampling, and then created custom sampling queries with variable $k$ values, following the weight distributions. We have determined a general interface that allows for this procedure to be expressed, and we define the term \emph{extended decomposability} to refer @@ -379,12 +379,12 @@ A significant limitation of invertible problems is that the result set size is not able to be controlled. We do not know how many records in our local results have been deleted until we reach the combine operation and they begin to cancel out, at which point we lack a mechanism to go back -and retrieve more. This presents difficulties for addressing important -search problems such as top-$k$, $k$-NN, and sampling. In principle, these -queries could be supported by repeating the query with larger-and-larger -$k$ values until the desired number of records is returned, but in the -eDSP model this requires throwing away a lot of useful work, as the state -of the query must be rebuilt each time. +and retrieve more records. This presents difficulties for addressing +important search problems such as top-$k$, $k$-NN, and sampling. In +principle, these queries could be supported by repeating the query with +larger-and-larger $k$ values until the desired number of records is +returned, but in the eDSP model this requires throwing away a lot of +useful work, as the state of the query must be rebuilt each time. We can resolve this problem by moving the decision to repeat the query into the query interface itself, allowing retries \emph{before} the @@ -700,7 +700,7 @@ the following main operations, This function will delete a record from the dynamized structure, returning $1$ on success and $0$ on failure. The meaning of a failure to delete is dependent upon the delete mechanism in use, - and will be discussed in Section~\ref{ssec:dyn-deletes}. + and will be discussed in Section~\ref{sssec:dyn-deletes}. \item \texttt{std::future<QueryResult> query(QueryParameters); } \\ This function will execute a query with the specified parameters @@ -838,17 +838,18 @@ shards of the same type. The second of these constructors is to allow for efficient merging to be leveraged for merge decomposable search problems. Shards can also expose a point lookup operation for use in supporting -deletes for DDSPs. This function is only used for DDSP deletes, and so can -be left off when this functionality isn't necessary. If a data structure -doesn't natively support an efficient point-lookup, then it can be added -by including a hash table or other data structure in the shard if desired. -This function accepts a record type as input, and should return a pointer -to the record that exactly matches the input in storage, if one exists, -or \texttt{nullptr} if it doesn't. It should also accept an optional -boolean argument that the framework will pass \texttt{true} into if it -is don't a lookup for a tombstone. This flag is to allow the shard to -use various tombstone-related optimization, such as using a Bloom filter -for them, or storing them separately from the main records, etc. +deletes for DDSPs. This function is only used for DDSP deletes, and +so can be left off when this functionality isn't necessary. If a data +structure doesn't natively support an efficient point-lookup, then it +can be added by including a hash table or other data structure in the +shard if desired. This function accepts a record type as input, and +should return a pointer to the record that exactly matches the input in +storage, if one exists, or \texttt{nullptr} if it doesn't. It should +also accept an optional boolean argument that the framework will pass +\texttt{true} into if the lookup operation is being used to search for +a tombstone records. This flag is to allow the shard to use various +tombstone-related optimization, such as using a Bloom filter for them, +or storing them separately from the main records, etc. Shards should also expose some accessors for basic meta-data about its contents. In particular, the framework is reliant upon a function @@ -888,19 +889,19 @@ concept ShardInterface = RecordInterface<typename SHARD::RECORD> }; \end{lstlisting} -\label{listing:shard} \caption{The required interface for shard types in our dynamization framework.} +\label{lst:shard} \end{lstfloat} \subsubsection{Query Interface} The most complex interface required by the framework is for queries. The -concept for query types is given in Listing~\ref{listing:query}. In +concept for query types is given in Listing~\ref{lst:query}. In effect, it requires implementing the full IDSP interface from the previous section, as well as versions of $\mathbftt{local\_preproc}$ -and $\mathbftt{local\query}$ for pre-processing and querying an unsorted +and $\mathbftt{local\_query}$ for pre-processing and querying an unsorted set of records, which is necessary to allow the mutable buffer to be used as part of the query process.\footnote{ In the worst case, these routines could construct temporary shard @@ -918,7 +919,7 @@ a local result that includes both the number of records and the number of tombstones, while the query result itself remains a single number. Additionally, the framework makes no decision about what, if any, collection type should be used for these results. A range scan, for -example, could specified the result types as a vector of records, map +example, could specify the result types as a vector of records, map of records, etc., depending on the use case. There is one significant difference between the IDSP interface and the @@ -935,7 +936,6 @@ to define an additional combination operation for final result types, or duplicate effort in the combine step on each repetition. \begin{lstfloat} - \begin{lstlisting}[language=C++] template <typename QUERY, typename SHARD, @@ -979,9 +979,9 @@ requires(PARAMETERS *parameters, LOCAL *local, }; \end{lstlisting} -\label{listing:query} \caption{The required interface for query types in our dynamization framework.} +\label{lst:query} \end{lstfloat} @@ -1029,7 +1029,7 @@ all the records from the level above it ($i-1$ or the buffer, if $i merged with the records in $j+1$ and the resulting shard placed in level $j+1$. This procedure guarantees that level $0$ will have capacity for the shard from the buffer, which is then merged into it (if it is not -empty) or because it (if the level is empty). +empty) or replaces it (if the level is empty). \item \textbf{Tiering.}\\ @@ -1152,16 +1152,16 @@ reconstruction. Consider a record $r_i$ and its corresponding tombstone $t_j$, where the subscript is the insertion time, with $i < j$ meaning that $r_i$ was inserted \emph{before} $t_j$. Then, if we are to apply tombstone cancellations, we must obey the following invariant within -each shard: A record $r_i$ and tombstone $r_j$ can exist in the same +each shard: A record $r_i$ and tombstone $t_j$ can exist in the same shard if $i > j$. But, if $i < j$, then a cancellation should occur. The case where the record and tombstone coexist covers the situation where a record is deleted, and then inserted again after the delete. In this case, there does exist a record $r_k$ with $k < j$ that the tombstone should cancel with, but that record may exist in a different shard. So -the tombstone will \emph{eventually} cancel, but it would be technically -incorrect to cancel it with the matching record $r_i$ that it coexists -with in the shard being considered. +the tombstone will \emph{eventually} cancel, but it would be incorrect +to cancel it with the matching record $r_i$ that it coexists with in +the shard being considered. This means that correct tombstone cancellation requires that the order that records have been inserted be known and accounted for during @@ -1186,7 +1186,7 @@ at index $i$ will cancel with a record if and only if that record is in index $i+1$. For structures that are constructed by a sorted-merge of data, this allows tombstone cancellation at no extra cost during the merge operation. Otherwise, it requires an extra linear pass after -sorting to remove cancelled records.\footnote{ +sorting to remove canceled records.\footnote{ For this reason, we use tagging based deletes for structures which don't require sorting by value during construction. } @@ -1200,7 +1200,7 @@ For tombstone deletes, a failure to delete means a failure to insert, and the request should be retried after a brief delay. Note that, for performance reasons, the framework makes no effort to ensure that the record being erased using tombstones is \emph{actually} there, so it -is possible to insert a tombstone that can never be cancelled. This +is possible to insert a tombstone that can never be canceled. This won't affect correctness in any way, so long as queries are correctly implemented, but it will increase the size of the structure slightly. @@ -1271,7 +1271,7 @@ same mechanisms described in Section~\ref{sssec:dyn-deletes}. \Paragraph{Asymptotic Complexity.} The worst-case query cost of the framework follows the same basic cost function as discussed for IDSPs -in Section~\ref{asec:dyn-idsp}, with slight modifications to account for +in Section~\ref{ssec:dyn-idsp}, with slight modifications to account for the different cost function of buffer querying and preprocessing. The cost is, \begin{equation*} @@ -1280,7 +1280,7 @@ cost is, \end{equation*} where $P_B(n)$ is the cost of pre-processing the buffer, and $Q_B(n)$ is the cost of querying it. As $N_B$ is a small constant relative to $n$, -in some cases these terms can be ommitted, but they are left here for +in some cases these terms can be omitted, but they are left here for generality. Also note that this is an upper bound, but isn't necessarily tight. As we saw with IRS in Section~\ref{ssec:edsp}, it is sometimes possible to leverage problem-specific details within this interface to @@ -1307,7 +1307,7 @@ All of our testing was performed using Ubuntu 20.04 LTS on a dual socket Intel Xeon Gold 6242 server with 384 GiB of physical memory and 40 physical cores. We ran our benchmarks pinned to a specific core, or specific NUMA node for multi-threaded testing. Our code was compiled -using GCC version 11.3.0 with the \texttt{-O3} flag, and targetted to +using GCC version 11.3.0 with the \texttt{-O3} flag, and targeted to C++20.\footnote{ Aside from the ALEX benchmark. ALEX does not build in this configuration, and we used C++13 instead for that particular test. @@ -1335,7 +1335,7 @@ structures. Specifically, \texttt{fb}, and \texttt{osm} datasets from SOSD~\cite{sosd-datasets}. Each has 200 million 64-bit keys (to which we added 64-bit values) following a variety of - distributions. We ommitted the \texttt{wiki} dataset because it + distributions. We omitted the \texttt{wiki} dataset because it contains duplicate keys, which were not supported by one of our dynamic baselines. @@ -1371,7 +1371,7 @@ For our first set of experiments, we evaluated a dynamized version of the Triespline learned index~\cite{plex} for answering range count queries.\footnote{ We tested range scans throughout this chapter by measure the performance of a range count. We decided to go this route to ensure - that the results across our baselines were comprable. Different range + that the results across our baselines were comparable. Different range structures provided different interfaces for accessing the result sets, some of which required making an extra copy and others which didn't. Using a range count instead allowed us to measure only index @@ -1383,7 +1383,7 @@ performance. We ran these tests using the SOSD \texttt{OSM} dataset. First, we'll consider the effect of buffer size on performance in Figures~\ref{fig:ins-buffer-size} and \ref{fig:q-buffer-size}. For all -of these tests, we used a fixe scale factor of $8$ and the tombstone +of these tests, we used a fixed scale factor of $8$ and the tombstone delete policy. Each plot shows the performance of our three supported layout policies (note that BSM using a fixed $N_B=1$ and $s=2$ for all tests, to accurately reflect the performance of the classical Bentley-Saxe @@ -1419,17 +1419,17 @@ improves performance. This is because a larger scale factor in tiering results in more, smaller structures, and thus reduced reconstruction time. But for leveling it increases the write amplification, hurting performance. Figure~\ref{fig:q-scale-factor} shows that, like with -Figure~\ref{fig:query_sf} in the previous chapter, query latency is not -strong affected by the scale factor, but larger scale factors due tend +Figure~\ref{fig:sample_sf} in the previous chapter, query latency is not +strongly affected by the scale factor, but larger scale factors due tend to have a negative effect under tiering (due to having more structures). As a final note, these results demonstrate that, compared the the normal Bentley-Saxe method, our proposed design space is a strict -improvement. There are points within the space that are equivilant to, +improvement. There are points within the space that are equivalent to, or even strictly superior to, BSM in terms of both query and insertion -performance, as well as clearly available trade-offs between insertion and -query performance, particular when it comes to selecting layout policy. - +performance. Beyond this, there are also clearly available trade-offs +between insertion and query performance, particular when it comes to +selecting layout policy. \begin{figure*} @@ -1446,7 +1446,7 @@ query performance, particular when it comes to selecting layout policy. \subsection{Independent Range Sampling} -Next, we'll consider the indepedent range sampling problem using ISAM +Next, we'll consider the independent range sampling problem using ISAM tree. The functioning of this structure for answering IRS queries is discussed in more detail in Section~\ref{ssec:irs-struct}, and we use the query algorithm described in Algorithm~\ref{alg:decomp-irs}. We use the @@ -1456,7 +1456,7 @@ obtain the upper and lower bounds of the query range, and the weight of that range, using tree traversals in \texttt{local\_preproc}. We use rejection sampling on the buffer, and so the buffer preprocessing simply uses the number of records in the buffer for its weight. In -\texttt{distribute\_query}, we build and alias structure over all of +\texttt{distribute\_query}, we build an alias structure over all of the weights and query it $k$ times to obtain the individual $k$ values for the local queries. To avoid extra work on repeat, we stash this alias structure in the buffer's local query object so it is available @@ -1485,8 +1485,8 @@ compaction is triggered. We configured our dynamized structure to use $s=8$, $N_B=12000$, $\delta = .05$, $f = 16$, and the tiering layout policy. We compared our method (\textbf{DE-IRS}) to Olken's method~\cite{olken89} on a B+Tree with -aggregate weight counts (\textbf{AGG B+Tree}), as well as our besoke -sampling solution from the previous chapter (\textbf{Besoke}) and a +aggregate weight counts (\textbf{AGG B+Tree}), as well as our bespoke +sampling solution from the previous chapter (\textbf{Bespoke}) and a single static instance of the ISAM Tree (\textbf{ISAM}). Because IRS is neither INV nor DDSP, the standard Bentley-Saxe Method has no way to support deletes for it, and was not tested. All of our tested sampling @@ -1494,7 +1494,7 @@ queries had a controlled selectivity of $\sigma = 0.01\%$ and $k=1000$. The results of our performance benchmarking are in Figure~\ref{fig:irs}. Figure~\ref{fig:irs-insert} shows that our general framework has -comperable insertion performance to the specialized one, though loses +comparable insertion performance to the specialized one, though loses slightly. This is to be expected, as \textbf{Bespoke} was hand-written for specifically this type of query and data structure, and has hard-coded data types, among other things. Despite losing to \textbf{Bespoke} @@ -1525,7 +1525,7 @@ using a static Vantage Point Tree (VPTree)~\cite{vptree}. This is a binary search tree with internal nodes that partition records based on their distance to a selected point, called the vantage point. All of the points within a fixed distance of the vantage point are covered -by one subtree, and the points outside of this distance are covered by +by one sub-tree, and the points outside of this distance are covered by the other. This results in a hard-to-update data structure that can be constructed in $\Theta(n \log n)$ time using repeated application of the \texttt{quickselect} algorithm~\cite{quickselect} to partition the @@ -1537,7 +1537,7 @@ Algorithm~\cite{alg:idsp-knn}, though using delete tagging instead of tombstones. VPTree doesn't support efficient point lookups, and so to work around this we add a hash map to each shard, mapping each record to its location in storage, to ensure that deletes can be done efficiently -in this way. This allows us to avoid cancelling deleted records in +in this way. This allows us to avoid canceling deleted records in the \texttt{combine} operation, as they can be skipped over during \texttt{local\_query} directly. Because $k$-NN doesn't have any of the distributional requirements of IRS, these local queries can return $k$ @@ -1599,7 +1599,7 @@ scheme used, with \textbf{BSM-VPTree} performing slightly \emph{better} than our framework for query performance. The reason for this is shown in Figure~\ref{fig:knn-insert}, where our framework outperforms the Bentley-Saxe method in insertion performance. These results are -atributible to our selection of framework configuration parameters, +attributable to our selection of framework configuration parameters, which are biased towards better insertion performance. Both dynamized structures also outperform the dynamic baseline. Finally, as is becoming a trend, Figure~\ref{fig:knn-space} shows that the storage requirements @@ -1667,7 +1667,7 @@ The results of our evaluation are shown in Figure~\ref{fig:eval-learned-index}. Figure~\ref{fig:rq-insert} shows the insertion performance. DE-TS is the best in all cases, and the pure BSM version of Triespline is the worst by a substantial margin. Of -particular interest in this chart is the inconsisent performance of +particular interest in this chart is the inconsistent performance of ALEX, which does quite well on the \texttt{books} dataset, and poorly on the others. It is worth noting that getting ALEX to run \emph{at all} in some cases required a lot of trial and error and tuning, as its @@ -1691,8 +1691,8 @@ performs horrendously compared to all of the other structures. The same caveat from the previous paragraph applies here--PGM can be configured for better performance. But it's notable that our framework-dynamized PGM is able to beat PGM slightly in insertion performance without seeing the -same massive degredation in query performance that PGM's native update -suport does in its own update-optmized configuration.\footnote{ +same massive degradation in query performance that PGM's native update +support does in its own update-optimized configuration.\footnote{ It's also worth noting that PGM implements tombstone deletes by inserting a record with a matching key to the record to be deleted, and a particular "tombstone" value, rather than using a header. This @@ -1712,7 +1712,7 @@ update support. \subsection{String Search} As a final example of a search problem, we consider exact string matching -using the fast succinct trie~\cite{zhang18}. While updatable +using the fast succinct trie~\cite{zhang18}. While dynamic tries aren't terribly unusual~\cite{m-bonsai,dynamic-trie}, succinct data structures, which attempt to approach an information-theoretic lower-bound on their binary representation of the data, are usually static because @@ -1725,7 +1725,7 @@ we consider the effectiveness of our generalized framework for them. \centering \subfloat[Update Throughput]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-insert} \label{fig:fst-insert}} \subfloat[Query Latency]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-query} \label{fig:fst-query}} - \subfloat[Index Overhead]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-space} \label{fig:fst-size}} + \subfloat[Index Overhead]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-space} \label{fig:fst-space}} %\vspace{-3mm} \caption{FST Evaluation} \label{fig:fst-eval} @@ -1739,9 +1739,9 @@ storage. Queries use no pre-processing and the local queries directly search for a matching string. We use the framework's early abort feature to stop as soon as the first result is found, and combine simply checks whether this record is a tombstone or not. If it's a tombstone, then -the lookup is considered to have no found the search string. Otherwise, +the lookup is considered to have not found the search string. Otherwise, the record is returned. This results in a dynamized structure with the -following asympotic costs, +following asymptotic costs, \begin{align*} @@ -1759,7 +1759,7 @@ The results are show in Figure~\ref{fig:fst-eval}. As with range scans, the Bentley-Saxe method shows horrible insertion performance relative to our framework in Figure~\ref{fig:fst-insert}. Note that the significant observed difference in update throughput for the two data sets is -largely attributable to the relative sizes. The \texttt{usra} set is +largely attributable to the relative sizes. The \texttt{US} set is far larger than \texttt{english}. Figure~\ref{fig:fst-query} shows that our write-optimized framework configuration is slightly out-performed in query latency by the standard Bentley-Saxe dynamization, and that both @@ -1767,7 +1767,7 @@ dynamized structures are quite a bit slower than the static structure for queries. Finally, the storage costs for the data structures are shown in Figure~\ref{fig:fst-space}. For the \texttt{english} data set, the extra storage cost from decomposing the structure is quite significant, -but the for \texttt{ursarc} set the sizes are quite comperable. It is +but the for \texttt{ursarc} set the sizes are quite comparable. It is not unexpected that dynamization would add storage cost for succinct (or any compressed) data structures, because the splitting of the records across multiple data structures reduces the ability of the structure to @@ -1792,10 +1792,10 @@ are inserted, it is necessary that each operation obtain a lock on the root node of the tree~\cite{zhao22}. This makes this situation a good use-case for the automatic concurrency support provided by our framework. Figure~\ref{fig:irs-concurrency} shows the results of this -benchmark for various numbers of concurreny query threads. As can be seen, +benchmark for various numbers of concurrency query threads. As can be seen, our framework supports a stable update throughput up to 32 query threads, whereas the AGG B+Tree suffers from contention for the mutex and sees -is performance degrade as the number of threads increases. +its performance degrade as the number of threads increases. \begin{figure} \centering diff --git a/chapters/dynamization.tex b/chapters/dynamization.tex index 053fb46..a2277c3 100644 --- a/chapters/dynamization.tex +++ b/chapters/dynamization.tex @@ -67,15 +67,17 @@ terms for these two concepts. \subsection{Decomposable Search Problems} -Dynamization techniques require the partitioning of one data structure -into several, smaller ones. As a result, these techniques can only -be applied in situations where the search problem to be answered can -be answered from this set of smaller data structures, with the same -answer as would have been obtained had all of the data been used to -construct a single, large structure. This requirement is formalized in -the definition of a class of problems called \emph{decomposable search -problems (DSP)}. This class was first defined by Bentley and Saxe in -their work on dynamization, and we will adopt their definition, +The dynamization techniques we will be considering require decomposing +one data structure into several, smaller ones, called blocks, each built +over a disjoint partition of the data. As a result, these techniques +can only be applied in situations where the search problem can be +answered from this set of decomposed blocks. The answer to the search +problem from the decomposition should be the same as would have been +obtained had all of the data been stored in a single data structure. This +requirement is formalized in the definition of a class of problems called +\emph{decomposable search problems (DSP)}. This class was first defined +by Bentley and Saxe in their work on dynamization, and we will adopt +their definition, \begin{definition}[Decomposable Search Problem~\cite{saxe79}] \label{def:dsp} @@ -180,15 +182,17 @@ database indices. We refer to a data structure with update support as contain header information (like visibility) that is updated in place. } -This section discusses \emph{dynamization}, the construction of a -dynamic data structure based on an existing static one. When certain -conditions are satisfied by the data structure and its associated -search problem, this process can be done automatically, and with -provable asymptotic bounds on amortized insertion performance, as well -as worst case query performance. This is in contrast to the manual -design of dynamic data structures, which involve techniques based on -partially rebuilding small portions of a single data structure (called -\emph{local reconstruction})~\cite{overmars83}. This is a very high cost +This section discusses \emph{dynamization}, the construction of a dynamic +data structure based on an existing static one. When certain conditions +are satisfied by the data structure and its associated search problem, +this process can be done automatically, and with provable asymptotic +bounds on amortized insertion performance, as well as worst case +query performance. This automatic approach is in constrast with the +manual design of a dynamic data structure, which involves altering +the data structure itself to natively support updates. This process +usually involves implementing techniques that partially rebuild small +portions of the structure to accomodate new records, which is called +\emph{local reconstruction}~\cite{overmars83}. This is a very high cost intervention that requires significant effort on the part of the data structure designer, whereas conventional dynamization can be performed with little-to-no modification of the underlying data structure at all. @@ -345,7 +349,7 @@ then an insert is done by, Following an insert, it is possible that Constraint~\ref{ebm-c1} is violated.\footnote{ Constraint~\ref{ebm-c2} cannot be violated by inserts, but may be violated by deletes. We're omitting deletes from the discussion at - this point, but will circle back to them in Section~\ref{sec:deletes}. + this point, but will circle back to them in Section~\ref{ssec:dyn-deletes}. } In this case, the constraints are enforced by "re-configuring" the structure. $s$ is updated to be exactly $f(n)$, all of the existing blocks are unbuilt, and then the records are redistributed evenly into @@ -584,14 +588,13 @@ F(A / B, q) = F(A, q)~\Delta~F(B, q) for all $A, B \in \mathcal{PS}(\mathcal{D})$ where $A \cap B = \emptyset$. \end{definition} -Given a search problem with this property, it is possible to perform -deletes by creating a secondary ``ghost'' structure. When a record -is to be deleted, it is inserted into this structure. Then, when the -dynamization is queried, this ghost structure is queried as well as the -main one. The results from the ghost structure can be removed from the -result set using the inverse merge operator. This simulates the result -that would have been obtained had the records been physically removed -from the main structure. +Given a search problem with this property, it is possible to emulate +removing a record from the structure by instead inserting into a +secondary ``ghost'' structure. When the dynamization is queried, +this ghost structure is queried as well as the main one. The results +from the ghost structure can be removed from the result set using the +inverse merge operator. This simulates the result that would have been +obtained had the records been physically removed from the main structure. Two examples of invertible search problems are set membership and range count. Range count was formally defined in @@ -670,11 +673,13 @@ to some serious problems, for example if every record in a structure of $n$ records is deleted, the net result will be an "empty" dynamized data structure containing $2n$ physical records within it. To circumvent this problem, Bentley and Saxe proposed a mechanism of setting a maximum -threshold for the size of the ghost structure relative to the main one, -and performing a complete re-partitioning of the data once this threshold -is reached, removing all deleted records from the main structure, -emptying the ghost structure, and rebuilding blocks with the records -that remain according to the invariants of the technique. +threshold for the size of the ghost structure relative to the main one. +Once this threshold was reached, a complete re-partitioning of the data +can be performed. During this re-paritioning, all deleted records can +be removed from the main structure, and the ghost structure emptied +completely. Then all of the blocks can be rebuilt from the remaining +records, partitioning them according to the strict binary decomposition +of the Bentley-Saxe method. \subsubsection{Weak Deletes for Deletion Decomposable Search Problems} @@ -694,16 +699,16 @@ underlying data structure supports a delete operation. More formally, for $\mathscr{I}$. \end{definition} -Superficially, this doesn't appear very useful. If the underlying data -structure already supports deletes, there isn't much reason to use a -dynamization technique to add deletes to it. However, one point worth -mentioning is that it is possible, in many cases, to easily \emph{add} -delete support to a static structure. If it is possible to locate a -record and somehow mark it as deleted, without removing it from the -structure, and then efficiently ignore these records while querying, -then the given structure and its search problem can be said to be -deletion decomposable. This technique for deleting records is called -\emph{weak deletes}. +Superficially, this doesn't appear very useful, because if the underlying +data structure already supports deletes, there isn't much reason to +use a dynamization technique to add deletes to it. However, even in +structures that don't natively support deleting, it is possible in many +cases to \emph{add} delete support without significant alterations. +If it is possible to locate a record and somehow mark it as deleted, +without removing it from the structure, and then efficiently ignore these +records while querying, then the given structure and its search problem +can be said to be deletion decomposable. This technique for deleting +records is called \emph{weak deletes}. \begin{definition}[Weak Deletes~\cite{overmars81}] \label{def:weak-delete} @@ -815,10 +820,10 @@ and thereby the query performance. The particular invariant maintenance rules depend upon the decomposition scheme used. \Paragraph{Bentley-Saxe Method.} When creating a BSM dynamization for -a deletion decomposable search problem, the $i$th block where $i \geq 2$\footnote{ +a deletion decomposable search problem, the $i$th block where $i \geq 2$,\footnote{ Block $i=0$ will only ever have one record, so no special maintenance must be done for it. A delete will simply empty it completely. -}, +} in the absence of deletes, will contain $2^{i-1} + 1$ records. When a delete occurs in block $i$, no special action is taken until the number of records in that block falls below $2^{i-2}$. Once this threshold is @@ -1076,12 +1081,13 @@ matching of records in result sets. To work around this, a slight abuse of definition is in order: assume that the equality conditions within the DSP definition can be interpreted to mean ``the contents in the two sets are drawn from the same distribution''. This enables the category -of DSP to apply to this type of problem. +of DSP to apply to this type of problem, while maintaining the spirit of +the definition. Even with this abuse, however, IRS cannot generally be considered decomposable; it is at best $C(n)$-decomposable. The reason for this is that matching the distribution requires drawing the appropriate number -of samples from each each partition of the data. Even in the special +of samples from each partition of the data. Even in the special case that $|D_0| = |D_1| = \ldots = |D_\ell|$, the number of samples from each partition that must appear in the result set cannot be known in advance due to differences in the selectivity of the predicate across @@ -1102,7 +1108,7 @@ the partitions. probability of a $4$. The second and third result sets can only be ${3, 3, 3, 3}$ and ${4, 4, 4, 4}$ respectively. Merging these together, we'd find that the probability distribution of the sample - would be $p(3) = 0.5$ and $p(4) = 0.5$. However, were were to perform + would be $p(3) = 0.5$ and $p(4) = 0.5$. However, were we to perform the same sampling operation over the full dataset (not partitioned), the distribution would be $p(3) = 0.25$ and $p(4) = 0.75$. @@ -1111,21 +1117,23 @@ the partitions. The problem is that the number of samples drawn from each partition needs to be weighted based on the number of elements satisfying the query predicate in that partition. In the above example, by drawing $4$ -samples from $D_1$, more weight is given to $3$ than exists within -the base dataset. This can be worked around by sampling a full $k$ -records from each partition, returning both the sample and the number -of records satisfying the predicate as that partition's query result, -and then performing another pass of IRS as the merge operator, but this -is the same approach as was used for k-NN above. This leaves IRS firmly +samples from $D_1$, more weight is given to $3$ than exists within the +base dataset. This can be worked around by sampling a full $k$ records +from each partition, returning both the sample and the number of records +satisfying the predicate as that partition's query result. This allows for +the relative weights of each block to be controlled for during the merge, +by doing weighted sampling of each partial result. This approach requires +$\Theta(k)$ time for the merge operation, however, leaving IRS firmly in the $C(n)$-decomposable camp. If it were possible to pre-calculate the number of samples to draw from each partition, then a constant-time merge operation could be used. -We examine this problem in detail in Chapters~\ref{chap:sampling} and -\ref{chap:framework} and propose techniques for efficiently expanding -support of dynamization systems to non-decomposable search problems, as -well as addressing some additional difficulties introduced by supporting -deletes, which can complicate query processing. +We examine expanding support for non-decomposable search problems +in Chapters~\ref{chap:sampling} and \ref{chap:framework} and propose +techniques for efficiently expanding support of dynamization systems to +non-decomposable search problems, as well as addressing some additional +difficulties introduced by supporting deletes, which can complicate +query processing. \subsection{Configurability} diff --git a/chapters/related-works.tex b/chapters/related-works.tex index 2ed466a..7a42003 100644 --- a/chapters/related-works.tex +++ b/chapters/related-works.tex @@ -1,4 +1,5 @@ \chapter{Related Work} +\label{chap:related-work} \section{Implementations of Bentley-Saxe} diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex index af3b80a..d600c27 100644 --- a/chapters/sigmod23/background.tex +++ b/chapters/sigmod23/background.tex @@ -19,16 +19,16 @@ is used to indicate the selection of either a single sample or a sample set; the specific usage should be clear from context. In each of the problems considered, sampling can be performed either -with replacement or without replacement. Sampling with replacement +with-replacement or without-replacement. Sampling with-replacement means that a record that has been included in the sample set for a given sampling query is "replaced" into the dataset and allowed to be sampled -again. Sampling without replacement does not "replace" the record, +again. Sampling without-replacement does not "replace" the record, and so each individual record can only be included within the a sample set once for a given query. The data structures that will be discussed -support sampling with replacement, and sampling without replacement can -be implemented using a constant number of with replacement sampling +support sampling with-replacement, and sampling without-replacement can +be implemented using a constant number of with-replacement sampling operations, followed by a deduplication step~\cite{hu15}, so this chapter -will focus exclusive on the with replacement case. +will focus exclusive on the with-replacement case. \subsection{Independent Sampling Problem} @@ -115,8 +115,10 @@ of problems that will be directly addressed within this chapter. Relational database systems often have native support for IQS using SQL's \texttt{TABLESAMPLE} operator~\cite{postgress-doc}. However, the -algorithms used to implement this operator have significant limitations: -users much choose between statistical independence or performance. +algorithms used to implement this operator have significant limitations +and do not allow users to maintain statistical independence of the results +without also running the query to be sampled from in full. Thus, users must +choose between independece and performance. To maintain statistical independence, Bernoulli sampling is used. This technique requires iterating over every record in the result set of the @@ -240,7 +242,7 @@ Tao~\cite{tao22}. There also exist specialized data structures with support for both efficient sampling and updates~\cite{hu14}, but these structures have poor constant factors and are very complex, rendering them of little -practical utility. Additionally, efforts have been made to extended +practical utility. Additionally, efforts have been made to extend the alias structure with support for weight updates over a fixed set of elements~\cite{hagerup93,matias03,allendorf23}. These approaches do not allow the insertion or removal of new records, however, only in-place diff --git a/chapters/sigmod23/examples.tex b/chapters/sigmod23/examples.tex index 38df04d..4e7f9ac 100644 --- a/chapters/sigmod23/examples.tex +++ b/chapters/sigmod23/examples.tex @@ -25,7 +25,7 @@ number of shards involved in a reconstruction using either layout policy is $\Theta(1)$ using our framework, this means that we can perform reconstructions in $B_M(n) \in \Theta(n)$ time, including tombstone cancellation. The total weight of the structure can also be calculated -at no time when it is constructed, allows $W(n) \in \Theta(1)$ time +at no time cost when it is constructed, allows $W(n) \in \Theta(1)$ time as well. Point lookups over the sorted data can be done using a binary search in $L(n) \in \Theta(\log_2 n)$ time, and sampling queries require no pre-processing, so $P(n) \in \Theta(1)$. The mutable buffer can be diff --git a/chapters/sigmod23/exp-baseline.tex b/chapters/sigmod23/exp-baseline.tex index 5585c36..d0e1ce0 100644 --- a/chapters/sigmod23/exp-baseline.tex +++ b/chapters/sigmod23/exp-baseline.tex @@ -73,7 +73,7 @@ being introduced by the dynamization. \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-insert} \label{fig:irs-insert1}} \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-sample} \label{fig:irs-sample1}} \\ - \subfloat[Delete Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-delete} \label{fig:irs-delete}} + \subfloat[Delete Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-delete} \label{fig:irs-delete-s}} \subfloat[Sampling Latency vs. Sample Size]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-samplesize} \label{fig:irs-samplesize}} \caption{Framework Comparison to Baselines for IRS} diff --git a/chapters/sigmod23/exp-parameter-space.tex b/chapters/sigmod23/exp-parameter-space.tex index 9583312..1e51d8c 100644 --- a/chapters/sigmod23/exp-parameter-space.tex +++ b/chapters/sigmod23/exp-parameter-space.tex @@ -2,11 +2,11 @@ \label{ssec:ds-exp} Our proposed framework has a large design space, which we briefly -described in Section~\ref{ssec:design-space}. The contents of this -space will be described in much more detail in Chapter~\ref{chap:design-space}, -but as part of this work we did perform an experimental examination of our -framework to compare insertion throughput and query latency over various -points within the space. +described in Section~\ref{ssec:sampling-design-space}. The +contents of this space will be described in much more detail in +Chapter~\ref{chap:design-space}, but as part of this work we did perform +an experimental examination of our framework to compare insertion +throughput and query latency over various points within the space. We examined this design space by considering \texttt{DE-WSS} specifically, using a random sample of $500,000,000$ records from the \texttt{OSM} @@ -48,7 +48,7 @@ performance, with tiering outperforming leveling for both delete policies. The next largest effect was the delete policy selection, with tombstone deletes outperforming tagged deletes in insertion performance. This result aligns with the asymptotic analysis of the two -approaches in Section~\ref{sampling-deletes}. It is interesting to note +approaches in Section~\ref{ssec:sampling-deletes}. It is interesting to note however that the effect of layout policy was more significant in these particular tests,\footnote{ Although the largest performance gap in absolute terms was between diff --git a/chapters/sigmod23/experiment.tex b/chapters/sigmod23/experiment.tex index 727284a..1eb704c 100644 --- a/chapters/sigmod23/experiment.tex +++ b/chapters/sigmod23/experiment.tex @@ -53,7 +53,7 @@ uninteresting key distributions. \Paragraph{Structures Compared.} As a basis of comparison, we tested both our dynamized SSI implementations, and existing dynamic baselines, -for each sampling problem considered. Specifically, we consider a the +for each sampling problem considered. Specifically, we consider the following dynamized structures, \begin{itemize} diff --git a/chapters/sigmod23/extensions.tex b/chapters/sigmod23/extensions.tex index 3a3cba3..3304b76 100644 --- a/chapters/sigmod23/extensions.tex +++ b/chapters/sigmod23/extensions.tex @@ -56,7 +56,7 @@ structure using in XDB~\cite{li19}. Because our dynamization technique is built on top of static data structures, a limited form of concurrency support is straightforward to -implement. To that end, created a proof-of-concept dynamization of an +implement. To that end, we created a proof-of-concept dynamization of an ISAM Tree for IRS based on a simplified version of a general concurrency controlled scheme for log-structured data stores~\cite{golan-gueta15}. @@ -79,7 +79,7 @@ accessing them have finished. The buffer itself is an unsorted array, so a query can capture a consistent and static version by storing the tail pointer at the time the query begins. New inserts can be performed concurrently by doing -a fetch-and-and on the tail. By using multiple buffers, inserts and +a fetch-and-add on the tail. By using multiple buffers, inserts and reconstructions can proceed, to some extent, in parallel, which helps to hide some of the insertion tail latency due to blocking on reconstructions during a buffer flush. diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex index 256d127..804194b 100644 --- a/chapters/sigmod23/framework.tex +++ b/chapters/sigmod23/framework.tex @@ -50,6 +50,7 @@ on the query being sampled from. Based on these observations, we can define the decomposability conditions for a query sampling problem, \begin{definition}[Decomposable Sampling Problem] + \label{def:decomp-sampling} A query sampling problem, $X: (F, \mathcal{D}, \mathcal{Q}, \mathbb{Z}^+ \to \mathcal{R}$) is decomposable if and only if the following conditions are met for all $q \in \mathcal{Q}, @@ -78,12 +79,14 @@ These two conditions warrant further explanation. The first condition is simply a redefinition of the standard decomposability criteria to consider matching the distribution, rather than the exact records in $R$, as the correctness condition for the merge process. The second condition -handles a necessary property of the underlying search problem being -sampled from. Note that this condition is \emph{stricter} than normal -decomposability for $F$, and essentially requires that the query being -sampled from return a set of records, rather than an aggregate value or -some other result that cannot be meaningfully sampled from. This condition -is satisfied by predicate-filtering style database queries, among others. +addresses the search problem from which results are to be sampled. Not all +search problems admit sampling of this sort--for example, an aggregation +query that returns a single result. This condition essentially requires +that the search problem being sampled from return a set of records, rather +than an aggregate value or some other result that cannot be meaningfully +sampled from. This condition is satisfied by predicate-filtering style +database queries, among others. However, it should be noted that this +condition is \emph{stricter} than normal decomposability. With these definitions in mind, let's turn to solving these query sampling problems. First, we note that many SSIs have a sampling procedure that @@ -120,7 +123,7 @@ down-sampling combination operator. Secondly, this formulation fails to avoid a per-sample dependence on $n$, even in the case where $S(n) \in \Theta(1)$. This gets even worse when considering rejections that may occur as a result of deleted records. Recall from -Section~\ref{ssec:background-deletes} that deletion can be supported +Section~\ref{ssec:dyn-deletes} that deletion can be supported using weak deletes or a shadow structure in a Bentley-Saxe dynamization. Using either approach, it isn't possible to avoid deleted records in advance when sampling, and so these will need to be rejected and retried. @@ -208,9 +211,8 @@ or are naturally determined as part of the pre-processing, and thus the $W(n)$ term can be merged into $P(n)$. \subsection{Supporting Deletes} -\ref{ssec:sampling-deletes} - -As discussed in Section~\ref{ssec:background-deletes}, the Bentley-Saxe +\label{ssec:sampling-deletes} +As discussed in Section~\ref{ssec:dyn-deletes}, the Bentley-Saxe method can support deleting records through the use of either weak deletes, or a secondary ghost structure, assuming certain properties are satisfied by either the search problem or data structure. Unfortunately, @@ -222,13 +224,14 @@ we'll discuss our mechanisms for supporting deletes, as well as how these can be handled during sampling while maintaining correctness. Because both deletion policies have their advantages under certain -contexts, we decided to support both. Specifically, we propose two -mechanisms for deletes, which are +contexts, we decided to support both. We require that each record contain +a small header, which is used to store visibility metadata. Given this, +we propose two mechanisms for deletes, \begin{enumerate} \item \textbf{Tagged Deletes.} Each record in the structure includes a -header with a visibility bit set. On delete, the structure is searched -for the record, and the bit is set in indicate that it has been deleted. +visibility bit in its header. On delete, the structure is searched +for the record, and the bit is set to indicate that it has been deleted. This mechanism is used to support \emph{weak deletes}. \item \textbf{Tombstone Deletes.} On delete, a new record is inserted into the structure with a tombstone bit set in the header. This mechanism is @@ -252,8 +255,9 @@ arbitrary number of delete records, and rebuild the entire structure when this threshold is crossed~\cite{saxe79}. Mixing the "ghost" records into the same structures as the original records allows for deleted records to naturally be cleaned up over time as they meet their tombstones during -reconstructions. This is an important consequence that will be discussed -in more detail in Section~\ref{ssec-sampling-delete-bounding}. +reconstructions using a technique called tombstone cancellation. This +technique, and its important consequences related to sampling, will be +discussed in Section~\ref{sssec:sampling-rejection-bound}. There are two relevant aspects of performance that the two mechanisms trade-off between: the cost of performing the delete, and the cost of @@ -368,7 +372,7 @@ This performance cost seems catastrophically bad, considering it must be paid per sample, but there are ways to mitigate it. We will discuss these mitigations in more detail later, during our discussion of the implementation of these results in -Section~\ref{sec:sampling-implementation}. +Section~\ref{ssec:sampling-framework}. \subsubsection{Bounding Rejection Probability} @@ -392,8 +396,7 @@ the Bentley-Saxe method, however. In the theoretical literature on this topic, the solution to this problem is to periodically re-partition all of the records to re-align the block sizes~\cite{merge-dsp, saxe79}. This approach could also be easily applied here, if desired, though we -do not in our implementations, for reasons that will be discussed in -Section~\ref{sec:sampling-implementation}. +do not in our implementations. The process of removing these deleted records during reconstructions is different for the two mechanisms. Tagged deletes are straightforward, @@ -411,16 +414,16 @@ care with ordering semantics, tombstones and their associated records can be sorted into adjacent spots, allowing them to be efficiently dropped during reconstruction without any extra overhead. -While the dropping of deleted records during reconstruction helps, it is -not sufficient on its own to ensure a particular bound on the number of -deleted records within the structure. Pathological scenarios resulting in -unbounded rejection rates, even in the presence of this mitigation, are -possible. For example, tagging alone will never trigger reconstructions, -and so it would be possible to delete every single record within the -structure without triggering a reconstruction, or records could be deleted -in the reverse order that they were inserted using tombstones. In either -case, a passive system of dropping records naturally during reconstruction -is not sufficient. +While the dropping of deleted records during reconstruction helps, +it is not sufficient on its own to ensure a particular bound on the +number of deleted records within the structure. Pathological scenarios +resulting in unbounded rejection rates, even in the presence of this +mitigation, are possible. For example, tagging alone will never trigger +reconstructions, and so it would be possible to delete every single +record within the structure without triggering a reconstruction. Or, +when using tombstones, records could be deleted in the reverse order +that they were inserted. In either case, a passive system of dropping +records naturally during reconstruction is not sufficient. Fortunately, this passive system can be used as the basis for a system that does provide a bound. This is because it guarantees, @@ -490,6 +493,7 @@ be taken to obtain a sample set of size $k$. \subsection{Performance Tuning and Configuration} +\label{ssec:sampling-design-space} The final of the desiderata referenced earlier in this chapter for our dynamized sampling indices is having tunable performance. The base @@ -508,7 +512,7 @@ Though it has thus far gone unmentioned, some readers may have noted the astonishing similarity between decomposition-based dynamization techniques, and a data structure called the Log-structured Merge-tree. First proposed by O'Neil in the mid '90s\cite{oneil96}, -the LSM Tree was designed to optimize write throughout for external data +the LSM Tree was designed to optimize write throughput for external data structures. It accomplished this task by buffer inserted records in a small in-memory AVL Tree, and then flushing this buffer to disk when it filled up. The flush process itself would fully rebuild the on-disk @@ -518,22 +522,23 @@ layered, external structures, to reduce the cost of reconstruction. In more recent times, the LSM Tree has seen significant development and been used as the basis for key-value stores like RocksDB~\cite{dong21} -and LevelDB~\cite{leveldb}. This work has produced an incredibly large -and well explored parametrization of the reconstruction procedures of -LSM Trees, a good summary of which can be bound in this recent tutorial -paper~\cite{sarkar23}. Examples of this design space exploration include: -different ways to organize each "level" of the tree~\cite{dayan19, -dostoevsky, autumn}, different growth rates, buffering, sub-partitioning -of structures to allow finer-grained reconstruction~\cite{dayan22}, and -approaches for allocating resources to auxiliary structures attached to -the main ones for accelerating certain types of query~\cite{dayan18-1, -zhu21, monkey}. +and LevelDB~\cite{leveldb}. This work has produced an incredibly +large and well explored parametrization of the reconstruction +procedures of LSM Trees, a good summary of which can be bounded in +this recent tutorial paper~\cite{sarkar23}. Examples of this design +space exploration include: different ways to organize each "level" +of the tree~\cite{dayan19, dostoevsky, autumn}, different growth +rates, buffering, sub-partitioning of structures to allow finer-grained +reconstruction~\cite{dayan22}, and approaches for allocating resources to +auxiliary structures attached to the main ones for accelerating certain +types of query~\cite{dayan18-1, zhu21, monkey}. This work is discussed +in greater depth in Chapter~\ref{chap:related-work} Many of the elements within the LSM Tree design space are based upon the -specifics of the data structure itself, and are not generally applicable. -However, some of the higher-level concepts can be imported and applied in -the context of dynamization. Specifically, we have decided to import the -following four elements for use in our dynamization technique, +specifics of the data structure itself, and are not applicable to our +use case. However, some of the higher-level concepts can be imported and +applied in the context of dynamization. Specifically, we have decided to +import the following four elements for use in our dynamization technique, \begin{itemize} \item A small dynamic buffer into which new records are inserted \item A variable growth rate, called as \emph{scale factor} @@ -554,11 +559,11 @@ we are dynamizing may not exist. This introduces some query cost, as queries must be answered from these unsorted records as well, but in the case of sampling this isn't a serious problem. The implications of this will be discussed in Section~\ref{ssec:sampling-cost-funcs}. The -size of this buffer, $N_B$ is a user-specified constant, and all block -capacities are multiplied by it. In the Bentley-Saxe method, the $i$th -block contains $2^i$ records. In our scheme, with buffering, this becomes -$N_B \cdot 2^i$ records in the $i$th block. We call this unsorted array -the \emph{mutable buffer}. +size of this buffer, $N_B$ is a user-specified constant. Block capacities +are defined in terms of multiples of $N_B$, such that each buffer flush +corresponds to an insert in the traditioanl Bentley-Saxe method. Thus, +rather than the $i$th block containing $2^i$ records, it contains $N_B +\cdot 2^i$ records. We call this unsorted array the \emph{mutable buffer}. \Paragraph{Scale Factor.} In the Bentley-Saxe method, each block is twice as large as the block the precedes it There is, however, no reason @@ -593,19 +598,19 @@ we can build them over tombstones. This approach can greatly improve the sampling performance of the structure when tombstone deletes are used. \Paragraph{Layout Policy.} The Bentley-Saxe method considers blocks -individually, without any other organization beyond increasing size. In -contrast, LSM Trees have multiple layers of structural organization. The -top level structure is a level, upon which record capacity restrictions -are applied. These levels are then partitioned into individual structures, -which can be further organized by key range. Because our intention is to -support general data structures, which may or may not be easily partition -by a key, we will not consider the finest grain of partitioning. However, -we can borrow the concept of levels, and lay out shards in these levels -according to different strategies. +individually, without any other organization beyond increasing +size. In contrast, LSM Trees have multiple layers of structural +organization. Record capacity restrictions are enforced on structures +called \emph{levels}, which are partitioned into individual data +structures, and then further organized into non-overlapping key ranges. +Because our intention is to support general data structures, which may +or may not be easily partitioned by a key, we will not consider the finest +grain of partitioning. However, we can borrow the concept of levels, +and lay out shards in these levels according to different strategies. Specifically, we consider two layout policies. First, we can allow a single shard per level, a policy called \emph{Leveling}. This approach -is traditionally read optimized, as it generally results in fewer shards +is traditionally read-optimized, as it generally results in fewer shards within the overall structure for a given scale factor. Under leveling, the $i$th level has a capacity of $N_B \cdot s^{i+1}$ records. We can also allow multiple shards per level, resulting in a write-optimized @@ -628,12 +633,10 @@ The requirements that the framework places upon SSIs are rather modest. The sampling problem being considered must be a decomposable sampling problem (Definition \ref{def:decomp-sampling}) and the SSI must support the \texttt{build} and \texttt{unbuild} operations. Optionally, -if the SSI supports point lookups or if the SSI can be constructed -from multiple instances of the SSI more efficiently than its normal -static construction, these two operations can be leveraged by the -framework. However, these are not requirements, as the framework provides -facilities to work around their absence. - +if the SSI supports point lookups or if the SSI is merge decomposable, +then these two operations can be leveraged by the framework. However, +these are not requirements, as the framework provides facilities to work +around their absence. \captionsetup[subfloat]{justification=centering} \begin{figure*} @@ -669,6 +672,7 @@ these delete mechanisms, each record contains an attached header with bits to indicate its tombstone or delete status. \subsection{Supported Operations and Cost Functions} +\label{ssec:sampling-cost-funcs} \Paragraph{Insert.} Inserting a record into the dynamization involves appending it to the mutable buffer, which requires $\Theta(1)$ time. When the buffer reaches its capacity, it must be flushed into the structure |