diff options
| author | Douglas Rumbaugh <dbr4@psu.edu> | 2025-04-27 17:36:57 -0400 |
|---|---|---|
| committer | Douglas Rumbaugh <dbr4@psu.edu> | 2025-04-27 17:36:57 -0400 |
| commit | 5e4ad2777acc4c2420514e39fb98b7cf2e200996 (patch) | |
| tree | 276c075048e85426436db8babf0ca1f37e9fdba2 /chapters/sigmod23 | |
| download | dissertation-5e4ad2777acc4c2420514e39fb98b7cf2e200996.tar.gz | |
Initial commit
Diffstat (limited to 'chapters/sigmod23')
| -rw-r--r-- | chapters/sigmod23/abstract.tex | 29 | ||||
| -rw-r--r-- | chapters/sigmod23/background.tex | 182 | ||||
| -rw-r--r-- | chapters/sigmod23/conclusion.tex | 17 | ||||
| -rw-r--r-- | chapters/sigmod23/examples.tex | 143 | ||||
| -rw-r--r-- | chapters/sigmod23/exp-baseline.tex | 98 | ||||
| -rw-r--r-- | chapters/sigmod23/exp-extensions.tex | 40 | ||||
| -rw-r--r-- | chapters/sigmod23/exp-parameter-space.tex | 105 | ||||
| -rw-r--r-- | chapters/sigmod23/experiment.tex | 48 | ||||
| -rw-r--r-- | chapters/sigmod23/extensions.tex | 57 | ||||
| -rw-r--r-- | chapters/sigmod23/framework.tex | 573 | ||||
| -rw-r--r-- | chapters/sigmod23/introduction.tex | 20 | ||||
| -rw-r--r-- | chapters/sigmod23/relatedwork.tex | 33 |
12 files changed, 1345 insertions, 0 deletions
diff --git a/chapters/sigmod23/abstract.tex b/chapters/sigmod23/abstract.tex new file mode 100644 index 0000000..3ff0c08 --- /dev/null +++ b/chapters/sigmod23/abstract.tex @@ -0,0 +1,29 @@ +\begin{abstract}
+
+ The execution of analytical queries on massive datasets presents challenges
+ due to long response times and high computational costs. As a result, the
+ analysis of representative samples of data has emerged as an attractive
+ alternative; this avoids the cost of processing queries against the entire
+ dataset, while still producing statistically valid results. Unfortunately,
+ the sampling techniques in common use sacrifice either sample quality or
+ performance, and so are poorly suited for this task. However, it is
+ possible to build high quality sample sets efficiently with the assistance
+ of indexes. This introduces a new challenge: real-world data is subject to
+ continuous update, and so the indexes must be kept up to date. This is
+ difficult, because existing sampling indexes present a dichotomy; efficient
+ sampling indexes are difficult to update, while easily updatable indexes
+ have poor sampling performance. This paper seeks to address this gap by
+ proposing a general and practical framework for extending most sampling
+ indexes with efficient update support, based on splitting indexes into
+ smaller shards, combined with a systematic approach to the periodic
+ reconstruction. The framework's design space is examined, with an eye
+ towards exploring trade-offs between update performance, sampling
+ performance, and memory usage. Three existing static sampling indexes are
+ extended using this framework to support updates, and the generalization of
+ the framework to concurrent operations and larger-than-memory data is
+ discussed. Through a comprehensive suite of benchmarks, the extended
+ indexes are shown to match or exceed the update throughput of
+ state-of-the-art dynamic baselines, while presenting significant
+ improvements in sampling latency.
+
+\end{abstract}
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex new file mode 100644 index 0000000..58324bd --- /dev/null +++ b/chapters/sigmod23/background.tex @@ -0,0 +1,182 @@ +\section{Background} +\label{sec:background} + +This section formalizes the sampling problem and describes relevant existing +solutions. Before discussing these topics, though, a clarification of +definition is in order. The nomenclature used to describe sampling varies +slightly throughout the literature. In this chapter, the term \emph{sample} is +used to indicate a single record selected by a sampling operation, and a +collection of these samples is called a \emph{sample set}; the number of +samples within a sample set is the \emph{sample size}. The term \emph{sampling} +is used to indicate the selection of either a single sample or a sample set; +the specific usage should be clear from context. + + +\Paragraph{Independent Sampling Problem.} When conducting sampling, it is often +desirable for the drawn samples to have \emph{statistical independence}. This +requires that the sampling of a record does not affect the probability of any +other record being sampled in the future. Independence is a requirement for the +application of statistical tools such as the Central Limit +Theorem~\cite{bulmer79}, which is the basis for many concentration bounds. +A failure to maintain independence in sampling invalidates any guarantees +provided by these statistical methods. + +In each of the problems considered, sampling can be performed either with +replacement (WR) or without replacement (WoR). It is possible to answer any WoR +sampling query using a constant number of WR queries, followed by a +deduplication step~\cite{hu15}, and so this chapter focuses exclusively on WR +sampling. + +A basic version of the independent sampling problem is \emph{weighted set +sampling} (WSS),\footnote{ + This nomenclature is adopted from Tao's recent survey of sampling + techniques~\cite{tao22}. This problem is also called + \emph{weighted random sampling} (WRS) in the literature. +} +in which each record is associated with a weight that determines its +probability of being sampled. More formally, WSS is defined +as: +\begin{definition}[Weighted Set Sampling~\cite{walker74}] + Let $D$ be a set of data whose members are associated with positive + weights $w: D \to \mathbb{R}^+$. Given an integer $k \geq 1$, a weighted + set sampling query returns $k$ independent random samples from $D$ with + each data point $d \in D$ having a probability of $\frac{w(d)}{\sum_{p\in + D}w(p)}$ of being sampled. +\end{definition} +Each query returns a sample set of size $k$, rather than a +single sample. Queries returning sample sets are the common case, because the +robustness of analysis relies on having a sufficiently large sample +size~\cite{ben-eliezer20}. The common \emph{simple random sampling} (SRS) +problem is a special case of WSS, where every element has unit weight. + +In the context of databases, it is also common to discuss a more general +version of the sampling problem, called \emph{independent query sampling} +(IQS)~\cite{hu14}. An IQS query samples a specified number of records from the +result set of a database query. In this context, it is insufficient to merely +ensure individual records are sampled independently; the sample sets returned +by repeated IQS queries must be independent as well. This provides a variety of +useful properties, such as fairness and representativeness of query +results~\cite{tao22}. As a concrete example, consider simple random sampling on +the result set of a single-dimensional range reporting query. This is +called independent range sampling (IRS), and is formally defined as: + +\begin{definition}[Independent Range Sampling~\cite{tao22}] + Let $D$ be a set of $n$ points in $\mathbb{R}$. Given a query + interval $q = [x, y]$ and an integer $k$, an independent range sampling + query returns $k$ independent samples from $D \cap q$ with each + point having equal probability of being sampled. +\end{definition} +A generalization of IRS exists, called \emph{Weighted Independent Range +Sampling} (WIRS)~\cite{afshani17}, which is similar to WSS. Each point in $D$ +is associated with a positive weight $w: D \to \mathbb{R}^+$, and samples are +drawn from the range query results $D \cap q$ such that each data point has a +probability of $\nicefrac{w(d)}{\sum_{p \in D \cap q}w(p)}$ of being sampled. + + +\Paragraph{Existing Solutions.} While many sampling techniques exist, +few are supported in practical database systems. The existing +\texttt{TABLESAMPLE} operator provided by SQL in all major DBMS +implementations~\cite{postgres-doc} requires either a linear scan (e.g., +Bernoulli sampling) that results in high sample retrieval costs, or relaxed +statistical guarantees (e.g., block sampling~\cite{postgres-doc} used in +PostgreSQL). + +Index-assisted sampling solutions have been studied +extensively. Olken's method~\cite{olken89} is a classical solution to +independent sampling problems. This algorithm operates upon traditional search +trees, such as the B+tree used commonly as a database index. It conducts a +random walk on the tree uniformly from the root to a leaf, resulting in a +$O(\log n)$ sampling cost for each returned record. Should weighted samples be +desired, rejection sampling can be performed. A sampled record, $r$, is +accepted with probability $\nicefrac{w(r)}{w_{max}}$, with an expected +number of $\nicefrac{w_{max}}{w_{avg}}$ samples to be taken per element in the +sample set. Olken's method can also be extended to support general IQS by +rejecting all sampled records failing to satisfy the query predicate. It can be +accelerated by adding aggregated weight tags to internal +nodes~\cite{olken-thesis,zhao22}, allowing rejection sampling to be performed +during the tree-traversal to abort dead-end traversals early. + +\begin{figure} + \centering + \includegraphics[width=.5\textwidth]{img/sigmod23/alias.pdf} + \caption{\textbf{A pictorial representation of an alias + structure}, built over a set of weighted records. Sampling is performed by + first (1) selecting a cell by uniformly generating an integer index on + $[0,n)$, and then (2) selecting an item by generating a + second uniform float on $[0,1]$ and comparing it to the cell's normalized + cutoff values. In this example, the first random number is $0$, + corresponding to the first cell, and the second is $.7$. This is larger + than $\nicefrac{.15}{.25}$, and so $3$ is selected as the result of the + query. + This allows $O(1)$ independent weighted set sampling, but adding a new + element requires a weight adjustment to every element in the structure, and + so isn't generally possible without performing a full reconstruction.} + \label{fig:alias} + +\end{figure} + +There also exist static data structures, referred to in this chapter as static +sampling indexes (SSIs)\footnote{ +The name SSI was established in the published version of this paper prior to the +realization that a distinction between the terms index and data structure would +be useful. We'll continue to use the term SSI for the remainder of this chapter, +to maintain consistency with the published work, but technically an SSI refers to + a data structure, not an index, in the nomenclature established in the previous + chapter. + }, that are capable of answering sampling queries in +near-constant time\footnote{ + The designation +``near-constant'' is \emph{not} used in the technical sense of being constant +to within a polylogarithmic factor (i.e., $\tilde{O}(1)$). It is instead used to mean +constant to within an additive polylogarithmic term, i.e., $f(x) \in O(\log n + +1)$. +%For example, drawing $k$ samples from $n$ records using a near-constant +%approach would require $O(\log n + k)$ time. This is in contrast to a +%tree-traversal approach, which would require $O(k\log n)$ time. +} relative to the size of the dataset. An example of such a +structure is used in Walker's alias method \cite{walker74,vose91}, a technique +for answering WSS queries with $O(1)$ query cost per sample, but requiring +$O(n)$ time to construct. It distributes the weight of items across $n$ cells, +where each cell is partitioned into at most two items, such that the total +proportion of each cell assigned to an item is its total weight. A query +selects one cell uniformly at random, then chooses one of the two items in the +cell by weight; thus, selecting items with probability proportional to their +weight in $O(1)$ time. A pictorial representation of this structure is shown in +Figure~\ref{fig:alias}. + +The alias method can also be used as the basis for creating SSIs capable of +answering general IQS queries using a technique called alias +augmentation~\cite{tao22}. As a concrete example, previous +papers~\cite{afshani17,tao22} have proposed solutions for WIRS queries using $O(\log n ++ k)$ time, where the $\log n$ cost is only be paid only once per query, after which +elements can be sampled in constant time. This structure is built by breaking +the data up into disjoint chunks of size $\nicefrac{n}{\log n}$, called +\emph{fat points}, each with an alias structure. A B+tree is then constructed, +using the fat points as its leaf nodes. The internal nodes are augmented with +an alias structure over the total weight of each child. This alias structure +is used instead of rejection sampling to determine the traversal path to take +through the tree, and then the alias structure of the fat point is used to +sample a record. Because rejection sampling is not used during the traversal, +two traversals suffice to establish the valid range of records for sampling, +after which samples can be collected without requiring per-sample traversals. +More examples of alias augmentation applied to different IQS problems can be +found in a recent survey by Tao~\cite{tao22}. + +There do exist specialized sampling indexes~\cite{hu14} with both efficient +sampling and support for updates, but these are restricted to specific query +types and are often very complex structures, with poor constant factors +associated with sampling and update costs, and so are of limited practical +utility. There has also been work~\cite{hagerup93,matias03,allendorf23} on +extending the alias structure to support weight updates over a fixed set of +elements. However, these solutions do not allow insertion or deletion in the +underlying dataset, and so are not well suited to database sampling +applications. + +\Paragraph{The Dichotomy.} Among these techniques, there exists a +clear trade-off between efficient sampling and support for updates. Tree-traversal +based sampling solutions pay a dataset size based cost per sample, in exchange for +update support. The static solutions lack support for updates, but support +near-constant time sampling. While some data structures exist with support for +both, these are restricted to highly specialized query types. Thus in the +general case there exists a dichotomy: existing sampling indexes can support +either data updates or efficient sampling, but not both. diff --git a/chapters/sigmod23/conclusion.tex b/chapters/sigmod23/conclusion.tex new file mode 100644 index 0000000..de6bffc --- /dev/null +++ b/chapters/sigmod23/conclusion.tex @@ -0,0 +1,17 @@ +\section{Conclusion}
+\label{sec:conclusion}
+
+This chapter discussed the creation of a framework for the dynamic extension of
+static indexes designed for various sampling problems. Specifically, extensions
+were created for the alias structure (WSS), the in-memory ISAM tree (IRS), and
+the alias-augmented B+tree (WIRS). In each case, the SSIs were extended
+successfully with support for updates and deletes, without compromising their
+sampling performance advantage relative to existing dynamic baselines. This was
+accomplished by leveraging ideas borrowed from the Bentley-Saxe method and the
+design space of the LSM tree to divide the static index into multiple shards,
+which could be individually reconstructed in a systematic fashion to
+accommodate new data. This framework provides a large design space for trading
+between update performance, sampling performance, and memory usage, which was
+explored experimentally. The resulting extended indexes were shown to approach
+or match the insertion performance of the B+tree, while simultaneously
+performing significantly faster in sampling operations under most situations.
diff --git a/chapters/sigmod23/examples.tex b/chapters/sigmod23/examples.tex new file mode 100644 index 0000000..cdbc398 --- /dev/null +++ b/chapters/sigmod23/examples.tex @@ -0,0 +1,143 @@ +\section{Framework Instantiations} +\label{sec:instance} +In this section, the framework is applied to three sampling problems and their +associated SSIs. All three sampling problems draw random samples from records +satisfying a simple predicate, and so result sets for all three can be +constructed by directly merging the result sets of the queries executed against +individual shards, the primary requirement for the application of the +framework. The SSIs used for each problem are discussed, including their +support of the remaining two optional requirements for framework application. + +\subsection{Dynamically Extended WSS Structure} +\label{ssec:wss-struct} +As a first example of applying this framework for dynamic extension, +the alias structure for answering WSS queries is considered. This is a +static structure that can be constructed in $O(n)$ time and supports WSS +queries in $O(1)$ time. The alias structure will be used as the SSI, with +the shards containing an alias structure paired with a sorted array of +records. { The use of sorted arrays for storing the records +allows for more efficient point-lookups, without requiring any additional +space. The total weight associated with a query for +a given alias structure is the total weight of all of its records, +and can be tracked at the shard level and retrieved in constant time. } + +Using the formulae from Section~\ref{sec:framework}, the worst-case +costs of insertion, sampling, and deletion are easily derived. The +initial construction cost from the buffer is $C_c(N_b) \in O(N_b +\log N_b)$, requiring the sorting of the buffer followed by alias +construction. After this point, the shards can be reconstructed in +linear time while maintaining sorted order. Thus, the reconstruction +cost is $C_r(n) \in O(n)$. As each shard contains a sorted array, +the point-lookup cost is $L(n) \in O(\log n)$. The total weight can +be tracked with the shard, requiring $W(n) \in O(1)$ time to access, +and there is no necessary preprocessing, so $P(n) \in O(1)$. Samples +can be drawn in $S(n) \in O(1)$ time. Plugging these results into the +formulae for insertion, sampling, and deletion costs gives, + +\begin{align*} + \text{Insertion:} \quad &O\left(\log_s n\right) \\ + \text{Sampling:} \quad &O\left(\log_s n + \frac{k}{1 - \delta}\cdot R(n)\right) \\ + \text{Tagged Delete:} \quad &O\left(\log_s n \log n\right) +\end{align*} +where $R(n) \in O(1)$ for tagging and $R(n) \in O(\log_s n \log n)$ for +tombstones. + +\Paragraph{Bounding Rejection Rate.} In the weighted sampling case, +the framework's generic record-based compaction trigger mechanism +is insufficient to bound the rejection rate. This is because the +probability of a given record being sampling is dependent upon its +weight, as well as the number of records in the index. If a highly +weighted record is deleted, it will be preferentially sampled, resulting +in a larger number of rejections than would be expected based on record +counts alone. This problem can be rectified using the framework's user-specified +compaction trigger mechanism. +In addition to +tracking record counts, each level also tracks its rejection rate, +\begin{equation*} +\rho_i = \frac{\text{rejections}}{\text{sampling attempts}} +\end{equation*} +A configurable rejection rate cap, $\rho$, is then defined. If $\rho_i +> \rho$ on a level, a compaction is triggered. In the case +the tombstone delete policy, it is not the level containing the sampled +record, but rather the level containing its tombstone, that is considered +the source of the rejection. This is necessary to ensure that the tombstone +is moved closer to canceling its associated record by the compaction. + +\subsection{Dynamically Extended IRS Structure} +\label{ssec:irs-struct} +Another sampling problem to which the framework can be applied is +independent range sampling (IRS). The SSI in this example is the in-memory +ISAM tree. The ISAM tree supports efficient point-lookups + directly, and the total weight of an IRS query can be +easily obtained by counting the number of records within the query range, +which is determined as part of the preprocessing of the query. + +The static nature of shards in the framework allows for an ISAM tree +to be constructed with adjacent nodes positioned contiguously in memory. +By selecting a leaf node size that is a multiple of the record size, and +avoiding placing any headers within leaf nodes, the set of leaf nodes can +be treated as a sorted array of records with direct indexing, and the +internal nodes allow for faster searching of this array. +Because of this layout, per-sample tree-traversals are avoided. The +start and end of the range from which to sample can be determined using +a pair of traversals, and then records can be sampled from this range +using random number generation and array indexing. + +Assuming a sorted set of input records, the ISAM tree can be bulk-loaded +in linear time. The insertion analysis proceeds like the WSS example +previously discussed. The initial construction cost is $C_c(N_b) \in +O(N_b \log N_b)$ and reconstruction cost is $C_r(n) \in O(n)$. The ISAM +tree supports point-lookups in $L(n) \in O(\log_f n)$ time, where $f$ +is the fanout of the tree. + +The process for performing range sampling against the ISAM tree involves +two stages. First, the tree is traversed twice: once to establish the index of +the first record greater than or equal to the lower bound of the query, +and again to find the index of the last record less than or equal to the +upper bound of the query. This process has the effect of providing the +number of records within the query range, and can be used to determine +the weight of the shard in the shard alias structure. Its cost is $P(n) +\in O(\log_f n)$. Once the bounds are established, samples can be drawn +by randomly generating uniform integers between the upper and lower bound, +in $S(n) \in O(1)$ time each. + +This results in the extended version of the ISAM tree having the following +insert, sampling, and delete costs, +\begin{align*} + \text{Insertion:} \quad &O\left(\log_s n\right) \\ + \text{Sampling:} \quad &O\left(\log_s n \log_f n + \frac{k}{1 - \delta}\cdot R(n)\right) \\ + \text{Tagged Delete:} \quad &O\left(\log_s n \log_f n\right) +\end{align*} +where $R(n) \in O(1)$ for tagging and $R(n) \in O(\log_s n \log_f n)$ for +tombstones. + + +\subsection{Dynamically Extended WIRS Structure} +\label{ssec:wirs-struct} +As a final example of applying this framework, the WIRS problem will be +considered. Specifically, the alias-augmented B+tree approach, described +by Tao \cite{tao22}, generalizing work by Afshani and Wei \cite{afshani17}, +and Hu et al. \cite{hu14}, will be extended. +This structure allows for efficient point-lookups, as +it is based on the B+tree, and the total weight of a given WIRS query can +be calculated given the query range using aggregate weight tags within +the tree. + +The alias-augmented B+tree is a static structure of linear space, capable +of being built initially in $C_c(N_b) \in O(N_b \log N_b)$ time, being +bulk-loaded from sorted lists of records in $C_r(n) \in O(n)$ time, +and answering WIRS queries in $O(\log_f n + k)$ time, where the query +cost consists of preliminary work to identify the sampling range +and calculate the total weight, with $P(n) \in O(\log_f n)$ cost, and +constant-time drawing of samples from that range with $S(n) \in O(1)$. +This results in the following costs, +\begin{align*} + \text{Insertion:} \quad &O\left(\log_s n\right) \\ + \text{Sampling:} \quad &O\left(\log_s n \log_f n + \frac{k}{1 - \delta} \cdot R(n)\right) \\ + \text{Tagged Delete:} \quad &O\left(\log_s n \log_f n\right) +\end{align*} +where $R(n) \in O(1)$ for tagging and $R(n) \in O(\log_s n \log_f n)$ for +tombstones. Because this is a weighted sampling structure, the custom +compaction trigger discussed in in Section~\ref{ssec:wss-struct} is applied +to maintain bounded rejection rates during sampling. + diff --git a/chapters/sigmod23/exp-baseline.tex b/chapters/sigmod23/exp-baseline.tex new file mode 100644 index 0000000..9e7929c --- /dev/null +++ b/chapters/sigmod23/exp-baseline.tex @@ -0,0 +1,98 @@ +\subsection{Comparison to Baselines} + +Next, the performance of indexes extended using the framework is compared +against tree sampling on the aggregate B+tree, as well as problem-specific +SSIs for WSS, WIRS, and IRS queries. Unless otherwise specified, IRS and WIRS +queries were executed with a selectivity of $0.1\%$ and 500 million randomly +selected records from the OSM dataset were used. The uniform and zipfian +synthetic datasets were 1 billion records in size. All benchmarks warmed up the +data structure by inserting 10\% of the records, and then measured the +throughput inserting the remaining records, while deleting 5\% of them over the +course of the benchmark. Once all records were inserted, the sampling +performance was measured. The reported update throughputs were calculated using +both inserts and deletes, following the warmup period. + +\begin{figure*} + \centering + \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wss-insert} \label{fig:wss-insert}} + \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wss-sample} \label{fig:wss-sample}} \\ + \subfloat[Insertion Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-wss-insert} \label{fig:wss-insert-s}} + \subfloat[Sampling Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-wss-sample} \label{fig:wss-sample-s}} + \caption{Framework Comparisons to Baselines for WSS} +\end{figure*} + +Starting with WSS, Figure~\ref{fig:wss-insert} shows that the DE-WSS structure +is competitive with the AGG B+tree in terms of insertion performance, achieving +about 85\% of the AGG B+tree's insertion throughput on the Twitter dataset, and +beating it by similar margins on the other datasets. In terms of sampling +performance in Figure~\ref{fig:wss-sample}, it beats the B+tree handily, and +compares favorably to the static alias structure. Figures~\ref{fig:wss-insert-s} +and \ref{fig:wss-sample-s} show the performance scaling of the three structures as +the dataset size increases. All of the structures exhibit the same type of +performance degradation with respect to dataset size. + +\begin{figure*} + \centering + \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wirs-insert} \label{fig:wirs-insert}} + \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wirs-sample} \label{fig:wirs-sample}} + \caption{Framework Comparison to Baselines for WIRS} +\end{figure*} + +Figures~\ref{fig:wirs-insert} and \ref{fig:wirs-sample} show the performance of +the DE-WIRS index, relative to the AGG B+tree and the alias-augmented B+tree. This +example shows the same pattern of behavior as was seen with DE-WSS, though the +margin between the DE-WIRS and its corresponding SSI is much narrower. +Additionally, the constant factors associated with the construction cost of the +alias-augmented B+tree are much larger than the alias structure. The loss of +insertion performance due to this is seen clearly in Figure~\ref{fig:wirs-insert}, where +the margin of advantage between DE-WIRS and the AGG B+tree in insertion +throughput shrinks compared to the DE-WSS index, and the AGG B+tree's advantage +on the Twitter dataset is expanded. + +\begin{figure*} + \subfloat[Insertion Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-insert} \label{fig:irs-insert-s}} + \subfloat[Sampling Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-sample} \label{fig:irs-sample-s}} \\ + + \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-insert} \label{fig:irs-insert1}} + \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-sample} \label{fig:irs-sample1}} \\ + + \subfloat[Delete Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-delete} \label{fig:irs-delete}} + \subfloat[Sampling Latency vs. Sample Size]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-samplesize} \label{fig:irs-samplesize}} + \caption{Framework Comparison to Baselines for IRS} + +\end{figure*} +Finally, Figures~\ref{fig:irs-insert1} and \ref{fig:irs-sample1} show a +comparison of the in-memory DE-IRS index against the in-memory ISAM tree and the AGG +B+tree for answering IRS queries. The cost of bulk-loading the ISAM tree is less +than the cost of building the alias structure, or the alias-augmented B+tree, and +so here DE-IRS defeats the AGG B+tree by wider margins in insertion throughput, +though the margin narrows significantly in terms of sampling performance +advantage. + +DE-IRS was further tested to evaluate scalability. +Figure~\ref{fig:irs-insert-s} shows average insertion throughput, +Figure~\ref{fig:irs-delete} shows average delete latency (under tagging), and +Figure~\ref{fig:irs-sample-s} shows average sampling latencies for DE-IRS and +AGG B+tree over a range of data sizes. In all cases, DE-IRS and B+tree show +similar patterns of performance degradation as the datasize grows. Note that +the delete latencies of DE-IRS are worse than AGG B+tree, because of the B+tree's +cheaper point-lookups. + +Figure~\ref{fig:irs-sample-s} +also includes one other point of interest: the sampling performance of +DE-IRS \emph{improves} when the data size grows from one million to ten million +records. While at first glance the performance increase may appear paradoxical, +it actually demonstrates an important result concerning the effect of the +unsorted mutable buffer on index performance. At one million records, the +buffer constitutes approximately 1\% of the total data size; this results in +the buffer being sampled from with greater frequency (as it has more total +weight) than would be the case with larger data. The greater the frequency of +buffer sampling, the more rejections will occur, and the worse the sampling +performance will be. This illustrates the importance of keeping the buffer +small, even when a scan is not used for buffer sampling. Finally, +Figure~\ref{fig:irs-samplesize} shows the decreasing per-sample cost as the +number of records requested by a sampling query grows for DE-IRS, compared to +AGG B+tree. Note that DE-IRS benefits significantly more from batching samples +than AGG B+tree, and that the improvement is greatest up to $k=100$ samples per +query. + diff --git a/chapters/sigmod23/exp-extensions.tex b/chapters/sigmod23/exp-extensions.tex new file mode 100644 index 0000000..d929e92 --- /dev/null +++ b/chapters/sigmod23/exp-extensions.tex @@ -0,0 +1,40 @@ +\subsection{External and Concurrent Extensions} + +\begin{figure*}[h]% + \centering + \subfloat[External Insertion Throughput]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-ext-insert.pdf} \label{fig:ext-insert}} + \subfloat[External Sampling Latency]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-ext-sample.pdf} \label{fig:ext-sample}} \\ + + \subfloat[Concurrent Insert Latency vs. Throughput]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-cc-irs-scale} \label{fig:con-latency}} + \subfloat[Concurrent Insert Throughput vs. Thread Count]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-cc-irs-thread} \label{fig:con-tput}} + + \caption{External and Concurrent Extensions of DE-IRS} + \label{fig:irs-extensions} +\end{figure*} + +Proof of concept implementations of external and concurrent extensions were +also tested for IRS queries. Figures \ref{fig:ext-sample} and +\ref{fig:ext-insert} show the performance of the external DE-IRS sampling index +against AB-tree. DE-IRS was configured with 4 in-memory levels, using at most +350 MiB of memory in testing, including bloom filters. { +For DE-IRS, the \texttt{O\_DIRECT} flag was used to disable OS caching, and +CGroups were used to limit process memory to 1 GiB to simulate a memory +constrained environment. The AB-tree implementation tested +had a cache, which was configured with a memory budget of 64 GiB. This extra +memory was provided to be fair to AB-tree. Because it uses per-sample +tree-traversals, it is much more reliant on caching for good performance. DE-IRS was +tested without a caching layer.} The tests were performed with 4 billion (80 GiB) +{and 8 billion (162 GiB) uniform and zipfian +records}, and 2.6 billion (55 GiB) OSM records. DE-IRS outperformed the AB-tree +by over an order of magnitude in both insertion and sampling performance. + +Finally, Figures~\ref{fig:con-latency} and \ref{fig:con-tput} show the +multi-threaded insertion performance of the in-memory DE-IRS index with +concurrency support, compared to AB-tree running entirely in memory, using the +synthetic uniform dataset. Note that in Figure~\ref{fig:con-latency}, some of +the AB-tree results are cut off, due to having significantly lower throughput +and higher latency compared with the DE-IRS. Even without concurrent +merging, the framework shows linear scaling up to 4 threads of insertion, +before leveling off; throughput remains flat even up to 32 concurrent +insertion threads. An implementation with support for concurrent merging would +scale even better. diff --git a/chapters/sigmod23/exp-parameter-space.tex b/chapters/sigmod23/exp-parameter-space.tex new file mode 100644 index 0000000..d2057ac --- /dev/null +++ b/chapters/sigmod23/exp-parameter-space.tex @@ -0,0 +1,105 @@ +\subsection{Framework Design Space Exploration} +\label{ssec:ds-exp} + +The proposed framework brings with it a large design space, described in +Section~\ref{ssec:design-space}. First, this design space will be examined +using a standardized benchmark to measure the average insertion throughput and +sampling latency of DE-WSS at several points within this space. Tests were run +using a random selection of 500 million records from the OSM dataset, with the +index warmed up by the insertion of 10\% of the total records prior to +beginning any measurement. Over the course of the insertion period, 5\% of the +records were deleted, except for the tests in +Figures~\ref{fig:insert_delete_prop}, \ref{fig:sample_delete_prop}, and +\ref{fig:bloom}, in which 25\% of the records were deleted. Reported update +throughputs were calculated using both inserts and deletes, following the +warmup period. The standard values +used for parameters not being varied in a given test were $s = 6$, $N_b = +12000$, $k=1000$, and $\delta = 0.05$, with buffer rejection sampling. + +\begin{figure*} + \centering + \subfloat[Insertion Throughput vs. Mutable Buffer Capacity]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-mt-insert} \label{fig:insert_mt}} + \subfloat[Insertion Throughput vs. Scale Factor]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-sf-insert} \label{fig:insert_sf}} \\ + + \subfloat[Insertion Throughput vs.\\Max Delete Proportion]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-tp-insert} \label{fig:insert_delete_prop}} + \subfloat[Per 1000 Sampling Latency vs.\\Mutable Buffer Capacity]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-mt-sample} \label{fig:sample_mt}} \\ + + \caption{DE-WSS Design Space Exploration I} + \label{fig:parameter-sweeps1} +\end{figure*} + +The results of this testing are displayed in +Figures~\ref{fig:parameter-sweeps1},~\ref{fig:parameter-sweeps2},~and:wq~\ref{fig:parameter-sweeps3}. +The two largest contributors to differences in performance were the selection +of layout policy and of delete policy. Figures~\ref{fig:insert_mt} and +\ref{fig:insert_sf} show that the choice of layout policy plays a larger role +than delete policy in insertion performance, with tiering outperforming +leveling in both configurations. The situation is reversed in sampling +performance, seen in Figure~\ref{fig:sample_mt} and \ref{fig:sample_sf}, where +the performance difference between layout policies is far less than between +delete policies. + +The values used for the scale factor and buffer size have less influence than +layout and delete policy. Sampling performance is largely independent of them +over the ranges of values tested, as shown in Figures~\ref{fig:sample_mt} and +\ref{fig:sample_sf}. This isn't surprising, as these parameters adjust the +number of shards, which only contributes to shard alias construction time +during sampling and is is amortized over all samples taken in a query. The +buffer also contributes rejections, but the cost of a rejection is small and +the buffer constitutes only a small portion of the total weight, so these are +negligible. However, under tombstones there is an upward trend in latency with +buffer size, as delete checks occasionally require a full buffer scan. The +effect of buffer size on insertion is shown in Figure~\ref{fig:insert_mt}. +{ There is only a small improvement in insertion performance as the mutable +buffer grows. This is because a larger buffer results in fewer reconstructions, +but these reconstructions individually take longer, and so the net positive +effect is less than might be expected.} Finally, Figure~\ref{fig:insert_sf} +shows the effect of scale factor on insertion performance. As expected, tiering +performs better with higher scale factors, whereas the insertion performance of +leveling trails off as the scale factor is increased, due to write +amplification. + +\begin{figure*} + \centering + \subfloat[Per 1000 Sampling Latency vs. Scale Factor]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-sf-sample} \label{fig:sample_sf}} + \subfloat[Per 1000 Sampling Latency vs. Max Delete Proportion]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-tp-sample}\label{fig:sample_delete_prop}} \\ + \caption{DE-WSS Design Space Exploration II} + \label{fig:parameter-sweeps2} +\end{figure*} + +Figures~\ref{fig:insert_delete_prop} and \ref{fig:sample_delete_prop} show the +cost of maintaining $\delta$ with a base delete rate of 25\%. The low cost of +an in-memory sampling rejection results in only a slight upward trend in the +sampling latency as the number of deleted records increases. While compaction +is necessary to avoid pathological cases, there does not seem to be a +significant benefit to aggressive compaction thresholds. +Figure~\ref{fig:insert_delete_prop} shows the effect of compactions on insert +performance. There is little effect on performance under tagging, but there is +a clear negative performance trend associated with aggressive compaction when +using tombstones. Under tagging, a single compaction is guaranteed to remove +all deleted records on a level, whereas with tombstones a compaction can +cascade for multiple levels before the delete bound is satisfied, resulting in +a larger cost per incident. + +\begin{figure*} + \centering + \subfloat[Sampling Latency vs. Sample Size]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-samplesize} \label{fig:sample_k}} + \subfloat[Per 1000 Sampling Latency vs. Bloom Filter Memory]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-bloom}\label{fig:bloom}} \\ + \caption{DE-WSS Design Space Exploration III} + \label{fig:parameter-sweeps3} +\end{figure*} + +Figure~\ref{fig:bloom} demonstrates the trade-off between memory usage for +Bloom filters and sampling performance under tombstones. This test was run +using 25\% incoming deletes with no compaction, to maximize the number of +tombstones within the index as a worst-case scenario. As expected, allocating +more memory to Bloom filters, decreasing their false positive rates, +accelerates sampling. Finally, Figure~\ref{fig:sample_k} shows the relationship +between average per sample latency and the sample set size. It shows the effect +of amortizing the initial shard alias setup work across an increasing number of +samples, with $k=100$ as the point at which latency levels off. + +Based upon these results, a set of parameters was established for the extended +indexes, which is used in the next section for baseline comparisons. This +standard configuration uses tagging as the delete policy and tiering as the +layout policy, with $k=1000$, $N_b = 12000$, $\delta = 0.05$, and $s = 6$. diff --git a/chapters/sigmod23/experiment.tex b/chapters/sigmod23/experiment.tex new file mode 100644 index 0000000..75cf32e --- /dev/null +++ b/chapters/sigmod23/experiment.tex @@ -0,0 +1,48 @@ +\section{Evaluation} +\label{sec:experiment} + +\Paragraph{Experimental Setup.} All experiments were run under Ubuntu 20.04 LTS +on a dual-socket Intel Xeon Gold 6242R server with 384 GiB of physical memory +and 40 physical cores. External tests were run using a 4 TB WD Red SA500 SATA +SSD, rated for 95000 and 82000 IOPS for random reads and writes respectively. + +\Paragraph{Datasets.} Testing utilized a variety of synthetic and real-world +datasets. For all datasets used, the key was represented as a 64-bit integer, +the weight as a 64-bit integer, and the value as a 32-bit integer. Each record +also contained a 32-bit header. The weight was omitted from IRS testing. +Keys and weights were pulled from the dataset directly, and values were +generated separately and were unique for each record. The following datasets +were used, +\begin{itemize} +\item \textbf{Synthetic Uniform.} A non-weighted, synthetically generated list + of keys drawn from a uniform distribution. +\item \textbf{Synthetic Zipfian.} A non-weighted, synthetically generated list + of keys drawn from a Zipfian distribution with + a skew of $0.8$. +\item \textbf{Twitter~\cite{data-twitter,data-twitter1}.} $41$ million Twitter user ids, weighted by follower counts. +\item \textbf{Delicious~\cite{data-delicious}.} $33.7$ million URLs, represented using unique integers, + weighted by the number of associated tags. +\item \textbf{OSM~\cite{data-osm}.} $2.6$ billion geospatial coordinates for points + of interest, collected by OpenStreetMap. The latitude, converted + to a 64-bit integer, was used as the key and the number of + its associated semantic tags as the weight. +\end{itemize} +The synthetic datasets were not used for weighted experiments, as they do not +have weights. For unweighted experiments, the Twitter and Delicious datasets +were not used, as they have uninteresting key distributions. + +\Paragraph{Compared Methods.} In this section, indexes extended using the +framework are compared against existing dynamic baselines. Specifically, DE-WSS +(Section~\ref{ssec:wss-struct}), DE-IRS (Section~\ref{ssec:irs-struct}), and +DE-WIRS (Section~\ref{ssec:irs-struct}) are examined. In-memory extensions are +compared against the B+tree with aggregate weight tags on internal nodes (AGG +B+tree) \cite{olken95} and concurrent and external extensions are compared +against the AB-tree \cite{zhao22}. Sampling performance is also compared against +comparable static sampling indexes: the alias structure \cite{walker74} for WSS, +the in-memory ISAM tree for IRS, and the alias-augmented B+tree \cite{afshani17} +for WIRS. Note that all structures under test, with the exception of the +external DE-IRS and external AB-tree, were contained entirely within system +memory. All benchmarking code and data structures were implemented using C++17 +and compiled using gcc 11.3.0 at the \texttt{-O3} optimization level. The +extension framework itself, excluding the shard implementations and utility +headers, consisted of a header-only library of about 1200 SLOC. diff --git a/chapters/sigmod23/extensions.tex b/chapters/sigmod23/extensions.tex new file mode 100644 index 0000000..6c242e9 --- /dev/null +++ b/chapters/sigmod23/extensions.tex @@ -0,0 +1,57 @@ +\captionsetup[subfloat]{justification=centering} +\section{Extensions} +\label{sec:discussion} +In this section, various extensions of the framework are considered. +Specifically, the applicability of the framework to external or distributed +data structures is discussed, as well as the use of the framework to add +automatic support for concurrent updates and sampling to extended SSIs. + +\Paragraph{Larger-than-Memory Data.} This framework can be applied to external +static sampling structures with minimal modification. As a proof-of-concept, +the IRS structure was extended with support for shards containing external ISAM +trees. This structure supports storing a configurable number of shards in +memory, and the rest on disk, making it well suited for operating in +memory-constrained environments. The on-disk shards contain standard ISAM +trees, with $8\text{KiB}$ page-aligned nodes. The external version of the +index only supports tombstone-based deletes, as tagging would require random +writes. In principle a hybrid approach to deletes is possible, where a delete +first searches the in-memory data for the record to be deleted, tagging it if +found. If the record is not found, then a tombstone could be inserted. As the +data size grows, though, and the preponderance of data is found on disk, this +approach would largely revert to the standard tombstone approach in practice. +External settings make the framework even more attractive, in terms of +performance characteristics, due to the different cost model. In external data +structures, performance is typically measured in terms of the number of IO +operations, meaning that much of the overhead introduced by the framework for +tasks like querying the mutable buffer, building auxiliary structures, extra +random number generations due to the shard alias structure, and the like, +become far less significant. + +Because the framework maintains immutability of shards, it is also well suited for +use on top of distributed file-systems or with other distributed data +abstractions like RDDs in Apache Spark~\cite{rdd}. Each shard can be +encapsulated within an immutable file in HDFS or an RDD in Spark. A centralized +control node or driver program can manage the mutable buffer, flushing it into +a new file or RDD when it is full, merging with existing files or RDDs using +the same reconstruction scheme already discussed for the framework. This setup +allows for datasets exceeding the capacity of a single node to be supported. As +an example, XDB~\cite{li19} features an RDD-based distributed sampling +structure that could be supported by this framework. + +\Paragraph{Concurrency.} The immutability of the majority of the structures +within the index makes for a straightforward concurrency implementation. +Concurrency control on the buffer is made trivial by the fact it is a simple, +unsorted array. The rest of the structure is never updated (aside from possible +delete tagging), and so concurrency becomes a simple matter of delaying the +freeing of memory used by internal structures until all the threads accessing +them have exited, rather than immediately on merge completion. A very basic +concurrency implementation can be achieved by using the tombstone delete +policy, and a reference counting scheme to control the deletion of the shards +following reconstructions. Multiple insert buffers can be used to improve +insertion throughput, as this will allow inserts to proceed in parallel with +merges, ultimately allowing concurrency to scale up to the point of being +bottlenecked by memory bandwidth and available storage. This proof-of-concept +implementation is based on a simplified version of an approach proposed by +Golan-Gueta et al. for concurrent log-structured data stores +\cite{golan-gueta15}. + diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex new file mode 100644 index 0000000..32a32e1 --- /dev/null +++ b/chapters/sigmod23/framework.tex @@ -0,0 +1,573 @@ +\section{Dynamic Sampling Index Framework} +\label{sec:framework} + +This work is an attempt to design a solution to independent sampling +that achieves \emph{both} efficient updates and near-constant cost per +sample. As the goal is to tackle the problem in a generalized fashion, +rather than design problem-specific data structures for used as the basis +of an index, a framework is created that allows for already +existing static data structures to be used as the basis for a sampling +index, by automatically adding support for data updates using a modified +version of the Bentley-Saxe method. + +Unfortunately, Bentley-Saxe as described in Section~\ref{ssec:bsm} cannot be +directly applied to sampling problems. The concept of decomposability is not +cleanly applicable to sampling, because the distribution of records in the +result set, rather than the records themselves, must be matched following the +result merge. Efficiently controlling the distribution requires each sub-query +to access information external to the structure against which it is being +processed, a contingency unaccounted for by Bentley-Saxe. Further, the process +of reconstruction used in Bentley-Saxe provides poor worst-case complexity +bounds~\cite{saxe79}, and attempts to modify the procedure to provide better +worst-case performance are complex and have worse performance in the common +case~\cite{overmars81}. Despite these limitations, this chapter will argue that +the core principles of the Bentley-Saxe method can be profitably applied to +sampling indexes, once a system for controlling result set distributions and a +more effective reconstruction scheme have been devised. The solution to +the former will be discussed in Section~\ref{ssec:sample}. For the latter, +inspiration is drawn from the literature on the LSM tree. + +The LSM tree~\cite{oneil96} is a data structure proposed to optimize +write throughput in disk-based storage engines. It consists of a memory +table of bounded size, used to buffer recent changes, and a hierarchy +of external levels containing indexes of exponentially increasing +size. When the memory table has reached capacity, it is emptied into the +external levels. Random writes are avoided by treating the data within +the external levels as immutable; all writes go through the memory +table. This introduces write amplification but maximizes sequential +writes, which is important for maintaining high throughput in disk-based +systems. The LSM tree is associated with a broad and well studied design +space~\cite{dayan17,dayan18,dayan22,balmau19,dayan18-1} containing +trade-offs between three key performance metrics: read performance, write +performance, and auxiliary memory usage. The challenges +faced in reconstructing predominately in-memory indexes are quite + different from those which the LSM tree is intended +to address, having little to do with disk-based systems and sequential IO +operations. But, the LSM tree possesses a rich design space for managing +the periodic reconstruction of data structures in a manner that is both +more practical and more flexible than that of Bentley-Saxe. By borrowing +from this design space, this preexisting body of work can be leveraged, +and many of Bentley-Saxe's limitations addressed. + +\captionsetup[subfloat]{justification=centering} + +\begin{figure*} + \centering + \subfloat[Leveling]{\includegraphics[width=.75\textwidth]{img/sigmod23/merge-leveling} \label{fig:leveling}}\\ + \subfloat[Tiering]{\includegraphics[width=.75\textwidth]{img/sigmod23/merge-tiering} \label{fig:tiering}} + + \caption{\textbf{A graphical overview of the sampling framework and its insert procedure.} A + mutable buffer (MB) sits atop two levels (L0, L1) containing shards (pairs + of SSIs and auxiliary structures [A]) using the leveling + (Figure~\ref{fig:leveling}) and tiering (Figure~\ref{fig:tiering}) layout + policies. Records are represented as black/colored squares, and grey + squares represent unused capacity. An insertion requiring a multi-level + reconstruction is illustrated.} \label{fig:framework} + +\end{figure*} + + +\subsection{Framework Overview} +The goal of this chapter is to build a general framework that extends most SSIs +with efficient support for updates by splitting the index into small data structures +to reduce reconstruction costs, and then distributing the sampling process over these +smaller structures. +The framework is designed to work efficiently with any SSI, so +long as it has the following properties, +\begin{enumerate} + \item The underlying full query $Q$ supported by the SSI from whose results + samples are drawn satisfies the following property: + for any dataset $D = \cup_{i = 1}^{n}D_i$ + where $D_i \cap D_j = \emptyset$, $Q(D) = \cup_{i = 1}^{n}Q(D_i)$. + \item \emph{(Optional)} The SSI supports efficient point-lookups. + \item \emph{(Optional)} The SSI is capable of efficiently reporting the total weight of all records + returned by the underlying full query. +\end{enumerate} + +The first property applies to the query being sampled from, and is essential +for the correctness of sample sets reported by extended sampling +indexes.\footnote{ This condition is stricter than the definition of a +decomposable search problem in the Bentley-Saxe method, which allows for +\emph{any} constant-time merge operation, not just union. +However, this condition is satisfied by many common types of database +query, such as predicate-based filtering queries.} The latter two properties +are optional, but reduce deletion and sampling costs respectively. Should the +SSI fail to support point-lookups, an auxiliary hash table can be attached to +the data structures. +Should it fail to support query result weight reporting, rejection +sampling can be used in place of the more efficient scheme discussed in +Section~\ref{ssec:sample}. The analysis of this framework will generally +assume that all three conditions are satisfied. + +Given an SSI with these properties, a dynamic extension can be produced as +shown in Figure~\ref{fig:framework}. The extended index consists of disjoint +shards containing an instance of the SSI being extended, and optional auxiliary +data structures. The auxiliary structures allow acceleration of certain +operations that are required by the framework, but which the SSI being extended +does not itself support efficiently. Examples of possible auxiliary structures +include hash tables, Bloom filters~\cite{bloom70}, and range +filters~\cite{zhang18,siqiang20}. The shards are arranged into levels of +increasing record capacity, with either one shard, or up to a fixed maximum +number of shards, per level. The decision to place one or many shards per level +is called the \emph{layout policy}. The policy names are borrowed from the +literature on the LSM tree, with the former called \emph{leveling} and the +latter called \emph{tiering}. + +To avoid a reconstruction on every insert, an unsorted array of fixed capacity +($N_b$), called the \emph{mutable buffer}, is used to buffer updates. Because it is +unsorted, it is kept small to maintain reasonably efficient sampling +and point-lookup performance. All updates are performed by appending new +records to the tail of this buffer. +If a record currently within the index is +to be updated to a new value, it must first be deleted, and then a record with +the new value inserted. This ensures that old versions of records are properly +filtered from query results. + +When the buffer is full, it is flushed to make room for new records. The +flushing procedure is based on the layout policy in use. When using leveling +(Figure~\ref{fig:leveling}) a new SSI is constructed using both the records in +$L_0$ and those in the buffer. This is used to create a new shard, which +replaces the one previously in $L_0$. When using tiering +(Figure~\ref{fig:tiering}) a new shard is built using only the records from the +buffer, and placed into $L_0$ without altering the existing shards. Each level +has a record capacity of $N_b \cdot s^{i+1}$, controlled by a configurable +parameter, $s$, called the scale factor. Records are organized in one large +shard under leveling, or in $s$ shards of $N_b \cdot s^i$ capacity each under +tiering. When a level reaches its capacity, it must be emptied to make room for +the records flushed into it. This is accomplished by moving its records down to +the next level of the index. Under leveling, this requires constructing a new +shard containing all records from both the source and target levels, and +placing this shard into the target, leaving the source empty. Under tiering, +the shards in the source level are combined into a single new shard that is +placed into the target level. Should the target be full, it is first emptied by +applying the same procedure. New empty levels +are dynamically added as necessary to accommodate these reconstructions. +Note that shard reconstructions are not necessarily performed using +merging, though merging can be used as an optimization of the reconstruction +procedure where such an algorithm exists. In general, reconstruction requires +only pooling the records of the shards being combined and then applying the SSI's +standard construction algorithm to this set of records. + +\begin{table}[t] +\caption{Frequently Used Notation} +\centering + +\begin{tabular}{|p{2.5cm} p{5cm}|} + \hline + \textbf{Variable} & \textbf{Description} \\ \hline + $N_b$ & Capacity of the mutable buffer \\ \hline + $s$ & Scale factor \\ \hline + $C_c(n)$ & SSI initial construction cost \\ \hline + $C_r(n)$ & SSI reconstruction cost \\ \hline + $L(n)$ & SSI point-lookup cost \\ \hline + $P(n)$ & SSI sampling pre-processing cost \\ \hline + $S(n)$ & SSI per-sample sampling cost \\ \hline + $W(n)$ & Shard weight determination cost \\ \hline + $R(n)$ & Shard rejection check cost \\ \hline + $\delta$ & Maximum delete proportion \\ \hline + %$\rho$ & Maximum rejection rate \\ \hline +\end{tabular} +\label{tab:nomen} + +\end{table} + +Table~\ref{tab:nomen} lists frequently used notation for the various parameters +of the framework, which will be used in the coming analysis of the costs and +trade-offs associated with operations within the framework's design space. The +remainder of this section will discuss the performance characteristics of +insertion into this structure (Section~\ref{ssec:insert}), how it can be used +to correctly answer sampling queries (Section~\ref{ssec:insert}), and efficient +approaches for supporting deletes (Section~\ref{ssec:delete}). Finally, it will +close with a detailed discussion of the trade-offs within the framework's +design space (Section~\ref{ssec:design-space}). + + +\subsection{Insertion} +\label{ssec:insert} +The framework supports inserting new records by first appending them to the end +of the mutable buffer. When it is full, the buffer is flushed into a sequence +of levels containing shards of increasing capacity, using a procedure +determined by the layout policy as discussed in Section~\ref{sec:framework}. +This method allows for the cost of repeated shard reconstruction to be +effectively amortized. + +Let the cost of constructing the SSI from an arbitrary set of $n$ records be +$C_c(n)$ and the cost of reconstructing the SSI given two or more shards +containing $n$ records in total be $C_r(n)$. The cost of an insert is composed +of three parts: appending to the mutable buffer, constructing a new +shard from the buffered records during a flush, and the total cost of +reconstructing shards containing the record over the lifetime of the index. The +cost of appending to the mutable buffer is constant, and the cost of constructing a +shard from the buffer can be amortized across the records participating in the +buffer flush, giving $\nicefrac{C_c(N_b)}{N_b}$. These costs are paid exactly once for +each record. To derive an expression for the cost of repeated reconstruction, +first note that each record will participate in at most $s$ reconstructions on +a given level, resulting in a worst-case amortized cost of $O\left(s\cdot +\nicefrac{C_r(n)}{n}\right)$ paid per level. The index itself will contain at most +$\log_s n$ levels. Thus, over the lifetime of the index a given record +will pay $O\left(s\cdot \nicefrac{C_r(n)}{n}\log_s n\right)$ cost in repeated +reconstruction. + +Combining these results, the total amortized insertion cost is +\begin{equation} +O\left(\frac{C_c(N_b)}{N_b} + s \cdot \frac{C_r(n)}{n} \log_s n\right) +\end{equation} +This can be simplified by noting that $s$ is constant, and that $N_b \ll n$ and also +a constant. By neglecting these terms, the amortized insertion cost of the +framework is, +\begin{equation} +O\left(\frac{C_r(n)}{n}\log_s n\right) +\end{equation} + + +\subsection{Sampling} +\label{ssec:sample} + +\begin{figure} + \centering + \includegraphics[width=\textwidth]{img/sigmod23/sampling} + \caption{\textbf{Overview of the multiple-shard sampling query process} for + Example~\ref{ex:sample} with $k=1000$. First, (1) the normalized weights of + the shards is determined, then (2) these weights are used to construct an + alias structure. Next, (3) the alias structure is queried $k$ times to + determine per shard sample sizes, and then (4) sampling is performed. + Finally, (5) any rejected samples are retried starting from the alias + structure, and the process is repeated until the desired number of samples + has been retrieved.} + \label{fig:sample} + +\end{figure} + +For many SSIs, sampling queries are completed in two stages. Some preliminary +processing is done to identify the range of records from which to sample, and then +samples are drawn from that range. For example, IRS over a sorted list of +records can be performed by first identifying the upper and lower bounds of the +query range in the list, and then sampling records by randomly generating +indexes within those bounds. The general cost of a sampling query can be +modeled as $P(n) + k S(n)$, where $P(n)$ is the cost of preprocessing, $k$ is +the number of samples drawn, and $S(n)$ is the cost of sampling a single +record. + +When sampling from multiple shards, the situation grows more complex. For each +sample, the shard to select the record from must first be decided. Consider an +arbitrary sampling query $X(D, k)$ asking for a sample set of size $k$ against +dataset $D$. The framework splits $D$ across $m$ disjoint shards, such that $D += \bigcup_{i=1}^m D_i$ and $D_i \cap D_j = \emptyset, \forall i,j < m$. The +framework must ensure that $X(D, k)$ and $\bigcup_{i=0}^m X(D_i, k_i)$ follow +the same distribution, by selecting appropriate values for the $k_i$s. If care +is not taken to balance the number of samples drawn from a shard with the total +weight of the shard under $X$, then bias can be introduced into the sample +set's distribution. The selection of $k_i$s can be viewed as an instance of WSS, +and solved using the alias method. + +When sampling using the framework, first the weight of each shard under the +sampling query is determined and a \emph{shard alias structure} built over +these weights. Then, for each sample, the shard alias is used to +determine the shard from which to draw the sample. Let $W(n)$ be the cost of +determining this total weight for a single shard under the query. The initial setup +cost, prior to drawing any samples, will be $O\left([W(n) + P(n)]\log_s +n\right)$, as the preliminary work for sampling from each shard must be +performed, as well as weights determined and alias structure constructed. In +many cases, however, the preliminary work will also determine the total weight, +and so the relevant operation need only be applied once to accomplish both +tasks. + +To ensure that all records appear in the sample set with the appropriate +probability, the mutable buffer itself must also be a valid target for +sampling. There are two generally applicable techniques that can be applied for +this, both of which can be supported by the framework. The query being sampled +from can be directly executed against the buffer and the result set used to +build a temporary SSI, which can be sampled from. Alternatively, rejection +sampling can be used to sample directly from the buffer, without executing the +query. In this case, the total weight of the buffer is used for its entry in +the shard alias structure. This can result in the buffer being +over-represented in the shard selection process, and so any rejections during +buffer sampling must be retried starting from shard selection. These same +considerations apply to rejection sampling used against shards, as well. + + +\begin{example} + \label{ex:sample} + Consider executing a WSS query, with $k=1000$, across three shards + containing integer keys with unit weight. $S_1$ contains only the + key $-2$, $S_2$ contains all integers on $[1,100]$, and $S_3$ + contains all integers on $[101, 200]$. These structures are shown + in Figure~\ref{fig:sample}. Sampling is performed by first + determining the normalized weights for each shard: $w_1 = 0.005$, + $w_2 = 0.4975$, $w_3 = 0.4975$, which are then used to construct a + shard alias structure. The shard alias structure is then queried + $k$ times, resulting in a distribution of $k_i$s that is + commensurate with the relative weights of each shard. Finally, + each shard is queried in turn to draw the appropriate number + of samples. +\end{example} + + +Assuming that rejection sampling is used on the mutable buffer, the worst-case +time complexity for drawing $k$ samples from an index containing $n$ elements +with a sampling cost of $S(n)$ is, +\begin{equation} + \label{eq:sample-cost} + O\left(\left[W(n) + P(n)\right]\log_s n + kS(n)\right) +\end{equation} + +%If instead a temporary SSI is constructed, the cost of sampling +%becomes: $O\left(N_b + C_c(N_b) + (W(n) + P(n))\log_s n + kS(n)\right)$. + +\begin{figure} + \centering + \subfloat[Tombstone Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tombstone} \label{fig:delete-tombstone}}\\ + \subfloat[Tagging Rejection Check]{\includegraphics[width=.75\textwidth]{img/sigmod23/delete-tagging} \label{fig:delete-tag}} + + \caption{\textbf{Overview of the rejection check procedure for deleted records.} First, + a record is sampled (1). + When using the tombstone delete policy + (Figure~\ref{fig:delete-tombstone}), the rejection check starts by (2) querying + the bloom filter of the mutable buffer. The filter indicates the record is + not present, so (3) the filter on $L_0$ is queried next. This filter + returns a false positive, so (4) a point-lookup is executed against $L_0$. + The lookup fails to find a tombstone, so the search continues and (5) the + filter on $L_1$ is checked, which reports that the tombstone is present. + This time, it is not a false positive, and so (6) a lookup against $L_1$ + (7) locates the tombstone. The record is thus rejected. When using the + tagging policy (Figure~\ref{fig:delete-tag}), (1) the record is sampled and + (2) checked directly for the delete tag. It is set, so the record is + immediately rejected.} + + \label{fig:delete} + +\end{figure} + + +\subsection{Deletion} +\label{ssec:delete} + +Because the shards are static, records cannot be arbitrarily removed from them. +This requires that deletes be supported in some other way, with the ultimate +goal being the prevention of deleted records' appearance in sampling query +result sets. This can be realized in two ways: locating the record and marking +it, or inserting a new record which indicates that an existing record should be +treated as deleted. The framework supports both of these techniques, the +selection of which is called the \emph{delete policy}. The former policy is +called \emph{tagging} and the latter \emph{tombstone}. + +Tagging a record is straightforward. Point-lookups are performed against each +shard in the index, as well as the buffer, for the record to be deleted. When +it is found, a bit in a header attached to the record is set. When sampling, +any records selected with this bit set are automatically rejected. Tombstones +represent a lazy strategy for deleting records. When a record is deleted using +tombstones, a new record with identical key and value, but with a ``tombstone'' +bit set, is inserted into the index. A record's presence can be checked by +performing a point-lookup. If a tombstone with the same key and value exists +above the record in the index, then it should be rejected when sampled. + +Two important aspects of performance are pertinent when discussing deletes: the +cost of the delete operation, and the cost of verifying the presence of a +sampled record. The choice of delete policy represents a trade-off between +these two costs. Beyond this simple trade-off, the delete policy also has other +implications that can affect its applicability to certain types of SSI. Most +notably, tombstones do not require any in-place updating of records, whereas +tagging does. This means that using tombstones is the only way to ensure total +immutability of the data within shards, which avoids random writes and eases +concurrency control. The tombstone delete policy, then, is particularly +appealing in external and concurrent contexts. + +\Paragraph{Deletion Cost.} The cost of a delete under the tombstone policy is +the same as an ordinary insert. Tagging, by contrast, requires a point-lookup +of the record to be deleted, and so is more expensive. Assuming a point-lookup +operation with cost $L(n)$, a tagged delete must search each level in the +index, as well as the buffer, requiring $O\left(N_b + L(n)\log_s n\right)$ +time. + +\Paragraph{Rejection Check Costs.} In addition to the cost of the delete +itself, the delete policy affects the cost of determining if a given record has +been deleted. This is called the \emph{rejection check cost}, $R(n)$. When +using tagging, the information necessary to make the rejection decision is +local to the sampled record, and so $R(n) \in O(1)$. However, when using tombstones +it is not; a point-lookup must be performed to search for a given record's +corresponding tombstone. This look-up must examine the buffer, and each shard +within the index. This results in a rejection check cost of $R(n) \in O\left(N_b + +L(n) \log_s n\right)$. The rejection check process for the two delete policies is +summarized in Figure~\ref{fig:delete}. + +Two factors contribute to the tombstone rejection check cost: the size of the +buffer, and the cost of performing a point-lookup against the shards. The +latter cost can be controlled using the framework's ability to associate +auxiliary structures with shards. For SSIs which do not support efficient +point-lookups, a hash table can be added to map key-value pairs to their +location within the SSI. This allows for constant-time rejection checks, even +in situations where the index would not otherwise support them. However, the +storage cost of this intervention is high, and in situations where the SSI does +support efficient point-lookups, it is not necessary. Further performance +improvements can be achieved by noting that the probability of a given record +having an associated tombstone in any particular shard is relatively small. +This means that many point-lookups will be executed against shards that do not +contain the tombstone being searched for. In this case, these unnecessary +lookups can be partially avoided using Bloom filters~\cite{bloom70} for +tombstones. By inserting tombstones into these filters during reconstruction, +point-lookups against some shards which do not contain the tombstone being +searched for can be bypassed. Filters can be attached to the buffer as well, +which may be even more significant due to the linear cost of scanning it. As +the goal is a reduction of rejection check costs, these filters need only be +populated with tombstones. In a later section, techniques for bounding the +number of tombstones on a given level are discussed, which will allow for the +memory usage of these filters to be tightly controlled while still ensuring +precise bounds on filter error. + +\Paragraph{Sampling with Deletes.} The addition of deletes to the framework +alters the analysis of sampling costs. A record that has been deleted cannot +be present in the sample set, and therefore the presence of each sampled record +must be verified. If a record has been deleted, it must be rejected. When +retrying samples rejected due to delete, the process must restart from shard +selection, as deleted records may be counted in the weight totals used to +construct that structure. This increases the cost of sampling to, +\begin{equation} +\label{eq:sampling-cost} + O\left([W(n) + P(n)]\log_s n + \frac{kS(n)}{1 - \mathbf{Pr}[\text{rejection}]} \cdot R(n)\right) +\end{equation} +where $R(n)$ is the cost of checking if a sampled record has been deleted, and +$\nicefrac{k}{1 -\mathbf{Pr}[\text{rejection}]}$ is the expected number of sampling +attempts required to obtain $k$ samples, given a fixed rejection probability. +The rejection probability itself is a function of the workload, and is +unbounded. + +\Paragraph{Bounding the Rejection Probability.} Rejections during sampling +constitute wasted memory accesses and random number generations, and so steps +should be taken to minimize their frequency. The probability of a rejection is +directly related to the number of deleted records, which is itself a function +of workload and dataset. This means that, without building counter-measures +into the framework, tight bounds on sampling performance cannot be provided in +the presence of deleted records. It is therefore critical that the framework +support some method for bounding the number of deleted records within the +index. + +While the static nature of shards prevents the direct removal of records at the +moment they are deleted, it doesn't prevent the removal of records during +reconstruction. When using tagging, all tagged records encountered during +reconstruction can be removed. When using tombstones, however, the removal +process is non-trivial. In principle, a rejection check could be performed for +each record encountered during reconstruction, but this would increase +reconstruction costs and introduce a new problem of tracking tombstones +associated with records that have been removed. Instead, a lazier approach can +be used: delaying removal until a tombstone and its associated record +participate in the same shard reconstruction. This delay allows both the record +and its tombstone to be removed at the same time, an approach called +\emph{tombstone cancellation}. In general, this can be implemented using an +extra linear scan of the input shards before reconstruction to identify +tombstones and associated records for cancellation, but potential optimizations +exist for many SSIs, allowing it to be performed during the reconstruction +itself at no extra cost. + +The removal of deleted records passively during reconstruction is not enough to +bound the number of deleted records within the index. It is not difficult to +envision pathological scenarios where deletes result in unbounded rejection +rates, even with this mitigation in place. However, the dropping of deleted +records does provide a useful property: any specific deleted record will +eventually be removed from the index after a finite number of reconstructions. +Using this fact, a bound on the number of deleted records can be enforced. A +new parameter, $\delta$, is defined, representing the maximum proportion of +deleted records within the index. Each level, and the buffer, tracks the number +of deleted records it contains by counting its tagged records or tombstones. +Following each buffer flush, the proportion of deleted records is checked +against $\delta$. If any level is found to exceed it, then a proactive +reconstruction is triggered, pushing its shards down into the next level. The +process is repeated until all levels respect the bound, allowing the number of +deleted records to be precisely controlled, which, by extension, bounds the +rejection rate. This process is called \emph{compaction}. + +Assuming every record is equally likely to be sampled, this new bound can be +applied to the analysis of sampling costs. The probability of a record being +rejected is $\mathbf{Pr}[\text{rejection}] = \delta$. Applying this result to +Equation~\ref{eq:sampling-cost} yields, +\begin{equation} +%\label{eq:sampling-cost-del} + O\left([W(n) + P(n)]\log_s n + \frac{kS(n)}{1 - \delta} \cdot R(n)\right) +\end{equation} + +Asymptotically, this proactive compaction does not alter the analysis of +insertion costs. Each record is still written at most $s$ times on each level, +there are at most $\log_s n$ levels, and the buffer insertion and SSI +construction costs are all unchanged, and so on. This results in the amortized +insertion cost remaining the same. + +This compaction strategy is based upon tombstone and record counts, and the +bounds assume that every record is equally likely to be sampled. For certain +sampling problems (such as WSS), there are other conditions that must be +considered to provide a bound on the rejection rate. To account for these +situations in a general fashion, the framework supports problem-specific +compaction triggers that can be tailored to the SSI being used. These allow +compactions to be triggered based on other properties, such as rejection rate +of a level, weight of deleted records, and the like. + + +\subsection{Trade-offs on Framework Design Space} +\label{ssec:design-space} +The framework has several tunable parameters, allowing it to be tailored for +specific applications. This design space contains trade-offs among three major +performance characteristics: update cost, sampling cost, and auxiliary memory +usage. The two most significant decisions when implementing this framework are +the selection of the layout and delete policies. The asymptotic analysis of the +previous sections obscures some of the differences between these policies, but +they do have significant practical performance implications. + +\Paragraph{Layout Policy.} The choice of layout policy represents a clear +trade-off between update and sampling performance. Leveling +results in fewer shards of larger size, whereas tiering results in a larger +number of smaller shards. As a result, leveling reduces the costs associated +with point-lookups and sampling query preprocessing by a constant factor, +compared to tiering. However, it results in more write amplification: a given +record may be involved in up to $s$ reconstructions on a single level, as +opposed to the single reconstruction per level under tiering. + +\Paragraph{Delete Policy.} There is a trade-off between delete performance and +sampling performance that exists in the choice of delete policy. Tagging +requires a point-lookup when performing a delete, which is more expensive than +the insert required by tombstones. However, it also allows constant-time +rejection checks, unlike tombstones which require a point-lookup of each +sampled record. In situations where deletes are common and write-throughput is +critical, tombstones may be more useful. Tombstones are also ideal in +situations where immutability is required, or random writes must be avoided. +Generally speaking, however, tagging is superior when using SSIs that support +it, because sampling rejection checks will usually be more common than deletes. + +\Paragraph{Mutable Buffer Capacity and Scale Factor.} The mutable buffer +capacity and scale factor both influence the number of levels within the index, +and by extension the number of distinct shards. Sampling and point-lookups have +better performance with fewer shards. Smaller shards are also faster to +reconstruct, although the same adjustments that reduce shard size also result +in a larger number of reconstructions, so the trade-off here is less clear. + +The scale factor has an interesting interaction with the layout policy: when +using leveling, the scale factor directly controls the amount of write +amplification per level. Larger scale factors mean more time is spent +reconstructing shards on a level, reducing update performance. Tiering does not +have this problem and should see its update performance benefit directly from a +larger scale factor, as this reduces the number of reconstructions. + +The buffer capacity also influences the number of levels, but is more +significant in its effects on point-lookup performance: a lookup must perform a +linear scan of the buffer. Likewise, the unstructured nature of the buffer also +will contribute negatively towards sampling performance, irrespective of which +buffer sampling technique is used. As a result, although a large buffer will +reduce the number of shards, it will also hurt sampling and delete (under +tagging) performance. It is important to minimize the cost of these buffer +scans, and so it is preferable to keep the buffer small, ideally small enough +to fit within the CPU's L2 cache. The number of shards within the index is, +then, better controlled by changing the scale factor, rather than the buffer +capacity. Using a smaller buffer will result in more compactions and shard +reconstructions; however, the empirical evaluation in Section~\ref{ssec:ds-exp} +demonstrates that this is not a serious performance problem when a scale factor +is chosen appropriately. When the shards are in memory, frequent small +reconstructions do not have a significant performance penalty compared to less +frequent, larger ones. + +\Paragraph{Auxiliary Structures.} The framework's support for arbitrary +auxiliary data structures allows for memory to be traded in exchange for +insertion or sampling performance. The use of Bloom filters for accelerating +tombstone rejection checks has already been discussed, but many other options +exist. Bloom filters could also be used to accelerate point-lookups for delete +tagging, though such filters would require much more memory than tombstone-only +ones to be effective. An auxiliary hash table could be used for accelerating +point-lookups, or range filters like SuRF \cite{zhang18} or Rosetta +\cite{siqiang20} added to accelerate pre-processing for range queries like in +IRS or WIRS. diff --git a/chapters/sigmod23/introduction.tex b/chapters/sigmod23/introduction.tex new file mode 100644 index 0000000..0155c7d --- /dev/null +++ b/chapters/sigmod23/introduction.tex @@ -0,0 +1,20 @@ +\section{Introduction} \label{sec:intro} + +As a first attempt at realizing a dynamic extension framework, one of the +non-decomposable search problems discussed in the previous chapter was +considered: independent range sampling, along with a number of other +independent sampling problems. These sorts of queries are important in a +variety of contexts, including including approximate query processing +(AQP)~\cite{blinkdb,quickr,verdict,cohen23}, interactive data +exploration~\cite{sps,xie21}, financial audit sampling~\cite{olken-thesis}, and +feature selection for machine learning~\cite{ml-sampling}. However, they are +not well served using existing techniques, which tend to sacrifice statistical +independence for performance, or vise versa. In this chapter, a solution for +independent sampling is presented that manages to achieve both statistical +independence, and good performance, by designing a Bentley-Saxe inspired +framework for introducing update support to efficient static sampling data +structures. It seeks to demonstrate the viability of Bentley-Saxe as the basis +for adding update support to data structures, as well as showing that the +limitations of the decomposable search problem abstraction can be overcome +through alternative query processing techniques to preserve good +performance. diff --git a/chapters/sigmod23/relatedwork.tex b/chapters/sigmod23/relatedwork.tex new file mode 100644 index 0000000..600cd0d --- /dev/null +++ b/chapters/sigmod23/relatedwork.tex @@ -0,0 +1,33 @@ +\section{Related Work}
+\label{sec:related}
+
+The general IQS problem was first proposed by Hu, Qiao, and Tao~\cite{hu14} and
+has since been the subject of extensive research
+\cite{irsra,afshani17,xie21,aumuller20}. These papers involve the use of
+specialized indexes to assist in drawing samples efficiently from the result
+sets of specific types of query, and are largely focused on in-memory settings.
+A recent survey by Tao~\cite{tao22} acknowledged that dynamization remains a major
+challenge for efficient sampling indexes. There do exist specific examples of
+sampling indexes~\cite{hu14} designed to support dynamic updates, but they are
+specialized, and impractical due to their
+implementation complexity and high constant-factors in their cost functions. A
+static index for spatial independent range sampling~\cite{xie21} has been
+proposed with a dynamic extension similar to the one proposed in this paper, but the method was not
+generalized, and its design space was not explored. There are also
+weight-updatable implementations of the alias structure \cite{hagerup93,
+matias03, allendorf23} that function under various assumptions about the weight
+distribution. These are of limited utility in a database context as they do not
+support direct insertion or deletion of entries. Efforts have also been made to
+improve tree-traversal based sampling approaches. Notably, the AB-tree
+\cite{zhao22} extends tree-sampling with support for concurrent updates, which
+has been a historical pain point.
+
+The Bentley-Saxe method was first proposed by Saxe and Bentley~\cite{saxe79}.
+Overmars and van Leeuwen extended this framework to provide better worst-case
+bounds~\cite{overmars81}, but their approach hurts common case performance by
+splitting reconstructions into small pieces and executing these pieces each
+time a record is inserted. Though not commonly used in database systems, the
+method has been applied to address specialized, problems, such as the creation
+of dynamic metric indexing structures~\cite{naidan14}, analysis of
+trajectories~\cite{custers19}, and genetic sequence search
+indexes~\cite{almodaresi23}.
|