summaryrefslogtreecommitdiffstats
path: root/chapters/sigmod23/relatedwork.tex
blob: 600cd0d72b5bd71277940a13f79fc5c5f3c8d28e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
\section{Related Work}
\label{sec:related}

The general IQS problem was first proposed by Hu, Qiao, and Tao~\cite{hu14} and
has since been the subject of extensive research
\cite{irsra,afshani17,xie21,aumuller20}. These papers involve the use of
specialized indexes to assist in drawing samples efficiently from the result
sets of specific types of query, and are largely focused on in-memory settings.
A recent survey by Tao~\cite{tao22} acknowledged that dynamization remains a major
challenge for efficient sampling indexes. There do exist specific examples of
sampling indexes~\cite{hu14} designed to support dynamic updates, but they are
specialized, and impractical due to their
implementation complexity and high constant-factors in their cost functions. A
static index for spatial independent range sampling~\cite{xie21} has been
proposed with a dynamic extension similar to the one proposed in this paper, but the method was not
generalized, and its design space was not explored. There are also
weight-updatable implementations of the alias structure \cite{hagerup93,
matias03, allendorf23} that function under various assumptions about the weight
distribution. These are of limited utility in a database context as they do not
support direct insertion or deletion of entries. Efforts have also been made to
improve tree-traversal based sampling approaches. Notably, the AB-tree
\cite{zhao22} extends tree-sampling with support for concurrent updates, which
has been a historical pain point. 

The Bentley-Saxe method was first proposed by Saxe and Bentley~\cite{saxe79}.
Overmars and van Leeuwen extended this framework to provide better worst-case
bounds~\cite{overmars81}, but their approach hurts common case performance by
splitting reconstructions into small pieces and executing these pieces each
time a record is inserted. Though not commonly used in database systems, the
method has been applied to address specialized, problems, such as the creation
of dynamic metric indexing structures~\cite{naidan14}, analysis of
trajectories~\cite{custers19}, and genetic sequence search
indexes~\cite{almodaresi23}.