Updates

author: Douglas Rumbaugh <dbr4@psu.edu> 2025-05-05 16:23:25 -0400
committer: Douglas Rumbaugh <dbr4@psu.edu> 2025-05-05 16:23:25 -0400
commit: ac1244fced7e6c6ba93d4292dd9a18ce293236eb (patch)
tree: 671696721d572a9e9ec2b92f94e1ff347ac26760 /chapters/sigmod23/background.tex
parent: eb519d35d7f11427dd5fc877130b02478f0da80d (diff)
download: dissertation-ac1244fced7e6c6ba93d4292dd9a18ce293236eb.tar.gz
1 files changed, 21 insertions, 2 deletions
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex
index ad89e03..b4ccbf1 100644
--- a/chapters/sigmod23/background.tex
+++ b/chapters/sigmod23/background.tex
@@ -124,7 +124,13 @@ query, and selecting or rejecting it for inclusion within the sample
 with a fixed probability~\cite{db2-doc}. This process requires that each
 record in the result set be considered, and thus provides no performance
 benefit relative to the query being sampled from, as it must be answered
-in full anyway before returning only some of the results.
+in full anyway before returning only some of the results.\footnote{
+    To clarify, this is not to say that Bernoulli sampling isn't
+    useful. It \emph{can} be used to improve the performance of queries
+    by limiting the cardinality of intermediate results, etc. But it is
+    not particularly useful for improving the performance of IQS queries,
+    where the sampling is performed on the final result set of the query.
+}
 
 For performance, the statistical guarantees can be discarded and
 systematic or block sampling used instead. Systematic sampling considers
@@ -230,6 +236,7 @@ structures attached to the nodes.  More examples of alias augmentation
 applied to different IQS problems can be found in a recent survey by
 Tao~\cite{tao22}.
 
+\Paragraph{Miscellanea.}
 There also exist specialized data structures with support for both
 efficient sampling and updates~\cite{hu14}, but these structures have
 poor constant factors and are very complex, rendering them of little
@@ -252,7 +259,19 @@ only once per sample set (if at all), but fail to support updates. Thus,
 there appears to be a general dichotomy of sampling techniques: existing
 sampling data structures support either updates, or efficient sampling,
 but generally not both. It will be the purpose of this chapter to resolve
-this dichotomy.
+this dichotomy. In particular, we seek to develop structures with the
+following desiderata,
+
+\begin{enumerate}
+    \item Support data updates (including deletes) with similar average
+          performance to a standard B+Tree.
+    \item Support IQS queries that do not pay a per-sample cost
+          proportional to some function of the data size. In other words,
+          $k$ should \emph{not} be be multiplied by any function of $n$
+          in the query cost function.
+    %FIXME: this guy comes out of nowhere...
+    \item Provide the user with some basic performance tuning capability.
+\end{enumerate}
author	Douglas Rumbaugh <dbr4@psu.edu>	2025-05-05 16:23:25 -0400
committer	Douglas Rumbaugh <dbr4@psu.edu>	2025-05-05 16:23:25 -0400
commit	ac1244fced7e6c6ba93d4292dd9a18ce293236eb (patch)
tree	671696721d572a9e9ec2b92f94e1ff347ac26760 /chapters/sigmod23/background.tex
parent	eb519d35d7f11427dd5fc877130b02478f0da80d (diff)
download	dissertation-ac1244fced7e6c6ba93d4292dd9a18ce293236eb.tar.gz