From ac1244fced7e6c6ba93d4292dd9a18ce293236eb Mon Sep 17 00:00:00 2001
From: Douglas Rumbaugh <dbr4@psu.edu>
Date: Mon, 5 May 2025 16:23:25 -0400
Subject: Updates

---
 chapters/sigmod23/background.tex | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

(limited to 'chapters/sigmod23/background.tex')

diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex
index ad89e03..b4ccbf1 100644
--- a/chapters/sigmod23/background.tex
+++ b/chapters/sigmod23/background.tex
@@ -124,7 +124,13 @@ query, and selecting or rejecting it for inclusion within the sample
 with a fixed probability~\cite{db2-doc}. This process requires that each
 record in the result set be considered, and thus provides no performance
 benefit relative to the query being sampled from, as it must be answered
-in full anyway before returning only some of the results.
+in full anyway before returning only some of the results.\footnote{
+    To clarify, this is not to say that Bernoulli sampling isn't
+    useful. It \emph{can} be used to improve the performance of queries
+    by limiting the cardinality of intermediate results, etc. But it is
+    not particularly useful for improving the performance of IQS queries,
+    where the sampling is performed on the final result set of the query.
+}
 
 For performance, the statistical guarantees can be discarded and
 systematic or block sampling used instead. Systematic sampling considers
@@ -230,6 +236,7 @@ structures attached to the nodes.  More examples of alias augmentation
 applied to different IQS problems can be found in a recent survey by
 Tao~\cite{tao22}.
 
+\Paragraph{Miscellanea.}
 There also exist specialized data structures with support for both
 efficient sampling and updates~\cite{hu14}, but these structures have
 poor constant factors and are very complex, rendering them of little
@@ -252,7 +259,19 @@ only once per sample set (if at all), but fail to support updates. Thus,
 there appears to be a general dichotomy of sampling techniques: existing
 sampling data structures support either updates, or efficient sampling,
 but generally not both. It will be the purpose of this chapter to resolve
-this dichotomy.
+this dichotomy. In particular, we seek to develop structures with the
+following desiderata,
+
+\begin{enumerate}
+    \item Support data updates (including deletes) with similar average
+          performance to a standard B+Tree.
+    \item Support IQS queries that do not pay a per-sample cost
+          proportional to some function of the data size. In other words,
+          $k$ should \emph{not} be be multiplied by any function of $n$
+          in the query cost function.
+    %FIXME: this guy comes out of nowhere...
+    \item Provide the user with some basic performance tuning capability.
+\end{enumerate}
 
 
 
-- 
cgit v1.2.3