From 5e4ad2777acc4c2420514e39fb98b7cf2e200996 Mon Sep 17 00:00:00 2001
From: Douglas Rumbaugh <dbr4@psu.edu>
Date: Sun, 27 Apr 2025 17:36:57 -0400
Subject: Initial commit

---
 chapters/sigmod23/abstract.tex | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100644 chapters/sigmod23/abstract.tex

(limited to 'chapters/sigmod23/abstract.tex')

diff --git a/chapters/sigmod23/abstract.tex b/chapters/sigmod23/abstract.tex
new file mode 100644
index 0000000..3ff0c08
--- /dev/null
+++ b/chapters/sigmod23/abstract.tex
@@ -0,0 +1,29 @@
+\begin{abstract}
+
+    The execution of analytical queries on massive datasets presents challenges
+    due to long response times and high computational costs. As a result, the
+    analysis of representative samples of data has emerged as an attractive
+    alternative; this avoids the cost of processing queries against the entire
+    dataset, while still producing statistically valid results. Unfortunately,
+    the sampling techniques in common use sacrifice either sample quality or
+    performance, and so are poorly suited for this task. However, it is
+    possible to build high quality sample sets efficiently with the assistance
+    of indexes. This introduces a new challenge: real-world data is subject to
+    continuous update, and so the indexes must be kept up to date. This is
+    difficult, because existing sampling indexes present a dichotomy; efficient
+    sampling indexes are difficult to update, while easily updatable indexes
+    have poor sampling performance. This paper seeks to address this gap by
+    proposing a general and practical framework for extending most sampling
+    indexes with efficient update support, based on splitting indexes into
+    smaller shards, combined with a systematic approach to the periodic
+    reconstruction. The framework's design space is examined, with an eye
+    towards exploring trade-offs between update performance, sampling
+    performance, and memory usage. Three existing static sampling indexes are
+    extended using this framework to support updates, and the generalization of
+    the framework to concurrent operations and larger-than-memory data is
+    discussed. Through a comprehensive suite of benchmarks, the extended
+    indexes are shown to match or exceed the update throughput of
+    state-of-the-art dynamic baselines, while presenting significant
+    improvements in sampling latency.
+
+\end{abstract}
-- 
cgit v1.2.3