summaryrefslogtreecommitdiffstats
path: root/chapters/sigmod23/abstract.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/sigmod23/abstract.tex')
-rw-r--r--chapters/sigmod23/abstract.tex29
1 files changed, 29 insertions, 0 deletions
diff --git a/chapters/sigmod23/abstract.tex b/chapters/sigmod23/abstract.tex
new file mode 100644
index 0000000..3ff0c08
--- /dev/null
+++ b/chapters/sigmod23/abstract.tex
@@ -0,0 +1,29 @@
+\begin{abstract}
+
+ The execution of analytical queries on massive datasets presents challenges
+ due to long response times and high computational costs. As a result, the
+ analysis of representative samples of data has emerged as an attractive
+ alternative; this avoids the cost of processing queries against the entire
+ dataset, while still producing statistically valid results. Unfortunately,
+ the sampling techniques in common use sacrifice either sample quality or
+ performance, and so are poorly suited for this task. However, it is
+ possible to build high quality sample sets efficiently with the assistance
+ of indexes. This introduces a new challenge: real-world data is subject to
+ continuous update, and so the indexes must be kept up to date. This is
+ difficult, because existing sampling indexes present a dichotomy; efficient
+ sampling indexes are difficult to update, while easily updatable indexes
+ have poor sampling performance. This paper seeks to address this gap by
+ proposing a general and practical framework for extending most sampling
+ indexes with efficient update support, based on splitting indexes into
+ smaller shards, combined with a systematic approach to the periodic
+ reconstruction. The framework's design space is examined, with an eye
+ towards exploring trade-offs between update performance, sampling
+ performance, and memory usage. Three existing static sampling indexes are
+ extended using this framework to support updates, and the generalization of
+ the framework to concurrent operations and larger-than-memory data is
+ discussed. Through a comprehensive suite of benchmarks, the extended
+ indexes are shown to match or exceed the update throughput of
+ state-of-the-art dynamic baselines, while presenting significant
+ improvements in sampling latency.
+
+\end{abstract}