From 5e4ad2777acc4c2420514e39fb98b7cf2e200996 Mon Sep 17 00:00:00 2001 From: Douglas Rumbaugh Date: Sun, 27 Apr 2025 17:36:57 -0400 Subject: Initial commit --- chapters/sigmod23/abstract.tex | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) create mode 100644 chapters/sigmod23/abstract.tex (limited to 'chapters/sigmod23/abstract.tex') diff --git a/chapters/sigmod23/abstract.tex b/chapters/sigmod23/abstract.tex new file mode 100644 index 0000000..3ff0c08 --- /dev/null +++ b/chapters/sigmod23/abstract.tex @@ -0,0 +1,29 @@ +\begin{abstract} + + The execution of analytical queries on massive datasets presents challenges + due to long response times and high computational costs. As a result, the + analysis of representative samples of data has emerged as an attractive + alternative; this avoids the cost of processing queries against the entire + dataset, while still producing statistically valid results. Unfortunately, + the sampling techniques in common use sacrifice either sample quality or + performance, and so are poorly suited for this task. However, it is + possible to build high quality sample sets efficiently with the assistance + of indexes. This introduces a new challenge: real-world data is subject to + continuous update, and so the indexes must be kept up to date. This is + difficult, because existing sampling indexes present a dichotomy; efficient + sampling indexes are difficult to update, while easily updatable indexes + have poor sampling performance. This paper seeks to address this gap by + proposing a general and practical framework for extending most sampling + indexes with efficient update support, based on splitting indexes into + smaller shards, combined with a systematic approach to the periodic + reconstruction. The framework's design space is examined, with an eye + towards exploring trade-offs between update performance, sampling + performance, and memory usage. Three existing static sampling indexes are + extended using this framework to support updates, and the generalization of + the framework to concurrent operations and larger-than-memory data is + discussed. Through a comprehensive suite of benchmarks, the extended + indexes are shown to match or exceed the update throughput of + state-of-the-art dynamic baselines, while presenting significant + improvements in sampling latency. + +\end{abstract} -- cgit v1.2.3