summaryrefslogtreecommitdiffstats
path: root/chapters/sigmod23/background.tex
diff options
context:
space:
mode:
authorDouglas B. Rumbaugh <doug@douglasrumbaugh.com>2025-06-01 13:15:52 -0400
committerDouglas B. Rumbaugh <doug@douglasrumbaugh.com>2025-06-01 13:15:52 -0400
commitcd3447f1cad16972e8a659ec6e84764c5b8b2745 (patch)
tree5a50b6e8a99646e326b2c41714f50e4f7dee64d0 /chapters/sigmod23/background.tex
parent6354e60f106a89f5bf807082561ed5efd9be0f4f (diff)
downloaddissertation-cd3447f1cad16972e8a659ec6e84764c5b8b2745.tar.gz
Julia updates
Diffstat (limited to 'chapters/sigmod23/background.tex')
-rw-r--r--chapters/sigmod23/background.tex18
1 files changed, 10 insertions, 8 deletions
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex
index af3b80a..d600c27 100644
--- a/chapters/sigmod23/background.tex
+++ b/chapters/sigmod23/background.tex
@@ -19,16 +19,16 @@ is used to indicate the selection of either a single sample or a sample
set; the specific usage should be clear from context.
In each of the problems considered, sampling can be performed either
-with replacement or without replacement. Sampling with replacement
+with-replacement or without-replacement. Sampling with-replacement
means that a record that has been included in the sample set for a given
sampling query is "replaced" into the dataset and allowed to be sampled
-again. Sampling without replacement does not "replace" the record,
+again. Sampling without-replacement does not "replace" the record,
and so each individual record can only be included within the a sample
set once for a given query. The data structures that will be discussed
-support sampling with replacement, and sampling without replacement can
-be implemented using a constant number of with replacement sampling
+support sampling with-replacement, and sampling without-replacement can
+be implemented using a constant number of with-replacement sampling
operations, followed by a deduplication step~\cite{hu15}, so this chapter
-will focus exclusive on the with replacement case.
+will focus exclusive on the with-replacement case.
\subsection{Independent Sampling Problem}
@@ -115,8 +115,10 @@ of problems that will be directly addressed within this chapter.
Relational database systems often have native support for IQS using
SQL's \texttt{TABLESAMPLE} operator~\cite{postgress-doc}. However, the
-algorithms used to implement this operator have significant limitations:
-users much choose between statistical independence or performance.
+algorithms used to implement this operator have significant limitations
+and do not allow users to maintain statistical independence of the results
+without also running the query to be sampled from in full. Thus, users must
+choose between independece and performance.
To maintain statistical independence, Bernoulli sampling is used. This
technique requires iterating over every record in the result set of the
@@ -240,7 +242,7 @@ Tao~\cite{tao22}.
There also exist specialized data structures with support for both
efficient sampling and updates~\cite{hu14}, but these structures have
poor constant factors and are very complex, rendering them of little
-practical utility. Additionally, efforts have been made to extended
+practical utility. Additionally, efforts have been made to extend
the alias structure with support for weight updates over a fixed set of
elements~\cite{hagerup93,matias03,allendorf23}. These approaches do not
allow the insertion or removal of new records, however, only in-place