Julia updates

author: Douglas B. Rumbaugh <doug@douglasrumbaugh.com> 2025-06-01 13:15:52 -0400
committer: Douglas B. Rumbaugh <doug@douglasrumbaugh.com> 2025-06-01 13:15:52 -0400
commit: cd3447f1cad16972e8a659ec6e84764c5b8b2745 (patch)
tree: 5a50b6e8a99646e326b2c41714f50e4f7dee64d0 /chapters/sigmod23/background.tex
parent: 6354e60f106a89f5bf807082561ed5efd9be0f4f (diff)
download: dissertation-cd3447f1cad16972e8a659ec6e84764c5b8b2745.tar.gz
1 files changed, 10 insertions, 8 deletions
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex
index af3b80a..d600c27 100644
--- a/chapters/sigmod23/background.tex
+++ b/chapters/sigmod23/background.tex
@@ -19,16 +19,16 @@ is used to indicate the selection of either a single sample or a sample
 set; the specific usage should be clear from context.
 
 In each of the problems considered, sampling can be performed either
-with replacement or without replacement. Sampling with replacement
+with-replacement or without-replacement. Sampling with-replacement
 means that a record that has been included in the sample set for a given
 sampling query is "replaced" into the dataset and allowed to be sampled
-again. Sampling without replacement does not "replace" the record,
+again. Sampling without-replacement does not "replace" the record,
 and so each individual record can only be included within the a sample
 set once for a given query. The data structures that will be discussed
-support sampling with replacement, and sampling without replacement can
-be implemented using a constant number of with replacement sampling
+support sampling with-replacement, and sampling without-replacement can
+be implemented using a constant number of with-replacement sampling
 operations, followed by a deduplication step~\cite{hu15}, so this chapter
-will focus exclusive on the with replacement case.
+will focus exclusive on the with-replacement case.
 
 \subsection{Independent Sampling Problem}
 
@@ -115,8 +115,10 @@ of problems that will be directly addressed within this chapter.
 
 Relational database systems often have native support for IQS using
 SQL's \texttt{TABLESAMPLE} operator~\cite{postgress-doc}. However, the
-algorithms used to implement this operator have significant limitations:
-users much choose between statistical independence or performance.
+algorithms used to implement this operator have significant limitations
+and do not allow users to maintain statistical independence of the results
+without also running the query to be sampled from in full. Thus, users must
+choose between independece and performance.
 
 To maintain statistical independence, Bernoulli sampling is used. This
 technique requires iterating over every record in the result set of the
@@ -240,7 +242,7 @@ Tao~\cite{tao22}.
 There also exist specialized data structures with support for both
 efficient sampling and updates~\cite{hu14}, but these structures have
 poor constant factors and are very complex, rendering them of little
-practical utility. Additionally, efforts have been made to extended
+practical utility. Additionally, efforts have been made to extend
 the alias structure with support for weight updates over a fixed set of
 elements~\cite{hagerup93,matias03,allendorf23}. These approaches do not
 allow the insertion or removal of new records, however, only in-place
author	Douglas B. Rumbaugh <doug@douglasrumbaugh.com>	2025-06-01 13:15:52 -0400
committer	Douglas B. Rumbaugh <doug@douglasrumbaugh.com>	2025-06-01 13:15:52 -0400
commit	cd3447f1cad16972e8a659ec6e84764c5b8b2745 (patch)
tree	5a50b6e8a99646e326b2c41714f50e4f7dee64d0 /chapters/sigmod23/background.tex
parent	6354e60f106a89f5bf807082561ed5efd9be0f4f (diff)
download	dissertation-cd3447f1cad16972e8a659ec6e84764c5b8b2745.tar.gz