summaryrefslogtreecommitdiffstats
path: root/chapters/sigmod23/background.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/sigmod23/background.tex')
-rw-r--r--chapters/sigmod23/background.tex12
1 files changed, 6 insertions, 6 deletions
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex
index b4ccbf1..af3b80a 100644
--- a/chapters/sigmod23/background.tex
+++ b/chapters/sigmod23/background.tex
@@ -37,12 +37,12 @@ have \emph{statistical independence} and for the distribution of records
in the sample set to match the distribution of source data set. This
requires that the sampling of a record does not affect the probability of
any other record being sampled in the future. Such sample sets are said
-to be drawn i.i.d (idendepently and identically distributed). Throughout
+to be drawn i.i.d (independently and identically distributed). Throughout
this chapter, the term "independent" will be used to describe both
statistical independence, and identical distribution.
Independence of sample sets is important because many useful statistical
-results are derived from assumping that the condition holds. For example,
+results are derived from assuming that the condition holds. For example,
it is a requirement for the application of statistical tools such as
the Central Limit Theorem~\cite{bulmer79}, which is the basis for many
concentration bounds. A failure to maintain independence in sampling
@@ -54,7 +54,7 @@ sampling} (IQS)~\cite{hu14}. In IQS, a sample set is constructed from a
specified number of records in the result set of a database query. In
this context, it isn't enough to ensure that individual records are
sampled independently; the sample sets from repeated queries must also be
-indepedent. This precludes, for example, caching and returning the same
+independent. This precludes, for example, caching and returning the same
sample set to multiple repetitions of the same query. This inter-query
independence provides a variety of useful properties, such as fairness
and representativeness of query results~\cite{tao22}.
@@ -194,7 +194,7 @@ call static sampling indices (SSIs) in this chapter,\footnote{
is based, which was published prior to our realization that a strong
distinction between an index and a data structure would be useful. I
am retaining the term SSI in this chapter for consistency with the
- original paper, but understand that in the termonology established in
+ original paper, but understand that in the terminology established in
Chapter~\ref{chap:background}, SSIs are data structures, not indices.
},
that are capable of answering sampling queries more efficiently than
@@ -216,7 +216,7 @@ per sample. Thus, a WSS query can be answered in $\Theta(k)$ time,
assuming the structure has already been built. Unfortunately, the alias
structure cannot be efficiently updated, as inserting new records would
change the relative weights of \emph{all} the records, and require fully
-repartitioning the structure.
+re-partitioning the structure.
While the alias method only applies to WSS, other sampling problems can
be solved by using the alias method within the context of a larger data
@@ -245,7 +245,7 @@ the alias structure with support for weight updates over a fixed set of
elements~\cite{hagerup93,matias03,allendorf23}. These approaches do not
allow the insertion or removal of new records, however, only in-place
weight updates. While in principle they could be constructed over the
-entire domain of possible records, with the weights of non-existant
+entire domain of possible records, with the weights of non-existent
records set to $0$, this is hardly practical. Thus, these structures are
not suited for the database sampling applications that are of interest to
us in this chapter.