diff options
Diffstat (limited to 'chapters/sigmod23/background.tex')
| -rw-r--r-- | chapters/sigmod23/background.tex | 6 |
1 files changed, 3 insertions, 3 deletions
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex index 88f2585..42a52de 100644 --- a/chapters/sigmod23/background.tex +++ b/chapters/sigmod23/background.tex @@ -104,7 +104,7 @@ sampling} (WIRS), positive weights $w: D\to \mathbb{R}^+$. Given a query interval $q = [x, y]$ and an integer $k$, an independent range sampling query returns $k$ independent samples from $D \cap q$ with each - point having a probability of $\nicefrac{w(d)}{\sum_{p \in D \cap q}w(p)}$ + point having a probability of $\frac{w(d)}{\sum_{p \in D \cap q}w(p)}$ of being sampled. \end{definition} @@ -118,7 +118,7 @@ SQL's \texttt{TABLESAMPLE} operator~\cite{postgres-doc}. However, the algorithms used to implement this operator have significant limitations and do not allow users to maintain statistical independence of the results without also running the query to be sampled from in full. Thus, users must -choose between independece and performance. +choose between independence and performance. To maintain statistical independence, Bernoulli sampling is used. This technique requires iterating over every record in the result set of the @@ -198,7 +198,7 @@ call static sampling indices (SSIs) in this chapter,\footnote{ am retaining the term SSI in this chapter for consistency with the original paper, but understand that in the terminology established in Chapter~\ref{chap:background}, SSIs are data structures, not indices. -}, +} that are capable of answering sampling queries more efficiently than Olken's method relative to the overall data size. An example of such a structure is used in Walker's alias method \cite{walker74,vose91}. |