summaryrefslogtreecommitdiffstats
path: root/chapters/sigmod23
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/sigmod23')
-rw-r--r--chapters/sigmod23/background.tex6
-rw-r--r--chapters/sigmod23/framework.tex8
2 files changed, 7 insertions, 7 deletions
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex
index 88f2585..42a52de 100644
--- a/chapters/sigmod23/background.tex
+++ b/chapters/sigmod23/background.tex
@@ -104,7 +104,7 @@ sampling} (WIRS),
positive weights $w: D\to \mathbb{R}^+$. Given a query
interval $q = [x, y]$ and an integer $k$, an independent range sampling
query returns $k$ independent samples from $D \cap q$ with each
- point having a probability of $\nicefrac{w(d)}{\sum_{p \in D \cap q}w(p)}$
+ point having a probability of $\frac{w(d)}{\sum_{p \in D \cap q}w(p)}$
of being sampled.
\end{definition}
@@ -118,7 +118,7 @@ SQL's \texttt{TABLESAMPLE} operator~\cite{postgres-doc}. However, the
algorithms used to implement this operator have significant limitations
and do not allow users to maintain statistical independence of the results
without also running the query to be sampled from in full. Thus, users must
-choose between independece and performance.
+choose between independence and performance.
To maintain statistical independence, Bernoulli sampling is used. This
technique requires iterating over every record in the result set of the
@@ -198,7 +198,7 @@ call static sampling indices (SSIs) in this chapter,\footnote{
am retaining the term SSI in this chapter for consistency with the
original paper, but understand that in the terminology established in
Chapter~\ref{chap:background}, SSIs are data structures, not indices.
-},
+}
that are capable of answering sampling queries more efficiently than
Olken's method relative to the overall data size. An example of such
a structure is used in Walker's alias method \cite{walker74,vose91}.
diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex
index d51c2cb..b3a8215 100644
--- a/chapters/sigmod23/framework.tex
+++ b/chapters/sigmod23/framework.tex
@@ -532,7 +532,7 @@ rates, buffering, sub-partitioning of structures to allow finer-grained
reconstruction~\cite{dayan22}, and approaches for allocating resources to
auxiliary structures attached to the main ones for accelerating certain
types of query~\cite{dayan18-1, zhu21, monkey}. This work is discussed
-in greater depth in Chapter~\ref{chap:related-work}
+in greater depth in Chapter~\ref{chap:related-work}.
Many of the elements within the LSM Tree design space are based upon the
specifics of the data structure itself, and are not applicable to our
@@ -561,7 +561,7 @@ the case of sampling this isn't a serious problem. The implications of
this will be discussed in Section~\ref{ssec:sampling-cost-funcs}. The
size of this buffer, $N_B$ is a user-specified constant. Block capacities
are defined in terms of multiples of $N_B$, such that each buffer flush
-corresponds to an insert in the traditioanl Bentley-Saxe method. Thus,
+corresponds to an insert in the traditional Bentley-Saxe method. Thus,
rather than the $i$th block containing $2^i$ records, it contains $N_B
\cdot 2^i$ records. We call this unsorted array the \emph{mutable buffer}.
@@ -750,8 +750,8 @@ operations must be used, the the cost becomes $I_a(n) \in
\Paragraph{Delete.} The framework supports both tombstone and tagged
deletes, each with different performance. Using tombstones, the cost
of a delete is identical to that of an insert. When using tagging, the
-cost of a delete is the same as cost of doing a point lookup, as the
-"delete" itself is simply setting a bit in the header of the record,
+cost of a delete is the same as the cost of a point lookup, because the
+"delete" itself only sets a bit in the header of the record,
once it has been located. There will be $\Theta(\log_s n)$ total shards
in the structure, each with a look-up cost of $L(n)$ using either the
SSI's native point-lookup, or an auxiliary hash table, and the lookup