updates

author: Douglas B. Rumbaugh <doug@douglasrumbaugh.com> 2025-07-06 18:21:32 -0400
committer: Douglas B. Rumbaugh <doug@douglasrumbaugh.com> 2025-07-06 18:21:32 -0400
commit: 0dc1a8ea20820168149cedaa14e223d4d31dc4b6 (patch)
tree: 2bc726803cf6de6d669958b1f5a79cde59722e00 /chapters/sigmod23
parent: 0fff4753fac809a6ba17f428df3a041cebe692e0 (diff)
download: dissertation-0dc1a8ea20820168149cedaa14e223d4d31dc4b6.tar.gz
2 files changed, 13 insertions, 13 deletions
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex
index 984e36c..8d3a88f 100644
--- a/chapters/sigmod23/background.tex
+++ b/chapters/sigmod23/background.tex
@@ -21,8 +21,8 @@ set; the specific usage should be clear from context.
 In each of the problems considered, sampling can be performed either
 with-replacement or without-replacement. Sampling with-replacement
 means that a record that has been included in the sample set for a given
-sampling query is "replaced" into the dataset and allowed to be sampled
-again. Sampling without-replacement does not "replace" the record,
+sampling query is ``replaced'' into the dataset and allowed to be sampled
+again. Sampling without-replacement does not ``replace'' the record,
 and so each individual record can only be included within the a sample
 set once for a given query. The data structures that will be discussed
 support sampling with-replacement, and sampling without-replacement can
@@ -38,7 +38,7 @@ in the sample set to match the distribution of source data set. This
 requires that the sampling of a record does not affect the probability of
 any other record being sampled in the future. Such sample sets are said
 to be drawn i.i.d (independently and identically distributed). Throughout
-this chapter, the term "independent" will be used to describe both
+this chapter, the term ``independent'' will be used to describe both
 statistical independence, and identical distribution.
 
 Independence of sample sets is important because many useful statistical
@@ -192,7 +192,7 @@ requiring greater than $k$ traversals to obtain a sample set of size $k$.
 \Paragraph{Static Solutions.}
 There are also a large number of static data structures, which we'll
 call static sampling indices (SSIs) in this chapter,\footnote{
-  We used the term "SSI" in the original paper on which this chapter
+  We used the term ``SSI'' in the original paper on which this chapter
   is based, which was published prior to our realization that a strong
   distinction between an index and a data structure would be useful. I
   am retaining the term SSI in this chapter for consistency with the
diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex
index 218c290..b413802 100644
--- a/chapters/sigmod23/framework.tex
+++ b/chapters/sigmod23/framework.tex
@@ -252,7 +252,7 @@ of the major limitations of the ghost structure approach for handling
 deletes is that there is not a principled method for removing deleted
 records from the decomposed structure. The standard approach is to set an
 arbitrary number of delete records, and rebuild the entire structure when
-this threshold is crossed~\cite{saxe79}. Mixing the "ghost" records into
+this threshold is crossed~\cite{saxe79}. Mixing the ``ghost'' records into
 the same structures as the original records allows for deleted records
 to naturally be cleaned up over time as they meet their tombstones during
 reconstructions using a technique called tombstone cancellation. This
@@ -280,7 +280,7 @@ mechanism.
 The cost of using a tombstone delete in a Bentley-Saxe dynamization is
 the same as a simple insert,
 \begin{equation*}
-\mathscr{D}(n)_A \in \Theta\left(\frac{B(n)}{n} \log_2 (n)\right)
+D_A(n) \in \Theta\left(\frac{B(n)}{n} \log_2 (n)\right)
 \end{equation*}
 with the worst-case cost being $\Theta(B(n))$. Note that there is also
 a minor performance effect resulting from deleted records appearing
@@ -309,7 +309,7 @@ on a Bentley-Saxe decomposition of that SSI will require, at worst,
 executing a point-lookup on each block, with a total cost of
 
 \begin{equation*}
-\mathscr{D}(n) \in \Theta\left( L(n) \log_2 (n)\right)
+D(n) \in \Theta\left( L(n) \log_2 (n)\right)
 \end{equation*}
 
 If the SSI being considered does \emph{not} support an efficient
@@ -391,7 +391,7 @@ a natural way of controlling the number of deleted records within the
 structure, and thereby bounding the rejection rate. During reconstruction,
 we have the opportunity to remove deleted records. This will cause the
 record counts associated with each block of the structure to gradually
-drift out of alignment with the "perfect" powers of two associated with
+drift out of alignment with the ``perfect'' powers of two associated with
 the Bentley-Saxe method, however. In the theoretical literature on this
 topic, the solution to this problem is to periodically re-partition all of
 the records to re-align the block sizes~\cite{merge-dsp, saxe79}. This
@@ -450,7 +450,7 @@ deleted records involved in the reconstruction will be dropped. Tombstones
 may require multiple cascading rounds of compaction to occur, because a
 tombstone record will only cancel when it encounters the record that it
 deletes. However, because tombstones always follow the record they
-delete in insertion order, and will therefore always be "above" that
+delete in insertion order, and will therefore always be ``above'' that
 record in the structure, each reconstruction will move every tombstone
 involved closer to the record it deletes, ensuring that eventually the
 bound will be satisfied.
@@ -526,7 +526,7 @@ and LevelDB~\cite{leveldb}. This work has produced an incredibly
 large and well explored parametrization of the reconstruction
 procedures of LSM trees, a good summary of which can be bounded in
 this recent tutorial paper~\cite{sarkar23}. Examples of this design
-space exploration include: different ways to organize each "level"
+space exploration include: different ways to organize each ``level''
 of the tree~\cite{dayan19, dostoevsky, autumn}, different growth
 rates, buffering, sub-partitioning of structures to allow finer-grained
 reconstruction~\cite{dayan22}, and approaches for allocating resources to
@@ -739,19 +739,19 @@ Assuming that $N_B \ll n$, the first two terms of this expression are
 constant. Dropping them and amortizing the result over $n$ records give
 us the amortized insertion cost,
 \begin{equation*}
-I_a(n) \in \Theta\left(\frac{B_M(n)}{n}\log_s(n)\right)
+I_A(n) \in \Theta\left(\frac{B_M(n)}{n}\log_s(n)\right)
 \end{equation*}
 If the SSI being considered does not support a more efficient
 construction procedure from other instances of the same SSI, and
 the general Bentley-Saxe \texttt{unbuild} and \texttt{build}
-operations must be used, the the cost becomes $I_a(n) \in
+operations must be used, the the cost becomes $I_A(n) \in
 \Theta\left(\frac{B(n)}{n}\log_s(n)\right)$ instead.
 
 \Paragraph{Delete.} The framework supports both tombstone and tagged
 deletes, each with different performance. Using tombstones, the cost
 of a delete is identical to that of an insert. When using tagging, the
 cost of a delete is the same as the cost of a point lookup, because the
-"delete" itself only sets a bit in the header of the record,
+``delete'' itself only sets a bit in the header of the record,
 once it has been located. There will be $\Theta(\log_s n)$ total shards
 in the structure, each with a look-up cost of $L(n)$ using either the
 SSI's native point-lookup, or an auxiliary hash table, and the lookup
author	Douglas B. Rumbaugh <doug@douglasrumbaugh.com>	2025-07-06 18:21:32 -0400
committer	Douglas B. Rumbaugh <doug@douglasrumbaugh.com>	2025-07-06 18:21:32 -0400
commit	0dc1a8ea20820168149cedaa14e223d4d31dc4b6 (patch)
tree	2bc726803cf6de6d669958b1f5a79cde59722e00 /chapters/sigmod23
parent	0fff4753fac809a6ba17f428df3a041cebe692e0 (diff)
download	dissertation-0dc1a8ea20820168149cedaa14e223d4d31dc4b6.tar.gz