updates

author: Douglas Rumbaugh <dbr4@psu.edu> 2025-06-27 15:21:38 -0400
committer: Douglas Rumbaugh <dbr4@psu.edu> 2025-06-27 15:21:38 -0400
commit: fcdbcbcd45dc567792429bb314df53b42ed9f22e (patch)
tree: 3f7c135b7b32022fa0a9f03361e60cc0cc4f86e0 /chapters/sigmod23
parent: ff528e8595e82802832930fae6c9ccee7afd23cb (diff)
download: dissertation-fcdbcbcd45dc567792429bb314df53b42ed9f22e.tar.gz
7 files changed, 33 insertions, 33 deletions
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex
index 42a52de..984e36c 100644
--- a/chapters/sigmod23/background.tex
+++ b/chapters/sigmod23/background.tex
@@ -226,7 +226,7 @@ structure, a technique called \emph{alias augmentation}~\cite{tao22}. For
 example, alias augmentation can be used to construct an SSI capable of
 answering WIRS queries in $\Theta(\log n + k)$~\cite{afshani17,tao22}.
 This structure breaks the data into multiple disjoint partitions of size
-$\nicefrac{n}{\log n}$, each with an associated alias structure. A B+Tree
+$\nicefrac{n}{\log n}$, each with an associated alias structure. A B+tree
 is then built, using the augmented partitions as its leaf nodes. Each
 internal node is also augmented with an alias structure over the aggregate
 weights associated with the children of each pointer. Constructing this
@@ -266,7 +266,7 @@ following desiderata,
 
 \begin{enumerate}
     \item Support data updates (including deletes) with similar average
-          performance to a standard B+Tree.
+          performance to a standard B+tree.
     \item Support IQS queries that do not pay a per-sample cost
           proportional to some function of the data size. In other words,
           $k$ should \emph{not} be be multiplied by any function of $n$
diff --git a/chapters/sigmod23/examples.tex b/chapters/sigmod23/examples.tex
index 4e7f9ac..32807e1 100644
--- a/chapters/sigmod23/examples.tex
+++ b/chapters/sigmod23/examples.tex
@@ -74,7 +74,7 @@ makes progress towards removing it.
 \subsection{Independent Range Sampling (ISAM Tree)} 
 \label{ssec:irs-struct}
 We will next considered independent range sampling. For this decomposable
-sampling problem, we use the ISAM Tree for the SSI. Because our shards are
+sampling problem, we use the ISAM tree for the SSI. Because our shards are
 static, we can build highly compact and efficient ISAM trees by storing
 the records directly in a sorted array. So long as the leaf node size is
 a multiple of the record size, this array can be treated as a sequence of
@@ -106,7 +106,7 @@ operations are,
     \text{Worst-case Tagged Delete:} \quad &O\left(\log_s n \log_f n\right)
 \end{align*}
 where $R(n) \in \Theta(1)$ for tagging and $R(n) \in \Theta(\log_s n \log_f n)$
-for tombstones and $f$ is the fanout of the ISAM Tree.
+for tombstones and $f$ is the fanout of the ISAM tree.
 
 
 \subsection{Weighted Independent Range Sampling (Alias-augmented B+Tree)} 
@@ -114,13 +114,13 @@ for tombstones and $f$ is the fanout of the ISAM Tree.
 \label{ssec:wirs-struct}
 As a final example of applying this framework, we consider WIRS. This
 is a decomposable sampling problem that can be answered using the
-alias-augmented B+Tree structure~\cite{tao22, afshani17,hu14}. This
+alias-augmented B+tree structure~\cite{tao22, afshani17,hu14}. This
 data structure is built over sorted data, but can be bulk-loaded from
 this data in linear time, resulting in costs of $B(n) \in \Theta(n \log n)$
 and $B_M(n) \in \Theta(n)$, though the constant factors associated with
 these functions are quite high, as each bulk-loading requires multiple
-linear-time operations for building both the B+Tree and the alias
-structures, among other things. As it is built on a B+Tree, the structure
+linear-time operations for building both the B+tree and the alias
+structures, among other things. As it is built on a B+tree, the structure
 supports $L(n) \in \Theta(\log n)$ point lookups. Answering sampling
 queries requires $P(n) \in \Theta(\log n)$ pre-processing time to
 establish the query interval, during which the weight of the interval
diff --git a/chapters/sigmod23/exp-baseline.tex b/chapters/sigmod23/exp-baseline.tex
index d0e1ce0..4ae744b 100644
--- a/chapters/sigmod23/exp-baseline.tex
+++ b/chapters/sigmod23/exp-baseline.tex
@@ -1,7 +1,7 @@
 \subsection{Comparison to Baselines} 
 
 Next, we compared the performance of our dynamized sampling indices with
-Olken's method on an aggregate B+Tree. We also examine the query performance
+Olken's method on an aggregate B+tree. We also examine the query performance
 of a single instance of the SSI in question to establish how much query
 performance is lost in the dynamization. Unless otherwise specified,
 IRS and WIRS queries are run with a selectivity of $0.1\%$. Additionally,
@@ -51,15 +51,15 @@ resulting in better performance.
 
 In Figures~\ref{fig:wirs-insert} and \ref{fig:wirs-sample} we examine
 the performed of \texttt{DE-WIRS} compared to \text{AGG B+tree} and an
-alias-augmented B+Tree. We see the same basic set of patterns in this
+alias-augmented B+tree. We see the same basic set of patterns in this
 case as we did with WSS. \texttt{AGG B+Tree} defeats our dynamized
 index on the \texttt{twitter} dataset, but loses on the others, in
 terms of insertion performance. We can see that the alias-augmented
-B+Tree is much more expensive to build than an alias structure, and
+B+tree is much more expensive to build than an alias structure, and
 so its insertion performance advantage is eroded somewhat compared to
 the dynamic structure.  For queries we see that the \texttt{AGG B+Tree}
 performs similarly for WIRS sampling as it did for WSS sampling, but the
-alias-augmented B+Tree structure is quite a bit slower at WIRS than the
+alias-augmented B+tree structure is quite a bit slower at WIRS than the
 alias structure was at WSS. This results in \texttt{DE-WIRS} defeating
 the dynamic baseline by less of a margin in this test, but it still is
 superior in terms of sampling performance, and is still quite close in
@@ -81,7 +81,7 @@ being introduced by the dynamization.
 
 We next considered IRS queries. Figures~\ref{fig:irs-insert1} and
 \ref{fig:irs-sample1} show the results of our testing of single-threaded
-\texttt{DE-IRS} running in-memory against the in-memory ISAM Tree and
+\texttt{DE-IRS} running in-memory against the in-memory ISAM tree and
 \texttt{AGG B+tree}. The ISAM tree structure can be efficiently bulk-loaded,
 which results in a much faster construction time than the alias structure
 or alias-augmented B+tree. This gives it a significant update performance 
@@ -112,7 +112,7 @@ to answer queries. However, as the sample set size increases, this cost
 increasingly begins to pay off, with \texttt{DE-IRS} quickly defeating
 the dynamic structure in average per-sample latency. One other interesting
 note is the performance of the static ISAM tree, which begins on-par with
-the B+Tree, but also sees an improvement as the sample set size increases.
+the B+tree, but also sees an improvement as the sample set size increases.
 This is because of cache effects. During the initial tree traversal, both
 the B+tree and ISAM tree have a similar number of cache misses. However,
 the ISAM tree needs to perform its traversal only once, and then samples
diff --git a/chapters/sigmod23/experiment.tex b/chapters/sigmod23/experiment.tex
index 1eb704c..14f59a7 100644
--- a/chapters/sigmod23/experiment.tex
+++ b/chapters/sigmod23/experiment.tex
@@ -60,7 +60,7 @@ following dynamized structures,
 \item \textbf{DE-WSS.} An implementation of the dynamized alias
 structure~\cite{walker74} for weighted set sampling discussed
 in Section~\ref{ssec:wss-struct}. We compare this against a WSS
-implementation of Olken's method on a B+Tree with aggregate weight tags
+implementation of Olken's method on a B+tree with aggregate weight tags
 (\textbf{AGG-BTree})~\cite{olken95}, based on the B+tree implementation
 in the TLX library~\cite{tlx}.
 
@@ -71,24 +71,24 @@ Section~\ref{ssec:ext-concurrency} and an external version from
 Section~\ref{ssec:ext-external}. We compare the external and concurrent
 versions against the AB-tree~\cite{zhao22}, and the single-threaded,
 in memory version was compare with an IRS implementation of Olken's
-method on an AGG-BTree.
+method on an \texttt{AGG-BTree}.
 
 \item \textbf{DE-WIRS.} An implementation of the dynamized alias-augmented
-B+Tree~\cite{afshani17} as discussed in Section~\ref{ssec:wirs-struct} for
+B+tree~\cite{afshani17} as discussed in Section~\ref{ssec:wirs-struct} for
 weighted independent range sampling. We compare this against a WIRS
-implementation of Olken's method on an AGG-BTree.
+implementation of Olken's method on \texttt{AGG-BTree}.
 
 \end{itemize}
 
 All of the tested structures, with the exception of the external memory
-DE-IRS implementation and AB-Tree, were wholly contained within system
-memory. AB-Tree is a native external structure, so for the in-memory
+DE-IRS implementation and AB-tree, were wholly contained within system
+memory. AB-tree is a native external structure, so for the in-memory
 concurrency evaluation we configured it with enough cache to maintain
 the entire structure in memory to simulate an in-memory implementation.\footnote{
 	Because of the nature of sampling queries, traditional
-	efficient locking techniques for B+Trees are not able to be
-	used~\cite{zhao22}. The alternatives were to run AB-Tree in this
-	manner, or to globally lock the B+Tree for every operation. We
+	efficient locking techniques for B+trees are not able to be
+	used~\cite{zhao22}. The alternatives were to run AB-tree in this
+	manner, or to globally lock the B+tree for every operation. We
 	elected to use the former approach for this chapter. We used the
 	latter approach in the next chapter. 
 }
diff --git a/chapters/sigmod23/extensions.tex b/chapters/sigmod23/extensions.tex
index 053c8e2..f77574d 100644
--- a/chapters/sigmod23/extensions.tex
+++ b/chapters/sigmod23/extensions.tex
@@ -19,7 +19,7 @@ to reside in memory, and the rest on disk. This allows for the smallest
 few shards, which sustain the most reconstructions, to reside in memory
 for performance, while storing most of the data on disk, in an attempt
 to get the best of both worlds, so to speak.\footnote{
-	In traditional LSM Trees, which are an external data structure,
+	In traditional LSM trees, which are an external data structure,
 	only the memtable resides in memory. We have decided to break with
 	this model because, for query performance reasons, the mutable
 	buffer must remain small. By placing a few levels in memory, the
@@ -58,7 +58,7 @@ structure using in XDB~\cite{li19}.
 Because our dynamization technique is built on top of static data
 structures, a limited form of concurrency support is straightforward to
 implement. To that end, we created a proof-of-concept dynamization of an
-ISAM Tree for IRS based on a simplified version of a general concurrency
+ISAM tree for IRS based on a simplified version of a general concurrency
 controlled scheme for log-structured data stores~\cite{golan-gueta15}.
 
 First, we restrict ourselves to tombstone deletes. This ensures that
diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex
index b3a8215..1eb2589 100644
--- a/chapters/sigmod23/framework.tex
+++ b/chapters/sigmod23/framework.tex
@@ -512,19 +512,19 @@ Though it has thus far gone unmentioned, some readers may have
 noted the astonishing similarity between decomposition-based
 dynamization techniques, and a data structure called the Log-structured
 Merge-tree. First proposed by O'Neil in the mid '90s\cite{oneil96},
-the LSM Tree was designed to optimize write throughput for external data
+the LSM tree was designed to optimize write throughput for external data
 structures. It accomplished this task by buffer inserted records in a
-small in-memory AVL Tree, and then flushing this buffer to disk when
+small in-memory AVL tree, and then flushing this buffer to disk when
 it filled up. The flush process itself would fully rebuild the on-disk
-structure (a B+Tree), including all of the currently existing records
+structure (a B+tree), including all of the currently existing records
 on external storage. O'Neil also proposed version which used several,
 layered, external structures, to reduce the cost of reconstruction.
 
-In more recent times, the LSM Tree has seen significant development and
+In more recent times, the LSM tree has seen significant development and
 been used as the basis for key-value stores like RocksDB~\cite{dong21}
 and LevelDB~\cite{leveldb}. This work has produced an incredibly
 large and well explored parametrization of the reconstruction
-procedures of LSM Trees, a good summary of which can be bounded in
+procedures of LSM trees, a good summary of which can be bounded in
 this recent tutorial paper~\cite{sarkar23}. Examples of this design
 space exploration include: different ways to organize each "level"
 of the tree~\cite{dayan19, dostoevsky, autumn}, different growth
@@ -534,7 +534,7 @@ auxiliary structures attached to the main ones for accelerating certain
 types of query~\cite{dayan18-1, zhu21, monkey}. This work is discussed
 in greater depth in Chapter~\ref{chap:related-work}.
 
-Many of the elements within the LSM Tree design space are based upon the
+Many of the elements within the LSM tree design space are based upon the
 specifics of the data structure itself, and are not applicable to our
 use case.  However, some of the higher-level concepts can be imported and
 applied in the context of dynamization. Specifically, we have decided to
@@ -590,7 +590,7 @@ that can be used to help improve the performance of these searches,
 without requiring as much storage as adding auxiliary hash tables to
 every block, is to include bloom filters~\cite{bloom70}. A bloom filter
 is an approximate data structure that answers tests of set membership
-with bounded, single-sided error. These are commonly used in LSM Trees
+with bounded, single-sided error. These are commonly used in LSM trees
 to accelerate point lookups by allowing levels that don't contain the
 record being searched for to be skipped. In our case, we only care about
 tombstone records, so rather than building these filters over all records,
@@ -599,7 +599,7 @@ the sampling performance of the structure when tombstone deletes are used.
 
 \Paragraph{Layout Policy.} The Bentley-Saxe method considers blocks
 individually, without any other organization beyond increasing
-size. In contrast, LSM Trees have multiple layers of structural
+size. In contrast, LSM trees have multiple layers of structural
 organization. Record capacity restrictions are enforced on structures
 called \emph{levels}, which are partitioned into individual data
 structures, and then further organized into non-overlapping key ranges.
diff --git a/chapters/sigmod23/introduction.tex b/chapters/sigmod23/introduction.tex
index 8f0635d..7ff82cd 100644
--- a/chapters/sigmod23/introduction.tex
+++ b/chapters/sigmod23/introduction.tex
@@ -22,7 +22,7 @@ them. Existing implementations tend to sacrifice either performance,
 by requiring the entire result set of be materialized prior to applying
 Bernoulli sampling, or statistical independence. There exists techniques
 for obtaining both sampling performance and independence by leveraging
-existing B+Tree indices with slight modification~\cite{olken-thesis},
+existing B+tree indices with slight modification~\cite{olken-thesis},
 but even this technique has worse sampling performance than could be
 achieved using specialized static sampling indices.
author	Douglas Rumbaugh <dbr4@psu.edu>	2025-06-27 15:21:38 -0400
committer	Douglas Rumbaugh <dbr4@psu.edu>	2025-06-27 15:21:38 -0400
commit	fcdbcbcd45dc567792429bb314df53b42ed9f22e (patch)
tree	3f7c135b7b32022fa0a9f03361e60cc0cc4f86e0 /chapters/sigmod23
parent	ff528e8595e82802832930fae6c9ccee7afd23cb (diff)
download	dissertation-fcdbcbcd45dc567792429bb314df53b42ed9f22e.tar.gz