summaryrefslogtreecommitdiffstats
path: root/chapters/sigmod23/framework.tex
diff options
context:
space:
mode:
authorDouglas Rumbaugh <dbr4@psu.edu>2025-05-12 19:59:26 -0400
committerDouglas Rumbaugh <dbr4@psu.edu>2025-05-12 19:59:26 -0400
commit5ffc53e69e956054fdefd1fe193e00eee705dcab (patch)
tree74fd32db95211d0be067d22919e65ac959e4fa46 /chapters/sigmod23/framework.tex
parent901a04fd8ec9a07b7bd195517a6d9e89da3ecab6 (diff)
downloaddissertation-5ffc53e69e956054fdefd1fe193e00eee705dcab.tar.gz
Updates
Diffstat (limited to 'chapters/sigmod23/framework.tex')
-rw-r--r--chapters/sigmod23/framework.tex23
1 files changed, 12 insertions, 11 deletions
diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex
index 89f15c3..0f3fac8 100644
--- a/chapters/sigmod23/framework.tex
+++ b/chapters/sigmod23/framework.tex
@@ -232,12 +232,13 @@ or are naturally determined as part of the pre-processing, and thus the
$W(n)$ term can be merged into $P(n)$.
\subsection{Supporting Deletes}
+\ref{ssec:sampling-deletes}
As discussed in Section~\ref{ssec:background-deletes}, the Bentley-Saxe
method can support deleting records through the use of either weak
-deletes, or a secondary ghost structure, assume certain properties are
+deletes, or a secondary ghost structure, assuming certain properties are
satisfied by either the search problem or data structure. Unfortunately,
-neither approach can work as a "drop-in" solution in the context of
+neither approach can work as a ``drop-in'' solution in the context of
sampling problems, because of the way that deleted records interact with
the sampling process itself. Sampling problems, as formalized here,
are neither invertable, nor deletion decomposable. In this section,
@@ -258,9 +259,9 @@ the structure with a tombstone bit set in the header. This mechanism is
used to support \emph{ghost structure} based deletes.
\end{enumerate}
-Broadly speaking, for sampling problems, tombstone deletes cause a number
-of problems because \emph{sampling problems are not invertible}. However,
-this limitation can be worked around during the query process if desired.
+Broadly speaking, for sampling problems, tombstone deletes cause a
+number of problems because \emph{sampling problems are not invertible}.
+This limitation can be worked around during the query process if desired.
Tagging is much more natural for these search problems. However, the
flexibility of selecting either option is desirable because of their
different performance characteristics.
@@ -527,8 +528,8 @@ unwieldy and are targetted at tuning the worst-case at the expense of the
common case. We will take a different approach to adding configurability
to our dynamization system.
-Though it has thus far gone unmentioned, readers familiar with LSM Trees
-may have noted the astonishing similarity between decomposition-based
+Though it has thus far gone unmentioned, some readers may have
+noted the astonishing similarity between decomposition-based
dynamization techniques, and a data structure called the Log-structured
Merge-tree. First proposed by O'Neil in the mid '90s\cite{oneil96},
the LSM Tree was designed to optmize write throughout for external data
@@ -541,7 +542,7 @@ layered, external structures, to reduce the cost of reconstruction.
In more recent times, the LSM Tree has seen significant development and
been used as the basis for key-value stores like RocksDB~\cite{dong21}
-and LevelDB~\cite{leveldb}. This work as produced an incredibly large
+and LevelDB~\cite{leveldb}. This work has produced an incredibly large
and well explored parameterization of the reconstruction procedures of
LSM Trees, a good summary of which can be bound in this recent tutorial
paper~\cite{sarkar23}. Examples of this design space exploration include:
@@ -701,7 +702,7 @@ levels below it, which may require further reconstructions to occur to
make room. The manner in which these reconstructions proceed follows the
selection of layout policy,
\begin{itemize}
-\item[\textbf{Leveling}] When a buffer flush occurs in the leveling
+\item \textbf{Leveling.} When a buffer flush occurs in the leveling
policy, the system scans the existing levels to find the first level
which has sufficient empty space to store the contents of the level above
it. More formally, if the number of records in level $i$ is $N_i$, then
@@ -711,8 +712,8 @@ empty level is added and $i$ is set to the index of this new level. Then,
a reconstruction is executed containing all of the records in levels $i$
and $i - 1$ (where $i=-1$ indicates the temporary shard built from the
buffer). Following this reconstruction, all levels $j < i$ are shifted
-by one level.
-\item[\textbf{Tiering}] When using tiering, the system will locate
+by one level to $j + 1$.
+\item \textbf{Tiering.} When using tiering, the system will locate
the first level, $i$, containing fewer than $s$ shards. If no such
level exists, then a new empty level is added and $i$ is set to the
index of that level. Then, for each level $j < i$, a reconstruction