From 5ffc53e69e956054fdefd1fe193e00eee705dcab Mon Sep 17 00:00:00 2001 From: Douglas Rumbaugh Date: Mon, 12 May 2025 19:59:26 -0400 Subject: Updates --- chapters/sigmod23/framework.tex | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) (limited to 'chapters/sigmod23/framework.tex') diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex index 89f15c3..0f3fac8 100644 --- a/chapters/sigmod23/framework.tex +++ b/chapters/sigmod23/framework.tex @@ -232,12 +232,13 @@ or are naturally determined as part of the pre-processing, and thus the $W(n)$ term can be merged into $P(n)$. \subsection{Supporting Deletes} +\ref{ssec:sampling-deletes} As discussed in Section~\ref{ssec:background-deletes}, the Bentley-Saxe method can support deleting records through the use of either weak -deletes, or a secondary ghost structure, assume certain properties are +deletes, or a secondary ghost structure, assuming certain properties are satisfied by either the search problem or data structure. Unfortunately, -neither approach can work as a "drop-in" solution in the context of +neither approach can work as a ``drop-in'' solution in the context of sampling problems, because of the way that deleted records interact with the sampling process itself. Sampling problems, as formalized here, are neither invertable, nor deletion decomposable. In this section, @@ -258,9 +259,9 @@ the structure with a tombstone bit set in the header. This mechanism is used to support \emph{ghost structure} based deletes. \end{enumerate} -Broadly speaking, for sampling problems, tombstone deletes cause a number -of problems because \emph{sampling problems are not invertible}. However, -this limitation can be worked around during the query process if desired. +Broadly speaking, for sampling problems, tombstone deletes cause a +number of problems because \emph{sampling problems are not invertible}. +This limitation can be worked around during the query process if desired. Tagging is much more natural for these search problems. However, the flexibility of selecting either option is desirable because of their different performance characteristics. @@ -527,8 +528,8 @@ unwieldy and are targetted at tuning the worst-case at the expense of the common case. We will take a different approach to adding configurability to our dynamization system. -Though it has thus far gone unmentioned, readers familiar with LSM Trees -may have noted the astonishing similarity between decomposition-based +Though it has thus far gone unmentioned, some readers may have +noted the astonishing similarity between decomposition-based dynamization techniques, and a data structure called the Log-structured Merge-tree. First proposed by O'Neil in the mid '90s\cite{oneil96}, the LSM Tree was designed to optmize write throughout for external data @@ -541,7 +542,7 @@ layered, external structures, to reduce the cost of reconstruction. In more recent times, the LSM Tree has seen significant development and been used as the basis for key-value stores like RocksDB~\cite{dong21} -and LevelDB~\cite{leveldb}. This work as produced an incredibly large +and LevelDB~\cite{leveldb}. This work has produced an incredibly large and well explored parameterization of the reconstruction procedures of LSM Trees, a good summary of which can be bound in this recent tutorial paper~\cite{sarkar23}. Examples of this design space exploration include: @@ -701,7 +702,7 @@ levels below it, which may require further reconstructions to occur to make room. The manner in which these reconstructions proceed follows the selection of layout policy, \begin{itemize} -\item[\textbf{Leveling}] When a buffer flush occurs in the leveling +\item \textbf{Leveling.} When a buffer flush occurs in the leveling policy, the system scans the existing levels to find the first level which has sufficient empty space to store the contents of the level above it. More formally, if the number of records in level $i$ is $N_i$, then @@ -711,8 +712,8 @@ empty level is added and $i$ is set to the index of this new level. Then, a reconstruction is executed containing all of the records in levels $i$ and $i - 1$ (where $i=-1$ indicates the temporary shard built from the buffer). Following this reconstruction, all levels $j < i$ are shifted -by one level. -\item[\textbf{Tiering}] When using tiering, the system will locate +by one level to $j + 1$. +\item \textbf{Tiering.} When using tiering, the system will locate the first level, $i$, containing fewer than $s$ shards. If no such level exists, then a new empty level is added and $i$ is set to the index of that level. Then, for each level $j < i$, a reconstruction -- cgit v1.2.3