diff options
| author | Douglas Rumbaugh <dbr4@psu.edu> | 2025-06-27 15:21:38 -0400 |
|---|---|---|
| committer | Douglas Rumbaugh <dbr4@psu.edu> | 2025-06-27 15:21:38 -0400 |
| commit | fcdbcbcd45dc567792429bb314df53b42ed9f22e (patch) | |
| tree | 3f7c135b7b32022fa0a9f03361e60cc0cc4f86e0 /chapters/sigmod23/framework.tex | |
| parent | ff528e8595e82802832930fae6c9ccee7afd23cb (diff) | |
| download | dissertation-fcdbcbcd45dc567792429bb314df53b42ed9f22e.tar.gz | |
updates
Diffstat (limited to 'chapters/sigmod23/framework.tex')
| -rw-r--r-- | chapters/sigmod23/framework.tex | 16 |
1 files changed, 8 insertions, 8 deletions
diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex index b3a8215..1eb2589 100644 --- a/chapters/sigmod23/framework.tex +++ b/chapters/sigmod23/framework.tex @@ -512,19 +512,19 @@ Though it has thus far gone unmentioned, some readers may have noted the astonishing similarity between decomposition-based dynamization techniques, and a data structure called the Log-structured Merge-tree. First proposed by O'Neil in the mid '90s\cite{oneil96}, -the LSM Tree was designed to optimize write throughput for external data +the LSM tree was designed to optimize write throughput for external data structures. It accomplished this task by buffer inserted records in a -small in-memory AVL Tree, and then flushing this buffer to disk when +small in-memory AVL tree, and then flushing this buffer to disk when it filled up. The flush process itself would fully rebuild the on-disk -structure (a B+Tree), including all of the currently existing records +structure (a B+tree), including all of the currently existing records on external storage. O'Neil also proposed version which used several, layered, external structures, to reduce the cost of reconstruction. -In more recent times, the LSM Tree has seen significant development and +In more recent times, the LSM tree has seen significant development and been used as the basis for key-value stores like RocksDB~\cite{dong21} and LevelDB~\cite{leveldb}. This work has produced an incredibly large and well explored parametrization of the reconstruction -procedures of LSM Trees, a good summary of which can be bounded in +procedures of LSM trees, a good summary of which can be bounded in this recent tutorial paper~\cite{sarkar23}. Examples of this design space exploration include: different ways to organize each "level" of the tree~\cite{dayan19, dostoevsky, autumn}, different growth @@ -534,7 +534,7 @@ auxiliary structures attached to the main ones for accelerating certain types of query~\cite{dayan18-1, zhu21, monkey}. This work is discussed in greater depth in Chapter~\ref{chap:related-work}. -Many of the elements within the LSM Tree design space are based upon the +Many of the elements within the LSM tree design space are based upon the specifics of the data structure itself, and are not applicable to our use case. However, some of the higher-level concepts can be imported and applied in the context of dynamization. Specifically, we have decided to @@ -590,7 +590,7 @@ that can be used to help improve the performance of these searches, without requiring as much storage as adding auxiliary hash tables to every block, is to include bloom filters~\cite{bloom70}. A bloom filter is an approximate data structure that answers tests of set membership -with bounded, single-sided error. These are commonly used in LSM Trees +with bounded, single-sided error. These are commonly used in LSM trees to accelerate point lookups by allowing levels that don't contain the record being searched for to be skipped. In our case, we only care about tombstone records, so rather than building these filters over all records, @@ -599,7 +599,7 @@ the sampling performance of the structure when tombstone deletes are used. \Paragraph{Layout Policy.} The Bentley-Saxe method considers blocks individually, without any other organization beyond increasing -size. In contrast, LSM Trees have multiple layers of structural +size. In contrast, LSM trees have multiple layers of structural organization. Record capacity restrictions are enforced on structures called \emph{levels}, which are partitioned into individual data structures, and then further organized into non-overlapping key ranges. |