diff options
| author | Douglas Rumbaugh <dbr4@psu.edu> | 2025-05-09 14:08:31 -0400 |
|---|---|---|
| committer | Douglas Rumbaugh <dbr4@psu.edu> | 2025-05-09 14:08:31 -0400 |
| commit | 901a04fd8ec9a07b7bd195517a6d9e89da3ecab6 (patch) | |
| tree | 6ccda914b7112f55d4bdc54852016275f2613ea6 /chapters/sigmod23 | |
| parent | f1fcf8426764b2e8fc8de08a6d74968d2fbc1b27 (diff) | |
| download | dissertation-901a04fd8ec9a07b7bd195517a6d9e89da3ecab6.tar.gz | |
updates
Diffstat (limited to 'chapters/sigmod23')
| -rw-r--r-- | chapters/sigmod23/extensions.tex | 60 |
1 files changed, 35 insertions, 25 deletions
diff --git a/chapters/sigmod23/extensions.tex b/chapters/sigmod23/extensions.tex index d8a4247..2752b0f 100644 --- a/chapters/sigmod23/extensions.tex +++ b/chapters/sigmod23/extensions.tex @@ -1,31 +1,41 @@ \captionsetup[subfloat]{justification=centering} \section{Extensions to the Framework} \label{sec:discussion} -In this section, various extensions of the framework are considered. -Specifically, the applicability of the framework to external or distributed -data structures is discussed, as well as the use of the framework to add -automatic support for concurrent updates and sampling to extended SSIs. - -\Paragraph{Larger-than-Memory Data.} This framework can be applied to external -static sampling structures with minimal modification. As a proof-of-concept, -the IRS structure was extended with support for shards containing external ISAM -trees. This structure supports storing a configurable number of shards in -memory, and the rest on disk, making it well suited for operating in -memory-constrained environments. The on-disk shards contain standard ISAM -trees, with $8\text{KiB}$ page-aligned nodes. The external version of the -index only supports tombstone-based deletes, as tagging would require random -writes. In principle a hybrid approach to deletes is possible, where a delete -first searches the in-memory data for the record to be deleted, tagging it if -found. If the record is not found, then a tombstone could be inserted. As the -data size grows, though, and the preponderance of data is found on disk, this -approach would largely revert to the standard tombstone approach in practice. -External settings make the framework even more attractive, in terms of -performance characteristics, due to the different cost model. In external data -structures, performance is typically measured in terms of the number of IO -operations, meaning that much of the overhead introduced by the framework for -tasks like querying the mutable buffer, building auxiliary structures, extra -random number generations due to the shard alias structure, and the like, -become far less significant. +While this chapter has thus far discussed single-threaded, in-memory data +structures, the framework as proposed can be easily extended to support +other use-cases. In this section, we discuss extending this framework +to support concurrency and external data structures. + + +\Paragraph{Larger-than-Memory Data.} Our dynamization techniques, +as discussed thus far, can easily accomodate external data structures +as well as in-memory ones. To demonstrate this, we have implemented +a dynamized version of an external ISAM tree for use in answering IRS +queries. The mutable buffer remains an unsorted array in memory, however +the shards themselves can either \emph{either} an in-memory ISAM tree +or an external one. Our system allows for a user-configurable number of +shards and the rest on disk, for performance tuning purposes. + +The on-disk shards are built from standard ISAM trees using $8$ KiB +page-aligned internal and leaf nodes. To avoid random writes, we only +support tombstone-based deletes. Theoretically, it should be possible to +implement a hybrid approach, where deletes first search the in-memory +shards for the record and tag it if found, inserting a tombstone only +when it is not located. However, because of the geometric growth rate +of the shards, at any given time the majority of the data will be on +disk anyway, so this would only provide a marginal improvement. + +Our implementation does not include a buffer manager, for simplicty. The +external interface requires passing in page-aligned buffers. + + + + +\Paragraph{Applications to distributed data structures.} +Many distributed file-systems are built on immutable abstracted, such +Apache Spark's resilient distributed dataset (RDD)~\cite{rdd} or Hadoop's +immutable + Because the framework maintains immutability of shards, it is also well suited for use on top of distributed file-systems or with other distributed data |