more updates

author: Douglas B. Rumbaugh <doug@douglasrumbaugh.com> 2025-06-01 15:09:25 -0400
committer: Douglas B. Rumbaugh <doug@douglasrumbaugh.com> 2025-06-01 15:09:25 -0400
commit: b8ae600d0d139fa76f8350f13f058b2cb795b692 (patch)
tree: e07f4264946863e7a145881820cf80ae43301e33
parent: cd3447f1cad16972e8a659ec6e84764c5b8b2745 (diff)
download: dissertation-b8ae600d0d139fa76f8350f13f058b2cb795b692.tar.gz
7 files changed, 435 insertions, 95 deletions
diff --git a/chapters/beyond-dsp.tex b/chapters/beyond-dsp.tex
index 73f8174..222dd14 100644
--- a/chapters/beyond-dsp.tex
+++ b/chapters/beyond-dsp.tex
@@ -53,7 +53,7 @@ that can be supported by our dynamization technique.
 
 \subsection{Extended Decomposability}
 \label{ssec:edsp}
-As discussed in Chapter~\cite{chap:background}, the standard query model
+As discussed in Chapter~\ref{chap:background}, the standard query model
 used by dynamization techniques requires that a given query be broadcast,
 unaltered, to each block within the dynamized structure, and then that
 the results from these identical local queries be efficiently mergeable
@@ -1533,7 +1533,7 @@ points for each node. This structure can answer $k$-NN queries in
 $\Theta(k \log n)$ time.
 
 Our dynamized query procedure is implemented based on
-Algorithm~\cite{alg:idsp-knn}, though using delete tagging instead of
+Algorithm~\ref{alg:idsp-knn}, though using delete tagging instead of
 tombstones. VPTree doesn't support efficient point lookups, and so to
 work around this we add a hash map to each shard, mapping each record to
 its location in storage, to ensure that deletes can be done efficiently
diff --git a/chapters/dynamization.tex b/chapters/dynamization.tex
index a2277c3..db60d2e 100644
--- a/chapters/dynamization.tex
+++ b/chapters/dynamization.tex
@@ -1216,6 +1216,14 @@ showed how amortized global reconstruction can be used to dynamize data
 structures associated with search problems having certain properties. We
 examined several theoretical approaches for dynamization, including the
 equal block method, the Bentley-Saxe method, and a worst-case insertion
+@manual{nxp:tja1043,
+    organization  = "NXP Semiconductors",
+    title         = "High-speed CAN transceiver",
+    number        = "TJA1043",
+    year          =  2013,
+    month         =  4,    
+    note          = "Rev. 3"
+}
 optimized approach. Additionally, we considered several more classes of
 search problem, and saw how additional properties could be used to enable
 more efficient reconstruction, and support for efficiently deleting
diff --git a/chapters/introduction.tex b/chapters/introduction.tex
index 75fad37..050a58b 100644
--- a/chapters/introduction.tex
+++ b/chapters/introduction.tex
@@ -1,6 +1,8 @@
 \chapter{Introduction}
 \label{chap:intro}
 
+\section{Motiviation}
+
 Modern relational database management systems (RDBMS) are founded
 upon a set-based representation of data~\cite{codd70}. This model is
 very flexible and can be used to represent data of a wide variety of
@@ -9,45 +11,57 @@ more. However, this flexibility comes at a significant cost in terms of
 its ability to answer queries: the most basic data access operation is
 a linear table scan.
 
-To work around this limitation, RDBMS support the creation of special
-data structures called indices, which can be used to accelerate
-particular types of query, and feature sophisticated query planning and
-optimization systems that can identify opportunities to utilize these
-indices~\cite{cowbook}. This approach works well for particular types
-of queries for which an index has been designed and integrated into
-the database. Unfortunately, many RDBMS only support a very limited
-set of indices for accelerating single dimensional range queries and
-point-lookups~\cite{mysql-btree-hash, cowbook}.
+To work around this limitation, RDBMS support the creation of special data
+structures called indices, which can be used to accelerate particular
+types of query. To take full advantage of these structures, databases
+feature sophisticated query planning and optimization systems that can
+identify opportunities to utilize these indices~\cite{cowbook}. This
+approach works well for particular types of queries for which an index
+has been designed and integrated into the database. Unfortunately, many
+RDBMS only support a very limited set of indices for accelerating single
+dimensional range queries and point-lookups~\cite{mysql-btree-hash,
+cowbook}.
 
 This situation is unfortunate, because one of the major challenges
 currently facing data systems is the processing of complex analytical
 queries of varying types over large sets of data. These queries and
 data types are supported, nominally, by a relational database, but are
 not well addressed by existing indexing techniques and as a result have
-horrible performance. This has led to the development of a variety of
+poor performance. This has led to the development of a variety of
 specialized systems for particular types of query, such as spatial
 systems~\cite{postgis-doc}, vector databases~\cite{pinecone-db},
-and graph databases~\cite{neptune, neo4j}.  The development of these
-indexes is difficult because of the requirements placed on them by data
-processing systems. Data is frequently subject to updates, yet a large
-number of potentially useful data structures are static.  Further,
-many large-scale data processing systems are highly concurrent, which
-increases the barrier to entry even further. The process for developing
-data structures that satisfy these requirements is arduous.
-
-To demonstrate this difficulty, consder the recent example of the
-evolution of learned indexes. These are data structures designed to
-efficiently solve a simple problem: single dimensional range queries
-over sorted data. They seek to reduce the size of the structure, as
-well as lookup times, by replacing a traditional data structure with a
-learned model capable of predicting the location of a record in storage
-that matches a key value to within bounded error. This concept was first
-proposed by Kraska et al. in 2017, when they published a paper on the
-first learned index, RMI~\cite{RMI}. This index succeeding in showing
-that a learned model can be both faster and smaller than a conventional
-range index, but the proposed solution did not support updates. The
-first (non-concurrently) updatable learned index, ALEX, took a year
-and a half to appear~\cite{alex}. Over the course of the subsequent
+and graph databases~\cite{neptune, neo4j}. At the heart of these
+specialized systems are specialized indices, and the accompanying query
+processing and optimization architectures necessary to utilize them
+effectively. However, the development a novel data processing system
+for a specific type of query is not a trivial process. While specialized
+data structures, which often already exist, are at the heart of such
+systems, meaningfully using such a data structure in a database requires
+adding a large number of additional features. 
+
+To be useful within the context of a database, a data structure must
+support inserts and deletes (collectively referred to as updates),
+as well as concurrency support that satisfies standardized isolation
+semantics~\cite{cowbook}, support for crash recovery of the index in the
+case of a system failure~\cite{aries}, and possibly more. As an example,
+a recent work on extended Datalog with support for user-defined data
+structures showed significant improvements in query processing time and
+space requirements, but required that the user-defined structures have
+support for concurrent updates~\cite{byods-datalog}.  The process of
+adding these features to data structures that currently lack them is
+not straightfoward and can take an extensive amount of time and effort.
+
+As a current example that demonstrates this problem, consider the recent
+development of learned indices. These are a broad class of data structure
+that use various techniques to approximate a function mapping a key onto
+its location in storage. Theoretically, this model allows for better
+space efficiency of the index, as well as improved lookup performance.
+This concept was first proposed by Kraska et al. in 2017, when they
+published a paper on the first learned index, RMI~\cite{RMI}. This index
+succeeding in showing that a learned model can be both faster and smaller
+than a conventional range index, but the proposed solution did not support
+updates. The first (non-concurrently) updatable learned index, ALEX, took
+a year and a half to appear~\cite{alex}. Over the course of the subsequent
 three years, several learned indexes were proposed with concurrency
 support~\cite{10.1145/3332466.3374547,10.14778/3489496.3489512} but a
 recent performance study~\cite{10.14778/3551793.3551848} showed that these
@@ -57,38 +71,141 @@ design, ALEX+, was able to outperform ART-OLC under certain circumstances,
 but even with this result learned indexes are not generally considered
 production ready, because they suffer from significant performance
 regressions under certain workloads, and are highly sensitive to the
-distribution of keys~\cite{10.14778/3551793.3551848}.  Despite the
+distribution of keys~\cite{10.14778/3551793.3551848,alex-aca}.  Despite the
 demonstrable advantages of the technique and over half a decade of
 development, learned indexes still have not reached a generally usable
-state.\footnote{
-    In Chapter~\ref{chap:framework}, we apply our proposed technique to
-    existing static learned indexes to produce an effective dynamic index.
-}
-
-This work proposes a strategy for addressing this problem by providing a
-framework for automatically introducing support for concurrent updates
-(including both inserts and deletes) to many static data structures. With
-this framework, a wide range of static, or otherwise impractical, data
-structures will be made practically useful in data systems. Based
-on a classical, theoretical framework called the Bentley-Saxe
-Method~\cite{saxe79}, the proposed system will provide a library
-that can automatically extend many data structures with support for
-concurrent updates, as well as a tunable design space to allow for the
-user to make trade-offs between read performance, write performance,
-and storage usage. The framework will address a number of limitations
-present in the original technique, widely increasing its applicability
-and practicality. It will also provide a workload-adaptive, online tuning
-system that can automatically adjust the tuning parameters of the data
-structure in the face of changing workloads.
-
-This framework is based on the splitting of the data structure into
-several smaller pieces, which are periodically reconstructed to support
-updates. A systematic partitioning and reconstruction approach is used
-to provide specific guarantees on amortized insertion performance, and
-worst case query performance.  The underlying Bentley-Saxe method is
-extended using a novel query abstraction to broaden its applicability,
-and the partitioning and reconstruction processes are adjusted to improve
-performance and introduce configurability.
+state.
+
+It would not be an exaggeration to say that there are dozens of novel data
+structures proposed each year at data structures and systems conferences,
+many of which solve useful problems. However, the burden of producing
+a useful database index from these structures is great, and many of
+them either never see use, or at least require a significant amount
+of time and effort before they can be deployed. If there were a way to
+bypass much of this additional development time by \emph{automatically}
+extending the feature set of an existing data structure to produce a
+usable index, many of these structures would become readily accessible
+to database practitioners, and the capabilities of database systems could
+be greatly enhanced. It is our goal with this work to make a significant
+step in this direction.
+
+\section{Existing Attempts}
+
+At present, there are several lines of work targetted at reducing
+the development burden associated with creating specialized indices. We
+classify them into three broad categories,
+
+\begin{itemize}
+\item \textbf{Automatic Index Composition.} This line of work seeks to
+automatically compose an instance-optimized data structure for indexing
+static data by examining the workload and combining a collection of basic
+primitive structures to optimize performance.
+
+\item \textbf{Generalized Index Templates.} This line of work seeks
+to introduce generalized data structures with built-in support for
+updates, concurrency, crash recovery, etc., that have user configurable
+behavior. The user can define various operations and data types according
+to the template, and a corresponding customized index is automatically
+constructed.
+
+\item \textbf{Automatic Feature Extension.} This line of work seeks to 
+take data structures that lack specific features, and automatically add
+these without requiring adjustment to the data structure itself. This is
+most commonly used to add update support, in which case the process is
+called \emph{dynamization}.
+
+\end{itemize}
+We'll briefly discuss each of these three lines, and their limitations,
+in this section. A more detailed discussion of the first two of these
+lines can be found in Chapter~\ref{chap:related-work}, and the third
+will be extensively discussed in Chapter~\ref{chap:background}.
+
+Automatic index composition has been considered in a variety of
+papers~\cite{periodic-table,ds-alchemy,fluid-ds,gene,cosine}, each considering
+differing sets of data structure primitives and different techniques for
+composing the structure. The general principle across all incarnations
+of the technique is to consider a (usually static) set of data, and a
+workload consisting of single-dimensional range queries and point lookups.
+The system then analyzes the workload, either statically or in real time,
+selects specific primitive structures optimized for certain operations
+(e.g., hash table-like structures for point lookups, sorted runs for range
+scans), and applies them to different regions of the data, in an attempt
+to maximize the overall performance of the workload. Although some work
+in this area suggests generalization to more complex data types, such
+as multi-dimensional data~\cite{fluid-ds}, this line is broadly focused
+on creating instance-optimal indices for workloads that databases are
+already well equiped to handle. While this task is quite important, it
+is not precisely the work that we are trying to accomplish here. And,
+because the techniques are limited to specified sets of structural
+primitives, it isn't clear that the approach can be usefully extended
+to support \emph{arbitrary} query and data types. We thus consider this
+line to be largely orthogonal to ours.
+
+The second approach, generalized index templates, \emph{does} attempt
+to address the problem of expanding indexing support of databases to
+a broader set of queries. The two primary exemplars of this approach
+are the generalized search tree (GiST)~\cite{gist, concurrent-gist}
+and the generalized inverted index (GIN)~\cite{pg-gin}, both of which
+have integrated into PostgreSQL~\cite{pg-gist, pg-gin}. GiST enables
+generalized predicate filtering over user-defined data types, and
+GIN generalizes an inverted index for text search. While powerful,
+these techniques are limited by the specific data structure that they
+are based upon, in a similar way that automatic index composition
+techniques are limited by their set of defined primitives. As a
+result, generalized index templates cannot support queries (e.g.,
+independent range sampling~\cite{hu14}) or data structures (e.g.,
+succinct tries~\cite{zhang18}) that do not fit their underlying models.
+Expanding these underlying models by introducing a new generalized index
+faces the same challenges as any other index development program. Thus, 
+while generalized index templates are a significant contribution in this
+area, they are not a general solution to the fundamental problem of the
+difficulties of index development.
+
+The final approach is  automatic feature extension. More specifically,
+we will consider dynamization,\footnote{
+	This is alternative called a static-to-dynamic transformation,
+	or dynamic extension, depending upon the source. These terms
+	all refer to the same process.
+} the automatic extension of an existing static data structure with
+support for inserts and deletes. The most general of these techniques
+are based on amortized global reconstruction~\cite{overmars83},
+an approach that divides a single data structure up into smaller
+structures, called blocks, built over disjoint partitions of the
+data. Inserts and deletes can then be supported by selectively
+rebuilding these blocks. The most commonly used version of this
+approach is the Bentley-Saxe method~\cite{saxe79}, which has been
+individually applied to several specific data structures in past
+work~\cite{almodaresi23,pgm,naidan14,xie21,bkdtree}. Dynamization
+of this sort is not a fully general solution though; it places
+a number of restrictions on the data structures and queries that
+it can support.  These limitations will be discussed at length in
+Chapter~\ref{chap:background}, but briefly they include: (1) restrictions
+on query types that can be supported, as well as even stricter constraints
+on when deletes are supported, (2) a lack of useful performance configuration,
+and  (3) sub-optimal performance characteristics, particularly in terms of
+insertion tail latencies. 
+
+Of the three approaches, we believe the latter to be the most promising
+from the prospective of easing the development of novel indices
+for specialized queries and data types. While dynamization does have
+limitations, they are less onerous than the other two approaches. This
+is because dynamization is unburdened by specific selections of primitive
+data layouts; rather, any existing (or novel) data structure can be used.
+
+\section{Our Work}
+
+The work described by this document is focused towards addressing, at
+least in part, the limitations of dynamization mentioned in the previous
+section. We discuss general strategies for overcoming each limitation in
+the context of the most popular dynamization technique: the Bentley-Saxe
+method. We then present a generalized dynamization framework based upon
+this discussion. This framework is capable of automatically adding
+support for concurrent inserts, deletes, and queries to a wide range
+of static data structures, including ones not supported by traditional
+dynamization techniques. Included in this framework is a tunable design
+space, allowing for trade-offs between query and update performance, and
+mitigations to control the significant insertion tail latency problems
+faced by most classical dynamization techniques.
 
 Specifically, the proposed work will address the following points,
 \begin{enumerate}
@@ -99,19 +216,27 @@ Specifically, the proposed work will address the following points,
           for automatically dynamizing static data structures in a performant
           and configurable manner.
     \item The extension of this system with support for concurrent operations,
-          and the use of concurrency to provide more effective worst-case
+          and the use of parallelism to provide more effective worst-case
           performance guarantees.
 \end{enumerate}
 
 The rest of this document is structured as follows. First,
-Chapter~\ref{chap:background} introduces relevant background information,
-including the importance of data structures and indexes in database systems,
-the concept of a search problem, and techniques for designing updatable data
-structures. Next, in Chapter~\ref{chap:sampling}, the application of the
-Bentley-Saxe method to a number of sampling data structures is presented. The
-extension of these structures introduces a number of challenges which must be
-addressed, resulting in significant modification of the underlying technique.
-Then, Chapter~\ref{chap:framework} discusses the generalization of the
-modifications from the sampling framework into a more general framework.
-Chapter~\ref{chap:proposed} discusses the work that remains to be completed as
-part of this project, and Chapter~\ref{chap:conclusion} concludes the work.
+Chapter~\ref{chap:background} introduces relevant background information
+about classical dynamization techniques and serves as a foundation
+for the discussion to follow. In Chapter~\ref{chap:sampling},
+we consider one specific example of a query type not supported
+by traditional dynamization systems, and propose a framework that
+addresses the underlying problems and enables dynamization for these
+problems. Next, in Chapter~\ref{chap:framework}, we use the results
+from the previous chapter to propose novel extensions to the search
+problem taxonomy and generalized mechanisms for supporting dynamization
+of these new types of problem, culminating in a general dynamization
+framework. Chapter~\ref{chap:design-space} unifies our discussion
+of configuration parameters of our dynamizations from the previous
+two chapters, and formally considers the design space and trade-offs
+within it. In Chapter~\ref{chap:tail-latency}, we consider the problem
+of insertion tail latency, and extend our framework with support for
+techniques to mitigate this problem. Chapter~\ref{chap:related-work}
+contains a more detailed discussion of works related to our own and the
+ways in which are approaches differ, and finally Chapter~\ref{chap:conclusion}
+concludes the work.
diff --git a/chapters/sigmod23/background.tex b/chapters/sigmod23/background.tex
index d600c27..88f2585 100644
--- a/chapters/sigmod23/background.tex
+++ b/chapters/sigmod23/background.tex
@@ -114,7 +114,7 @@ of problems that will be directly addressed within this chapter.
 \subsection{Algorithmic Solutions}
 
 Relational database systems often have native support for IQS using
-SQL's \texttt{TABLESAMPLE} operator~\cite{postgress-doc}. However, the
+SQL's \texttt{TABLESAMPLE} operator~\cite{postgres-doc}. However, the
 algorithms used to implement this operator have significant limitations
 and do not allow users to maintain statistical independence of the results
 without also running the query to be sampled from in full. Thus, users must
@@ -137,7 +137,7 @@ in full anyway before returning only some of the results.\footnote{
 For performance, the statistical guarantees can be discarded and
 systematic or block sampling used instead. Systematic sampling considers
 only a fraction of the rows in the table being sampled from, following
-some particular pattern~\cite{postgress-doc}, and block sampling samples
+some particular pattern~\cite{postgres-doc}, and block sampling samples
 entire database pages~\cite{db2-doc}. These allow for query performance
 to be decoupled from data size, but tie a given record's inclusion in the
 sample set directly to its physical storage location, which can introduce
diff --git a/chapters/sigmod23/framework.tex b/chapters/sigmod23/framework.tex
index 804194b..d51c2cb 100644
--- a/chapters/sigmod23/framework.tex
+++ b/chapters/sigmod23/framework.tex
@@ -29,7 +29,7 @@ more efficient query process if we abandon the DSP model and consider
 a slightly more complicated procedure.
 
 First, we'll define the IQS problem in terms of the notation and concepts
-used in Chapter~\cite{chap:background} for search problems,
+used in Chapter~\ref{chap:background} for search problems,
 
 \begin{definition}[Independent Query Sampling Problem]
 	Given a search problem, $F$, a query sampling problem is function
diff --git a/chapters/sigmod23/introduction.tex b/chapters/sigmod23/introduction.tex
index 1a33c2e..8f0635d 100644
--- a/chapters/sigmod23/introduction.tex
+++ b/chapters/sigmod23/introduction.tex
@@ -29,7 +29,7 @@ achieved using specialized static sampling indices.
 Thus, we decided to attempt to apply a Bentley-Saxe based dynamization
 technique to these data structures. In this chapter, we discuss our
 approach, which addresses the decomposability problems discussed in
-Section~\cite{ssec:background-irs}, introduces two physical mechanisms
+Section~\ref{ssec:decomp-limits}, introduces two physical mechanisms
 for support deletes, and also introduces an LSM-tree inspired design
 space to allow for performance tuning. The results in this chapter are
 highly specialized to sampling problems, however they will serve as a
diff --git a/references/references.bib b/references/references.bib
index 868a65b..f2c88fa 100644
--- a/references/references.bib
+++ b/references/references.bib
@@ -506,18 +506,38 @@
 }
 
 @misc {postgres-doc,
+    author = {The PostgreSQL Global Development Group},
     title = {PostgreSQL Documentation},
     url = {https://www.postgresql.org/docs/15/sql-select.html},
     year = {2025}
 }
 
+@misc {pg-gist,
+    author = {The PostgreSQL Global Development Group},
+    title = {PostgreSQL Documentation: GiST Indexes},
+    url = {https://www.postgresql.org/docs/8.1/gist.html},
+    year = {2025}
+}
+
+
+@misc{pg-gin,
+    author = {The PostgreSQL Global Development Group},
+    title = {GIN Indexes},
+    url = {https://www.postgresql.org/docs/16/gin.html},
+    year = {2024},
+    lastaccessed = {April, 2024}
+}
+
 @misc {db2-doc,
+	author = {IBM},
     title = {IBM DB2 Documentation},
     url = {https://www.ibm.com/docs/en/db2/12.1.0?topic=design-data-sampling-in-queries},
     year = {2025}
 }
 
 
+
+
 @online {pinecone,
     title = {Pinecone DB},
     url = {https://www.pinecone.io/},
@@ -1133,15 +1153,6 @@ booktitle = {Proceedings of the 2018 International Conference on Management of D
 series = {SIGMOD '18}
 }
 
-@article{10.14778/3551793.3551848,
-author = {Wongkham, Chaichon and Lu, Baotong and Liu, Chris and Zhong, Zhicong and Lo, Eric and Wang, Tianzheng},
-title = {Are Updatable Learned Indexes Ready?},
-year = {2022},
-publisher = {VLDB Endowment},
-volume = {15},
-number = {11},
-journal = {Proc. VLDB Endow.},
-}
 
 @article{10.14778/2850583.2850584,
 author = {Wang, Lu and Christensen, Robert and Li, Feifei and Yi, Ke},
@@ -1227,13 +1238,6 @@ journal = {Proc. VLDB Endow.},
   bibsource    = {dblp computer science bibliography, https://dblp.org}
 }
 
-@inproceedings{10.1145/3332466.3374547,
-author = {Tang, Chuzhe and Wang, Youyun and Dong, Zhiyuan and Hu, Gansen and Wang, Zhaoguo and Wang, Minjie and Chen, Haibo},
-title = {XIndex: A Scalable Learned Index for Multicore Data Storage},
-year = {2020},
-booktitle = {Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
-series = {PPoPP '20}
-}
 
 @article{10.14778/3489496.3489512,
 author = {Li, Pengfei and Hua, Yu and Jia, Jingnan and Zuo, Pengfei},
@@ -1837,3 +1841,206 @@ keywords = {analytic model, analysis of algorithms, overflow chaining, performan
   bibsource    = {dblp computer science bibliography, https://dblp.org}
 }
 
+
+@article{alex-aca,
+  author       = {Rui Yang and
+                  Evgenios M. Kornaropoulos and
+                  Yue Cheng},
+  title        = {Algorithmic Complexity Attacks on Dynamic Learned Indexes},
+  journal      = {PVLDB},
+  volume       = {17},
+  number       = {4},
+  pages        = {780--793},
+  year         = {2023},
+  url          = {https://www.vldb.org/pvldb/vol17/p780-yang.pdf},
+  timestamp    = {Tue, 26 Mar 2024 22:14:29 +0100},
+  biburl       = {https://dblp.org/rec/journals/pvldb/YangKC23.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+
+
+@article{cosine,
+  author       = {Subarna Chatterjee and
+                  Meena Jagadeesan and
+                  Wilson Qin and
+                  Stratos Idreos},
+  title        = {Cosine: {A} Cloud-Cost Optimized Self-Designing Key-Value Storage
+                  Engine},
+  journal      = {PVLDB},
+  volume       = {15},
+  number       = {1},
+  pages        = {112--126},
+  year         = {2021},
+  url          = {http://www.vldb.org/pvldb/vol15/p112-chatterjee.pdf},
+  doi          = {10.14778/3485450.3485461},
+  timestamp    = {Thu, 21 Apr 2022 17:09:21 +0200},
+  biburl       = {https://dblp.org/rec/journals/pvldb/ChatterjeeJQI21.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+
+
+
+@inproceedings{gist,
+  author       = {Joseph M. Hellerstein and
+                  Jeffrey F. Naughton and
+                  Avi Pfeffer},
+  editor       = {Umeshwar Dayal and
+                  Peter M. D. Gray and
+                  Shojiro Nishio},
+  title        = {Generalized Search Trees for Database Systems},
+  booktitle    = {VLDB},
+  pages        = {562--573},
+  publisher    = {Morgan Kaufmann},
+  year         = {1995},
+  url          = {http://www.vldb.org/conf/1995/P562.PDF},
+  timestamp    = {Tue, 20 Feb 2018 15:19:44 +0100},
+  biburl       = {https://dblp.org/rec/conf/vldb/HellersteinNP95.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+
+@inproceedings{concurrent-gist,
+  author       = {Marcel Kornacker and
+                  C. Mohan and
+                  Joseph M. Hellerstein},
+  editor       = {Joan Peckham},
+  title        = {Concurrency and Recovery in Generalized Search Trees},
+  booktitle    = {SIGMOD},
+  pages        = {62--72},
+  publisher    = {{ACM} Press},
+  year         = {1997},
+  url          = {https://doi.org/10.1145/253260.253272},
+  doi          = {10.1145/253260.253272},
+  timestamp    = {Mon, 14 Jun 2021 15:39:36 +0200},
+  biburl       = {https://dblp.org/rec/conf/sigmod/KornackerMH97.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+
+
+@article{periodic-table,
+  author       = {Stratos Idreos and
+                  Kostas Zoumpatianos and
+                  Manos Athanassoulis and
+                  Niv Dayan and
+                  Brian Hentschel and
+                  Michael S. Kester and
+                  Demi Guo and
+                  Lukas M. Maas and
+                  Wilson Qin and
+                  Abdul Wasay and
+                  Yiyou Sun},
+  title        = {The Periodic Table of Data Structures},
+  journal      = {{IEEE} Data Eng. Bull.},
+  volume       = {41},
+  number       = {3},
+  pages        = {64--75},
+  year         = {2018},
+  url          = {http://sites.computer.org/debull/A18sept/p64.pdf},
+  timestamp    = {Tue, 10 Mar 2020 16:23:50 +0100},
+  biburl       = {https://dblp.org/rec/journals/debu/IdreosZADHKGMQW18.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+
+@article{ds-alchemy,
+  author       = {Stratos Idreos and
+                  Kostas Zoumpatianos and
+                  Subarna Chatterjee and
+                  Wilson Qin and
+                  Abdul Wasay and
+                  Brian Hentschel and
+                  Mike S. Kester and
+                  Niv Dayan and
+                  Demi Guo and
+                  Minseo Kang and
+                  Yiyou Sun},
+  title        = {Learning Data Structure Alchemy},
+  journal      = {{IEEE} Data Eng. Bull.},
+  volume       = {42},
+  number       = {2},
+  pages        = {47--58},
+  year         = {2019},
+  url          = {http://sites.computer.org/debull/A19june/p47.pdf},
+  timestamp    = {Tue, 10 Mar 2020 16:23:49 +0100},
+  biburl       = {https://dblp.org/rec/journals/debu/IdreosZCQWHKDGK19.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+
+
+@article{gene,
+  author       = {Jens Dittrich and
+                  Joris Nix and
+                  Christian Sch{\"{o}}n},
+  title        = {The next 50 Years in Database Indexing or: The Case for Automatically
+                  Generated Index Structures},
+  journal      = {PVLDB},
+  volume       = {15},
+  number       = {3},
+  pages        = {527--540},
+  year         = {2021},
+  url          = {http://www.vldb.org/pvldb/vol15/p527-dittrich.pdf},
+  doi          = {10.14778/3494124.3494136},
+  timestamp    = {Thu, 21 Apr 2022 17:09:21 +0200},
+  biburl       = {https://dblp.org/rec/journals/pvldb/DittrichNS21.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+
+@inproceedings{fluid-ds,
+  author       = {Darshana Balakrishnan and
+                  Lukasz Ziarek and
+                  Oliver Kennedy},
+  editor       = {Alvin Cheung and
+                  Kim Nguyen},
+  title        = {Fluid data structures},
+  booktitle    = {Proceedings of the 17th {ACM} {SIGPLAN} International Symposium on
+                  Database Programming Languages},
+  pages        = {3--17},
+  publisher    = {{ACM}},
+  year         = {2019},
+  url          = {https://doi.org/10.1145/3315507.3330197},
+  doi          = {10.1145/3315507.3330197},
+  timestamp    = {Sun, 12 Nov 2023 02:16:34 +0100},
+  biburl       = {https://dblp.org/rec/conf/dbpl/BalakrishnanZK19.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+
+
+@inproceedings{10.1145/3332466.3374547,
+author = {Tang, Chuzhe and Wang, Youyun and Dong, Zhiyuan and Hu, Gansen and Wang, Zhaoguo and Wang, Minjie and Chen, Haibo},
+title = {XIndex: A Scalable Learned Index for Multicore Data Storage},
+year = {2020},
+booktitle = {Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
+series = {PPoPP '20}
+}
+
+@article{10.14778/3551793.3551848,
+author = {Wongkham, Chaichon and Lu, Baotong and Liu, Chris and Zhong, Zhicong and Lo, Eric and Wang, Tianzheng},
+title = {Are Updatable Learned Indexes Ready?},
+year = {2022},
+publisher = {VLDB Endowment},
+volume = {15},
+number = {11},
+journal = {PVDLB},
+}
+
+ 
+
+
+@article{aries,
+  author       = {C. Mohan and
+                  Don Haderle and
+                  Bruce G. Lindsay and
+                  Hamid Pirahesh and
+                  Peter M. Schwarz},
+  title        = {{ARIES:} {A} Transaction Recovery Method Supporting Fine-Granularity
+                  Locking and Partial Rollbacks Using Write-Ahead Logging},
+  journal      = {{ACM} Trans. Database Syst.},
+  volume       = {17},
+  number       = {1},
+  pages        = {94--162},
+  year         = {1992},
+  url          = {https://doi.org/10.1145/128765.128770},
+  doi          = {10.1145/128765.128770},
+  timestamp    = {Wed, 29 May 2019 10:39:45 +0200},
+  biburl       = {https://dblp.org/rec/journals/tods/MohanHLPS92.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+
author	Douglas B. Rumbaugh <doug@douglasrumbaugh.com>	2025-06-01 15:09:25 -0400
committer	Douglas B. Rumbaugh <doug@douglasrumbaugh.com>	2025-06-01 15:09:25 -0400
commit	b8ae600d0d139fa76f8350f13f058b2cb795b692 (patch)
tree	e07f4264946863e7a145881820cf80ae43301e33
parent	cd3447f1cad16972e8a659ec6e84764c5b8b2745 (diff)
download	dissertation-b8ae600d0d139fa76f8350f13f058b2cb795b692.tar.gz