diff options
Diffstat (limited to 'chapters/introduction.tex')
| -rw-r--r-- | chapters/introduction.tex | 63 |
1 files changed, 32 insertions, 31 deletions
diff --git a/chapters/introduction.tex b/chapters/introduction.tex index 6b6904a..8a45bd0 100644 --- a/chapters/introduction.tex +++ b/chapters/introduction.tex @@ -123,25 +123,26 @@ lines can be found in Chapter~\ref{chap:related-work}, and the third will be extensively discussed in Chapter~\ref{chap:background}. Automatic index composition has been considered in a variety of -papers~\cite{periodic-table,ds-alchemy,fluid-ds,gene,cosine}, each considering -differing sets of data structure primitives and different techniques for -composing the structure. The general principle across all incarnations -of the technique is to consider a (usually static) set of data, and a -workload consisting of single-dimensional range queries and point lookups. -The system then analyzes the workload, either statically or in real time, -selects specific primitive structures optimized for certain operations -(e.g., hash table-like structures for point lookups, sorted runs for range -scans), and applies them to different regions of the data, in an attempt -to maximize the overall performance of the workload. Although some work -in this area suggests generalization to more complex data types, such -as multi-dimensional data~\cite{fluid-ds}, this line is broadly focused -on creating instance-optimal indices for workloads that databases are -already well equipped to handle. While this task is quite important, it -is not precisely the work that we are trying to accomplish here. And, -because the techniques are limited to specified sets of structural -primitives, it isn't clear that the approach can be usefully extended -to support \emph{arbitrary} query and data types. We thus consider this -line to be largely orthogonal to ours. +papers~\cite{periodic-table,ds-alchemy,fluid-ds,gene,cosine}, each +considering differing sets of data structure primitives and different +techniques for composing the structure. The general principle across all +incarnations of the technique is to consider a (usually static) set of +data, and a workload consisting of single-dimensional range queries and +point lookups. The system then analyzes the workload, either statically +or in real time, selects specific primitive structures optimized for +certain operations (e.g., hash table-like structures for point lookups, +sorted runs for range scans), and applies them to different regions +of the data, in an attempt to maximize the overall performance of the +workload. Although some work in this area suggests generalization to +more complex data types, such as multi-dimensional data~\cite{fluid-ds}, +this line is broadly focused on creating instance-optimal indices for +workloads that databases are already well equipped to handle. While this +task is quite important, it is not precisely the work that we are trying +to accomplish here. And, because the techniques are limited to specified +sets of structural primitives, it isn't clear that the approach can +be usefully extended to support \emph{arbitrary} query and data types +without reintroducing the very problem we are trying to address. We thus +consider this line to be largely orthogonal to ours. The second approach, generalized index templates, \emph{does} attempt to address the problem of expanding indexing support of databases to @@ -178,17 +179,17 @@ rebuilding these blocks. The most commonly used version of this approach is the Bentley-Saxe method~\cite{saxe79}, which has been individually applied to several specific data structures in past work~\cite{almodaresi23,pgm,naidan14,xie21,bkdtree}. Dynamization -of this sort is not a fully general solution though; it places -a number of restrictions on the data structures and queries that -it can support. These limitations will be discussed at length in -Chapter~\ref{chap:background}, but briefly they include: (1) restrictions -on query types that can be supported, as well as even stricter constraints -on when deletes are supported, (2) a lack of useful performance configuration, -and (3) sub-optimal performance characteristics, particularly in terms of -insertion tail latencies. +of this sort is not a fully general solution though; it places a +number of restrictions on the data structures and queries that +it can support. These limitations will be discussed at length +in Chapter~\ref{chap:background}, but briefly they include: (1) +restrictions on query types that can be supported, as well as even +stricter constraints on when deletes are supported, (2) a lack of +useful performance configuration, and (3) sub-optimal performance +characteristics, particularly in terms of insertion tail latencies. Of the three approaches, we believe the latter to be the most promising -from the prospective of easing the development of novel indices +from the perspective of easing the development of novel indices for specialized queries and data types. While dynamization does have limitations, they are less onerous than the other two approaches. This is because dynamization is unburdened by specific selections of primitive @@ -239,6 +240,6 @@ two chapters, and formally considers the design space and trade-offs within it. In Chapter~\ref{chap:tail-latency}, we consider the problem of insertion tail latency, and extend our framework with support for techniques to mitigate this problem. Chapter~\ref{chap:related-work} -contains a more detailed discussion of works related to our own and the -ways in which are approaches differ, and finally Chapter~\ref{chap:conclusion} -concludes the work. +contains a more detailed discussion of works related to our +own and the ways in which are approaches differ, and finally +Chapter~\ref{chap:conclusion} concludes the work. |