From fcdbcbcd45dc567792429bb314df53b42ed9f22e Mon Sep 17 00:00:00 2001 From: Douglas Rumbaugh Date: Fri, 27 Jun 2025 15:21:38 -0400 Subject: updates --- chapters/beyond-dsp.tex | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) (limited to 'chapters/beyond-dsp.tex') diff --git a/chapters/beyond-dsp.tex b/chapters/beyond-dsp.tex index 74afdd2..5655b8c 100644 --- a/chapters/beyond-dsp.tex +++ b/chapters/beyond-dsp.tex @@ -1664,10 +1664,10 @@ compaction is triggered. We configured our dynamized structure to use $s=8$, $N_B=12000$, $\delta = .05$, $f = 16$, and the tiering layout policy. We compared our method -(\textbf{DE-IRS}) to Olken's method~\cite{olken89} on a B+Tree with +(\textbf{DE-IRS}) to Olken's method~\cite{olken89} on a B+tree with aggregate weight counts (\textbf{AGG B+Tree}), as well as our bespoke sampling solution from the previous chapter (\textbf{Bespoke}) and a -single static instance of the ISAM Tree (\textbf{ISAM}). Because IRS +single static instance of the ISAM tree (\textbf{ISAM}). Because IRS is neither INV nor DDSP, the standard Bentley-Saxe Method has no way to support deletes for it, and was not tested. All of our tested sampling queries had a controlled selectivity of $\sigma = 0.01\%$ and $k=1000$. @@ -1692,7 +1692,7 @@ the dynamic baseline. Finally, Figure~\ref{fig:irs-space} shows the space usage of the data structures, less the storage required for the raw data. The two dynamized solutions require \emph{significantly} less storage than the -dynamic B+Tree, which must leave empty spaces in its nodes for inserts. +dynamic B+tree, which must leave empty spaces in its nodes for inserts. This is a significant advantage of static data structures--they can pack data much more tightly and require less storage. Dynamization, at least in this case, doesn't add a significant amount of overhead over a single @@ -1701,7 +1701,7 @@ instance of the static structure. \subsection{$k$-NN Search} \label{ssec:dyn-knn-exp} Next, we'll consider answering high dimensional exact $k$-NN queries -using a static Vantage Point Tree (VPTree)~\cite{vptree}. This is a +using a static vantage point tree (VPTree)~\cite{vptree}. This is a binary search tree with internal nodes that partition records based on their distance to a selected point, called the vantage point. All of the points within a fixed distance of the vantage point are covered @@ -1746,10 +1746,10 @@ standard DDSP, we compare with the Bentley-Saxe Method (\textbf{BSM})\footnote{ be deleted in $\Theta(1)$ time, rather than requiring an inefficient point-lookup directly on the VPTree. } and a dynamic data structure for the same search problem called an -M-Tree~\cite{mtree,mtree-impl} (\textbf{MTree}), which is an example of a so-called +M-tree~\cite{mtree,mtree-impl} (\textbf{MTree}), which is an example of a so-called "ball tree" structure that partitions high dimensional space using nodes representing spheres, which are merged and split to maintain balance in -a manner not unlike a B+Tree. We also consider a static instance of a +a manner not unlike a B+tree. We also consider a static instance of a VPTree built over the same set of records (\textbf{VPTree}). We used L2 distance as our metric, which is defined for vectors of $d$ dimensions as @@ -1784,7 +1784,7 @@ which are biased towards better insertion performance. Both dynamized structures also outperform the dynamic baseline. Finally, as is becoming a trend, Figure~\ref{fig:knn-space} shows that the storage requirements of the static data structures, dynamized or not, are significantly less -than M-Tree. M-Tree, like a B+Tree, requires leaving empty slots in its +than M-tree. M-tree, like a B+tree, requires leaving empty slots in its nodes to support insertion, and this results in a large amount of wasted space. @@ -1810,7 +1810,7 @@ We apply our framework to create dynamized versions of two static learned indices: Triespline~\cite{plex} (\textbf{DE-TS}) and PGM~\cite{pgm} (\textbf{DE-PGM}), and compare with a standard Bentley-Saxe dynamized of Triespline (\textbf{BSM-TS}). Our dynamic baselines are ALEX~\cite{alex}, -which is dynamic learned index based on a B+Tree like structure, and +which is dynamic learned index based on a B+tree like structure, and PGM (\textbf{PGM}), which provides support for a dynamic version based on Bentley-Saxe dynamization (which is why we have not included a BSM version of PGM in our testing). @@ -1885,7 +1885,7 @@ support does in its own update-optimized configuration.\footnote{ these data structures. All of the dynamic options require significantly more space than the static Triespline, but ALEX requires the most by a very large margin. This is in keeping with the previous experiments, which -all included similarly B+Tree-like structures that required significant +all included similarly B+tree-like structures that required significant additional storage space compared to static structures as part of their update support. @@ -1966,7 +1966,7 @@ this test. In this benchmark, we used a single thread to insert records into the structure at a constant rate, while we deployed a variable number of additional threads that continuously issued sampling queries -against the structure. We used an AGG B+Tree as our baseline. Note +against the structure. We used an AGG B+tree as our baseline. Note that, to accurately maintain the aggregate weight counts as records are inserted, it is necessary that each operation obtain a lock on the root node of the tree~\cite{zhao22}. This makes this situation @@ -1974,7 +1974,7 @@ a good use-case for the automatic concurrency support provided by our framework. Figure~\ref{fig:irs-concurrency} shows the results of this benchmark for various numbers of concurrency query threads. As can be seen, our framework supports a stable update throughput up to 32 query threads, -whereas the AGG B+Tree suffers from contention for the mutex and sees +whereas the AGG B+tree suffers from contention for the mutex and sees its performance degrade as the number of threads increases. \begin{figure} -- cgit v1.2.3