From cd3447f1cad16972e8a659ec6e84764c5b8b2745 Mon Sep 17 00:00:00 2001
From: "Douglas B. Rumbaugh" <doug@douglasrumbaugh.com>
Date: Sun, 1 Jun 2025 13:15:52 -0400
Subject: Julia updates

---
 chapters/beyond-dsp.tex | 126 ++++++++++++++++++++++++------------------------
 1 file changed, 63 insertions(+), 63 deletions(-)

(limited to 'chapters/beyond-dsp.tex')
diff --git a/chapters/beyond-dsp.tex b/chapters/beyond-dsp.tex
index 87f44ba..73f8174 100644
--- a/chapters/beyond-dsp.tex
+++ b/chapters/beyond-dsp.tex
@@ -202,7 +202,7 @@ problem. The core idea underlying our solution in that chapter was to
 introduce individualized local queries for each block, which were created
 after a pre-processing step to allow information about each block to be
 determined first. In that particular example, we established the weight
-each block should have during sampling, and then creating custom sampling
+each block should have during sampling, and then created custom sampling
 queries with variable $k$ values, following the weight distributions. We
 have determined a general interface that allows for this procedure to be
 expressed, and we define the term \emph{extended decomposability} to refer
@@ -379,12 +379,12 @@ A significant limitation of invertible problems is that the result set
 size is not able to be controlled. We do not know how many records in our
 local results have been deleted until we reach the combine operation and
 they begin to cancel out, at which point we lack a mechanism to go back
-and retrieve more. This presents difficulties for addressing important
-search problems such as top-$k$, $k$-NN, and sampling. In principle, these
-queries could be supported by repeating the query with larger-and-larger
-$k$ values until the desired number of records is returned, but in the
-eDSP model this requires throwing away a lot of useful work, as the state
-of the query must be rebuilt each time. 
+and retrieve more records. This presents difficulties for addressing
+important search problems such as top-$k$, $k$-NN, and sampling. In
+principle, these queries could be supported by repeating the query with
+larger-and-larger $k$ values until the desired number of records is
+returned, but in the eDSP model this requires throwing away a lot of
+useful work, as the state of the query must be rebuilt each time.
 
 We can resolve this problem by moving the decision to repeat the query
 into the query interface itself, allowing retries \emph{before} the
@@ -700,7 +700,7 @@ the following main operations,
     This function will delete a record from the dynamized structure,
     returning $1$ on success and $0$ on failure. The meaning of a
     failure to delete is dependent upon the delete mechanism in use,
-    and will be discussed in Section~\ref{ssec:dyn-deletes}.
+    and will be discussed in Section~\ref{sssec:dyn-deletes}.
 
 \item \texttt{std::future<QueryResult> query(QueryParameters); } \\
     This function will execute a query with the specified parameters
@@ -838,17 +838,18 @@ shards of the same type. The second of these constructors is to allow for
 efficient merging to be leveraged for merge decomposable search problems.
 
 Shards can also expose a point lookup operation for use in supporting
-deletes for DDSPs. This function is only used for DDSP deletes, and so can
-be left off when this functionality isn't necessary. If a data structure
-doesn't natively support an efficient point-lookup, then it can be added
-by including a hash table or other data structure in the shard if desired.
-This function accepts a record type as input, and should return a pointer
-to the record that exactly matches the input in storage, if one exists,
-or \texttt{nullptr} if it doesn't. It should also accept an optional
-boolean argument that the framework will pass \texttt{true} into if it
-is don't a lookup for a tombstone. This flag is to allow the shard to
-use various tombstone-related optimization, such as using a Bloom filter
-for them, or storing them separately from the main records, etc.
+deletes for DDSPs. This function is only used for DDSP deletes, and
+so can be left off when this functionality isn't necessary. If a data
+structure doesn't natively support an efficient point-lookup, then it
+can be added by including a hash table or other data structure in the
+shard if desired.  This function accepts a record type as input, and
+should return a pointer to the record that exactly matches the input in
+storage, if one exists, or \texttt{nullptr} if it doesn't. It should
+also accept an optional boolean argument that the framework will pass
+\texttt{true} into if the lookup operation is being used to search for
+a tombstone records.  This flag is to allow the shard to use various
+tombstone-related optimization, such as using a Bloom filter for them,
+or storing them separately from the main records, etc.
 
 Shards should also expose some accessors for basic meta-data about
 its contents. In particular, the framework is reliant upon a function
@@ -888,19 +889,19 @@ concept ShardInterface = RecordInterface<typename SHARD::RECORD>
 
 };
 \end{lstlisting}
-\label{listing:shard}
 \caption{The required interface for shard types in our dynamization
 framework.}
+\label{lst:shard}
 \end{lstfloat}
 
 
 \subsubsection{Query Interface}
 
 The most complex interface required by the framework is for queries. The
-concept for query types is given in Listing~\ref{listing:query}. In
+concept for query types is given in Listing~\ref{lst:query}. In
 effect, it requires implementing the full IDSP interface from the
 previous section, as well as versions of $\mathbftt{local\_preproc}$
-and $\mathbftt{local\query}$ for pre-processing and querying an unsorted
+and $\mathbftt{local\_query}$ for pre-processing and querying an unsorted
 set of records, which is necessary to allow the mutable buffer to be
 used as part of the query process.\footnote{
     In the worst case, these routines could construct temporary shard
@@ -918,7 +919,7 @@ a local result that includes both the number of records and the number
 of tombstones, while the query result itself remains a single number.
 Additionally, the framework makes no decision about what, if any,
 collection type should be used for these results. A range scan, for
-example, could specified the result types as a vector of records, map
+example, could specify the result types as a vector of records, map
 of records, etc., depending on the use case.
 
 There is one significant difference between the IDSP interface and the
@@ -935,7 +936,6 @@ to define an additional combination operation for final result types,
 or duplicate effort in the combine step on each repetition.
 
 \begin{lstfloat}
-
 \begin{lstlisting}[language=C++]
 
 template <typename QUERY, typename SHARD,
@@ -979,9 +979,9 @@ requires(PARAMETERS *parameters, LOCAL *local,
 };
 \end{lstlisting}
 
-\label{listing:query}
 \caption{The required interface for query types in our dynamization
 framework.}
+\label{lst:query}
 \end{lstfloat}
 
 
@@ -1029,7 +1029,7 @@ all the records from the level above it ($i-1$ or the buffer, if $i
 merged with the records in $j+1$ and the resulting shard placed in level
 $j+1$. This procedure guarantees that level $0$ will have capacity for
 the shard from the buffer, which is then merged into it (if it is not
-empty) or because it (if the level is empty).
+empty) or replaces it (if the level is empty).
 
 
 \item \textbf{Tiering.}\\
@@ -1152,16 +1152,16 @@ reconstruction.  Consider a record $r_i$ and its corresponding tombstone
 $t_j$, where the subscript is the insertion time, with $i < j$ meaning
 that $r_i$ was inserted \emph{before} $t_j$. Then, if we are to apply
 tombstone cancellations, we must obey the following invariant within
-each shard: A record $r_i$ and tombstone $r_j$ can exist in the same
+each shard: A record $r_i$ and tombstone $t_j$ can exist in the same
 shard if $i > j$. But, if $i < j$, then a cancellation should occur.
 
 The case where the record and tombstone coexist covers the situation where
 a record is deleted, and then inserted again after the delete. In this
 case, there does exist a record $r_k$ with $k < j$ that the tombstone
 should cancel with, but that record may exist in a different shard. So
-the tombstone will \emph{eventually} cancel, but it would be technically
-incorrect to cancel it with the matching record $r_i$ that it coexists
-with in the shard being considered.
+the tombstone will \emph{eventually} cancel, but it would be incorrect
+to cancel it with the matching record $r_i$ that it coexists with in
+the shard being considered.
 
 This means that correct tombstone cancellation requires that the order
 that records have been inserted be known and accounted for during
@@ -1186,7 +1186,7 @@ at index $i$ will cancel with a record if and only if that record is
 in index $i+1$. For structures that are constructed by a sorted-merge
 of data, this allows tombstone cancellation at no extra cost during
 the merge operation. Otherwise, it requires an extra linear pass after
-sorting to remove cancelled records.\footnote{
+sorting to remove canceled records.\footnote{
     For this reason, we use tagging based deletes for structures which
     don't require sorting by value during construction.
 }
@@ -1200,7 +1200,7 @@ For tombstone deletes, a failure to delete means a failure to insert,
 and the request should be retried after a brief delay. Note that, for
 performance reasons, the framework makes no effort to ensure that the
 record being erased using tombstones is \emph{actually} there, so it
-is possible to insert a tombstone that can never be cancelled. This
+is possible to insert a tombstone that can never be canceled. This
 won't affect correctness in any way, so long as queries are correctly
 implemented, but it will increase the size of the structure slightly.
 
@@ -1271,7 +1271,7 @@ same mechanisms described in Section~\ref{sssec:dyn-deletes}.
 
 \Paragraph{Asymptotic Complexity.} The worst-case query cost of the
 framework follows the same basic cost function as discussed for IDSPs
-in Section~\ref{asec:dyn-idsp}, with slight modifications to account for
+in Section~\ref{ssec:dyn-idsp}, with slight modifications to account for
 the different cost function of buffer querying and preprocessing. The
 cost is,
 \begin{equation*}
@@ -1280,7 +1280,7 @@ cost is,
 \end{equation*}
 where $P_B(n)$ is the cost of pre-processing the buffer, and $Q_B(n)$ is
 the cost of querying it. As $N_B$ is a small constant relative to $n$,
-in some cases these terms can be ommitted, but they are left here for
+in some cases these terms can be omitted, but they are left here for
 generality. Also note that this is an upper bound, but isn't necessarily
 tight. As we saw with IRS in Section~\ref{ssec:edsp}, it is sometimes
 possible to leverage problem-specific details within this interface to
@@ -1307,7 +1307,7 @@ All of our testing was performed using Ubuntu 20.04 LTS on a dual
 socket Intel Xeon Gold 6242 server with 384 GiB of physical memory and
 40 physical cores. We ran our benchmarks pinned to a specific core,
 or specific NUMA node for multi-threaded testing. Our code was compiled
-using GCC version 11.3.0 with the \texttt{-O3} flag, and targetted to
+using GCC version 11.3.0 with the \texttt{-O3} flag, and targeted to
 C++20.\footnote{
     Aside from the ALEX benchmark. ALEX does not build in this
     configuration, and we used C++13 instead for that particular test.
@@ -1335,7 +1335,7 @@ structures. Specifically,
         \texttt{fb}, and \texttt{osm} datasets from
         SOSD~\cite{sosd-datasets}. Each has 200 million 64-bit keys
         (to which we added 64-bit values)  following a variety of
-        distributions. We ommitted the \texttt{wiki} dataset because it
+        distributions. We omitted the \texttt{wiki} dataset because it
         contains duplicate keys, which were not supported by one of our
         dynamic baselines.
 
@@ -1371,7 +1371,7 @@ For our first set of experiments, we evaluated a dynamized version of the
 Triespline learned index~\cite{plex} for answering range count queries.\footnote{
     We tested range scans throughout this chapter by measure the
     performance of a range count. We decided to go this route to ensure
-    that the results across our baselines were comprable. Different range
+    that the results across our baselines were comparable. Different range
     structures provided different interfaces for accessing the result
     sets, some of which required making an extra copy and others which
     didn't. Using a range count instead allowed us to measure only index
@@ -1383,7 +1383,7 @@ performance. We ran these tests using the SOSD \texttt{OSM} dataset.
 
 First, we'll consider the effect of buffer size on performance in
 Figures~\ref{fig:ins-buffer-size} and \ref{fig:q-buffer-size}. For all
-of these tests, we used a fixe scale factor of $8$ and the tombstone
+of these tests, we used a fixed scale factor of $8$ and the tombstone
 delete policy. Each plot shows the performance of our three supported
 layout policies (note that BSM using a fixed $N_B=1$ and $s=2$ for all
 tests, to accurately reflect the performance of the classical Bentley-Saxe
@@ -1419,17 +1419,17 @@ improves performance. This is because a larger scale factor in tiering
 results in more, smaller structures, and thus reduced reconstruction
 time. But for leveling it increases the write amplification, hurting
 performance.  Figure~\ref{fig:q-scale-factor} shows that, like with
-Figure~\ref{fig:query_sf} in the previous chapter, query latency is not
-strong affected by the scale factor, but larger scale factors due tend
+Figure~\ref{fig:sample_sf} in the previous chapter, query latency is not
+strongly affected by the scale factor, but larger scale factors due tend
 to have a negative effect under tiering (due to having more structures).
 
 As a final note, these results demonstrate that, compared the the
 normal Bentley-Saxe method, our proposed design space is a strict
-improvement. There are points within the space that are equivilant to,
+improvement. There are points within the space that are equivalent to,
 or even strictly superior to, BSM in terms of both query and insertion
-performance, as well as clearly available trade-offs between insertion and
-query performance, particular when it comes to selecting layout policy.
-
+performance. Beyond this, there are also clearly available trade-offs
+between insertion and query performance, particular when it comes to
+selecting layout policy.
 
 
 \begin{figure*}
@@ -1446,7 +1446,7 @@ query performance, particular when it comes to selecting layout policy.
 
 \subsection{Independent Range Sampling}
 
-Next, we'll consider the indepedent range sampling problem using ISAM
+Next, we'll consider the independent range sampling problem using ISAM
 tree. The functioning of this structure for answering IRS queries is
 discussed in more detail in Section~\ref{ssec:irs-struct}, and we use the
 query algorithm described in Algorithm~\ref{alg:decomp-irs}. We use the
@@ -1456,7 +1456,7 @@ obtain the upper and lower bounds of the query range, and the weight
 of that range, using tree traversals in \texttt{local\_preproc}. We
 use rejection sampling on the buffer, and so the buffer preprocessing
 simply uses the number of records in the buffer for its weight. In
-\texttt{distribute\_query}, we build and alias structure over all of
+\texttt{distribute\_query}, we build an alias structure over all of
 the weights and query it $k$ times to obtain the individual $k$ values
 for the local queries. To avoid extra work on repeat, we stash this
 alias structure in the buffer's local query object so it is available
@@ -1485,8 +1485,8 @@ compaction is triggered.
 We configured our dynamized structure to use $s=8$, $N_B=12000$, $\delta
 = .05$, $f = 16$, and the tiering layout policy. We compared our method
 (\textbf{DE-IRS}) to Olken's method~\cite{olken89} on a B+Tree with
-aggregate weight counts (\textbf{AGG B+Tree}), as well as our besoke
-sampling solution from the previous chapter (\textbf{Besoke}) and a
+aggregate weight counts (\textbf{AGG B+Tree}), as well as our bespoke
+sampling solution from the previous chapter (\textbf{Bespoke}) and a
 single static instance of the ISAM Tree (\textbf{ISAM}). Because IRS
 is neither INV nor DDSP, the standard Bentley-Saxe Method has no way to
 support deletes for it, and was not tested. All of our tested sampling
@@ -1494,7 +1494,7 @@ queries had a controlled selectivity of $\sigma = 0.01\%$ and $k=1000$.
 
 The results of our performance benchmarking are in Figure~\ref{fig:irs}.
 Figure~\ref{fig:irs-insert} shows that our general framework has
-comperable insertion performance to the specialized one, though loses
+comparable insertion performance to the specialized one, though loses
 slightly. This is to be expected, as \textbf{Bespoke} was hand-written for
 specifically this type of query and data structure, and has hard-coded
 data types, among other things. Despite losing to \textbf{Bespoke}
@@ -1525,7 +1525,7 @@ using a static Vantage Point Tree (VPTree)~\cite{vptree}. This is a
 binary search tree with internal nodes that partition records based
 on their distance to a selected point, called the vantage point. All
 of the points within a fixed distance of the vantage point are covered
-by one subtree, and the points outside of this distance are covered by
+by one sub-tree, and the points outside of this distance are covered by
 the other.  This results in a hard-to-update data structure that can
 be constructed in $\Theta(n \log n)$ time using repeated application of
 the \texttt{quickselect} algorithm~\cite{quickselect} to partition the
@@ -1537,7 +1537,7 @@ Algorithm~\cite{alg:idsp-knn}, though using delete tagging instead of
 tombstones. VPTree doesn't support efficient point lookups, and so to
 work around this we add a hash map to each shard, mapping each record to
 its location in storage, to ensure that deletes can be done efficiently
-in this way. This allows us to avoid cancelling deleted records in
+in this way. This allows us to avoid canceling deleted records in
 the \texttt{combine} operation, as they can be skipped over during
 \texttt{local\_query} directly. Because $k$-NN doesn't have any of the
 distributional requirements of IRS, these local queries can return $k$
@@ -1599,7 +1599,7 @@ scheme used, with \textbf{BSM-VPTree} performing slightly \emph{better}
 than our framework for query performance. The reason for this is
 shown in Figure~\ref{fig:knn-insert}, where our framework outperforms
 the Bentley-Saxe method in insertion performance. These results are
-atributible to our selection of framework configuration parameters,
+attributable to our selection of framework configuration parameters,
 which are biased towards better insertion performance. Both dynamized
 structures also outperform the dynamic baseline. Finally, as is becoming
 a trend, Figure~\ref{fig:knn-space} shows that the storage requirements
@@ -1667,7 +1667,7 @@ The results of our evaluation are shown in
 Figure~\ref{fig:eval-learned-index}. Figure~\ref{fig:rq-insert} shows
 the insertion performance. DE-TS is the best in all cases, and the pure
 BSM version of Triespline is the worst by a substantial margin. Of
-particular interest in this chart is the inconsisent performance of
+particular interest in this chart is the inconsistent performance of
 ALEX, which does quite well on the \texttt{books} dataset, and poorly
 on the others. It is worth noting that getting ALEX to run \emph{at
 all} in some cases required a lot of trial and error and tuning, as its
@@ -1691,8 +1691,8 @@ performs horrendously compared to all of the other structures. The same
 caveat from the previous paragraph applies here--PGM can be configured
 for better performance. But it's notable that our framework-dynamized PGM
 is able to beat PGM slightly in insertion performance without seeing the
-same massive degredation in query performance that PGM's native update
-suport does in its own update-optmized configuration.\footnote{
+same massive degradation in query performance that PGM's native update
+support does in its own update-optimized configuration.\footnote{
     It's also worth noting that PGM implements tombstone deletes by
     inserting a record with a matching key to the record to be deleted,
     and a particular "tombstone" value, rather than using a header. This
@@ -1712,7 +1712,7 @@ update support.
 \subsection{String Search}
 
 As a final example of a search problem, we consider exact string matching
-using the fast succinct trie~\cite{zhang18}. While updatable
+using the fast succinct trie~\cite{zhang18}. While dynamic
 tries aren't terribly unusual~\cite{m-bonsai,dynamic-trie}, succinct data
 structures, which attempt to approach an information-theoretic lower-bound
 on their binary representation of the data, are usually static because
@@ -1725,7 +1725,7 @@ we consider the effectiveness of our generalized framework for them.
     \centering
 	\subfloat[Update Throughput]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-insert} \label{fig:fst-insert}} 
 	\subfloat[Query Latency]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-query} \label{fig:fst-query}} 
-	\subfloat[Index Overhead]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-space} \label{fig:fst-size}} 
+	\subfloat[Index Overhead]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-space} \label{fig:fst-space}} 
     %\vspace{-3mm}
     \caption{FST Evaluation}
     \label{fig:fst-eval}
@@ -1739,9 +1739,9 @@ storage. Queries use no pre-processing and the local queries directly
 search for a matching string. We use the framework's early abort feature
 to stop as soon as the first result is found, and combine simply checks
 whether this record is a tombstone or not. If it's a tombstone, then
-the lookup is considered to have no found the search string. Otherwise,
+the lookup is considered to have not found the search string. Otherwise,
 the record is returned. This results in a dynamized structure with the
-following asympotic costs,
+following asymptotic costs,
 
 
 \begin{align*}
@@ -1759,7 +1759,7 @@ The results are show in Figure~\ref{fig:fst-eval}. As with range scans,
 the Bentley-Saxe method shows horrible insertion performance relative to
 our framework in Figure~\ref{fig:fst-insert}. Note that the significant
 observed difference in update throughput for the two data sets is
-largely attributable to the relative sizes. The \texttt{usra} set is
+largely attributable to the relative sizes. The \texttt{US} set is
 far larger than \texttt{english}. Figure~\ref{fig:fst-query} shows that
 our write-optimized framework configuration is slightly out-performed in
 query latency by the standard Bentley-Saxe dynamization, and that both
@@ -1767,7 +1767,7 @@ dynamized structures are quite a bit slower than the static structure for
 queries. Finally, the storage costs for the data structures are shown
 in Figure~\ref{fig:fst-space}. For the \texttt{english} data set, the
 extra storage cost from decomposing the structure is quite significant,
-but the for \texttt{ursarc} set the sizes are quite comperable. It is
+but the for \texttt{ursarc} set the sizes are quite comparable. It is
 not unexpected that dynamization would add storage cost for succinct
 (or any compressed) data structures, because the splitting of the records
 across multiple data structures reduces the ability of the structure to
@@ -1792,10 +1792,10 @@ are inserted, it is necessary that each operation obtain a lock on
 the root node of the tree~\cite{zhao22}.  This makes this situation
 a good use-case for the automatic concurrency support provided by our
 framework. Figure~\ref{fig:irs-concurrency} shows the results of this
-benchmark for various numbers of concurreny query threads. As can be seen,
+benchmark for various numbers of concurrency query threads. As can be seen,
 our framework supports a stable update throughput up to 32 query threads,
 whereas the AGG B+Tree suffers from contention for the mutex and sees
-is performance degrade as the number of threads increases.
+its performance degrade as the number of threads increases.
 
 \begin{figure}
     \centering
-- 
cgit v1.2.3