summaryrefslogtreecommitdiffstats
path: root/chapters/beyond-dsp.tex
diff options
context:
space:
mode:
Diffstat (limited to 'chapters/beyond-dsp.tex')
-rw-r--r--chapters/beyond-dsp.tex126
1 files changed, 63 insertions, 63 deletions
diff --git a/chapters/beyond-dsp.tex b/chapters/beyond-dsp.tex
index 87f44ba..73f8174 100644
--- a/chapters/beyond-dsp.tex
+++ b/chapters/beyond-dsp.tex
@@ -202,7 +202,7 @@ problem. The core idea underlying our solution in that chapter was to
introduce individualized local queries for each block, which were created
after a pre-processing step to allow information about each block to be
determined first. In that particular example, we established the weight
-each block should have during sampling, and then creating custom sampling
+each block should have during sampling, and then created custom sampling
queries with variable $k$ values, following the weight distributions. We
have determined a general interface that allows for this procedure to be
expressed, and we define the term \emph{extended decomposability} to refer
@@ -379,12 +379,12 @@ A significant limitation of invertible problems is that the result set
size is not able to be controlled. We do not know how many records in our
local results have been deleted until we reach the combine operation and
they begin to cancel out, at which point we lack a mechanism to go back
-and retrieve more. This presents difficulties for addressing important
-search problems such as top-$k$, $k$-NN, and sampling. In principle, these
-queries could be supported by repeating the query with larger-and-larger
-$k$ values until the desired number of records is returned, but in the
-eDSP model this requires throwing away a lot of useful work, as the state
-of the query must be rebuilt each time.
+and retrieve more records. This presents difficulties for addressing
+important search problems such as top-$k$, $k$-NN, and sampling. In
+principle, these queries could be supported by repeating the query with
+larger-and-larger $k$ values until the desired number of records is
+returned, but in the eDSP model this requires throwing away a lot of
+useful work, as the state of the query must be rebuilt each time.
We can resolve this problem by moving the decision to repeat the query
into the query interface itself, allowing retries \emph{before} the
@@ -700,7 +700,7 @@ the following main operations,
This function will delete a record from the dynamized structure,
returning $1$ on success and $0$ on failure. The meaning of a
failure to delete is dependent upon the delete mechanism in use,
- and will be discussed in Section~\ref{ssec:dyn-deletes}.
+ and will be discussed in Section~\ref{sssec:dyn-deletes}.
\item \texttt{std::future<QueryResult> query(QueryParameters); } \\
This function will execute a query with the specified parameters
@@ -838,17 +838,18 @@ shards of the same type. The second of these constructors is to allow for
efficient merging to be leveraged for merge decomposable search problems.
Shards can also expose a point lookup operation for use in supporting
-deletes for DDSPs. This function is only used for DDSP deletes, and so can
-be left off when this functionality isn't necessary. If a data structure
-doesn't natively support an efficient point-lookup, then it can be added
-by including a hash table or other data structure in the shard if desired.
-This function accepts a record type as input, and should return a pointer
-to the record that exactly matches the input in storage, if one exists,
-or \texttt{nullptr} if it doesn't. It should also accept an optional
-boolean argument that the framework will pass \texttt{true} into if it
-is don't a lookup for a tombstone. This flag is to allow the shard to
-use various tombstone-related optimization, such as using a Bloom filter
-for them, or storing them separately from the main records, etc.
+deletes for DDSPs. This function is only used for DDSP deletes, and
+so can be left off when this functionality isn't necessary. If a data
+structure doesn't natively support an efficient point-lookup, then it
+can be added by including a hash table or other data structure in the
+shard if desired. This function accepts a record type as input, and
+should return a pointer to the record that exactly matches the input in
+storage, if one exists, or \texttt{nullptr} if it doesn't. It should
+also accept an optional boolean argument that the framework will pass
+\texttt{true} into if the lookup operation is being used to search for
+a tombstone records. This flag is to allow the shard to use various
+tombstone-related optimization, such as using a Bloom filter for them,
+or storing them separately from the main records, etc.
Shards should also expose some accessors for basic meta-data about
its contents. In particular, the framework is reliant upon a function
@@ -888,19 +889,19 @@ concept ShardInterface = RecordInterface<typename SHARD::RECORD>
};
\end{lstlisting}
-\label{listing:shard}
\caption{The required interface for shard types in our dynamization
framework.}
+\label{lst:shard}
\end{lstfloat}
\subsubsection{Query Interface}
The most complex interface required by the framework is for queries. The
-concept for query types is given in Listing~\ref{listing:query}. In
+concept for query types is given in Listing~\ref{lst:query}. In
effect, it requires implementing the full IDSP interface from the
previous section, as well as versions of $\mathbftt{local\_preproc}$
-and $\mathbftt{local\query}$ for pre-processing and querying an unsorted
+and $\mathbftt{local\_query}$ for pre-processing and querying an unsorted
set of records, which is necessary to allow the mutable buffer to be
used as part of the query process.\footnote{
In the worst case, these routines could construct temporary shard
@@ -918,7 +919,7 @@ a local result that includes both the number of records and the number
of tombstones, while the query result itself remains a single number.
Additionally, the framework makes no decision about what, if any,
collection type should be used for these results. A range scan, for
-example, could specified the result types as a vector of records, map
+example, could specify the result types as a vector of records, map
of records, etc., depending on the use case.
There is one significant difference between the IDSP interface and the
@@ -935,7 +936,6 @@ to define an additional combination operation for final result types,
or duplicate effort in the combine step on each repetition.
\begin{lstfloat}
-
\begin{lstlisting}[language=C++]
template <typename QUERY, typename SHARD,
@@ -979,9 +979,9 @@ requires(PARAMETERS *parameters, LOCAL *local,
};
\end{lstlisting}
-\label{listing:query}
\caption{The required interface for query types in our dynamization
framework.}
+\label{lst:query}
\end{lstfloat}
@@ -1029,7 +1029,7 @@ all the records from the level above it ($i-1$ or the buffer, if $i
merged with the records in $j+1$ and the resulting shard placed in level
$j+1$. This procedure guarantees that level $0$ will have capacity for
the shard from the buffer, which is then merged into it (if it is not
-empty) or because it (if the level is empty).
+empty) or replaces it (if the level is empty).
\item \textbf{Tiering.}\\
@@ -1152,16 +1152,16 @@ reconstruction. Consider a record $r_i$ and its corresponding tombstone
$t_j$, where the subscript is the insertion time, with $i < j$ meaning
that $r_i$ was inserted \emph{before} $t_j$. Then, if we are to apply
tombstone cancellations, we must obey the following invariant within
-each shard: A record $r_i$ and tombstone $r_j$ can exist in the same
+each shard: A record $r_i$ and tombstone $t_j$ can exist in the same
shard if $i > j$. But, if $i < j$, then a cancellation should occur.
The case where the record and tombstone coexist covers the situation where
a record is deleted, and then inserted again after the delete. In this
case, there does exist a record $r_k$ with $k < j$ that the tombstone
should cancel with, but that record may exist in a different shard. So
-the tombstone will \emph{eventually} cancel, but it would be technically
-incorrect to cancel it with the matching record $r_i$ that it coexists
-with in the shard being considered.
+the tombstone will \emph{eventually} cancel, but it would be incorrect
+to cancel it with the matching record $r_i$ that it coexists with in
+the shard being considered.
This means that correct tombstone cancellation requires that the order
that records have been inserted be known and accounted for during
@@ -1186,7 +1186,7 @@ at index $i$ will cancel with a record if and only if that record is
in index $i+1$. For structures that are constructed by a sorted-merge
of data, this allows tombstone cancellation at no extra cost during
the merge operation. Otherwise, it requires an extra linear pass after
-sorting to remove cancelled records.\footnote{
+sorting to remove canceled records.\footnote{
For this reason, we use tagging based deletes for structures which
don't require sorting by value during construction.
}
@@ -1200,7 +1200,7 @@ For tombstone deletes, a failure to delete means a failure to insert,
and the request should be retried after a brief delay. Note that, for
performance reasons, the framework makes no effort to ensure that the
record being erased using tombstones is \emph{actually} there, so it
-is possible to insert a tombstone that can never be cancelled. This
+is possible to insert a tombstone that can never be canceled. This
won't affect correctness in any way, so long as queries are correctly
implemented, but it will increase the size of the structure slightly.
@@ -1271,7 +1271,7 @@ same mechanisms described in Section~\ref{sssec:dyn-deletes}.
\Paragraph{Asymptotic Complexity.} The worst-case query cost of the
framework follows the same basic cost function as discussed for IDSPs
-in Section~\ref{asec:dyn-idsp}, with slight modifications to account for
+in Section~\ref{ssec:dyn-idsp}, with slight modifications to account for
the different cost function of buffer querying and preprocessing. The
cost is,
\begin{equation*}
@@ -1280,7 +1280,7 @@ cost is,
\end{equation*}
where $P_B(n)$ is the cost of pre-processing the buffer, and $Q_B(n)$ is
the cost of querying it. As $N_B$ is a small constant relative to $n$,
-in some cases these terms can be ommitted, but they are left here for
+in some cases these terms can be omitted, but they are left here for
generality. Also note that this is an upper bound, but isn't necessarily
tight. As we saw with IRS in Section~\ref{ssec:edsp}, it is sometimes
possible to leverage problem-specific details within this interface to
@@ -1307,7 +1307,7 @@ All of our testing was performed using Ubuntu 20.04 LTS on a dual
socket Intel Xeon Gold 6242 server with 384 GiB of physical memory and
40 physical cores. We ran our benchmarks pinned to a specific core,
or specific NUMA node for multi-threaded testing. Our code was compiled
-using GCC version 11.3.0 with the \texttt{-O3} flag, and targetted to
+using GCC version 11.3.0 with the \texttt{-O3} flag, and targeted to
C++20.\footnote{
Aside from the ALEX benchmark. ALEX does not build in this
configuration, and we used C++13 instead for that particular test.
@@ -1335,7 +1335,7 @@ structures. Specifically,
\texttt{fb}, and \texttt{osm} datasets from
SOSD~\cite{sosd-datasets}. Each has 200 million 64-bit keys
(to which we added 64-bit values) following a variety of
- distributions. We ommitted the \texttt{wiki} dataset because it
+ distributions. We omitted the \texttt{wiki} dataset because it
contains duplicate keys, which were not supported by one of our
dynamic baselines.
@@ -1371,7 +1371,7 @@ For our first set of experiments, we evaluated a dynamized version of the
Triespline learned index~\cite{plex} for answering range count queries.\footnote{
We tested range scans throughout this chapter by measure the
performance of a range count. We decided to go this route to ensure
- that the results across our baselines were comprable. Different range
+ that the results across our baselines were comparable. Different range
structures provided different interfaces for accessing the result
sets, some of which required making an extra copy and others which
didn't. Using a range count instead allowed us to measure only index
@@ -1383,7 +1383,7 @@ performance. We ran these tests using the SOSD \texttt{OSM} dataset.
First, we'll consider the effect of buffer size on performance in
Figures~\ref{fig:ins-buffer-size} and \ref{fig:q-buffer-size}. For all
-of these tests, we used a fixe scale factor of $8$ and the tombstone
+of these tests, we used a fixed scale factor of $8$ and the tombstone
delete policy. Each plot shows the performance of our three supported
layout policies (note that BSM using a fixed $N_B=1$ and $s=2$ for all
tests, to accurately reflect the performance of the classical Bentley-Saxe
@@ -1419,17 +1419,17 @@ improves performance. This is because a larger scale factor in tiering
results in more, smaller structures, and thus reduced reconstruction
time. But for leveling it increases the write amplification, hurting
performance. Figure~\ref{fig:q-scale-factor} shows that, like with
-Figure~\ref{fig:query_sf} in the previous chapter, query latency is not
-strong affected by the scale factor, but larger scale factors due tend
+Figure~\ref{fig:sample_sf} in the previous chapter, query latency is not
+strongly affected by the scale factor, but larger scale factors due tend
to have a negative effect under tiering (due to having more structures).
As a final note, these results demonstrate that, compared the the
normal Bentley-Saxe method, our proposed design space is a strict
-improvement. There are points within the space that are equivilant to,
+improvement. There are points within the space that are equivalent to,
or even strictly superior to, BSM in terms of both query and insertion
-performance, as well as clearly available trade-offs between insertion and
-query performance, particular when it comes to selecting layout policy.
-
+performance. Beyond this, there are also clearly available trade-offs
+between insertion and query performance, particular when it comes to
+selecting layout policy.
\begin{figure*}
@@ -1446,7 +1446,7 @@ query performance, particular when it comes to selecting layout policy.
\subsection{Independent Range Sampling}
-Next, we'll consider the indepedent range sampling problem using ISAM
+Next, we'll consider the independent range sampling problem using ISAM
tree. The functioning of this structure for answering IRS queries is
discussed in more detail in Section~\ref{ssec:irs-struct}, and we use the
query algorithm described in Algorithm~\ref{alg:decomp-irs}. We use the
@@ -1456,7 +1456,7 @@ obtain the upper and lower bounds of the query range, and the weight
of that range, using tree traversals in \texttt{local\_preproc}. We
use rejection sampling on the buffer, and so the buffer preprocessing
simply uses the number of records in the buffer for its weight. In
-\texttt{distribute\_query}, we build and alias structure over all of
+\texttt{distribute\_query}, we build an alias structure over all of
the weights and query it $k$ times to obtain the individual $k$ values
for the local queries. To avoid extra work on repeat, we stash this
alias structure in the buffer's local query object so it is available
@@ -1485,8 +1485,8 @@ compaction is triggered.
We configured our dynamized structure to use $s=8$, $N_B=12000$, $\delta
= .05$, $f = 16$, and the tiering layout policy. We compared our method
(\textbf{DE-IRS}) to Olken's method~\cite{olken89} on a B+Tree with
-aggregate weight counts (\textbf{AGG B+Tree}), as well as our besoke
-sampling solution from the previous chapter (\textbf{Besoke}) and a
+aggregate weight counts (\textbf{AGG B+Tree}), as well as our bespoke
+sampling solution from the previous chapter (\textbf{Bespoke}) and a
single static instance of the ISAM Tree (\textbf{ISAM}). Because IRS
is neither INV nor DDSP, the standard Bentley-Saxe Method has no way to
support deletes for it, and was not tested. All of our tested sampling
@@ -1494,7 +1494,7 @@ queries had a controlled selectivity of $\sigma = 0.01\%$ and $k=1000$.
The results of our performance benchmarking are in Figure~\ref{fig:irs}.
Figure~\ref{fig:irs-insert} shows that our general framework has
-comperable insertion performance to the specialized one, though loses
+comparable insertion performance to the specialized one, though loses
slightly. This is to be expected, as \textbf{Bespoke} was hand-written for
specifically this type of query and data structure, and has hard-coded
data types, among other things. Despite losing to \textbf{Bespoke}
@@ -1525,7 +1525,7 @@ using a static Vantage Point Tree (VPTree)~\cite{vptree}. This is a
binary search tree with internal nodes that partition records based
on their distance to a selected point, called the vantage point. All
of the points within a fixed distance of the vantage point are covered
-by one subtree, and the points outside of this distance are covered by
+by one sub-tree, and the points outside of this distance are covered by
the other. This results in a hard-to-update data structure that can
be constructed in $\Theta(n \log n)$ time using repeated application of
the \texttt{quickselect} algorithm~\cite{quickselect} to partition the
@@ -1537,7 +1537,7 @@ Algorithm~\cite{alg:idsp-knn}, though using delete tagging instead of
tombstones. VPTree doesn't support efficient point lookups, and so to
work around this we add a hash map to each shard, mapping each record to
its location in storage, to ensure that deletes can be done efficiently
-in this way. This allows us to avoid cancelling deleted records in
+in this way. This allows us to avoid canceling deleted records in
the \texttt{combine} operation, as they can be skipped over during
\texttt{local\_query} directly. Because $k$-NN doesn't have any of the
distributional requirements of IRS, these local queries can return $k$
@@ -1599,7 +1599,7 @@ scheme used, with \textbf{BSM-VPTree} performing slightly \emph{better}
than our framework for query performance. The reason for this is
shown in Figure~\ref{fig:knn-insert}, where our framework outperforms
the Bentley-Saxe method in insertion performance. These results are
-atributible to our selection of framework configuration parameters,
+attributable to our selection of framework configuration parameters,
which are biased towards better insertion performance. Both dynamized
structures also outperform the dynamic baseline. Finally, as is becoming
a trend, Figure~\ref{fig:knn-space} shows that the storage requirements
@@ -1667,7 +1667,7 @@ The results of our evaluation are shown in
Figure~\ref{fig:eval-learned-index}. Figure~\ref{fig:rq-insert} shows
the insertion performance. DE-TS is the best in all cases, and the pure
BSM version of Triespline is the worst by a substantial margin. Of
-particular interest in this chart is the inconsisent performance of
+particular interest in this chart is the inconsistent performance of
ALEX, which does quite well on the \texttt{books} dataset, and poorly
on the others. It is worth noting that getting ALEX to run \emph{at
all} in some cases required a lot of trial and error and tuning, as its
@@ -1691,8 +1691,8 @@ performs horrendously compared to all of the other structures. The same
caveat from the previous paragraph applies here--PGM can be configured
for better performance. But it's notable that our framework-dynamized PGM
is able to beat PGM slightly in insertion performance without seeing the
-same massive degredation in query performance that PGM's native update
-suport does in its own update-optmized configuration.\footnote{
+same massive degradation in query performance that PGM's native update
+support does in its own update-optimized configuration.\footnote{
It's also worth noting that PGM implements tombstone deletes by
inserting a record with a matching key to the record to be deleted,
and a particular "tombstone" value, rather than using a header. This
@@ -1712,7 +1712,7 @@ update support.
\subsection{String Search}
As a final example of a search problem, we consider exact string matching
-using the fast succinct trie~\cite{zhang18}. While updatable
+using the fast succinct trie~\cite{zhang18}. While dynamic
tries aren't terribly unusual~\cite{m-bonsai,dynamic-trie}, succinct data
structures, which attempt to approach an information-theoretic lower-bound
on their binary representation of the data, are usually static because
@@ -1725,7 +1725,7 @@ we consider the effectiveness of our generalized framework for them.
\centering
\subfloat[Update Throughput]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-insert} \label{fig:fst-insert}}
\subfloat[Query Latency]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-query} \label{fig:fst-query}}
- \subfloat[Index Overhead]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-space} \label{fig:fst-size}}
+ \subfloat[Index Overhead]{\includegraphics[width=.32\textwidth, trim=5mm 2mm 0 0]{img/fig-bs-fst-space} \label{fig:fst-space}}
%\vspace{-3mm}
\caption{FST Evaluation}
\label{fig:fst-eval}
@@ -1739,9 +1739,9 @@ storage. Queries use no pre-processing and the local queries directly
search for a matching string. We use the framework's early abort feature
to stop as soon as the first result is found, and combine simply checks
whether this record is a tombstone or not. If it's a tombstone, then
-the lookup is considered to have no found the search string. Otherwise,
+the lookup is considered to have not found the search string. Otherwise,
the record is returned. This results in a dynamized structure with the
-following asympotic costs,
+following asymptotic costs,
\begin{align*}
@@ -1759,7 +1759,7 @@ The results are show in Figure~\ref{fig:fst-eval}. As with range scans,
the Bentley-Saxe method shows horrible insertion performance relative to
our framework in Figure~\ref{fig:fst-insert}. Note that the significant
observed difference in update throughput for the two data sets is
-largely attributable to the relative sizes. The \texttt{usra} set is
+largely attributable to the relative sizes. The \texttt{US} set is
far larger than \texttt{english}. Figure~\ref{fig:fst-query} shows that
our write-optimized framework configuration is slightly out-performed in
query latency by the standard Bentley-Saxe dynamization, and that both
@@ -1767,7 +1767,7 @@ dynamized structures are quite a bit slower than the static structure for
queries. Finally, the storage costs for the data structures are shown
in Figure~\ref{fig:fst-space}. For the \texttt{english} data set, the
extra storage cost from decomposing the structure is quite significant,
-but the for \texttt{ursarc} set the sizes are quite comperable. It is
+but the for \texttt{ursarc} set the sizes are quite comparable. It is
not unexpected that dynamization would add storage cost for succinct
(or any compressed) data structures, because the splitting of the records
across multiple data structures reduces the ability of the structure to
@@ -1792,10 +1792,10 @@ are inserted, it is necessary that each operation obtain a lock on
the root node of the tree~\cite{zhao22}. This makes this situation
a good use-case for the automatic concurrency support provided by our
framework. Figure~\ref{fig:irs-concurrency} shows the results of this
-benchmark for various numbers of concurreny query threads. As can be seen,
+benchmark for various numbers of concurrency query threads. As can be seen,
our framework supports a stable update throughput up to 32 query threads,
whereas the AGG B+Tree suffers from contention for the mutex and sees
-is performance degrade as the number of threads increases.
+its performance degrade as the number of threads increases.
\begin{figure}
\centering