1 files changed, 86 insertions, 62 deletions
diff --git a/chapters/sigmod23/exp-baseline.tex b/chapters/sigmod23/exp-baseline.tex
index 9e7929c..da62766 100644
--- a/chapters/sigmod23/exp-baseline.tex
+++ b/chapters/sigmod23/exp-baseline.tex
@@ -1,16 +1,17 @@
 \subsection{Comparison to Baselines} 
 
-Next, the performance of indexes extended using the framework is compared
-against tree sampling on the aggregate B+tree, as well as problem-specific
-SSIs for WSS, WIRS, and IRS queries. Unless otherwise specified, IRS and WIRS
-queries were executed with a selectivity of $0.1\%$ and 500 million randomly
-selected records from the OSM dataset were used. The uniform and zipfian
-synthetic datasets were 1 billion records in size. All benchmarks warmed up the
-data structure by inserting 10\% of the records, and then measured the
-throughput inserting the remaining records, while deleting 5\% of them over the
-course of the benchmark. Once all records were inserted, the sampling
-performance was measured. The reported update throughputs were calculated using
-both inserts and deletes, following the warmup period.
+Next, we compared the performance of our dynamized sampling indices with
+Olken's method on an aggregate B+Tree. We also examine the query performance
+of a single instance of the SSI in question to establish how much query
+performance is lost in the dynamization. Unless otherwise specified,
+IRS and WIRS queries are run with a selectivity of $0.1\%$. Additionally,
+the \texttt{OSM} dataset was downsampled to 500 million records, except
+for scalability tests. The synthetic uniform and zipfian datasets were
+generated with 1 billion records. As with the previous section, all
+benchmarks began by warming up the structure with $10\%$ of the total
+records, and then update performance was measured over the insertion of
+the remaining records, including a mix of $5\%$ deletes. Query performance
+was measured following the insertion of all records.
 
 \begin{figure*}
     \centering
@@ -21,15 +22,25 @@ both inserts and deletes, following the warmup period.
     \caption{Framework Comparisons to Baselines for WSS}
 \end{figure*}
 
-Starting with WSS, Figure~\ref{fig:wss-insert} shows that the DE-WSS structure
-is competitive with the AGG B+tree in terms of insertion performance, achieving
-about 85\% of the AGG B+tree's insertion throughput on the Twitter dataset, and
-beating it by similar margins on the other datasets. In terms of sampling
-performance in Figure~\ref{fig:wss-sample}, it beats the B+tree handily, and
-compares favorably to the static alias structure. Figures~\ref{fig:wss-insert-s}
-and \ref{fig:wss-sample-s} show the performance scaling of the three structures as
-the dataset size increases. All of the structures exhibit the same type of
-performance degradation with respect to dataset size.
+We'll begin with WSS. Figure~\ref{fig:wss-insert} shows that
+\texttt{DE-WSS} achieves about $85\%$ of \texttt{AGG B+Tree}'s insertion
+throughput on the \texttt{twitter} dataset, and outright defeats it on the
+others. Its sampling performance, shown in Figure~\ref{fig:wss-sample}
+is also clearly superior to Olken's method, and is quite close
+to a single instance of the alias structure, indicating that the
+overhead due to the dynamization is quite low. We also considered
+the scalability of \texttt{DE-WSS} as the data size increases,
+in Figures~\ref{fig:wss-insert-s} and \ref{fig:wss-sample-s}. These
+tests were run with random samples of the \texttt{OSM} dataset of the
+specified sizes, and show that \texttt{DE-WSS} maintains its advantage
+over \texttt{AGG B+Tree} across a range of data sizes. One interesting
+point on Figure~\ref{fig:wss-sample-s} is the final data point for
+the alias structure, which is \emph{worse} than \texttt{DE-WSS}. This
+point consistently reproduced, and we believe it is because of NUMA. The
+2 billion records were large enough that the alias structure built from
+them spanned two NUMA nodes on our server, whereas the dynamized structure
+was broken into pieces, none of which individually spanned a NUMA node,
+resulting in better performance.
 
 \begin{figure*}
     \centering
@@ -38,16 +49,22 @@ performance degradation with respect to dataset size.
     \caption{Framework Comparison to Baselines for WIRS}
 \end{figure*}
 
-Figures~\ref{fig:wirs-insert} and \ref{fig:wirs-sample} show the performance of
-the DE-WIRS index, relative to the AGG B+tree and the alias-augmented B+tree. This
-example shows the same pattern of behavior as was seen with DE-WSS, though the
-margin between the DE-WIRS and its corresponding SSI is much narrower.
-Additionally, the constant factors associated with the construction cost of the
-alias-augmented B+tree are much larger than the alias structure. The loss of
-insertion performance due to this is seen clearly in Figure~\ref{fig:wirs-insert}, where
-the margin of advantage between DE-WIRS and the AGG B+tree in insertion
-throughput shrinks compared to the DE-WSS index, and the AGG B+tree's advantage
-on the Twitter dataset is expanded.
+In Figures~\ref{fig:wirs-insert} and \ref{fig:wirs-sample} we examine
+the performed of \texttt{DE-WIRS} compared to \text{AGG B+TreE} and an
+alias-augmented B+Tree. We see the same basic set of patterns in this
+case as we did with WSS. \texttt{AGG B+Tree} defeats our dynamized
+index on the \texttt{twitter} dataset, but loses on the others, in
+terms of insertion performance. We can see that the alias-augmented
+B+Tree is much more expensive to build than an alias structure, and
+so its insertion performance advantage is erroded somewhat compared to
+the dynamic structure.  For queries we see that the \texttt{AGG B+Tree}
+performs similarly for WIRS sampling as it did for WSS sampling, but the
+alias-augmented B+Tree structure is quite a bit slower at WIRS than the
+alias structure was at WSS. This results in \texttt{DE-WIRS} defeating
+the dynamic baseline by less of a margin in this test, but it still is
+superior in terms of sampling performance, and is still quite close in
+performance to the static structure, indicating relatively low overhead
+being introduced by the dynamization.
 
 \begin{figure*}
     \subfloat[Insertion Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-insert} \label{fig:irs-insert-s}} 
@@ -61,38 +78,45 @@ on the Twitter dataset is expanded.
     \caption{Framework Comparison to Baselines for IRS}
  
 \end{figure*}
-Finally, Figures~\ref{fig:irs-insert1} and \ref{fig:irs-sample1} show a
-comparison of the in-memory DE-IRS index against the in-memory ISAM tree and the AGG
-B+tree for answering IRS queries. The cost of bulk-loading the ISAM tree is less
-than the cost of building the alias structure, or the alias-augmented B+tree, and
-so here DE-IRS defeats the AGG B+tree by wider margins in insertion throughput,
-though the margin narrows significantly in terms of sampling performance
-advantage. 
 
-DE-IRS was further tested to evaluate scalability.
-Figure~\ref{fig:irs-insert-s} shows average insertion throughput,
-Figure~\ref{fig:irs-delete} shows average delete latency (under tagging), and
-Figure~\ref{fig:irs-sample-s} shows average sampling latencies for DE-IRS and
-AGG B+tree over a range of data sizes. In all cases, DE-IRS and B+tree show
-similar patterns of performance degradation as the datasize grows. Note that
-the delete latencies of DE-IRS are worse than AGG B+tree, because of the B+tree's
-cheaper point-lookups.
+We next considered IRS queries. Figures~\ref{fig:irs-insert1} and
+\ref{fig:irs-sample1} show the results of our testing of single-threaded
+\texttt{DE-IRS} running in-memory against the in-memory ISAM Tree and
+\texttt{AGG B+treE}. The ISAM tree structure can be efficiently bulk-loaded,
+which results in a much faster construction time than the alias structure
+or alias-augmented B+tree. This gives it a significant update performance 
+advantage, and we see in Figure~\ref{fig:irs-insert1} that \texttt{DE-IRS}
+beats \texttt{AGG B+tree} by a significant margin in terms of insertion
+throughput. However, its query performance is significantly worse than
+the static baseline, and it defeats the B+tree by only a small margin in
+sampling latency across most datasets. Note that the OSM dataset in
+these tests is half the size of the synthetic ones, which accounts for
+the performance differences.
 
-Figure~\ref{fig:irs-sample-s}
-also includes one other point of interest: the sampling performance of
-DE-IRS \emph{improves} when the data size grows from one million to ten million
-records. While at first glance the performance increase may appear paradoxical,
-it actually demonstrates an important result concerning the effect of the
-unsorted mutable buffer on index performance. At one million records, the
-buffer constitutes approximately 1\% of the total data size; this results in
-the buffer being sampled from with greater frequency (as it has more total
-weight) than would be the case with larger data. The greater the frequency of
-buffer sampling, the more rejections will occur, and the worse the sampling
-performance will be. This illustrates the importance of keeping the buffer
-small, even when a scan is not used for buffer sampling. Finally,
-Figure~\ref{fig:irs-samplesize} shows the decreasing per-sample cost as the
-number of records requested by a sampling query grows for DE-IRS, compared to
-AGG B+tree. Note that DE-IRS benefits significantly more from batching samples
-than AGG B+tree, and that the improvement is greatest up to $k=100$ samples per
-query.
+We also consider the scalability of inserts, queries, and deletes, of
+\texttt{DE-IRS} compared to \texttt{AGG B+tree} across a wide range of
+data sizes. Figure~\ref{fig:irs-insert-s} shows that \texttt{DE-IRS}'s
+insertion performance scales similarly with datasize as the baseline, and
+Figure~\ref{fig:irs-sample-s} tells a similar story for query performance.
+Figure~\ref{fig:irs-delete-s} compares the delete performance of the
+two structures, where \texttt{DE-IRS} is configured to use tagging. As
+expected, the B+tree does perform better here, as it's delete cost is
+asymptotically superior to tagging. However, the plot does demonstrate
+that tagging delete performance does scale well with data size as well.
 
+Finally, in Figure~\ref{fig:irs-samplesize} shows the effect of sample
+set size on average per-sample cost. We see that, for a single sample,
+the B+tree is superior to \texttt{DE-IRS} because of the cost of the
+preliminary processing that our dynamized structure must do to begin
+to answer queries. However, as the sample set size increases, this cost
+increasingly begins to pay off, with \texttt{DE-IRS} quickly defeating
+the dynamic structure in averge per-sample latency. One other interesting
+note is the performance of the static ISAM tree, which begins on-par with
+the B+Tree, but also sees an improvement as the sample set size increases.
+This is because of cache effects. During the initial tree traversal, both
+the B+tree and ISAM tree have a similar number of cache misses. However,
+the ISAM tree needs to perform its traversal only once, and then samples
+from data that is stored in a compact sorted array, so it benefits strongly
+from the cache. Olken's method, in contrast, must perform a full tree
+traversal for each sample, so it doesn't see a significant improvement in
+per-sample performance as the sample set size grows.