\subsection{Comparison to Baselines} 

Next, we compared the performance of our dynamized sampling indices with
Olken's method on an aggregate B+Tree. We also examine the query performance
of a single instance of the SSI in question to establish how much query
performance is lost in the dynamization. Unless otherwise specified,
IRS and WIRS queries are run with a selectivity of $0.1\%$. Additionally,
the \texttt{OSM} dataset was downsampled to 500 million records, except
for scalability tests. The synthetic uniform and zipfian datasets were
generated with 1 billion records. As with the previous section, all
benchmarks began by warming up the structure with $10\%$ of the total
records, and then update performance was measured over the insertion of
the remaining records, including a mix of $5\%$ deletes. Query performance
was measured following the insertion of all records.

\begin{figure*}
    \centering
    \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wss-insert} \label{fig:wss-insert}}
    \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wss-sample} \label{fig:wss-sample}} \\
    \subfloat[Insertion Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-wss-insert} \label{fig:wss-insert-s}}
    \subfloat[Sampling Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-wss-sample} \label{fig:wss-sample-s}} 
    \caption{Framework Comparisons to Baselines for WSS}
\end{figure*}

We'll begin with WSS. Figure~\ref{fig:wss-insert} shows that
\texttt{DE-WSS} achieves about $85\%$ of \texttt{AGG B+Tree}'s insertion
throughput on the \texttt{twitter} dataset, and outright defeats it on the
others. Its sampling performance, shown in Figure~\ref{fig:wss-sample}
is also clearly superior to Olken's method, and is quite close
to a single instance of the alias structure, indicating that the
overhead due to the dynamization is quite low. We also considered
the scalability of \texttt{DE-WSS} as the data size increases,
in Figures~\ref{fig:wss-insert-s} and \ref{fig:wss-sample-s}. These
tests were run with random samples of the \texttt{OSM} dataset of the
specified sizes, and show that \texttt{DE-WSS} maintains its advantage
over \texttt{AGG B+Tree} across a range of data sizes. One interesting
point on Figure~\ref{fig:wss-sample-s} is the final data point for
the alias structure, which is \emph{worse} than \texttt{DE-WSS}. This
point consistently reproduced, and we believe it is because of NUMA. The
2 billion records were large enough that the alias structure built from
them spanned two NUMA nodes on our server, whereas the dynamized structure
was broken into pieces, none of which individually spanned a NUMA node,
resulting in better performance.

\begin{figure*}
    \centering
    \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wirs-insert} \label{fig:wirs-insert}}
    \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wirs-sample} \label{fig:wirs-sample}}
    \caption{Framework Comparison to Baselines for WIRS}
\end{figure*}

In Figures~\ref{fig:wirs-insert} and \ref{fig:wirs-sample} we examine
the performed of \texttt{DE-WIRS} compared to \text{AGG B+TreE} and an
alias-augmented B+Tree. We see the same basic set of patterns in this
case as we did with WSS. \texttt{AGG B+Tree} defeats our dynamized
index on the \texttt{twitter} dataset, but loses on the others, in
terms of insertion performance. We can see that the alias-augmented
B+Tree is much more expensive to build than an alias structure, and
so its insertion performance advantage is erroded somewhat compared to
the dynamic structure.  For queries we see that the \texttt{AGG B+Tree}
performs similarly for WIRS sampling as it did for WSS sampling, but the
alias-augmented B+Tree structure is quite a bit slower at WIRS than the
alias structure was at WSS. This results in \texttt{DE-WIRS} defeating
the dynamic baseline by less of a margin in this test, but it still is
superior in terms of sampling performance, and is still quite close in
performance to the static structure, indicating relatively low overhead
being introduced by the dynamization.

\begin{figure*}
    \subfloat[Insertion Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-insert} \label{fig:irs-insert-s}} 
    \subfloat[Sampling Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-sample} \label{fig:irs-sample-s}} \\

    \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-insert} \label{fig:irs-insert1}}
    \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-sample} \label{fig:irs-sample1}} \\

    \subfloat[Delete Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-delete} \label{fig:irs-delete}} 
    \subfloat[Sampling Latency vs. Sample Size]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-samplesize} \label{fig:irs-samplesize}}
    \caption{Framework Comparison to Baselines for IRS}
 
\end{figure*}

We next considered IRS queries. Figures~\ref{fig:irs-insert1} and
\ref{fig:irs-sample1} show the results of our testing of single-threaded
\texttt{DE-IRS} running in-memory against the in-memory ISAM Tree and
\texttt{AGG B+treE}. The ISAM tree structure can be efficiently bulk-loaded,
which results in a much faster construction time than the alias structure
or alias-augmented B+tree. This gives it a significant update performance 
advantage, and we see in Figure~\ref{fig:irs-insert1} that \texttt{DE-IRS}
beats \texttt{AGG B+tree} by a significant margin in terms of insertion
throughput. However, its query performance is significantly worse than
the static baseline, and it defeats the B+tree by only a small margin in
sampling latency across most datasets. Note that the OSM dataset in
these tests is half the size of the synthetic ones, which accounts for
the performance differences.

We also consider the scalability of inserts, queries, and deletes, of
\texttt{DE-IRS} compared to \texttt{AGG B+tree} across a wide range of
data sizes. Figure~\ref{fig:irs-insert-s} shows that \texttt{DE-IRS}'s
insertion performance scales similarly with datasize as the baseline, and
Figure~\ref{fig:irs-sample-s} tells a similar story for query performance.
Figure~\ref{fig:irs-delete-s} compares the delete performance of the
two structures, where \texttt{DE-IRS} is configured to use tagging. As
expected, the B+tree does perform better here, as it's delete cost is
asymptotically superior to tagging. However, the plot does demonstrate
that tagging delete performance does scale well with data size as well.

Finally, in Figure~\ref{fig:irs-samplesize} shows the effect of sample
set size on average per-sample cost. We see that, for a single sample,
the B+tree is superior to \texttt{DE-IRS} because of the cost of the
preliminary processing that our dynamized structure must do to begin
to answer queries. However, as the sample set size increases, this cost
increasingly begins to pay off, with \texttt{DE-IRS} quickly defeating
the dynamic structure in averge per-sample latency. One other interesting
note is the performance of the static ISAM tree, which begins on-par with
the B+Tree, but also sees an improvement as the sample set size increases.
This is because of cache effects. During the initial tree traversal, both
the B+tree and ISAM tree have a similar number of cache misses. However,
the ISAM tree needs to perform its traversal only once, and then samples
from data that is stored in a compact sorted array, so it benefits strongly
from the cache. Olken's method, in contrast, must perform a full tree
traversal for each sample, so it doesn't see a significant improvement in
per-sample performance as the sample set size grows.