\subsection{Comparison to Baselines} 

Next, the performance of indexes extended using the framework is compared
against tree sampling on the aggregate B+tree, as well as problem-specific
SSIs for WSS, WIRS, and IRS queries. Unless otherwise specified, IRS and WIRS
queries were executed with a selectivity of $0.1\%$ and 500 million randomly
selected records from the OSM dataset were used. The uniform and zipfian
synthetic datasets were 1 billion records in size. All benchmarks warmed up the
data structure by inserting 10\% of the records, and then measured the
throughput inserting the remaining records, while deleting 5\% of them over the
course of the benchmark. Once all records were inserted, the sampling
performance was measured. The reported update throughputs were calculated using
both inserts and deletes, following the warmup period.

\begin{figure*}
    \centering
    \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wss-insert} \label{fig:wss-insert}}
    \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wss-sample} \label{fig:wss-sample}} \\
    \subfloat[Insertion Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-wss-insert} \label{fig:wss-insert-s}}
    \subfloat[Sampling Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-wss-sample} \label{fig:wss-sample-s}} 
    \caption{Framework Comparisons to Baselines for WSS}
\end{figure*}

Starting with WSS, Figure~\ref{fig:wss-insert} shows that the DE-WSS structure
is competitive with the AGG B+tree in terms of insertion performance, achieving
about 85\% of the AGG B+tree's insertion throughput on the Twitter dataset, and
beating it by similar margins on the other datasets. In terms of sampling
performance in Figure~\ref{fig:wss-sample}, it beats the B+tree handily, and
compares favorably to the static alias structure. Figures~\ref{fig:wss-insert-s}
and \ref{fig:wss-sample-s} show the performance scaling of the three structures as
the dataset size increases. All of the structures exhibit the same type of
performance degradation with respect to dataset size.

\begin{figure*}
    \centering
    \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wirs-insert} \label{fig:wirs-insert}}
    \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wirs-sample} \label{fig:wirs-sample}}
    \caption{Framework Comparison to Baselines for WIRS}
\end{figure*}

Figures~\ref{fig:wirs-insert} and \ref{fig:wirs-sample} show the performance of
the DE-WIRS index, relative to the AGG B+tree and the alias-augmented B+tree. This
example shows the same pattern of behavior as was seen with DE-WSS, though the
margin between the DE-WIRS and its corresponding SSI is much narrower.
Additionally, the constant factors associated with the construction cost of the
alias-augmented B+tree are much larger than the alias structure. The loss of
insertion performance due to this is seen clearly in Figure~\ref{fig:wirs-insert}, where
the margin of advantage between DE-WIRS and the AGG B+tree in insertion
throughput shrinks compared to the DE-WSS index, and the AGG B+tree's advantage
on the Twitter dataset is expanded.

\begin{figure*}
    \subfloat[Insertion Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-insert} \label{fig:irs-insert-s}} 
    \subfloat[Sampling Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-sample} \label{fig:irs-sample-s}} \\

    \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-insert} \label{fig:irs-insert1}}
    \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-sample} \label{fig:irs-sample1}} \\

    \subfloat[Delete Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-delete} \label{fig:irs-delete}} 
    \subfloat[Sampling Latency vs. Sample Size]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-samplesize} \label{fig:irs-samplesize}}
    \caption{Framework Comparison to Baselines for IRS}
 
\end{figure*}
Finally, Figures~\ref{fig:irs-insert1} and \ref{fig:irs-sample1} show a
comparison of the in-memory DE-IRS index against the in-memory ISAM tree and the AGG
B+tree for answering IRS queries. The cost of bulk-loading the ISAM tree is less
than the cost of building the alias structure, or the alias-augmented B+tree, and
so here DE-IRS defeats the AGG B+tree by wider margins in insertion throughput,
though the margin narrows significantly in terms of sampling performance
advantage. 

DE-IRS was further tested to evaluate scalability.
Figure~\ref{fig:irs-insert-s} shows average insertion throughput,
Figure~\ref{fig:irs-delete} shows average delete latency (under tagging), and
Figure~\ref{fig:irs-sample-s} shows average sampling latencies for DE-IRS and
AGG B+tree over a range of data sizes. In all cases, DE-IRS and B+tree show
similar patterns of performance degradation as the datasize grows. Note that
the delete latencies of DE-IRS are worse than AGG B+tree, because of the B+tree's
cheaper point-lookups.

Figure~\ref{fig:irs-sample-s}
also includes one other point of interest: the sampling performance of
DE-IRS \emph{improves} when the data size grows from one million to ten million
records. While at first glance the performance increase may appear paradoxical,
it actually demonstrates an important result concerning the effect of the
unsorted mutable buffer on index performance. At one million records, the
buffer constitutes approximately 1\% of the total data size; this results in
the buffer being sampled from with greater frequency (as it has more total
weight) than would be the case with larger data. The greater the frequency of
buffer sampling, the more rejections will occur, and the worse the sampling
performance will be. This illustrates the importance of keeping the buffer
small, even when a scan is not used for buffer sampling. Finally,
Figure~\ref{fig:irs-samplesize} shows the decreasing per-sample cost as the
number of records requested by a sampling query grows for DE-IRS, compared to
AGG B+tree. Note that DE-IRS benefits significantly more from batching samples
than AGG B+tree, and that the improvement is greatest up to $k=100$ samples per
query.