\subsection{Comparison to Baselines} Next, we compared the performance of our dynamized sampling indices with Olken's method on an aggregate B+Tree. We also examine the query performance of a single instance of the SSI in question to establish how much query performance is lost in the dynamization. Unless otherwise specified, IRS and WIRS queries are run with a selectivity of $0.1\%$. Additionally, the \texttt{OSM} dataset was downsampled to 500 million records, except for scalability tests. The synthetic uniform and zipfian datasets were generated with 1 billion records. As with the previous section, all benchmarks began by warming up the structure with $10\%$ of the total records, and then update performance was measured over the insertion of the remaining records, including a mix of $5\%$ deletes. Query performance was measured following the insertion of all records. \begin{figure*} \centering \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wss-insert} \label{fig:wss-insert}} \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wss-sample} \label{fig:wss-sample}} \\ \subfloat[Insertion Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-wss-insert} \label{fig:wss-insert-s}} \subfloat[Sampling Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-wss-sample} \label{fig:wss-sample-s}} \caption{Framework Comparisons to Baselines for WSS} \end{figure*} We'll begin with WSS. Figure~\ref{fig:wss-insert} shows that \texttt{DE-WSS} achieves about $85\%$ of \texttt{AGG B+Tree}'s insertion throughput on the \texttt{twitter} dataset, and outright defeats it on the others. Its sampling performance, shown in Figure~\ref{fig:wss-sample} is also clearly superior to Olken's method, and is quite close to a single instance of the alias structure, indicating that the overhead due to the dynamization is quite low. We also considered the scalability of \texttt{DE-WSS} as the data size increases, in Figures~\ref{fig:wss-insert-s} and \ref{fig:wss-sample-s}. These tests were run with random samples of the \texttt{OSM} dataset of the specified sizes, and show that \texttt{DE-WSS} maintains its advantage over \texttt{AGG B+Tree} across a range of data sizes. One interesting point on Figure~\ref{fig:wss-sample-s} is the final data point for the alias structure, which is \emph{worse} than \texttt{DE-WSS}. This point consistently reproduced, and we believe it is because of NUMA. The 2 billion records were large enough that the alias structure built from them spanned two NUMA nodes on our server, whereas the dynamized structure was broken into pieces, none of which individually spanned a NUMA node, resulting in better performance. \begin{figure*} \centering \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wirs-insert} \label{fig:wirs-insert}} \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-wirs-sample} \label{fig:wirs-sample}} \caption{Framework Comparison to Baselines for WIRS} \end{figure*} In Figures~\ref{fig:wirs-insert} and \ref{fig:wirs-sample} we examine the performed of \texttt{DE-WIRS} compared to \text{AGG B+TreE} and an alias-augmented B+Tree. We see the same basic set of patterns in this case as we did with WSS. \texttt{AGG B+Tree} defeats our dynamized index on the \texttt{twitter} dataset, but loses on the others, in terms of insertion performance. We can see that the alias-augmented B+Tree is much more expensive to build than an alias structure, and so its insertion performance advantage is erroded somewhat compared to the dynamic structure. For queries we see that the \texttt{AGG B+Tree} performs similarly for WIRS sampling as it did for WSS sampling, but the alias-augmented B+Tree structure is quite a bit slower at WIRS than the alias structure was at WSS. This results in \texttt{DE-WIRS} defeating the dynamic baseline by less of a margin in this test, but it still is superior in terms of sampling performance, and is still quite close in performance to the static structure, indicating relatively low overhead being introduced by the dynamization. \begin{figure*} \subfloat[Insertion Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-insert} \label{fig:irs-insert-s}} \subfloat[Sampling Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-sample} \label{fig:irs-sample-s}} \\ \subfloat[Insertion Throughput vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-insert} \label{fig:irs-insert1}} \subfloat[Sampling Latency vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-sample} \label{fig:irs-sample1}} \\ \subfloat[Delete Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-delete} \label{fig:irs-delete}} \subfloat[Sampling Latency vs. Sample Size]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-irs-samplesize} \label{fig:irs-samplesize}} \caption{Framework Comparison to Baselines for IRS} \end{figure*} We next considered IRS queries. Figures~\ref{fig:irs-insert1} and \ref{fig:irs-sample1} show the results of our testing of single-threaded \texttt{DE-IRS} running in-memory against the in-memory ISAM Tree and \texttt{AGG B+treE}. The ISAM tree structure can be efficiently bulk-loaded, which results in a much faster construction time than the alias structure or alias-augmented B+tree. This gives it a significant update performance advantage, and we see in Figure~\ref{fig:irs-insert1} that \texttt{DE-IRS} beats \texttt{AGG B+tree} by a significant margin in terms of insertion throughput. However, its query performance is significantly worse than the static baseline, and it defeats the B+tree by only a small margin in sampling latency across most datasets. Note that the OSM dataset in these tests is half the size of the synthetic ones, which accounts for the performance differences. We also consider the scalability of inserts, queries, and deletes, of \texttt{DE-IRS} compared to \texttt{AGG B+tree} across a wide range of data sizes. Figure~\ref{fig:irs-insert-s} shows that \texttt{DE-IRS}'s insertion performance scales similarly with datasize as the baseline, and Figure~\ref{fig:irs-sample-s} tells a similar story for query performance. Figure~\ref{fig:irs-delete-s} compares the delete performance of the two structures, where \texttt{DE-IRS} is configured to use tagging. As expected, the B+tree does perform better here, as it's delete cost is asymptotically superior to tagging. However, the plot does demonstrate that tagging delete performance does scale well with data size as well. Finally, in Figure~\ref{fig:irs-samplesize} shows the effect of sample set size on average per-sample cost. We see that, for a single sample, the B+tree is superior to \texttt{DE-IRS} because of the cost of the preliminary processing that our dynamized structure must do to begin to answer queries. However, as the sample set size increases, this cost increasingly begins to pay off, with \texttt{DE-IRS} quickly defeating the dynamic structure in averge per-sample latency. One other interesting note is the performance of the static ISAM tree, which begins on-par with the B+Tree, but also sees an improvement as the sample set size increases. This is because of cache effects. During the initial tree traversal, both the B+tree and ISAM tree have a similar number of cache misses. However, the ISAM tree needs to perform its traversal only once, and then samples from data that is stored in a compact sorted array, so it benefits strongly from the cache. Olken's method, in contrast, must perform a full tree traversal for each sample, so it doesn't see a significant improvement in per-sample performance as the sample set size grows.