\subsection{External and Concurrent Extensions} \begin{figure*}[h]% \centering \subfloat[External Insertion Throughput]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-ext-insert.pdf} \label{fig:ext-insert}} \subfloat[External Sampling Latency]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-ext-sample.pdf} \label{fig:ext-sample}} \\ \subfloat[Concurrent Insert Latency vs. Throughput]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-cc-irs-scale} \label{fig:con-latency}} \subfloat[Concurrent Insert Throughput vs. Thread Count]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-cc-irs-thread} \label{fig:con-tput}} \caption{External and Concurrent Extensions of DE-IRS} \label{fig:irs-extensions} \end{figure*} We also tested our proof-of-concept implementations for external and concurrent extensions to the dynamization framework, as discussed in Section~\ref{sec:discussion}. First, we'll consider the external version of \texttt{DE-IRS}, compared with \texttt{AB-tree}. For this test, we configured \texttt{DE-IRS} to store the first 4 levels in memory, and the remainder on disk. This configuration resulted in the use of at most 350 MiB of memory (including the mutable buffer and Bloom filters). We used tagging for deletes, to avoid random writes, but otherwise used the same standardized configuration parameters as the previous tests. We use \texttt{O\_DIRECT} to disable OS caching and CGroups to constrain the process to 1 GiB of memory to simulate a memory constrained environment. \texttt{DE-IRS} did not use any caching layer, however we did enable a 64 GiB cache for AB-tree. This was to be fair, as AB-tree requires per-sample tree traversals and thus is much more reliant on caching for good performance. We performed these tests with larger data sets: 4 billion and 8 billion record synthetic datasets, which were 80 GiB and 162 GiB in size respective, and the full 2.6 billion records of the OSM dataset, which was 55 GiB in size. The results of this testing can be see in Figures~\ref{fig:ext-sample} and \ref{fig:ext-insert}. Despite using significantly less memory and having no caching layer, \texttt{DE-IRS} was able to handily defeat the dynamic baseline in both sampling and update performance. Finally, we tested the multi-threaded insertion performance of our in-memory, concurrent implementation of \texttt{DE-IRS} compared to \texttt{AB-tree} configured to run entirely in memory. We used the synthetic uniform dataset (1B records) for this testing, and introduced a slight delay between inserts to avoid bottlenecking on the fetch-and-add within the mutable buffer. Figure~\ref{fig:con-latency} shows the latency vs. throughput curves for the two structures. Note that \texttt{AB-tree}'s results are cut off by the y-axis, as it performs significantly worse than \texttt{DE-IRS}. Figure~\ref{fig:con-tput} shows the insertion throughput as additional insertion threads are added. Both plots show linear scaling up to 3 or 4 threads, before the throughput levels off. Further, even with as many as 32 threads, the system is able to maintain a stable insertion throughput. Note that this implementation of concurrency is incredibly rudamentary, and doesn't take advantage of concurrent merging opportunities, among other things. An implementation with support for this will be discussed in Chapter~\ref{chap:tail-latency}, and shown to perform significantly better. Even with this rudamentary implementation of concurrency, however, \texttt{DE-IRS} is able to outperform \texttt{AB-tree} under all conditions tested.