\subsection{External and Concurrent Extensions} 

\begin{figure*}[h]%
    \centering
    \subfloat[External Insertion Throughput]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-ext-insert.pdf} \label{fig:ext-insert}} 
    \subfloat[External Sampling Latency]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-ext-sample.pdf} \label{fig:ext-sample}} \\

    \subfloat[Concurrent Insert Latency vs. Throughput]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-cc-irs-scale} \label{fig:con-latency}}
    \subfloat[Concurrent Insert Throughput vs. Thread Count]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-cc-irs-thread} \label{fig:con-tput}}
    
    \caption{External and Concurrent Extensions of DE-IRS}
    \label{fig:irs-extensions}
\end{figure*}

We also tested our proof-of-concept implementations for external and
concurrent extensions to the dynamization framework, as discussed
in Section~\ref{sec:discussion}. First, we'll consider the external
version of \texttt{DE-IRS}, compared with \texttt{AB-tree}. For this
test, we configured \texttt{DE-IRS} to store the first 4 levels in
memory, and the remainder on disk. This configuration resulted in the
use of at most 350 MiB of memory (including the mutable buffer and
Bloom filters). We used tagging for deletes, to avoid random writes,
but otherwise used the same standardized configuration parameters as
the previous tests. We use \texttt{O\_DIRECT} to disable OS caching and
CGroups to constrain the process to 1 GiB of memory to simulate a memory
constrained environment. \texttt{DE-IRS} did not use any caching layer,
however we did enable a 64 GiB cache for AB-tree. This was to be fair,
as AB-tree requires per-sample tree traversals and thus is much more
reliant on caching for good performance. We performed these tests with
larger data sets: 4 billion and 8 billion record synthetic datasets,
which were 80 GiB and 162 GiB in size respective, and the full 2.6
billion records of the OSM dataset, which was 55 GiB in size.

The results of this testing can be see in Figures~\ref{fig:ext-sample}
and \ref{fig:ext-insert}. Despite using significantly less memory and
having no caching layer, \texttt{DE-IRS} was able to handily defeat the
dynamic baseline in both sampling and update performance.

Finally, we tested the multi-threaded insertion performance of our
in-memory, concurrent implementation of \texttt{DE-IRS} compared to
\texttt{AB-tree} configured to run entirely in memory. We used the
synthetic uniform dataset (1B records) for this testing, and introduced a
slight delay between inserts to avoid bottlenecking on the fetch-and-add
within the mutable buffer. Figure~\ref{fig:con-latency} shows the latency
vs. throughput curves for the two structures. Note that \texttt{AB-tree}'s
results are cut off by the y-axis, as it performs significantly worse than
\texttt{DE-IRS}. Figure~\ref{fig:con-tput} shows the insertion throughput
as additional insertion threads are added. Both plots show linear scaling
up to 3 or 4 threads, before the throughput levels off. Further, even
with as many as 32 threads, the system is able to maintain a stable
insertion throughput. Note that this implementation of concurrency
is incredibly rudamentary, and doesn't take advantage of concurrent
merging opportunities, among other things. An implementation with
support for this will be discussed in Chapter~\ref{chap:tail-latency},
and shown to perform significantly better. Even with this rudamentary
implementation of concurrency, however, \texttt{DE-IRS} is able to
outperform \texttt{AB-tree} under all conditions tested.