\subsection{External and Concurrent Extensions} \begin{figure*}[h]% \centering \subfloat[External Insertion Throughput]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-ext-insert.pdf} \label{fig:ext-insert}} \subfloat[External Sampling Latency]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-ext-sample.pdf} \label{fig:ext-sample}} \\ \subfloat[Concurrent Insert Latency vs. Throughput]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-cc-irs-scale} \label{fig:con-latency}} \subfloat[Concurrent Insert Throughput vs. Thread Count]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-cc-irs-thread} \label{fig:con-tput}} \caption{External and Concurrent Extensions of DE-IRS} \label{fig:irs-extensions} \end{figure*} Proof of concept implementations of external and concurrent extensions were also tested for IRS queries. Figures \ref{fig:ext-sample} and \ref{fig:ext-insert} show the performance of the external DE-IRS sampling index against AB-tree. DE-IRS was configured with 4 in-memory levels, using at most 350 MiB of memory in testing, including bloom filters. { For DE-IRS, the \texttt{O\_DIRECT} flag was used to disable OS caching, and CGroups were used to limit process memory to 1 GiB to simulate a memory constrained environment. The AB-tree implementation tested had a cache, which was configured with a memory budget of 64 GiB. This extra memory was provided to be fair to AB-tree. Because it uses per-sample tree-traversals, it is much more reliant on caching for good performance. DE-IRS was tested without a caching layer.} The tests were performed with 4 billion (80 GiB) {and 8 billion (162 GiB) uniform and zipfian records}, and 2.6 billion (55 GiB) OSM records. DE-IRS outperformed the AB-tree by over an order of magnitude in both insertion and sampling performance. Finally, Figures~\ref{fig:con-latency} and \ref{fig:con-tput} show the multi-threaded insertion performance of the in-memory DE-IRS index with concurrency support, compared to AB-tree running entirely in memory, using the synthetic uniform dataset. Note that in Figure~\ref{fig:con-latency}, some of the AB-tree results are cut off, due to having significantly lower throughput and higher latency compared with the DE-IRS. Even without concurrent merging, the framework shows linear scaling up to 4 threads of insertion, before leveling off; throughput remains flat even up to 32 concurrent insertion threads. An implementation with support for concurrent merging would scale even better.