chapters/sigmod23/exp-extensions.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

\subsection{External and Concurrent Extensions} 

\begin{figure*}[h]%
    \centering
    \subfloat[External Insertion Throughput]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-ext-insert.pdf} \label{fig:ext-insert}} 
    \subfloat[External Sampling Latency]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-bs-ext-sample.pdf} \label{fig:ext-sample}} \\

    \subfloat[Concurrent Insert Latency vs. Throughput]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-cc-irs-scale} \label{fig:con-latency}}
    \subfloat[Concurrent Insert Throughput vs. Thread Count]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-cc-irs-thread} \label{fig:con-tput}}
    
    \caption{External and Concurrent Extensions of DE-IRS}
    \label{fig:irs-extensions}
\end{figure*}

Proof of concept implementations of external and concurrent extensions were
also tested for IRS queries. Figures \ref{fig:ext-sample} and
\ref{fig:ext-insert} show the performance of the external DE-IRS sampling index
against AB-tree. DE-IRS was configured with 4 in-memory levels, using at most
350 MiB of memory in testing, including bloom filters. { 
For DE-IRS, the \texttt{O\_DIRECT} flag was used to disable OS caching, and
CGroups were used to limit process memory to 1 GiB to simulate a memory
constrained environment.  The AB-tree implementation tested
had a cache, which was configured with a memory budget of 64 GiB. This extra
memory was provided to be fair to AB-tree. Because it uses per-sample 
tree-traversals, it is much more reliant on caching for good performance. DE-IRS was
tested without a caching layer.} The tests were performed with 4 billion (80 GiB)
{and 8 billion (162 GiB) uniform  and zipfian
records}, and 2.6 billion (55 GiB) OSM records. DE-IRS outperformed the AB-tree
by over an order of magnitude in both insertion and sampling performance.

Finally, Figures~\ref{fig:con-latency} and \ref{fig:con-tput} show the
multi-threaded insertion performance of the in-memory DE-IRS index with
concurrency support, compared to AB-tree running entirely in memory, using the
synthetic uniform dataset. Note that in Figure~\ref{fig:con-latency}, some of
the AB-tree results are cut off, due to having significantly lower throughput
and higher latency compared with the DE-IRS. Even without concurrent
merging, the framework shows linear scaling up to 4 threads of insertion,
before leveling off; throughput remains flat even up to 32 concurrent
insertion threads. An implementation with support for concurrent merging would
scale even better.