diff options
Diffstat (limited to 'chapters/sigmod23/exp-extensions.tex')
| -rw-r--r-- | chapters/sigmod23/exp-extensions.tex | 67 |
1 files changed, 42 insertions, 25 deletions
diff --git a/chapters/sigmod23/exp-extensions.tex b/chapters/sigmod23/exp-extensions.tex index d929e92..62f15f4 100644 --- a/chapters/sigmod23/exp-extensions.tex +++ b/chapters/sigmod23/exp-extensions.tex @@ -12,29 +12,46 @@ \label{fig:irs-extensions} \end{figure*} -Proof of concept implementations of external and concurrent extensions were -also tested for IRS queries. Figures \ref{fig:ext-sample} and -\ref{fig:ext-insert} show the performance of the external DE-IRS sampling index -against AB-tree. DE-IRS was configured with 4 in-memory levels, using at most -350 MiB of memory in testing, including bloom filters. { -For DE-IRS, the \texttt{O\_DIRECT} flag was used to disable OS caching, and -CGroups were used to limit process memory to 1 GiB to simulate a memory -constrained environment. The AB-tree implementation tested -had a cache, which was configured with a memory budget of 64 GiB. This extra -memory was provided to be fair to AB-tree. Because it uses per-sample -tree-traversals, it is much more reliant on caching for good performance. DE-IRS was -tested without a caching layer.} The tests were performed with 4 billion (80 GiB) -{and 8 billion (162 GiB) uniform and zipfian -records}, and 2.6 billion (55 GiB) OSM records. DE-IRS outperformed the AB-tree -by over an order of magnitude in both insertion and sampling performance. +We also tested our proof-of-concept implementations for external and +concurrent extensions to the dynamization framework, as discussed +in Section~\ref{sec:discussion}. First, we'll consider the external +version of \texttt{DE-IRS}, compared with \texttt{AB-tree}. For this +test, we configured \texttt{DE-IRS} to store the first 4 levels in +memory, and the remainder on disk. This configuration resulted in the +use of at most 350 MiB of memory (including the mutable buffer and +Bloom filters). We used tagging for deletes, to avoid random writes, +but otherwise used the same standardized configuration parameters as +the previous tests. We use \texttt{O\_DIRECT} to disable OS caching and +CGroups to constrain the process to 1 GiB of memory to simulate a memory +constrained environment. \texttt{DE-IRS} did not use any caching layer, +however we did enable a 64 GiB cache for AB-tree. This was to be fair, +as AB-tree requires per-sample tree traversals and thus is much more +reliant on caching for good performance. We performed these tests with +larger data sets: 4 billion and 8 billion record synthetic datasets, +which were 80 GiB and 162 GiB in size respective, and the full 2.6 +billion records of the OSM dataset, which was 55 GiB in size. -Finally, Figures~\ref{fig:con-latency} and \ref{fig:con-tput} show the -multi-threaded insertion performance of the in-memory DE-IRS index with -concurrency support, compared to AB-tree running entirely in memory, using the -synthetic uniform dataset. Note that in Figure~\ref{fig:con-latency}, some of -the AB-tree results are cut off, due to having significantly lower throughput -and higher latency compared with the DE-IRS. Even without concurrent -merging, the framework shows linear scaling up to 4 threads of insertion, -before leveling off; throughput remains flat even up to 32 concurrent -insertion threads. An implementation with support for concurrent merging would -scale even better. +The results of this testing can be see in Figures~\ref{fig:ext-sample} +and \ref{fig:ext-insert}. Despite using significantly less memory and +having no caching layer, \texttt{DE-IRS} was able to handily defeat the +dynamic baseline in both sampling and update performance. + +Finally, we tested the multi-threaded insertion performance of our +in-memory, concurrent implementation of \texttt{DE-IRS} compared to +\texttt{AB-tree} configured to run entirely in memory. We used the +synthetic uniform dataset (1B records) for this testing, and introduced a +slight delay between inserts to avoid bottlenecking on the fetch-and-add +within the mutable buffer. Figure~\ref{fig:con-latency} shows the latency +vs. throughput curves for the two structures. Note that \texttt{AB-tree}'s +results are cut off by the y-axis, as it performs significantly worse than +\texttt{DE-IRS}. Figure~\ref{fig:con-tput} shows the insertion throughput +as additional insertion threads are added. Both plots show linear scaling +up to 3 or 4 threads, before the throughput levels off. Further, even +with as many as 32 threads, the system is able to maintain a stable +insertion throughput. Note that this implementation of concurrency +is incredibly rudamentary, and doesn't take advantage of concurrent +merging opportunities, among other things. An implementation with +support for this will be discussed in Chapter~\ref{chap:tail-latency}, +and shown to perform significantly better. Even with this rudamentary +implementation of concurrency, however, \texttt{DE-IRS} is able to +outperform \texttt{AB-tree} under all conditions tested. |