1 files changed, 42 insertions, 25 deletions
diff --git a/chapters/sigmod23/exp-extensions.tex b/chapters/sigmod23/exp-extensions.tex
index d929e92..62f15f4 100644
--- a/chapters/sigmod23/exp-extensions.tex
+++ b/chapters/sigmod23/exp-extensions.tex
@@ -12,29 +12,46 @@
     \label{fig:irs-extensions}
 \end{figure*}
 
-Proof of concept implementations of external and concurrent extensions were
-also tested for IRS queries. Figures \ref{fig:ext-sample} and
-\ref{fig:ext-insert} show the performance of the external DE-IRS sampling index
-against AB-tree. DE-IRS was configured with 4 in-memory levels, using at most
-350 MiB of memory in testing, including bloom filters. { 
-For DE-IRS, the \texttt{O\_DIRECT} flag was used to disable OS caching, and
-CGroups were used to limit process memory to 1 GiB to simulate a memory
-constrained environment.  The AB-tree implementation tested
-had a cache, which was configured with a memory budget of 64 GiB. This extra
-memory was provided to be fair to AB-tree. Because it uses per-sample 
-tree-traversals, it is much more reliant on caching for good performance. DE-IRS was
-tested without a caching layer.} The tests were performed with 4 billion (80 GiB)
-{and 8 billion (162 GiB) uniform  and zipfian
-records}, and 2.6 billion (55 GiB) OSM records. DE-IRS outperformed the AB-tree
-by over an order of magnitude in both insertion and sampling performance.
+We also tested our proof-of-concept implementations for external and
+concurrent extensions to the dynamization framework, as discussed
+in Section~\ref{sec:discussion}. First, we'll consider the external
+version of \texttt{DE-IRS}, compared with \texttt{AB-tree}. For this
+test, we configured \texttt{DE-IRS} to store the first 4 levels in
+memory, and the remainder on disk. This configuration resulted in the
+use of at most 350 MiB of memory (including the mutable buffer and
+Bloom filters). We used tagging for deletes, to avoid random writes,
+but otherwise used the same standardized configuration parameters as
+the previous tests. We use \texttt{O\_DIRECT} to disable OS caching and
+CGroups to constrain the process to 1 GiB of memory to simulate a memory
+constrained environment. \texttt{DE-IRS} did not use any caching layer,
+however we did enable a 64 GiB cache for AB-tree. This was to be fair,
+as AB-tree requires per-sample tree traversals and thus is much more
+reliant on caching for good performance. We performed these tests with
+larger data sets: 4 billion and 8 billion record synthetic datasets,
+which were 80 GiB and 162 GiB in size respective, and the full 2.6
+billion records of the OSM dataset, which was 55 GiB in size.
 
-Finally, Figures~\ref{fig:con-latency} and \ref{fig:con-tput} show the
-multi-threaded insertion performance of the in-memory DE-IRS index with
-concurrency support, compared to AB-tree running entirely in memory, using the
-synthetic uniform dataset. Note that in Figure~\ref{fig:con-latency}, some of
-the AB-tree results are cut off, due to having significantly lower throughput
-and higher latency compared with the DE-IRS. Even without concurrent
-merging, the framework shows linear scaling up to 4 threads of insertion,
-before leveling off; throughput remains flat even up to 32 concurrent
-insertion threads. An implementation with support for concurrent merging would
-scale even better.
+The results of this testing can be see in Figures~\ref{fig:ext-sample}
+and \ref{fig:ext-insert}. Despite using significantly less memory and
+having no caching layer, \texttt{DE-IRS} was able to handily defeat the
+dynamic baseline in both sampling and update performance.
+
+Finally, we tested the multi-threaded insertion performance of our
+in-memory, concurrent implementation of \texttt{DE-IRS} compared to
+\texttt{AB-tree} configured to run entirely in memory. We used the
+synthetic uniform dataset (1B records) for this testing, and introduced a
+slight delay between inserts to avoid bottlenecking on the fetch-and-add
+within the mutable buffer. Figure~\ref{fig:con-latency} shows the latency
+vs. throughput curves for the two structures. Note that \texttt{AB-tree}'s
+results are cut off by the y-axis, as it performs significantly worse than
+\texttt{DE-IRS}. Figure~\ref{fig:con-tput} shows the insertion throughput
+as additional insertion threads are added. Both plots show linear scaling
+up to 3 or 4 threads, before the throughput levels off. Further, even
+with as many as 32 threads, the system is able to maintain a stable
+insertion throughput. Note that this implementation of concurrency
+is incredibly rudamentary, and doesn't take advantage of concurrent
+merging opportunities, among other things. An implementation with
+support for this will be discussed in Chapter~\ref{chap:tail-latency},
+and shown to perform significantly better. Even with this rudamentary
+implementation of concurrency, however, \texttt{DE-IRS} is able to
+outperform \texttt{AB-tree} under all conditions tested.