1 files changed, 82 insertions, 33 deletions
diff --git a/chapters/sigmod23/experiment.tex b/chapters/sigmod23/experiment.tex
index 75cf32e..4dbb4c2 100644
--- a/chapters/sigmod23/experiment.tex
+++ b/chapters/sigmod23/experiment.tex
@@ -1,18 +1,36 @@
 \section{Evaluation}
 \label{sec:experiment}
+In this section, we provide comprehensive performance benchmarks
+of implementations of the dynamized structures discussed in
+Sections~\ref{sec:instance} and \ref{sec:discussion}. All of the code was
+written using C++17. The full implementations, including benchmarking
+code, are available on GitHub on the Modified BSD License, at 
+\url{https://github.com/psu-db/sampling-extension-original}.\footnote{
+	We also provide a ``cleaner'' implementation for WSS and WIRS,
+	with a structure and nomenclature better aligned with this
+	chapter, here: \url{https://github.com/psu-db/sampling-extension}.
+}
 
-\Paragraph{Experimental Setup.} All experiments were run under Ubuntu 20.04 LTS
-on a dual-socket Intel Xeon Gold 6242R server with 384 GiB of physical memory
-and 40 physical cores. External tests were run using a 4 TB WD Red SA500 SATA
-SSD, rated for 95000 and 82000 IOPS for random reads and writes respectively. 
-
-\Paragraph{Datasets.} Testing utilized a variety of synthetic and real-world
-datasets. For all datasets used, the key was represented as a 64-bit integer,
-the weight as a 64-bit integer, and the value as a 32-bit integer. Each record
-also contained a 32-bit header. The weight was omitted from IRS testing.
-Keys and weights were pulled from the dataset directly, and values were
-generated separately and were unique for each record. The following datasets 
-were used,
+\Paragraph{Experimental Setup.} We ran all of our experiments on Ubuntu
+20.04 LTS using a server equipped with dual socket Intel Xeon Gold 6242R
+processes with 40 physical cores and 384 GiB of physical memory. We
+performed testing of external structures with a 4 TB WD Red SA500 SATA
+drive rated at 95000 IOPS for random reads and 82000 IOPS for random
+writes All benchmarking code was compiled with GCC version 11.3.0 with
+the \texttt{-O3} optimization level.
+
+\Paragraph{Datasets.} We used a variety of synthetic and real-world
+datasets of various distributions to test sampling performance. For all
+of our datasets, we treated the data as a sequence of key-value pairs
+with a 64-bit integer key and a 32-bit integer value. Our dynamizations
+introduced a 32-bit header to each record as well. This header was not
+added to records when testing dynamic baselines. Additionally, weighted
+testing attached a 64-bit integer weight to each record. This weight was
+not included in the record for non-weighted testing. The weights and
+keys were both used directly from the datasets, and values were added
+seperately and unique to each record.
+
+We used the following datasets for testing,
 \begin{itemize}
 \item \textbf{Synthetic Uniform.} A non-weighted, synthetically generated list 
                                   of keys drawn from a uniform distribution.
@@ -23,26 +41,57 @@ were used,
 \item \textbf{Delicious~\cite{data-delicious}.} $33.7$ million URLs, represented using unique integers, 
                           weighted by the number of associated tags.
 \item \textbf{OSM~\cite{data-osm}.} $2.6$ billion geospatial coordinates for points
-                    of interest, collected by OpenStreetMap. The latitude, converted
-                    to a 64-bit integer, was used as the key and the number of
+                    of interest, collected by OpenStreetMap. We used the latitude, converted
+                    to a 64-bit integer, as the key and the number of
                     its associated semantic tags as the weight. 
 \end{itemize}
-The synthetic datasets were not used for weighted experiments, as they do not
-have weights. For unweighted experiments, the Twitter and Delicious datasets
-were not used, as they have uninteresting key distributions.
-
-\Paragraph{Compared Methods.} In this section, indexes extended using the
-framework are compared against existing dynamic baselines. Specifically, DE-WSS
-(Section~\ref{ssec:wss-struct}), DE-IRS (Section~\ref{ssec:irs-struct}), and
-DE-WIRS (Section~\ref{ssec:irs-struct}) are examined. In-memory extensions are
-compared against the B+tree with aggregate weight tags on internal nodes (AGG
-B+tree) \cite{olken95} and concurrent and external extensions are compared
-against the AB-tree \cite{zhao22}. Sampling performance is also compared against
-comparable static sampling indexes: the alias structure \cite{walker74} for WSS,
-the in-memory ISAM tree for IRS, and the alias-augmented B+tree \cite{afshani17}
-for WIRS. Note that all structures under test, with the exception of the
-external DE-IRS and external AB-tree, were contained entirely within system
-memory. All benchmarking code and data structures were implemented using  C++17
-and compiled using gcc 11.3.0 at the \texttt{-O3} optimization level. The
-extension framework itself, excluding the shard implementations and utility
-headers, consisted of a header-only library of about 1200 SLOC.
+
+We did not use the synthetic uniform and zipfian data sets for testing
+WSS and WIRS, as these datasets lacked weights. We also did not use the
+Twitter and Delicious datasets for unweighted testing, as they have
+uninteresting key distributions.
+
+\Paragraph{Structures Compared.} As a basis of comparison, we tested
+both our dynamized SSI implementations, and existing dynamic baselines,
+for each sampling problem considered. Specifically, we consider a the
+following dynamized structures,
+\begin{itemize}
+
+\item \textbf{DE-WSS.} An implementation of the dynamized alias
+structure~\cite{walker74} for weighted set sampling discussed
+in Section~\ref{ssec:wss-struct}. We compare this against a WSS
+implementation of Olken's method on a B+Tree with aggregate weight tags
+(\textbf{AGG-BTree})~\cite{olken95}, based on the B+tree implementation
+in the TLX library~\cite{tlx}.
+
+\item \textbf{DE-IRS.} An implementation of the dynamized ISAM tree for
+independent range sampling, discussed in Section~\ref{ssec:irs-struct}. We
+also implement a concurrent version based on our discussion in
+Section~\ref{ssec:ext-concurrency} and an external version from
+Section~\ref{ssec:ext-external}. We compare the external and concurrent
+versions against the AB-tree~\cite{zhao22}, and the single-threaded,
+in memory version was compare with an IRS implementation of Olken's
+method on an AGG-BTree.
+
+\item \textbf{DE-WIRS.} An implementation of the dynamized alias-augmented
+B+Tree~\cite{afshani17} as discussed in Section~\ref{ssec:wirs-struct} for
+weighted indepedent range sampling. We compare this against a WIRS
+implementation of Olken's method on an AGG-BTree.
+
+\end{itemize}
+
+All of the tested structures, with the exception of the external memory
+DE-IRS implementation and AB-Tree, were wholely contained within system
+memory. AB-Tree is a native external structure, so for the in-memory
+concurrency evaluation we configured it with enough cache to maintain
+the entire structure in memory to simulate an in-memory implementation.\footnote{
+	Because of the nature of sampling queries, traditional
+	efficient locking techniques for B+Trees are not able to be
+	used~\cite{zhao22}. The alternatives were to run AB-Tree in this
+	manner, or to globally lock the B+Tree for every operation. We
+	elected to use the former approach for this chapter. We used the
+	latter approach in the next chapter. 
+}
+
+
+