diff options
Diffstat (limited to 'chapters/sigmod23/exp-baseline.tex')
| -rw-r--r-- | chapters/sigmod23/exp-baseline.tex | 148 |
1 files changed, 86 insertions, 62 deletions
diff --git a/chapters/sigmod23/exp-baseline.tex b/chapters/sigmod23/exp-baseline.tex index 9e7929c..da62766 100644 --- a/chapters/sigmod23/exp-baseline.tex +++ b/chapters/sigmod23/exp-baseline.tex @@ -1,16 +1,17 @@ \subsection{Comparison to Baselines} -Next, the performance of indexes extended using the framework is compared -against tree sampling on the aggregate B+tree, as well as problem-specific -SSIs for WSS, WIRS, and IRS queries. Unless otherwise specified, IRS and WIRS -queries were executed with a selectivity of $0.1\%$ and 500 million randomly -selected records from the OSM dataset were used. The uniform and zipfian -synthetic datasets were 1 billion records in size. All benchmarks warmed up the -data structure by inserting 10\% of the records, and then measured the -throughput inserting the remaining records, while deleting 5\% of them over the -course of the benchmark. Once all records were inserted, the sampling -performance was measured. The reported update throughputs were calculated using -both inserts and deletes, following the warmup period. +Next, we compared the performance of our dynamized sampling indices with +Olken's method on an aggregate B+Tree. We also examine the query performance +of a single instance of the SSI in question to establish how much query +performance is lost in the dynamization. Unless otherwise specified, +IRS and WIRS queries are run with a selectivity of $0.1\%$. Additionally, +the \texttt{OSM} dataset was downsampled to 500 million records, except +for scalability tests. The synthetic uniform and zipfian datasets were +generated with 1 billion records. As with the previous section, all +benchmarks began by warming up the structure with $10\%$ of the total +records, and then update performance was measured over the insertion of +the remaining records, including a mix of $5\%$ deletes. Query performance +was measured following the insertion of all records. \begin{figure*} \centering @@ -21,15 +22,25 @@ both inserts and deletes, following the warmup period. \caption{Framework Comparisons to Baselines for WSS} \end{figure*} -Starting with WSS, Figure~\ref{fig:wss-insert} shows that the DE-WSS structure -is competitive with the AGG B+tree in terms of insertion performance, achieving -about 85\% of the AGG B+tree's insertion throughput on the Twitter dataset, and -beating it by similar margins on the other datasets. In terms of sampling -performance in Figure~\ref{fig:wss-sample}, it beats the B+tree handily, and -compares favorably to the static alias structure. Figures~\ref{fig:wss-insert-s} -and \ref{fig:wss-sample-s} show the performance scaling of the three structures as -the dataset size increases. All of the structures exhibit the same type of -performance degradation with respect to dataset size. +We'll begin with WSS. Figure~\ref{fig:wss-insert} shows that +\texttt{DE-WSS} achieves about $85\%$ of \texttt{AGG B+Tree}'s insertion +throughput on the \texttt{twitter} dataset, and outright defeats it on the +others. Its sampling performance, shown in Figure~\ref{fig:wss-sample} +is also clearly superior to Olken's method, and is quite close +to a single instance of the alias structure, indicating that the +overhead due to the dynamization is quite low. We also considered +the scalability of \texttt{DE-WSS} as the data size increases, +in Figures~\ref{fig:wss-insert-s} and \ref{fig:wss-sample-s}. These +tests were run with random samples of the \texttt{OSM} dataset of the +specified sizes, and show that \texttt{DE-WSS} maintains its advantage +over \texttt{AGG B+Tree} across a range of data sizes. One interesting +point on Figure~\ref{fig:wss-sample-s} is the final data point for +the alias structure, which is \emph{worse} than \texttt{DE-WSS}. This +point consistently reproduced, and we believe it is because of NUMA. The +2 billion records were large enough that the alias structure built from +them spanned two NUMA nodes on our server, whereas the dynamized structure +was broken into pieces, none of which individually spanned a NUMA node, +resulting in better performance. \begin{figure*} \centering @@ -38,16 +49,22 @@ performance degradation with respect to dataset size. \caption{Framework Comparison to Baselines for WIRS} \end{figure*} -Figures~\ref{fig:wirs-insert} and \ref{fig:wirs-sample} show the performance of -the DE-WIRS index, relative to the AGG B+tree and the alias-augmented B+tree. This -example shows the same pattern of behavior as was seen with DE-WSS, though the -margin between the DE-WIRS and its corresponding SSI is much narrower. -Additionally, the constant factors associated with the construction cost of the -alias-augmented B+tree are much larger than the alias structure. The loss of -insertion performance due to this is seen clearly in Figure~\ref{fig:wirs-insert}, where -the margin of advantage between DE-WIRS and the AGG B+tree in insertion -throughput shrinks compared to the DE-WSS index, and the AGG B+tree's advantage -on the Twitter dataset is expanded. +In Figures~\ref{fig:wirs-insert} and \ref{fig:wirs-sample} we examine +the performed of \texttt{DE-WIRS} compared to \text{AGG B+TreE} and an +alias-augmented B+Tree. We see the same basic set of patterns in this +case as we did with WSS. \texttt{AGG B+Tree} defeats our dynamized +index on the \texttt{twitter} dataset, but loses on the others, in +terms of insertion performance. We can see that the alias-augmented +B+Tree is much more expensive to build than an alias structure, and +so its insertion performance advantage is erroded somewhat compared to +the dynamic structure. For queries we see that the \texttt{AGG B+Tree} +performs similarly for WIRS sampling as it did for WSS sampling, but the +alias-augmented B+Tree structure is quite a bit slower at WIRS than the +alias structure was at WSS. This results in \texttt{DE-WIRS} defeating +the dynamic baseline by less of a margin in this test, but it still is +superior in terms of sampling performance, and is still quite close in +performance to the static structure, indicating relatively low overhead +being introduced by the dynamization. \begin{figure*} \subfloat[Insertion Scalability vs. Baselines]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-sc-irs-insert} \label{fig:irs-insert-s}} @@ -61,38 +78,45 @@ on the Twitter dataset is expanded. \caption{Framework Comparison to Baselines for IRS} \end{figure*} -Finally, Figures~\ref{fig:irs-insert1} and \ref{fig:irs-sample1} show a -comparison of the in-memory DE-IRS index against the in-memory ISAM tree and the AGG -B+tree for answering IRS queries. The cost of bulk-loading the ISAM tree is less -than the cost of building the alias structure, or the alias-augmented B+tree, and -so here DE-IRS defeats the AGG B+tree by wider margins in insertion throughput, -though the margin narrows significantly in terms of sampling performance -advantage. -DE-IRS was further tested to evaluate scalability. -Figure~\ref{fig:irs-insert-s} shows average insertion throughput, -Figure~\ref{fig:irs-delete} shows average delete latency (under tagging), and -Figure~\ref{fig:irs-sample-s} shows average sampling latencies for DE-IRS and -AGG B+tree over a range of data sizes. In all cases, DE-IRS and B+tree show -similar patterns of performance degradation as the datasize grows. Note that -the delete latencies of DE-IRS are worse than AGG B+tree, because of the B+tree's -cheaper point-lookups. +We next considered IRS queries. Figures~\ref{fig:irs-insert1} and +\ref{fig:irs-sample1} show the results of our testing of single-threaded +\texttt{DE-IRS} running in-memory against the in-memory ISAM Tree and +\texttt{AGG B+treE}. The ISAM tree structure can be efficiently bulk-loaded, +which results in a much faster construction time than the alias structure +or alias-augmented B+tree. This gives it a significant update performance +advantage, and we see in Figure~\ref{fig:irs-insert1} that \texttt{DE-IRS} +beats \texttt{AGG B+tree} by a significant margin in terms of insertion +throughput. However, its query performance is significantly worse than +the static baseline, and it defeats the B+tree by only a small margin in +sampling latency across most datasets. Note that the OSM dataset in +these tests is half the size of the synthetic ones, which accounts for +the performance differences. -Figure~\ref{fig:irs-sample-s} -also includes one other point of interest: the sampling performance of -DE-IRS \emph{improves} when the data size grows from one million to ten million -records. While at first glance the performance increase may appear paradoxical, -it actually demonstrates an important result concerning the effect of the -unsorted mutable buffer on index performance. At one million records, the -buffer constitutes approximately 1\% of the total data size; this results in -the buffer being sampled from with greater frequency (as it has more total -weight) than would be the case with larger data. The greater the frequency of -buffer sampling, the more rejections will occur, and the worse the sampling -performance will be. This illustrates the importance of keeping the buffer -small, even when a scan is not used for buffer sampling. Finally, -Figure~\ref{fig:irs-samplesize} shows the decreasing per-sample cost as the -number of records requested by a sampling query grows for DE-IRS, compared to -AGG B+tree. Note that DE-IRS benefits significantly more from batching samples -than AGG B+tree, and that the improvement is greatest up to $k=100$ samples per -query. +We also consider the scalability of inserts, queries, and deletes, of +\texttt{DE-IRS} compared to \texttt{AGG B+tree} across a wide range of +data sizes. Figure~\ref{fig:irs-insert-s} shows that \texttt{DE-IRS}'s +insertion performance scales similarly with datasize as the baseline, and +Figure~\ref{fig:irs-sample-s} tells a similar story for query performance. +Figure~\ref{fig:irs-delete-s} compares the delete performance of the +two structures, where \texttt{DE-IRS} is configured to use tagging. As +expected, the B+tree does perform better here, as it's delete cost is +asymptotically superior to tagging. However, the plot does demonstrate +that tagging delete performance does scale well with data size as well. +Finally, in Figure~\ref{fig:irs-samplesize} shows the effect of sample +set size on average per-sample cost. We see that, for a single sample, +the B+tree is superior to \texttt{DE-IRS} because of the cost of the +preliminary processing that our dynamized structure must do to begin +to answer queries. However, as the sample set size increases, this cost +increasingly begins to pay off, with \texttt{DE-IRS} quickly defeating +the dynamic structure in averge per-sample latency. One other interesting +note is the performance of the static ISAM tree, which begins on-par with +the B+Tree, but also sees an improvement as the sample set size increases. +This is because of cache effects. During the initial tree traversal, both +the B+tree and ISAM tree have a similar number of cache misses. However, +the ISAM tree needs to perform its traversal only once, and then samples +from data that is stored in a compact sorted array, so it benefits strongly +from the cache. Olken's method, in contrast, must perform a full tree +traversal for each sample, so it doesn't see a significant improvement in +per-sample performance as the sample set size grows. |