chapters/sigmod23/exp-parameter-space.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156

\subsection{Design Space Exploration}
\label{ssec:ds-exp}

Our proposed framework has a large design space, which we briefly
described in Section~\ref{ssec:design-space}. The contents of this
space will be described in much more detail in Chapter~\ref{chap:design-space},
but as part of this work we did perform an experimental examination of our
framework to compare insertion throughput and query latency over various
points within the space.

We examined this design space by considering \texttt{DE-WSS} specifically,
using a random sample of $500,000,000$ records from the \texttt{OSM}
dataset. Prior to taking any measurements, we warmed the structure up by
inserting 10\% of the total records in the set. We then measured the
update throughput over the course of the insertion of the remaining
records, randomly intermixing delete operations of 5\% of the
total data. In the tests for Figures~\ref{fig:insert_delete_prop},
\ref{fig:sample_delete_prop}, and \ref{fig:bloom}, we instead deleted
25\% of the data.

The reported update throughputs were calculated based on all of the
inserts and deletes following the warmup, executed on a single thread.
Query latency numbers were measured after all of the inserts and
deletes had been completed. We used standardized values of $s = 6$,
$N_b = 12000$, $k = 1000$ and $\delta = 0.05$ for parameters not be
varied in a given test, and all buffer queries were answered using
rejection sampling.  We show the results of this testing in
Figures~\ref{fig:parameter-sweeps1}, \ref{fig:parameter-sweeps2}, and
\ref{fig:parameter-sweeps3}.

\begin{figure*}
    \centering
    \subfloat[Insertion Throughput vs. Mutable Buffer Capacity]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-mt-insert} \label{fig:insert_mt}}
    \subfloat[Insertion Throughput vs. Scale Factor]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-sf-insert} \label{fig:insert_sf}} \\

    \subfloat[Per 1000 Sampling Latency vs.\\Mutable Buffer Capacity]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-mt-sample} \label{fig:sample_mt}} 
    \subfloat[Per 1000 Sampling Latency vs. Scale Factor]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-sf-sample} \label{fig:sample_sf}}

    \caption{DE-WSS Design Space Exploration: Major Parameters} 
    \label{fig:parameter-sweeps1}
\end{figure*}

We first note that the two largest contributors to performance
differences across all of the tests was the selection of layout and delete
policy. In particular, Figures~\ref{fig:insert_mt} and \ref{fig:insert_sf}
demonstrate that layout policy plays a very significant role in insertion
performance, with tiering outperforming leveling for both delete
policies. The next largest effect was the delete policy selection,
with tombstone deletes outperforming tagged deletes in insertion
performance. This result aligns with the asymptotic analysis of the two
approaches in Section~\ref{sampling-deletes}. It is interesting to note
however that the effect of layout policy was more significant in these
particular tests,\footnote{
    Although the largest performance gap in absolute terms was between
    tiering with tombstones and tiering with tagging, the selection of
    delete policy was not enough to overcome the relative difference
    between leveling and tiering in these tests, hence us labeling the
    layout policy as more significant.
} despite both layout policies having the same asymptotic performance.
This was likely due to the small amount of deletes (only 5\% of the total
operations) reducing their effect on the overall throughput. 

The influence of scale factor on update performance is shown in
Figure~\ref{fig:insert_sf}. The effect is different depending on the
layout policy, with larger scale factors benefitting update performance
under tiering, and hurting it under leveling. The effect of the mutable
buffer size on insertion, shown in Figure~\ref{fig:insert_mt}, is a little
less clear, but does show a slight upward trend, with larger buffers
enhancing update performance in all cases. A larger buffer results in
fewer reconstructions, but increases the size of these reconstructions,
so the effect isn't as large as one might initially expect.

Query performance follows broadly opposite trends to updates. We see in
Figures~\ref{fig:sample_sf} and \ref{fig:sample_mt} that query latency
is better under leveling than tiering, and that tagging is better than
tombstones. More interestingly, the relative effect of the two decisions
is also different. Here, the selection of delete policy has a larger
effect than layout policy, in the sense that the better layout policy
(leveling) with the worse delete policy (tombstones), loses to the worse
layout policy (tiering) with the better delete policy (tagging). In fact,
under tagging, the performance difference between the two layout policies
is almost indistinguishable. 

Scale factor, shown in Figure~\ref{fig:sample_sf} has very little
effect on query performance. Thus, in this context, is would appear
that the scale factor is primarily useful as an insertion performance
tuning tool. The mutable buffer size, in Figure~\ref{fig:sample_mt},
also generally has no clear effect. This is expected, because the buffer
contains onyl a small number of records relative to the entire dataset,
and so has a fairly low probability of being selected for drawing
a sample from. Even when it is selected, rejection sampling is very
inexpensive. The one exception to this trend is when using tombstones,
where the query performance degrades as the buffer size grows. This is
because the rejection check process for tombstones requires doing a full
buffer scan for every sample in some cases.

\begin{figure*}
    \centering
    \subfloat[Insertion Throughput vs.\\Max Delete Proportion]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-tp-insert} \label{fig:insert_delete_prop}} 
    \subfloat[Per 1000 Sampling Latency vs. Max Delete Proportion]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-tp-sample}\label{fig:sample_delete_prop}} \\
    \caption{DE-WSS Design Space Exploration: Delete Bounding}
    \label{fig:parameter-sweeps2}
\end{figure*}

We also considered the effect that bounding the proportion of deleted
records within the structure has on performance. In these tests,
25\% of all records were eventually deleted over the course of the
benchmark. Figure~\ref{fig:sample_delete_prop} shows the effect
that maintaining these bounds has on query performance. In our
testing, we saw very little benefit to maintaining more aggressive
bounds on deletes on query performance. This is likely because
the cost of rejecting is relatively small in our query model. It
does have a clear effect on insertion performance, though, as shown
in Figure~\ref{fig:insert_delete_prop}. Under tagging, the cost of
maintaining increasingly tight bounds on deleted records is small, likely
because all deleted records can be dropped by a single reconstruction.
This means both that a violation of the bound can be resolved in a single
compaction, and also that violations of the bound are much less likely to
occur, as each reconstruction removes all deleted records. Tombstone-based
deletes require far more work to remove from the structure, and so we
would expect to see a degradation of insertion performance. Interestingly,
we see the opposite--higher bounds result in improved performance. This is
because of the sheer volume of deleted records having a measurable effect
on the size of the dynamized structure. The more proactive compactions
prune these records, resulting in better performance.

\begin{figure*}
    \centering
    \subfloat[Sampling Latency vs. Sample Size]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-samplesize} \label{fig:sample_k}}
    \subfloat[Per 1000 Sampling Latency vs. Bloom Filter Memory]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-bloom}\label{fig:bloom}} \\
    \caption{DE-WSS Design Space Exploration: Misc.}
    \label{fig:parameter-sweeps3}
\end{figure*}

Finally, we consider two more parameters: memory usage for bloom filters
and the effect of sample set size on query latency. Figure~\ref{fig:bloom}
shows the trade-off between memory allocated to filters and sampling
performance when tombstones are used. Recall that these Bloom filters
are specifically used for tombstones, not for general records, and
are used to accelerate rejection checks of sampled records. In this
test, 25\% of all records were deleted and $\delta$ was set to 0 to
disable all proactive compaction, to present a worst-case scenario in
terms of tombstones. Allocating additional memory to the Bloom filters
decreases their false positive rates, and results in better sampling
performance. Finally, Figure~\ref{fig:sample_k} compares the sample set
size and the average latency of drawing a single sample, to demonstrate
the ability of our procedure to amortize the preliminary work across
multiple samples in a sample set. After a sample set size of $k=100$,
we stop seeing a benefit from increasing the size, indicating the limit
of how much the preliminary work can be effectively amortized.

Based upon the results of this preliminary study, we established a set
of standardized parameters to use for the baseline comparisons in the
remainder of this section. We will use tagging for deletes, tiering as
the layout policy, $k=1000$, $N_b = 12000$, $\delta = 0.5$, and $s =
6$, unless otherwise stated.