\subsection{Framework Design Space Exploration}
\label{ssec:ds-exp}

The proposed framework brings with it a large design space, described in
Section~\ref{ssec:design-space}. First, this design space will be examined
using a standardized benchmark to measure the average insertion throughput and
sampling latency of DE-WSS at several points within this space. Tests were run
using a random selection of 500 million records from the OSM dataset, with the
index warmed up by the insertion of 10\% of the total records prior to
beginning any measurement. Over the course of the insertion period, 5\% of the
records were deleted, except for the tests in
Figures~\ref{fig:insert_delete_prop}, \ref{fig:sample_delete_prop}, and
\ref{fig:bloom}, in which 25\% of the records were deleted. Reported update
throughputs were calculated using both inserts and deletes, following the
warmup period. The standard values
used for parameters not being varied in a given test were $s = 6$, $N_b =
12000$, $k=1000$, and $\delta = 0.05$, with buffer rejection sampling.

\begin{figure*}
    \centering
    \subfloat[Insertion Throughput vs. Mutable Buffer Capacity]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-mt-insert} \label{fig:insert_mt}}
    \subfloat[Insertion Throughput vs. Scale Factor]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-sf-insert} \label{fig:insert_sf}} \\ 

    \subfloat[Insertion Throughput vs.\\Max Delete Proportion]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-tp-insert} \label{fig:insert_delete_prop}} 
    \subfloat[Per 1000 Sampling Latency vs.\\Mutable Buffer Capacity]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-mt-sample} \label{fig:sample_mt}}  \\ 

    \caption{DE-WSS Design Space Exploration I} 
    \label{fig:parameter-sweeps1}
\end{figure*}

The results of this testing are displayed in 
Figures~\ref{fig:parameter-sweeps1},~\ref{fig:parameter-sweeps2},~and:wq~\ref{fig:parameter-sweeps3}.
The two largest contributors to differences in performance were the selection
of layout policy and of delete policy. Figures~\ref{fig:insert_mt} and
\ref{fig:insert_sf} show that the choice of layout policy plays a larger role
than delete policy in insertion performance, with tiering outperforming
leveling in both configurations. The situation is reversed in sampling
performance, seen in Figure~\ref{fig:sample_mt} and \ref{fig:sample_sf}, where
the performance difference between layout policies is far less than between
delete policies.

The values used for the scale factor and buffer size have less influence than
layout and delete policy. Sampling performance is largely independent of them
over the ranges of values tested, as shown in Figures~\ref{fig:sample_mt} and
\ref{fig:sample_sf}. This isn't surprising, as these parameters adjust the
number of shards, which only contributes to shard alias construction time
during sampling and is is amortized over all samples taken in a query. The
buffer also contributes rejections, but the cost of a rejection is small and
the buffer constitutes only a small portion of the total weight, so these are
negligible. However, under tombstones there is an upward trend in latency with
buffer size, as delete checks occasionally require a full buffer scan. The
effect of buffer size on insertion is shown in Figure~\ref{fig:insert_mt}.
{  There is only a small improvement in insertion performance as the mutable
buffer grows. This is because a larger buffer results in fewer reconstructions,
but these reconstructions individually take longer, and so the net positive
effect is less than might be expected.} Finally, Figure~\ref{fig:insert_sf}
shows the effect of scale factor on insertion performance. As expected, tiering
performs better with higher scale factors, whereas the insertion performance of
leveling trails off as the scale factor is increased, due to write
amplification. 

\begin{figure*}
    \centering
    \subfloat[Per 1000 Sampling Latency vs. Scale Factor]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-sf-sample} \label{fig:sample_sf}}
    \subfloat[Per 1000 Sampling Latency vs. Max Delete Proportion]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-tp-sample}\label{fig:sample_delete_prop}} \\
    \caption{DE-WSS Design Space Exploration II}
    \label{fig:parameter-sweeps2}
\end{figure*}

Figures~\ref{fig:insert_delete_prop} and \ref{fig:sample_delete_prop} show the
cost of maintaining $\delta$ with a base delete rate of 25\%. The low cost of
an in-memory sampling rejection results in only a slight upward trend in the
sampling latency as the number of deleted records increases. While compaction
is necessary to avoid pathological cases, there does not seem to be a
significant benefit to aggressive compaction thresholds.
Figure~\ref{fig:insert_delete_prop} shows the effect of compactions on insert
performance. There is little effect on performance under tagging, but there is
a clear negative performance trend associated with aggressive compaction when
using tombstones. Under tagging, a single compaction is guaranteed to remove
all deleted records on a level, whereas with tombstones a compaction can
cascade for multiple levels before the delete bound is satisfied, resulting in
a larger cost per incident.

\begin{figure*}
    \centering
    \subfloat[Sampling Latency vs. Sample Size]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-samplesize} \label{fig:sample_k}}
    \subfloat[Per 1000 Sampling Latency vs. Bloom Filter Memory]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-bloom}\label{fig:bloom}} \\
    \caption{DE-WSS Design Space Exploration III}
    \label{fig:parameter-sweeps3}
\end{figure*}

Figure~\ref{fig:bloom} demonstrates the trade-off between memory usage for
Bloom filters and sampling performance under tombstones. This test was run
using 25\% incoming deletes with no compaction, to maximize the number of
tombstones within the index as a worst-case scenario. As expected, allocating
more memory to Bloom filters, decreasing their false positive rates,
accelerates sampling. Finally, Figure~\ref{fig:sample_k} shows the relationship
between average per sample latency and the sample set size. It shows the effect
of amortizing the initial shard alias setup work across an increasing number of
samples, with $k=100$ as the point at which latency levels off.

Based upon these results, a set of parameters was established for the extended
indexes, which is used in the next section for baseline comparisons. This
standard configuration uses tagging as the delete policy and tiering as the
layout policy, with $k=1000$, $N_b = 12000$, $\delta = 0.05$, and $s = 6$.