\subsection{Framework Design Space Exploration} \label{ssec:ds-exp} The proposed framework brings with it a large design space, described in Section~\ref{ssec:design-space}. First, this design space will be examined using a standardized benchmark to measure the average insertion throughput and sampling latency of DE-WSS at several points within this space. Tests were run using a random selection of 500 million records from the OSM dataset, with the index warmed up by the insertion of 10\% of the total records prior to beginning any measurement. Over the course of the insertion period, 5\% of the records were deleted, except for the tests in Figures~\ref{fig:insert_delete_prop}, \ref{fig:sample_delete_prop}, and \ref{fig:bloom}, in which 25\% of the records were deleted. Reported update throughputs were calculated using both inserts and deletes, following the warmup period. The standard values used for parameters not being varied in a given test were $s = 6$, $N_b = 12000$, $k=1000$, and $\delta = 0.05$, with buffer rejection sampling. \begin{figure*} \centering \subfloat[Insertion Throughput vs. Mutable Buffer Capacity]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-mt-insert} \label{fig:insert_mt}} \subfloat[Insertion Throughput vs. Scale Factor]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-sf-insert} \label{fig:insert_sf}} \\ \subfloat[Insertion Throughput vs.\\Max Delete Proportion]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-tp-insert} \label{fig:insert_delete_prop}} \subfloat[Per 1000 Sampling Latency vs.\\Mutable Buffer Capacity]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-mt-sample} \label{fig:sample_mt}} \\ \caption{DE-WSS Design Space Exploration I} \label{fig:parameter-sweeps1} \end{figure*} The results of this testing are displayed in Figures~\ref{fig:parameter-sweeps1},~\ref{fig:parameter-sweeps2},~and:wq~\ref{fig:parameter-sweeps3}. The two largest contributors to differences in performance were the selection of layout policy and of delete policy. Figures~\ref{fig:insert_mt} and \ref{fig:insert_sf} show that the choice of layout policy plays a larger role than delete policy in insertion performance, with tiering outperforming leveling in both configurations. The situation is reversed in sampling performance, seen in Figure~\ref{fig:sample_mt} and \ref{fig:sample_sf}, where the performance difference between layout policies is far less than between delete policies. The values used for the scale factor and buffer size have less influence than layout and delete policy. Sampling performance is largely independent of them over the ranges of values tested, as shown in Figures~\ref{fig:sample_mt} and \ref{fig:sample_sf}. This isn't surprising, as these parameters adjust the number of shards, which only contributes to shard alias construction time during sampling and is is amortized over all samples taken in a query. The buffer also contributes rejections, but the cost of a rejection is small and the buffer constitutes only a small portion of the total weight, so these are negligible. However, under tombstones there is an upward trend in latency with buffer size, as delete checks occasionally require a full buffer scan. The effect of buffer size on insertion is shown in Figure~\ref{fig:insert_mt}. { There is only a small improvement in insertion performance as the mutable buffer grows. This is because a larger buffer results in fewer reconstructions, but these reconstructions individually take longer, and so the net positive effect is less than might be expected.} Finally, Figure~\ref{fig:insert_sf} shows the effect of scale factor on insertion performance. As expected, tiering performs better with higher scale factors, whereas the insertion performance of leveling trails off as the scale factor is increased, due to write amplification. \begin{figure*} \centering \subfloat[Per 1000 Sampling Latency vs. Scale Factor]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-sf-sample} \label{fig:sample_sf}} \subfloat[Per 1000 Sampling Latency vs. Max Delete Proportion]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-tp-sample}\label{fig:sample_delete_prop}} \\ \caption{DE-WSS Design Space Exploration II} \label{fig:parameter-sweeps2} \end{figure*} Figures~\ref{fig:insert_delete_prop} and \ref{fig:sample_delete_prop} show the cost of maintaining $\delta$ with a base delete rate of 25\%. The low cost of an in-memory sampling rejection results in only a slight upward trend in the sampling latency as the number of deleted records increases. While compaction is necessary to avoid pathological cases, there does not seem to be a significant benefit to aggressive compaction thresholds. Figure~\ref{fig:insert_delete_prop} shows the effect of compactions on insert performance. There is little effect on performance under tagging, but there is a clear negative performance trend associated with aggressive compaction when using tombstones. Under tagging, a single compaction is guaranteed to remove all deleted records on a level, whereas with tombstones a compaction can cascade for multiple levels before the delete bound is satisfied, resulting in a larger cost per incident. \begin{figure*} \centering \subfloat[Sampling Latency vs. Sample Size]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-samplesize} \label{fig:sample_k}} \subfloat[Per 1000 Sampling Latency vs. Bloom Filter Memory]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-bloom}\label{fig:bloom}} \\ \caption{DE-WSS Design Space Exploration III} \label{fig:parameter-sweeps3} \end{figure*} Figure~\ref{fig:bloom} demonstrates the trade-off between memory usage for Bloom filters and sampling performance under tombstones. This test was run using 25\% incoming deletes with no compaction, to maximize the number of tombstones within the index as a worst-case scenario. As expected, allocating more memory to Bloom filters, decreasing their false positive rates, accelerates sampling. Finally, Figure~\ref{fig:sample_k} shows the relationship between average per sample latency and the sample set size. It shows the effect of amortizing the initial shard alias setup work across an increasing number of samples, with $k=100$ as the point at which latency levels off. Based upon these results, a set of parameters was established for the extended indexes, which is used in the next section for baseline comparisons. This standard configuration uses tagging as the delete policy and tiering as the layout policy, with $k=1000$, $N_b = 12000$, $\delta = 0.05$, and $s = 6$.