\subsection{Design Space Exploration} \label{ssec:ds-exp} Our proposed framework has a large design space, which we briefly described in Section~\ref{ssec:design-space}. The contents of this space will be described in much more detail in Chapter~\ref{chap:design-space}, but as part of this work we did perform an experimental examination of our framework to compare insertion throughput and query latency over various points within the space. We examined this design space by considering \texttt{DE-WSS} specifically, using a random sample of $500,000,000$ records from the \texttt{OSM} dataset. Prior to taking any measurements, we warmed the structure up by inserting 10\% of the total records in the set. We then measured the update throughput over the course of the insertion of the remaining records, randomly intermixing delete operations of 5\% of the total data. In the tests for Figures~\ref{fig:insert_delete_prop}, \ref{fig:sample_delete_prop}, and \ref{fig:bloom}, we instead deleted 25\% of the data. The reported update throughputs were calculated based on all of the inserts and deletes following the warmup, executed on a single thread. Query latency numbers were measured after all of the inserts and deletes had been completed. We used standardized values of $s = 6$, $N_b = 12000$, $k = 1000$ and $\delta = 0.05$ for parameters not be varied in a given test, and all buffer queries were answered using rejection sampling. We show the results of this testing in Figures~\ref{fig:parameter-sweeps1}, \ref{fig:parameter-sweeps2}, and \ref{fig:parameter-sweeps3}. \begin{figure*} \centering \subfloat[Insertion Throughput vs. Mutable Buffer Capacity]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-mt-insert} \label{fig:insert_mt}} \subfloat[Insertion Throughput vs. Scale Factor]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-sf-insert} \label{fig:insert_sf}} \\ \subfloat[Per 1000 Sampling Latency vs.\\Mutable Buffer Capacity]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-mt-sample} \label{fig:sample_mt}} \subfloat[Per 1000 Sampling Latency vs. Scale Factor]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-sf-sample} \label{fig:sample_sf}} \caption{DE-WSS Design Space Exploration: Major Parameters} \label{fig:parameter-sweeps1} \end{figure*} We first note that the two largest contributors to performance differences across all of the tests was the selection of layout and delete policy. In particular, Figures~\ref{fig:insert_mt} and \ref{fig:insert_sf} demonstrate that layout policy plays a very significant role in insertion performance, with tiering outperforming leveling for both delete policies. The next largest effect was the delete policy selection, with tombstone deletes outperforming tagged deletes in insertion performance. This result aligns with the asymptotic analysis of the two approaches in Section~\ref{sampling-deletes}. It is interesting to note however that the effect of layout policy was more significant in these particular tests,\footnote{ Although the largest performance gap in absolute terms was between tiering with tombstones and tiering with tagging, the selection of delete policy was not enough to overcome the relative difference between leveling and tiering in these tests, hence us labeling the layout policy as more significant. } despite both layout policies having the same asymptotic performance. This was likely due to the small amount of deletes (only 5\% of the total operations) reducing their effect on the overall throughput. The influence of scale factor on update performance is shown in Figure~\ref{fig:insert_sf}. The effect is different depending on the layout policy, with larger scale factors benefitting update performance under tiering, and hurting it under leveling. The effect of the mutable buffer size on insertion, shown in Figure~\ref{fig:insert_mt}, is a little less clear, but does show a slight upward trend, with larger buffers enhancing update performance in all cases. A larger buffer results in fewer reconstructions, but increases the size of these reconstructions, so the effect isn't as large as one might initially expect. Query performance follows broadly opposite trends to updates. We see in Figures~\ref{fig:sample_sf} and \ref{fig:sample_mt} that query latency is better under leveling than tiering, and that tagging is better than tombstones. More interestingly, the relative effect of the two decisions is also different. Here, the selection of delete policy has a larger effect than layout policy, in the sense that the better layout policy (leveling) with the worse delete policy (tombstones), loses to the worse layout policy (tiering) with the better delete policy (tagging). In fact, under tagging, the performance difference between the two layout policies is almost indistinguishable. Scale factor, shown in Figure~\ref{fig:sample_sf} has very little effect on query performance. Thus, in this context, is would appear that the scale factor is primarily useful as an insertion performance tuning tool. The mutable buffer size, in Figure~\ref{fig:sample_mt}, also generally has no clear effect. This is expected, because the buffer contains onyl a small number of records relative to the entire dataset, and so has a fairly low probability of being selected for drawing a sample from. Even when it is selected, rejection sampling is very inexpensive. The one exception to this trend is when using tombstones, where the query performance degrades as the buffer size grows. This is because the rejection check process for tombstones requires doing a full buffer scan for every sample in some cases. \begin{figure*} \centering \subfloat[Insertion Throughput vs.\\Max Delete Proportion]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-tp-insert} \label{fig:insert_delete_prop}} \subfloat[Per 1000 Sampling Latency vs. Max Delete Proportion]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-tp-sample}\label{fig:sample_delete_prop}} \\ \caption{DE-WSS Design Space Exploration: Delete Bounding} \label{fig:parameter-sweeps2} \end{figure*} We also considered the effect that bounding the proportion of deleted records within the structure has on performance. In these tests, 25\% of all records were eventually deleted over the course of the benchmark. Figure~\ref{fig:sample_delete_prop} shows the effect that maintaining these bounds has on query performance. In our testing, we saw very little benefit to maintaining more aggressive bounds on deletes on query performance. This is likely because the cost of rejecting is relatively small in our query model. It does have a clear effect on insertion performance, though, as shown in Figure~\ref{fig:insert_delete_prop}. Under tagging, the cost of maintaining increasingly tight bounds on deleted records is small, likely because all deleted records can be dropped by a single reconstruction. This means both that a violation of the bound can be resolved in a single compaction, and also that violations of the bound are much less likely to occur, as each reconstruction removes all deleted records. Tombstone-based deletes require far more work to remove from the structure, and so we would expect to see a degradation of insertion performance. Interestingly, we see the opposite--higher bounds result in improved performance. This is because of the sheer volume of deleted records having a measurable effect on the size of the dynamized structure. The more proactive compactions prune these records, resulting in better performance. \begin{figure*} \centering \subfloat[Sampling Latency vs. Sample Size]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-samplesize} \label{fig:sample_k}} \subfloat[Per 1000 Sampling Latency vs. Bloom Filter Memory]{\includegraphics[width=.5\textwidth]{img/sigmod23/plot/fig-ps-wss-bloom}\label{fig:bloom}} \\ \caption{DE-WSS Design Space Exploration: Misc.} \label{fig:parameter-sweeps3} \end{figure*} Finally, we consider two more parameters: memory usage for bloom filters and the effect of sample set size on query latency. Figure~\ref{fig:bloom} shows the trade-off between memory allocated to filters and sampling performance when tombstones are used. Recall that these Bloom filters are specifically used for tombstones, not for general records, and are used to accelerate rejection checks of sampled records. In this test, 25\% of all records were deleted and $\delta$ was set to 0 to disable all proactive compaction, to present a worst-case scenario in terms of tombstones. Allocating additional memory to the Bloom filters decreases their false positive rates, and results in better sampling performance. Finally, Figure~\ref{fig:sample_k} compares the sample set size and the average latency of drawing a single sample, to demonstrate the ability of our procedure to amortize the preliminary work across multiple samples in a sample set. After a sample set size of $k=100$, we stop seeing a benefit from increasing the size, indicating the limit of how much the preliminary work can be effectively amortized. Based upon the results of this preliminary study, we established a set of standardized parameters to use for the baseline comparisons in the remainder of this section. We will use tagging for deletes, tiering as the layout policy, $k=1000$, $N_b = 12000$, $\delta = 0.5$, and $s = 6$, unless otherwise stated.