\section{Evaluation} \label{sec:experiment} \Paragraph{Experimental Setup.} All experiments were run under Ubuntu 20.04 LTS on a dual-socket Intel Xeon Gold 6242R server with 384 GiB of physical memory and 40 physical cores. External tests were run using a 4 TB WD Red SA500 SATA SSD, rated for 95000 and 82000 IOPS for random reads and writes respectively. \Paragraph{Datasets.} Testing utilized a variety of synthetic and real-world datasets. For all datasets used, the key was represented as a 64-bit integer, the weight as a 64-bit integer, and the value as a 32-bit integer. Each record also contained a 32-bit header. The weight was omitted from IRS testing. Keys and weights were pulled from the dataset directly, and values were generated separately and were unique for each record. The following datasets were used, \begin{itemize} \item \textbf{Synthetic Uniform.} A non-weighted, synthetically generated list of keys drawn from a uniform distribution. \item \textbf{Synthetic Zipfian.} A non-weighted, synthetically generated list of keys drawn from a Zipfian distribution with a skew of $0.8$. \item \textbf{Twitter~\cite{data-twitter,data-twitter1}.} $41$ million Twitter user ids, weighted by follower counts. \item \textbf{Delicious~\cite{data-delicious}.} $33.7$ million URLs, represented using unique integers, weighted by the number of associated tags. \item \textbf{OSM~\cite{data-osm}.} $2.6$ billion geospatial coordinates for points of interest, collected by OpenStreetMap. The latitude, converted to a 64-bit integer, was used as the key and the number of its associated semantic tags as the weight. \end{itemize} The synthetic datasets were not used for weighted experiments, as they do not have weights. For unweighted experiments, the Twitter and Delicious datasets were not used, as they have uninteresting key distributions. \Paragraph{Compared Methods.} In this section, indexes extended using the framework are compared against existing dynamic baselines. Specifically, DE-WSS (Section~\ref{ssec:wss-struct}), DE-IRS (Section~\ref{ssec:irs-struct}), and DE-WIRS (Section~\ref{ssec:irs-struct}) are examined. In-memory extensions are compared against the B+tree with aggregate weight tags on internal nodes (AGG B+tree) \cite{olken95} and concurrent and external extensions are compared against the AB-tree \cite{zhao22}. Sampling performance is also compared against comparable static sampling indexes: the alias structure \cite{walker74} for WSS, the in-memory ISAM tree for IRS, and the alias-augmented B+tree \cite{afshani17} for WIRS. Note that all structures under test, with the exception of the external DE-IRS and external AB-tree, were contained entirely within system memory. All benchmarking code and data structures were implemented using C++17 and compiled using gcc 11.3.0 at the \texttt{-O3} optimization level. The extension framework itself, excluding the shard implementations and utility headers, consisted of a header-only library of about 1200 SLOC.