\captionsetup[subfloat]{justification=centering} \section{Extensions to the Framework} \label{sec:discussion} In this section, various extensions of the framework are considered. Specifically, the applicability of the framework to external or distributed data structures is discussed, as well as the use of the framework to add automatic support for concurrent updates and sampling to extended SSIs. \Paragraph{Larger-than-Memory Data.} This framework can be applied to external static sampling structures with minimal modification. As a proof-of-concept, the IRS structure was extended with support for shards containing external ISAM trees. This structure supports storing a configurable number of shards in memory, and the rest on disk, making it well suited for operating in memory-constrained environments. The on-disk shards contain standard ISAM trees, with $8\text{KiB}$ page-aligned nodes. The external version of the index only supports tombstone-based deletes, as tagging would require random writes. In principle a hybrid approach to deletes is possible, where a delete first searches the in-memory data for the record to be deleted, tagging it if found. If the record is not found, then a tombstone could be inserted. As the data size grows, though, and the preponderance of data is found on disk, this approach would largely revert to the standard tombstone approach in practice. External settings make the framework even more attractive, in terms of performance characteristics, due to the different cost model. In external data structures, performance is typically measured in terms of the number of IO operations, meaning that much of the overhead introduced by the framework for tasks like querying the mutable buffer, building auxiliary structures, extra random number generations due to the shard alias structure, and the like, become far less significant. Because the framework maintains immutability of shards, it is also well suited for use on top of distributed file-systems or with other distributed data abstractions like RDDs in Apache Spark~\cite{rdd}. Each shard can be encapsulated within an immutable file in HDFS or an RDD in Spark. A centralized control node or driver program can manage the mutable buffer, flushing it into a new file or RDD when it is full, merging with existing files or RDDs using the same reconstruction scheme already discussed for the framework. This setup allows for datasets exceeding the capacity of a single node to be supported. As an example, XDB~\cite{li19} features an RDD-based distributed sampling structure that could be supported by this framework. \Paragraph{Concurrency.} The immutability of the majority of the structures within the index makes for a straightforward concurrency implementation. Concurrency control on the buffer is made trivial by the fact it is a simple, unsorted array. The rest of the structure is never updated (aside from possible delete tagging), and so concurrency becomes a simple matter of delaying the freeing of memory used by internal structures until all the threads accessing them have exited, rather than immediately on merge completion. A very basic concurrency implementation can be achieved by using the tombstone delete policy, and a reference counting scheme to control the deletion of the shards following reconstructions. Multiple insert buffers can be used to improve insertion throughput, as this will allow inserts to proceed in parallel with merges, ultimately allowing concurrency to scale up to the point of being bottlenecked by memory bandwidth and available storage. This proof-of-concept implementation is based on a simplified version of an approach proposed by Golan-Gueta et al. for concurrent log-structured data stores \cite{golan-gueta15}.