Disaggregated Database Management Systems: Architecture, Case Studies, and Future

Database Systems

This article examines disaggregated database systems, driven by cloud trends. It features Google AlloyDB and Rockset, detailing compute-storage separation, HTAP, and real-time analytics. Future directions in memory and hardware disaggregation are also discussed.

This article is based on a panel discussion from the TPC Technology Conference 2022, which surveyed how cloud hardware and software trends are reshaping database system architecture around the idea of disaggregation.

A significant focus of the discussion, particularly in the context of "Disaggregated Database Management Systems," involved examining three key software case studies: Google AlloyDB, Rockset, and Nova-LSM. These examples illustrate the evolving landscape of disaggregated architectures. Other notable systems include Aurora, Socrates, Taurus, and TaurusMM. Amazon DSQL and the PolarDB series of papers also offer insights into this domain, tracing a progression from active log-replay storage towards simpler, compute-driven designs.

AlloyDB

AlloyDB extends PostgreSQL with compute–storage disaggregation and Hybrid Transactional/Analytical Processing (HTAP) support. Its layered design features a primary node (RW node) for writes, a set of read pool replicas (RO nodes) for scalable reads, and a shared distributed storage engine that persists data in Google's Colossus file system. This architecture enables elastic scaling of read pools without data movement, as data resides in the disaggregated storage.

AlloyDB's hybrid nature facilitates combining transactional and analytical processing by maintaining both a row cache and a pluggable columnar engine. The columnar engine vectorizes execution and automatically converts frequently accessed (hot) data into columnar format, optimizing analytical queries.

Beneath the surface, the database storage engine materializes pages from logs and stores blocks on Colossus. Logs are written to regional log storage, while Log Processing Servers (LPS) continuously replay and materialize pages in the zones where compute nodes operate. This design decouples durability and availability: logs are durable in regional storage, and LPS workers ensure blocks are consistently available near the compute. This exemplifies how disaggregation enhances elasticity and performance, allowing independent compute scaling and benefiting HTAP workloads through a unified, multi-format cache hierarchy.

Rockset

Rockset stands out as a prime example of disaggregation in real-time analytics. Its architecture follows the Aggregator–Leaf–Tailer (ALT) pattern, which separates compute for writes (Tailers), compute for reads (Aggregators and Leaves), and storage. Tailers ingest new data from sources like Kafka or S3. Leaves then index this data into various index types (columnar, inverted, geo, document). Aggregators execute SQL queries over these indexes, scaling horizontally to manage high-concurrency, low-latency workloads.

A key advantage of this architecture is the strict isolation between writes and reads, ensuring that ingest bursts do not compromise query latencies. Disaggregation enables independent scaling of each tier: more Tailers for ingest spikes, more Aggregators for query surges, and more Leaves as data volume expands.

Rockset also demonstrates why LSM-style storage engines and append-only logs are well-suited for disaggregation. RocksDB-Cloud, for instance, never mutates SST files post-creation. All SSTs are immutable and stored in cloud object stores like S3, allowing safe sharing across servers. Compaction jobs can be delegated to stateless compute nodes, which fetch, merge, write new SSTs to S3, and return control, fully decoupling storage and compaction compute.

Memory Disaggregation

The panel discussion also highlighted memory disaggregation as an emerging frontier. Modern datacenters often waste over half their DRAM capacity due to static provisioning. RDMA-based systems, such as Redy, have shown that remote memory can be elastically leveraged to extend caches. The paper looks forward to CXL as the next evolutionary step, as its coherent memory fabric promises to make remote memory behave much like local memory, offering fine-grained sharing and coherence.

Hardware Disaggregation

From a hardware perspective, the paper surveys how storage, GPUs, and memory are being separated from traditional servers and accessed via high-speed fabrics. A compelling case study in this area is Fungible's DPU-based approach. The DPU (Data Processing Unit) offloads data-centric tasks—such as networking, storage, and security—from CPUs, allowing server cores to exclusively focus on application logic. In essence, the DPU embodies the principle of hardware disaggregation.

Future Directions

While disaggregated databases are already prevalent, numerous open questions remain:

How can we automatically assemble microservice DBMSs on demand, selecting the optimal compute, memory, and storage tiers for diverse workloads?
How do we co-design software and hardware across fabrics like CXL to minimize data movement while preserving performance isolation?
What methods can verify the correctness of such dynamic compositions?
Can a DBMS intelligently reconfigure itself (e.g., rebalancing compute and storage) to maintain optimal performance under shifting workload patterns?
How should we address fault-tolerance and availability issues, and develop new distributed systems protocols that leverage the opportunities presented by the disaggregated model?

As Swami stated at the Sigmod 2023 Panel: "The customer value is here, and the technical problems will be solved in time. Thanks to the complexities of disaggregation problems, every database/systems assistant professor is going to get tenure figuring how to solve them."