Branch-Aware MVCC Garbage Collection with Hierarchical Visibility and LSM Compaction Integration
Authors/Creators
Description
We present a garbage collection (GC) algorithm for multi-version concurrency control (MVCC) in database systems that support hierarchical copy-on-write branching. Traditional MVCC GC systems use single-dimensional retention policies (time-based or snapshot-based) and are unaware of branch hierarchies. When a parent branch garbage-collects old versions, child branches that depend on those versions through inheritance lose data access, causing query failures. Our algorithm computes a multidimensional GC horizon for each branch by taking the minimum log sequence number (LSN) across three dimensions: (1) the branch’s own retention period, (2) the creation LSNs of all descendant branches (protecting inherited data), and (3) active transaction snapshots across the branch lineage. We further integrate this branch-aware GC policy with LSM-tree compaction by filtering version records during SSTable merge operations. The algorithm achieves 100% correctness (zero data loss in multi-branch
scenarios) with 10–30% storage savings versus a conservative “never delete” approach. We describe the architecture, algorithms, and implementation in HeliosDB, with per-branch observability via 10 Prometheus metrics. No prior database system combines hierarchical branch visibility with MVCC garbage collection and LSM compaction integration.
Files
DP_cs_DB_MVCC001.pdf
Files
(415.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c173de5192696a6ca86cbc8b72b1b78c
|
415.6 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/dimensigon/HDB-HeliosDB-Nano/
- Programming language
- Rust
- Development Status
- Active