Published August 6, 2024 | Version v1
Presentation Open

Enlarging Effective DRAM Capacity through Hermes

  • 1. Gnosis Research Center

Description

Traditionally, memory and I/O substrates have been considered separate entities due to their differences in terms of performance and persistence. However, modern data-intensive memory-centric workloads widespread in HPC are challenging these distinctions. Data analytics, machine learning, and deep learning codes perform large-scale computations on data which greatly exceed the bounds of memory, relying on explicit data movements to I/O systems to meet basic capacity requirements. This often leads to significantly increased development complexity and suboptimal, one-off solutions where I/O and compute happen in distinct, synchronous phases, incurring the memory wall problem in the compute phase and the notorious I/O bottleneck during the I/O phase. Conversely, scientific simulation codes are becoming increasingly memory-intensive and are developed assuming large memory capacities are provided to avoid out-of-core development complexity. To reduce complexity and I/O costs, HPC and Cloud sites have been increasing DRAM capacities. However, while many applications desire an effectively infinite memory to generate and analyze massive datasets, the ever-increasing size of data and the extreme financial and energy costs of DRAM make scaling DRAM capacity unsustainable.

In this work, we expose the Hermes I/O buffering system as a software distributed shared memory (DSM) that enlarges effective memory capacity through intelligent tiered DRAM and storage management. This DSM provides workload-aware data organization, eviction, and prefetching policies to reduce DRAM consumption while ensuring speedy access to critical data. Evaluations show that various workloads can be executed with a fraction of the DRAM while offering competitive performance.

Files

MegaMmap.pdf

Files (934.8 kB)

Name Size Download all
md5:dad1586e98cd90562833128ed95894a1
934.8 kB Preview Download

Additional details

Related works

Is supplemented by
Video/Audio: https://youtu.be/e_kA8sM-YQI (URL)