TeraHeap: Reducing Memory Pressure in Managed Big Data Frameworks
Creators
- 1. FORTH-ICS & University of Crete
- 2. Australian National University, Australia
- 3. FORTH-ICS
- 4. Isovalent Inc.
- 5. Red Hat, UK
Description
Big data frameworks, such as Spark and Giraph, suffer from high memory pressure because they allocate massive volumes of long-lived objects on the managed heap. Thus, frameworks temporarily move long-lived objects outside the managed heap (off-heap) on a fast storage device. Unfortunately, this practice results in: (1) high serialization/deserialization (S/D) cost, and (2) high garbage collection (GC) cost when many off-heap objects are moved back to the managed heap for processing.
In this paper, we propose HugeHeap, which extends the managed runtime (JVM) to use a second, high-capacity heap over a fast storage device that coexists with the regular heap. HugeHeap provides direct access to objects on the second heap (no S/D). It also reduces GC cost by fencing the garbage collector from scanning the second heap. HugeHeap leverages frameworks’ property of choosing specific objects for off-heap placement and offers frameworks a hint-based interface for moving such objects to the second heap. We implement HugeHeap in OpenJDK and evaluate it with 15 widely used applications in two real-world big data frameworks, Spark and Giraph. Our evaluation shows that HugeHeap improves performance up to 83% compared to native Spark and Giraph, and it also consumes up to 87% less DRAM capacity. Finally, it outperforms Panthera, a garbage collector specialized for hybrid memories, by up to 69%.
Files
asplos23_ae.zip
Files
(67.9 kB)
Name | Size | Download all |
---|---|---|
md5:9584e4692d881ba7cc759fe58af7c8c9
|
67.9 kB | Preview Download |