Published January 31, 2023 | Version v1
Software Open

TeraHeap: Reducing Memory Pressure in Managed Big Data Frameworks

  • 1. FORTH-ICS & University of Crete
  • 2. Australian National University, Australia
  • 3. FORTH-ICS
  • 4. Isovalent Inc.
  • 5. Red Hat, UK


Big data frameworks, such as Spark and Giraph, suffer from high memory pressure because they allocate massive volumes of long-lived objects on the managed heap. Thus, frameworks temporarily move long-lived objects outside the managed heap (off-heap) on a fast storage device. Unfortunately, this practice results in: (1) high serialization/deserialization (S/D) cost, and (2) high garbage collection (GC) cost when many off-heap objects are moved back to the managed heap for processing.

In this paper, we propose HugeHeap, which extends the managed runtime (JVM) to use a second, high-capacity heap over a fast storage device that coexists with the regular heap. HugeHeap provides direct access to objects on the second heap (no S/D). It also reduces GC cost by fencing the garbage collector from scanning the second heap. HugeHeap leverages frameworks’ property of choosing specific objects for off-heap placement and offers frameworks a hint-based interface for moving such objects to the second heap. We implement HugeHeap in OpenJDK and evaluate it with 15 widely used applications in two real-world big data frameworks, Spark and Giraph. Our evaluation shows that HugeHeap improves performance up to 83% compared to native Spark and Giraph, and it also consumes up to 87% less DRAM capacity. Finally, it outperforms Panthera, a garbage collector specialized for hybrid memories, by up to 69%.


Files (67.9 kB)

Name Size Download all
67.9 kB Preview Download

Additional details


European Commission