TeraHeap: Reducing Memory Pressure in Managed Big Data Frameworks

Kolokasis Iacovos; Evdorou Giannos; Akram Shoaib; Kozanitis Christos; Papagiannis Anastasios; Zakkak S. Foivos; Pratikakis Polyvios; Bilas Angelos

doi:10.5281/zenodo.7590151

Published January 31, 2023 | Version v1

Software Open

TeraHeap: Reducing Memory Pressure in Managed Big Data Frameworks

1. FORTH-ICS & University of Crete
2. Australian National University, Australia
3. FORTH-ICS
4. Isovalent Inc.
5. Red Hat, UK

Big data frameworks, such as Spark and Giraph, suffer from high memory pressure because they allocate massive volumes of long-lived objects on the managed heap. Thus, frameworks temporarily move long-lived objects outside the managed heap (off-heap) on a fast storage device. Unfortunately, this practice results in: (1) high serialization/deserialization (S/D) cost, and (2) high garbage collection (GC) cost when many off-heap objects are moved back to the managed heap for processing.

In this paper, we propose HugeHeap, which extends the managed runtime (JVM) to use a second, high-capacity heap over a fast storage device that coexists with the regular heap. HugeHeap provides direct access to objects on the second heap (no S/D). It also reduces GC cost by fencing the garbage collector from scanning the second heap. HugeHeap leverages frameworks’ property of choosing specific objects for off-heap placement and offers frameworks a hint-based interface for moving such objects to the second heap. We implement HugeHeap in OpenJDK and evaluate it with 15 widely used applications in two real-world big data frameworks, Spark and Giraph. Our evaluation shows that HugeHeap improves performance up to 83% compared to native Spark and Giraph, and it also consumes up to 87% less DRAM capacity. Finally, it outperforms Panthera, a garbage collector specialized for hybrid memories, by up to 69%.

Files

asplos23_ae.zip

Files (67.9 kB)

Name	Size	Download all
asplos23_ae.zip md5:9584e4692d881ba7cc759fe58af7c8c9	67.9 kB	Preview Download

Additional details

European Commission
EUPEX – EUROPEAN PILOT FOR EXASCALE 101033975

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	136	135
Downloads	37	37
Data volume	2.6 MB	2.6 MB

TeraHeap: Reducing Memory Pressure in Managed Big Data Frameworks

Creators

Description

Files

asplos23_ae.zip

Files (67.9 kB)

Additional details

Funding