Published November 5, 2023 | Version v1
Report Open

Baler: Deep Autoencoders for Scientific Data Compression

  • 1. ROR icon University of Massachusetts Amherst
  • 1. Lund University
  • 2. ROR icon University of Manchester

Description

Today the experiments at CERN output roughly one petabyte of data per day. After the planned upgrades to its main experiments, in 2032 these will produce 2-5 times more data than the available storage resources. To tackle this challenge, an exploration of lossy compression methods is necessary. Aiming to address this issue we present Baler, an Autoencoder-based lossy compression tool currently under development at the universities of Lund, Manchester, and Warwick. This report investigates improvements of Baler, with a focus on 2D Convolutional SZ Autoencoders, and compares its performance to its existing Dense network and leading off-the-shelf SZ3[3] lossy compression tool. Finally, we conclude by providing substantial evidence indicating that Dense networks outperform Convolutional models in both offline and online compression scenarios, while concurrently preserving an similar compression ratio to that of SZ3.

Files

Deep_AutoEncoders_for_Scientific_Data_Compression.pdf

Files (1.3 MB)

Additional details

Related works

Is previous version of
Journal: 10.13140/RG.2.2.33707.66086 (DOI)
References
Journal: 10.48550/arXiv.2305.02283 (DOI)

Funding

Google (United States)
Google Summer of Code -

Software

Repository URL
https://github.com/baler-collaboration/baler
Programming language
Python