Deep Autoencoders for ATLAS Data Compression - George Dialektakis - Google Summer of Code 2021 Project
Contributors
Supervisors:
- 1. Lund University
- 2. Ohio State University
- 3. University of Glasgow
Description
Storage is one of the main limiting factors to the recording of information from proton-proton collision events at the Large Hadron Collider (LHC), at CERN in Geneva. Hence, the ATLAS experiment at the LHC uses a so-called trigger system, which selects and transfers interesting events to the data storage system while filtering out the rest. However, if interesting events are buried in very large backgrounds and difficult to identify as a signal by the trigger system, they will also be discarded together with the background. To alleviate this problem, different compression algorithms are already in use to reduce the size of the data that is recorded. One of those state-of-the-art algorithms is an autoencoder network that tries to implement an approximation to the identity, f(x) = x, and given some input data, its goal is to create a lower-dimensional representation of those data in a latent space using an encoder network. This way when collisions happen on the ATLAS Collider, we run the encoder on the produced data and we save only the latent space representation. Then using this latent representation online the decoder network can reconstruct the original data.
The goal of this project is to experiment with different types of Autoencoders for data compression in-depth and optimize their performance in reconstructing the ATLAS event data. For this reason, three kinds of Autoencoders are proposed, and in specific, the Standard Autoencoder, the Variational Autoencoder, and the Sparse Autoencoder. The above Autoencoders and thoroughly tested using different parameters and data normalization techniques, as our ultimate goal is to obtain the best possible reconstructions of the original event data. The proposed implementations will be a decisive contribution towards future testing and analysis for the ATLAS experiment at CERN and will assist overcome the obstacle of needing much more storage space than in the past due to the increase in the size of the data generated by the continuous proton-proton collision events in CERN's Large Hadron Collider.