ALPHA-g Preprocessed Simulation Dataset for Machine Learning
Creators
- Ferreira, Ashley (Researcher)1, 2
- Singh, Mahip (Researcher)1, 2
- Capra, Andrea (Researcher)1
- Carli, Ina (Researcher)1
- Duque Quiceno, Daniel (Researcher)1, 3
- Fedorko, Wojciech (Researcher)1
- Fujiwara, Makoto (Researcher)1
- Li, Muyan (Researcher)1, 2
- Martin, Lars (Researcher)1
- Saito, Yukiya (Researcher)1, 4
- Smith, Gareth (Researcher)1, 3
- Xu, Anqi (Researcher)1, 3
Description
Dataset generated for "Vertex Reconstruction with Deep Learning for ALPHA-g Radial Time Projection Chamber" 2024 preprint and "AI Meets Antimatter: Unveiling Antihydrogen Annihilations" paper for Machine Learning and the Physical Sciences workshop at NeurIPS'24 by Ferreira et al.
The goal of this project was to create a PointNet-like deep learning model that could learn the relationship between simulated ALPHA-g detector data and the position of an antimatter annihilation on the walls of the ALPHA-g vacuum chamber. If this approach to antimatter annihilation event reconstruction works with real-world data then it can then be used as part of a larger analysis pipeline to better constrain the effect of gravity on antimatter.
See the above papers for a much more detailed description of the experiment and data used but generally, the four important terms to know for interacting with this dataset are:
Term | Meaning |
Event | An antimatter annihilation. Each event contains a number of spacepoints and one vertex. |
Spacepoints | The many 3D points (x,y,z) that represent the ALPHA-g detector data, if more than 800 this dataset cuts them off at 800, prioritizing the ones with higher amplitudes, and if less than 800 it pads them with zeros such that all events have 800 entries. |
Vertex | 3D position of the antimatter annihilation. For this initial study, we just use the z coordinate since this is what is most important to measuring the effect of gravity. |
Helix Fit | The state-of-the-art method currently used for antimatter annihilation event reconstruction within the ALPHA-g detector. The method doesn't use machine learning and instead involves fitting 3D helix functions to the spacepoints. |
Data is split into three separate HDF5 datasets: training (2,160,020 events), validation (261,654 events), and testing (288,919 events). Each dataset contains the following columns with each row representing one event:
Column Index | Value [mm] |
0 | x coordinates of spacepoints |
1 | y coordinates of spacepoints |
2 | z coordinates of spacepoints |
3 |
z coordinate of simulated vertex |
4 |
z coordinate of Helix Fit vertex prediction |
The code used to generate the raw dataset is available at bitbucket.org/expalpha/alphasoft
and the code used to process it into this preprocessed dataset as well as the downstream training and evaluation are available at gitlab.triumf.ca/alpha-ai/rTPC-AI
.
Files
Files
(43.1 GB)
Name | Size | Download all |
---|---|---|
md5:35751147dfded52757645059af20978a
|
34.6 GB | Download |
md5:7df58aa4aec8de589fa12747047d1f3d
|
4.2 GB | Download |
md5:44a8a1857057596795eae07d03cbe6a0
|
4.3 GB | Download |