Published August 11, 2022 | Version 1.0.0
Dataset Open

Lowest Common Ancestor Generations (LCAG) Phasespace Particle Decay Reconstruction Dataset

  • 1. Helmholtz AI, Germany; Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology (KIT), Germany
  • 2. Physikalisches Institut der Rheinischen Friedrich-Wilhelms-Universität Bonn, Germany
  • 3. Institute for Experimental Particle Physics (ETP), Karlsruhe Institute of Technology (KIT), Germany
  • 4. Université de Strasbourg, CNRS, IPHC, UMR 7178, 67037 Strasbourg, France
  • 5. Aix Marseille Université, CNRS, IN2P3, CPPM, 13288 Marseille, France
  • 6. Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology (KIT), Germany

Description

This record contains the corresponding dataset for the paper Learning Tree Structures from Leaves For Particle Decay Reconstruction. The dataset contains the resulting simulated particle physics decays, with information about the detected particle (leaves) to be used as input, and Lowest Common Ancestor Generations (LCAGs) to be used as training targets. The code used for the paper's experiments, which contains the PyTorch dataset/dataloader, can be found at: github.com/Helmholtz-AI-Energy/BaumBauen.

The dataset contains simulated synthetic particle decays, simulated using the PhaseSpace library.
All simulated decay topologies have a common root particle of mass 100 (arbitrary units). Intermediate particles are selected at random with replacement from the following masses: [90, 80, 70, 50, 25, 20, 10].
Final state particles, which make up the leaf nodes of generated topologies, are drawn with replacement from the following masses: [1, 2, 3, 5, 12]. For each intermediate particle (including the root), we limit the minimum number of children to two, and the maximum five.

Tree topology creation to generate the dataset was as follows:
starting from the root particle a set of children are selected from the available intermediate and final state particles such that the sum of their masses totals less than the root, this process is then repeated for each child particle which is not a final state particle and so on until only final state particles remain.

This dataset consists of 200 topologies (unique decay processes) in total, with 16,000 samples per topology. In the paper's experiments, 2000 topologies for each of training, validation, and testing were used. Leaf node features are not normalized. We have not enforced any ordering of the nodes and leave them unsorted as created in the dataset.

When unpacked, the dataset archive will have the following structure, with the labelling pattern [data]_[subset].[topology].npy

└── phasespace_dataset/
    ├── lcas_train.000.npy
    ├── leaves_train.000.npy
    ├── ...
    ├── lcas_train.199.npy
    ├── leaves_train.199.npy
    ├── lcas_val.000.npy
    ├── leaves_val.000.npy
    ├── ...
    ├── lcas_val.199.npy
    ├── leaves_val.199.npy
    ├── lcas_test.000.npy
    ├── leaves_test.000.npy
    ├── ...
    ├── lcas_test.199.npy
    └── leaves_test.199.npy

 

Files

Files (1.9 GB)

Name Size Download all
md5:f730e7ced1bd7015dc6adc7fc742dab1
1.9 GB Download

Additional details

Related works

Is supplement to
Journal article: 10.1088/2632-2153/ac8de0 (DOI)