Published February 23, 2026 | Version v1
Dataset Open

End-to-end event reconstruction for precision physics at future colliders dataset ML format

  • 1. ROR icon European Organization for Nuclear Research

Description

Dataset used for "End-to-end event reconstruction for precision physics at future colliders"

Derived from HitPF_datageneration, prepared in a machine-learning friendly parquet format, ready to be used with HitPF

Contents
* Z--> (qq) q=uds, dataset used for training candidate clustering, we provide 200k events as a sample, the rest are hosted at CERN's EOS (paths in dataset.txt)
* Particle gun dataset used for training property regression, we provide 10k events as a sample, the rest are hosted at CERN's EOS (paths in dataset.txt)
* Evaluation dataset used for the results section, we provide the full 100k sample used in the paper.

Each .tar file contains the dataset in parquet format with 100 events.

Each dataset consists of events that can be iterated over using the pytorch dataloader provided in XX. Each event has the following information available:

  • X_track: the input features of tracks in the event
  • X_hit: the input features of hits in the event
  • X_gen: the target set of particles
  • y_gen_track: target label for track hits
  • y_gen_hit: target label for calo hits

The validation dataset additionally also has this information to compare to the baseline approach:

  • X_pandora: pandora set of reconstructed particles
  • pfo_calohit: label of each calo hit in the pandora set
  • pfo_track: label of each track in the pandora set 

The features availanle in each are described in HitPF_datageneration

The dataset is split in chunks of 2000 files with 100 events per file.

Files

Files (31.5 GB)

Name Size Download all
md5:60f0f937a7b42235bb06aa11011b5f9b
895.1 MB Download
md5:53af42c8612d16f8d007eae7bf359043
10.5 GB Download
md5:b721c8628ba2c5ef345d3f33d90dff96
20.1 GB Download