End-to-end event reconstruction for precision physics at future colliders dataset ML format
Authors/Creators
Description
Dataset used for "End-to-end event reconstruction for precision physics at future colliders"
Derived from HitPF_datageneration, prepared in a machine-learning friendly parquet format, ready to be used with HitPF
Contents
* Z--> (qq) q=uds, dataset used for training candidate clustering, we provide 200k events as a sample, the rest are hosted at CERN's EOS (paths in dataset.txt)
* Particle gun dataset used for training property regression, we provide 10k events as a sample, the rest are hosted at CERN's EOS (paths in dataset.txt)
* Evaluation dataset used for the results section, we provide the full 100k sample used in the paper.
Each .tar file contains the dataset in parquet format with 100 events.
Each dataset consists of events that can be iterated over using the pytorch dataloader provided in XX. Each event has the following information available:
- X_track: the input features of tracks in the event
- X_hit: the input features of hits in the event
- X_gen: the target set of particles
- y_gen_track: target label for track hits
- y_gen_hit: target label for calo hits
The validation dataset additionally also has this information to compare to the baseline approach:
- X_pandora: pandora set of reconstructed particles
- pfo_calohit: label of each calo hit in the pandora set
- pfo_track: label of each track in the pandora set
The features availanle in each are described in HitPF_datageneration
The dataset is split in chunks of 2000 files with 100 events per file.
Files
Files
(31.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:60f0f937a7b42235bb06aa11011b5f9b
|
895.1 MB | Download |
|
md5:53af42c8612d16f8d007eae7bf359043
|
10.5 GB | Download |
|
md5:b721c8628ba2c5ef345d3f33d90dff96
|
20.1 GB | Download |