FAIR Universe - HiggsML Uncertainty Challenge Public Dataset
Creators
-
Bhimji, Wahid1
-
Calafiura, Paolo1
-
Chakkappai, Ragansu2, 3
-
Chang, Po-Wen4
-
Chou, Yuan-Tang5
-
Diefenbacher, Sascha1
-
Dudley, Jordan1, 6
-
Farrell, Steven1
-
Ghosh, Aishik7, 1
-
Guyon, Isabelle8
-
Harris, Christopher1
-
Hsu, Shih-Chieh5, 9
-
Elham E Khoda10
-
Nachman, Benjamin1
-
Nugent, Peter1
-
Rousseau, David11
-
Thorne, Ben12
-
Ullah, Ihsan8
-
Zhang, Yulei5
-
1.
Lawrence Berkeley National Laboratory
-
2.
Centre National de la Recherche Scientifique
-
3.
Université Paris-Saclay
-
4.
Ohio State University
-
5.
University of Washington
-
6.
University of California, Berkeley
-
7.
University of California System
-
8.
ChaLearn
-
9.
National Tsing Hua University
-
10.
University of California, San Diego
- 11. CNRS Délégation Ile-de-France Sud
-
12.
University of California, Davis
Description
HiggsML Uncertainty Challenge Public Dataset
This dataset has been created for the HiggsML Uncertainty Challenge, a NeurIPS 2024 competition. A detailed documentation is available in the challenge white paper
The tabular dataset is created using the particle physics simulation tools Pythia 8.2 and Delphes 3.5.0. The proton-proton collision events are generated with a center of mass energy of 13 TeV using Pythia8. Subsequently, these events undergo the Delphes tool to produce simulated detector measurements. We used an ATLAS-like detector description to make the dataset closer to experimental data. The events are divided into two groups:
- Higgs boson signal (H→ττ)
- ZZ boson background (Z→ττ)
- Diboson background (VV→ττ)
- ttbar background (ttˉ)
| Process | Number Generated | LHC Events | Label |
| Higgs | 52 040 227 | 1015 | signal |
| Z Boson | 160 383 358 | 1 002 395 | background |
| Di-Boson | 605 118 | 3 783 | background |
| ttbar | 7 070 398 | 44 192 | background |
⚠️ Note: The "LHC events" is the average number in this category in a pseudo-experiment corresponding to running of the Large Hadron Collider for 10 fb−1, corresponding to approximately 800 billion inelastic proton collisions, or 2 weeks in summer 2024 conditions
Higgs Signal:
The Higgs bosons are produced with all possible production modes and decay into two tau leptons. The tau leptons are further allowed to decay into all possible final states, but only final states with one lepton (electron or muon) and one hadron tau decay are kept.
Z boson Background:
Only background events coming from Z bosons are included in this challenge. While simulating the process, interference effects between Z bosons and photons are included. Similar to signal events, only the tau-tau decay mode of the Z boson is included in the dataset.
⚠️ Note:
The training events have weights.
Event Weights:
Event weights are defined as:
w=Cross-Section × LuminosityTotal number of generated eventsw=Total number of generated eventsCross-Section × Luminosity
The challenge is considering a scenario of analyzing proton-proton collision data of 10 fb−1 luminosity collected by the ATLAS experiment.
Features in the data
Prefix-less variables
Weight, Label, DetailedLabel, have a special role and should NOT be used as regular features for the model:
| Variable | Description |
|---|---|
| Weight | The event weight wi |
| Label | The event label yi ∈ 1,0 (1 for signal, 0 for background). |
| Detailed Label | The event detailed label ∈ htautau, ztautau, diboson, ttbar |
Primary Features
The variables prefixed with PRI (for PRImitives) are “raw” quantities about the bunch collision as measured by the detector, essentially parameters of the momenta of particles.
| Variable | Description |
|---|---|
| PRI_had_pt | The transverse momentum px2+py2 of the hadronic tau. |
| PRI_had_eta | The pseudorapidity η of the hadronic tau. |
| PRI_had_phi | The azimuth angle ϕ of the hadronic tau. |
| PRI_lep_pt | The transverse momentum px2+py2 of the lepton (electron or muon). |
| PRI_lep_eta | The pseudorapidity η of the lepton. |
| PRI_lep_phi | The azimuth angle ϕ of the lepton. |
| PRI_met | The missing transverse energy ETmiss. |
| PRI_met_phi | The azimuth angle ϕ of the missing transverse energy. |
| PRI_jet_num | The number of jets (integer with a value of 0, 1, 2 or 3; possible larger values have been capped at 3). |
| PRI_jet_leading_pt | The transverse momentum px2 + py2 of the leading jet, that is the jet with the largest transverse momentum (undefined if PRI_jet_num = 0). |
| PRI_jet_leading_eta | The pseudorapidity η of the leading jet (undefined if PRI_jet_num = 0). |
| PRI_jet_leading_phi | The azimuth angle ϕ of the leading jet (undefined if PRI_jet_num = 0). |
| PRI_jet_subleading_pt | The transverse momentum px2+py2 of the leading jet, that is, the jet with the second largest transverse momentum (undefined if PRI_jet_num ≤ 1). |
| PRI_jet_subleading_eta | The pseudorapidity η of the subleading jet (undefined if PRI_jet_num ≤ 1). |
| PRI_jet_subleading_phi | The azimuth angle ϕ of the subleading jet (undefined if PRI_jet_num ≤ 1). |
| PRI_jet_all_pt | The scalar sum of the transverse momentum of all the jets of the events. |
Derived Features
These variables are derived from the primary variables with the help of derived_quantities.py.
| Variable | Description |
|---|---|
| DER_mass_transverse_met_lep | The transverse mass between the missing transverse energy and the lepton. |
| DER_mass_vis | The invariant mass of the hadronic tau and the lepton. |
| DER_pt_h | The modulus of the vector sum of the transverse momentum of the hadronic tau, the lepton and the missing transverse energy vector. |
| DER_deltaeta_jet_jet | The absolute value of the pseudorapidity separation between the two jets (undefined if PRI_jet_num ≤ 1). |
| DER_mass_jet_jet | The invariant mass of the two jets (undefined if PRI_jet_num ≤ 1). |
| DER_prodeta_jet_jet | The product of the pseudorapidities of the two jets (undefined if PRI_jet_num ≤ 1). |
| DER_deltar_had_lep | The R separation between the hadronic tau and the lepton. |
| DER_pt_tot | The modulus of the vector sum of the missing transverse momenta and the transverse momenta of the hadronic tau, the lepton, the leading jet (if PRI_jet_num ≥ 1) and the subleading jet (if PRI_jet_num = 2) (but not of any additional jets). |
| DER_sum_pt | The sum of the moduli of the transverse momenta of the hadronic tau, the lepton, the leading jet (if PRI_jet_num ≥ 1) and the subleading jet (if PRI_jet_num = 2) and the other jets (if PRI_jet_num = 3). |
| DER_pt_ratio_lep_tau | The ratio of the transverse momenta of the lepton and the hadronic tau. |
| DER_met_phi_centrality | The centrality of the azimuthal angle of the missing transverse energy vector w.r.t. the hadronic tau and the lepton. |
| DER_lep_eta_centrality | The centrality of the pseudorapidity of the lepton w.r.t. the two jets (undefined if PRI_jet_num ≤ 1). |
Preselection Cuts
| Criteria | Pre-selected cut | Post selection cut |
|---|---|---|
| Number of τhad | 1 | |
| Number of τlep | 1 | |
| pTτhad | > 20GeV | > 26GeV |
| pTτleppTτlep | > 20GeV | > 20GeV |
| pTleadingjet | > 20GeV | > 26GeV |
| pTsubleadingjet | > 20GeV | > 26GeV |
| Charge | Opposite Charges |
⚠️ Note: The post-selection cuts are the cuts made after systematics is applied.
⚠️ Note: The Dataset might not be properly shuffled.
One could use dataset.py from the dataset repository (see below).
Utility Software
Alongside the dataset, a GitHub repository with the relevant code for reading and analysing it is made available. This includes a Jupyter notebook starting kit, simple baseline models, and code to run the challenge and generate the score. The repository also has a sample dataset, a subset of the main dataset, to let users experience the challenge software without downloading the much larger dataset.
The code for dataset generation is provided in a dedicated repository: https://github.com/FAIR-Universe/genHEPdata. This repository also contains a Dockerfile, which facilitates the installation of the necessary software dependencies.
Files
FAIR_Universe_HiggsML_data.zip
Files
(15.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:7fb2a4f2b73bb8dcdaa6ffc0fe67e96e
|
15.1 GB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/FAIR-Universe/FAIR_Universe_dataset
- Programming language
- Python
- Development Status
- Active