Published February 12, 2024 | Version 0.1.0
Dataset Open

ttH(bb) dataset in the semi-leptonic decay channel

  • 1. ETH Zurich (CH)

Description

Higgs boson dataset in the \(t\bar{t}H(b\bar{b}) \) semil-leptonic channel, used for studies of deep learning and quantum machine learning classification studies [1, 2].

The simulation of the \(t\bar{t}H(b\bar{b}) \) semi-leptonic channel produces a data set that consists of the following features:

  1. Jet features: \((p_\mathrm{T}, \eta, \phi, E, \mathrm{b-tag}, p_\mathrm{x}, p_\mathrm{y}, p_\mathrm{z})\)
  2. Leptonic features: \((p_\mathrm{T}, \eta, \phi, E, p_\mathrm{x}, p_\mathrm{y}, p_\mathrm{z})\)
  3.  Missing energy features: \((\phi, p_\mathrm{T}, p_\mathrm{x}, p_\mathrm{y})\)

Before processing the data with (quantum) machine learning algorithms, the features are filtered using the following physically motivated criteria to constrain the problem in a suitable phase space. These criteria take into account the geometric acceptance of the detector and the goal of background suppression.  

The following preprocessing steps are applied in the related studies using this dataset:

  • For electrons: \(p_\mathrm{T} > 30 \) GeV and \(|\eta|<2.1\)
  • For muons: \(p_\mathrm{T} > 26\) GeV and \(|\eta|<2.1\)
  • For jets: \(p_\mathrm{T} > 30\) GeV and \(|\eta|<2.4\)
  • Isolation of the leptons with respect to jets is higher than the benchmark value of 0.1.
  • Require at least 4 jets per event, at least 2 b-tagged jets, and exactly one lepton. 
  • The first seven most energetic jets are kept per collision event, allowing for one extra jet beyond the leading order expectation of 6 jets, to account for final state radiation.

These criteria constrain the problem in a suitable phase space, taking into account the geometric acceptance of the CMS detector and the goal of background suppression. For more details, please see the corresponding papers.

[1] V. Belis et al., Higgs analysis with quantum classifiers, EPJ Web Conf. 251, 03070 (2021)arXiv: 2104.07692.

[2] V. Belis et al., Guided Quantum Compression for Higgs identification, arXiv: 2402.09524.

Notes

The data is simulated using Powheg v2 for hard scattering computations, Pythia 8 for parton shower simulations, and Delphes v3.4.1 for the detector response simulation. Delphes is configured with the CMS detector Run II settings throughout the study.

 

 

Files

Files (4.3 GB)

Name Size Download all
md5:bed381f521a73f852eecb7c36247550e
4.3 GB Download