Published August 27, 2024 | Version v1
Dataset Open

Semi-leptonic ttbar full-event unfolding R&D dataset

  • 1. University of California, Irvine

Description

This dataset was generated for the purpose of developing unfolding methods that leverage generative machine learning models. It consists of two pieces: one piece contains events with the Standard Model (SM) production of a top-quark pair in the semi-leptonic decay mode, and the other contains events with top-quark pair production modified by a non-zero EFT operator. The SM dataset contains 15,015,000 events, and the EFT dataset contains 30,000,000. Both datasets store the following event configurations:

  • Parton level: configurations of all partons that result from the matrix element calculation done using MadGraph
  • Particle level: configurations of all “truth” jets and leptons that result from the parton shower and hadronization modelled using Pythia
  • Detector level: configurations of all reconstruction level jets and leptons, measured by a detector simulated with Delphes and the default CMS detector card.

Each of these configurations is stored in a dedicated group as described below. Throughout, the units of energy and transverse momentum are GeV. For more details on the generation of this dataset, see Ref. [1].

Parton level data:

  • No phase space requirements are placed on the events at parton level. 
  • The kinematics of the top, anti-top, W+, W-, and all decay products are contained in groups entitled top, antitop, Wp, and Wm respectively. Each of these contains the kinematics of the parton itself in a group called “particle”, as well as the kinematics of two daughter particles, in groups called “d1” and “d2”. In the case of the tops, these daughters are the W’s and b quarks. In the case of the W’s, these are two light quarks, or a lepton and a neutrino. The “pid” vector contains the PDGID for a given particle, used to identify its type. 
  • One detail is that the W’s “particle” description is not always the same as the description of the same W stored as the daughter of the tops. This results from when the W radiates some parton before decaying.  

Particle level data:

  • At particle level all leptons and jets are required to have $p_T > 25$ GeV and absolute pseudo rapidity $|\eta| < 2.5$. 
  • Events at particle level are required to have at least one electron or muon and at least 4 jets, of which at least two are b-tagged. Event which pass or fail this criteria are marked by the vector contained in the group “mask”.
  • Electrons and muons are stored in separate groups. Each group contains a vector “mask” which is true only if there is a true particle-level electron or muon in the event, and false if this entry is zero padding. 
  • Jets are clustered from stable particle level objects using the anti-kt algorithm with a radius parameter of 0.5. Jet information is stored in the group “jets”, and true jets in the event are again denoted by a true value in the vector “mask”, and zero-padding is marked by a false value. Jets additionally contain a vector “btag” which is 1 if the jet is b-tagged with the default Delphes prescription, and 0 if not.
  • Information on the missing transverse momentum (MET) is contained in the group “met”. The “met” vector gives the magnitude, and the “phi” vector gives the direction in phi of the missing transverse momentum.
  • In addition to the information on the jets, leptons, and MET, the particle level data also contain the configurations for the hadronic top, leptonic top, and ttbar system. These configurations are determined assuming the pseudo-top jet parton assignment algorithm, which is a common method used by LHC experiments when analyzing semileptonic ttbar events.

Detector level data:

  • Requirements for leptons and jets are the same as for the particle level data.
  • The event selection is the same as the particle level data. Events which pass the selection are again denoted by a true value in the vector “mask”.
  • The data for the leptons, jets, and MET are stored analogously to particle level
  • The configurations of the top quarks and ttbar system are not pre-computed at detector level, since ideally a generative unfolding method would not assume a given jet-carton assignment algorithm when it is being trained. However if the user wishes to pursue such an application, the relevant configurations can be obtained by running the pseudo-top algorithm [2].

Citations:

[1] - https://arxiv.org/abs/2404.14332

[2] - https://twiki.cern.ch/twiki/bin/view/LHCPhysics/ParticleLevelTopDefinitions

Files

Files (24.0 GB)

Name Size Download all
md5:366381669fff3ea744bee70c988c2631
15.9 GB Download
md5:8cb222ef4d86df3d3a0534fffed79727
8.1 GB Download

Additional details

Related works

Documents
Preprint: arXiv:2404.14332 (arXiv)