Published June 20, 2024 | Version v1
Dataset Open

Top quark pair production at the LHC

Description

R&D Datasets containing all-hadronic, semi-leptonic and di-leptonic top quark pair events at the LHC.

Used in the development of PIPPIN: Particles Into Particles with Permutation Invariant Network

 

The datasets contain a total of 40M ttbar events in the all-hadronic, semi-leptonic and di-leptonic decay channels, with jets matched to the truth partons in the top quark decays.

 

Event generation

  • Centre of Mass energy: 13 TeV
  • MC Generator: PYTHIA v8.307
  • Parton Shower and Hadronisation: PYTHIA v8.307
  • Detector response: Delphes v3.4.2 using ATLAS-like geometry
  • Jets reconstructed with anti-kt algorithm, R = 0.4, using FastJet

Event selection and truth matching

  • All events are required to have between 2 and 16 reconstructed jets
  • Jets are required to fall within |η| < 2.5 and to have a minimum pT > 25 GeV
  • Leptons are required to fall within |η|<2.5 and to have a minimum pT > 15 GeV
  • Partons are matched to jets using ΔR matching, with ΔR < 0.4
  • Events with partons matched to multiple jets or jets to multiple partons are discarded

The training dataset contains 37M events.
The validation dataset contains 0.8M events.
The testing dataset contains 2.4M events.

 

Dataset format

The dataset is in HDF5 format and the key 'delphes' contains the following numpy arrays:

Truth information (parton-level):

  • truth_leptons, truth_neutrinos, truth_quarks: Truth level information of the final state partons
    • keys: PDGID, pt, eta, phi, mass
  • truth_particles: Truth level information of the final state partons and the intermediate particles
    • keys: PDGID, pt, eta, phi, mass

Reconstructed information (detector-level):

  • leptons: The zero padded reconstructed leptons (0 to 2), ordered by decay channel
    • keys: pt, eta, phi, energy, charge, type
  • MET: The missing transverse energy
    • keys: MET, phi
  • jets: The zero padded reconstructed jets (2 to 16), ordered by pT
    • keys: pt, eta, phi, energy, is_tagged, is_tau

Miscellaneous information:

  • decay_channel: The decay channel of the event
    • 0b00 for all-hadronic, 0b01 for semi-leptonic (from Top), 0b10 for semi-leptonic (from Anti-Top), 0b11 for di-leptonic
  • matchability: Which partons are matched to a reconstructed object
    • Binary representation with bits corresponding to each parton (length 6) 0b111111
    • From left to right: b1, q1W1, q2W1, b2, q1W2, q2W2 (b1/W1 = from Top, b2/W2 = from Anti-Top)
    • 0b111000 means Top fully matched, 0b000111 means Anti-Top fully matched, 0b111111 means both Tops fully matched, etc.
  • jet_indices: Integer corresponding to the parton a jet is matched to
    • From 0 to 5: b1, q1W1, q2W1, b2, q1W2, q2W2 (b1/W1 = from Top, b2/W2 = from Anti-Top)
    • -1 indicates not matched to a parton
  • nleptonsnjetsnbjets: How many leptons, jets, b-jets in the event

Files

Files (34.7 GB)

Name Size Download all
md5:0a7e7ed943cd95df4a6ef64203ead095
2.1 GB Download
md5:9a6bcb339ac77657c0dd0743290c8ec3
31.9 GB Download
md5:3d59dfcc4fa5a13f6b01e5737f7a1377
693.8 MB Download

Additional details

Additional titles

Subtitle
Inclusive all-hadronic, semi-leptonic and di-leptonic channels

Funding

Swiss National Science Foundation
Robust Deep Density Models for High-Energy Particle Physics and Solar Flare Analysis (RODEM) CRSII5_193716
Swiss National Science Foundation
At the two upgrade frontiers: machine learning and the ITk Pixel detector 200020_212127