Top Quark Tagging Reference Dataset

Kasieczka, Gregor; Plehn, Tilman; Thompson, Jennifer; Russel, Michael

  "DOI": "10.5281/zenodo.2603256" 
  "title": "Top Quark Tagging Reference Dataset" 
  "abstract": "<p>A set of MC simulated training/testing events for the evaluation of top quark tagging architectures.</p>\n\n<p>In total 1.2M training events, 400k validation events and 400k test events. Use &ldquo;train&rdquo; for training, &ldquo;val&rdquo; for validation during the training and &ldquo;test&rdquo; for final testing and reporting results.</p>\n\n<p><strong>Description</strong></p>\n\n<ul>\n\t<li>\n\t<p>14 TeV, hadronic tops for signal, qcd diets background, Delphes ATLAS detector card with Pythia8</p>\n\t</li>\n\t<li>\n\t<p>No MPI/pile-up included</p>\n\t</li>\n\t<li>\n\t<p>Clustering of&nbsp; particle-flow entries (produced by Delphes E-flow) into anti-kT 0.8 jets in the pT range [550,650] GeV</p>\n\t</li>\n\t<li>\n\t<p>All top jets are matched to a parton-level top within \u2206R = 0.8, and to all top decay partons within 0.8</p>\n\t</li>\n\t<li>\n\t<p>Jets are required to have |eta| &lt; 2</p>\n\t</li>\n\t<li>\n\t<p>The leading 200 jet constituent four-momenta are stored, with zero-padding for jets with fewer than 200</p>\n\t</li>\n\t<li>\n\t<p>Constituents are sorted by pT, with the highest pT one first</p>\n\t</li>\n\t<li>\n\t<p>The truth top four-momentum is stored as truth_px etc.</p>\n\t</li>\n\t<li>\n\t<p>A flag (1 for top, 0 for QCD) is kept for each jet. It is called is_signal_new</p>\n\t</li>\n\t<li>\n\t<p>The variable &quot;ttv&quot; (= test/train/validation) is kept for each jet. It indicates to which dataset the jet belongs. It is redundant as the different sets are already distributed as different files.</p>\n\t</li>\n</ul>", 
  "version": "v0 (2018_03_27)" 
