Dataset Open Access

Top Quark Tagging Reference Dataset

Kasieczka, Gregor; Plehn, Tilman; Thompson, Jennifer; Russel, Michael

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Kasieczka, Gregor</dc:creator>
  <dc:creator>Plehn, Tilman</dc:creator>
  <dc:creator>Thompson, Jennifer</dc:creator>
  <dc:creator>Russel, Michael</dc:creator>
  <dc:description>A set of MC simulated training/testing events for the evaluation of top quark tagging architectures.

In total 1.2M training events, 400k validation events and 400k test events. Use “train” for training, “val” for validation during the training and “test” for final testing and reporting results.


	14 TeV, hadronic tops for signal, qcd diets background, Delphes ATLAS detector card with Pythia8
	No MPI/pile-up included
	Clustering of  particle-flow entries (produced by Delphes E-flow) into anti-kT 0.8 jets in the pT range [550,650] GeV
	All top jets are matched to a parton-level top within ∆R = 0.8, and to all top decay partons within 0.8
	Jets are required to have |eta| &lt; 2
	The leading 200 jet constituent four-momenta are stored, with zero-padding for jets with fewer than 200
	Constituents are sorted by pT, with the highest pT one first
	The truth top four-momentum is stored as truth_px etc.
	A flag (1 for top, 0 for QCD) is kept for each jet. It is called is_signal_new
	The variable "ttv" (= test/train/validation) is kept for each jet. It indicates to which dataset the jet belongs. It is redundant as the different sets are already distributed as different files.
  <dc:title>Top Quark Tagging Reference Dataset</dc:title>
All versions This version
Views 2,6522,652
Downloads 3,1453,145
Data volume 1.8 TB1.8 TB
Unique views 2,3652,365
Unique downloads 1,5261,526


Cite as