Published March 17, 2021 | Version 2.1
Dataset Open

Test Sets for Jet Anomaly Detection at the LHC

Creators

  • 1. Mila, University of Montreal

Description

A few datasets are updated in Version 2.1. These datasets are tagged with 'new' in the file names.

- pT~1.2TeV top jets, the mother Z' mass is slightly adjusted to better match pT peaks around 1.2TeV

- Especially there was a bug in the previous version for top jets with mass 80 GeV `*top_m80_100k*`  (the top jet mass was not correctly set). So please be careful when you use those datasets. 

Data Description

These datasets are generated as a series of test sets for anomalous jet tagging at the LHC. They include boosted W jets, Top jets, and Higgs jets. Jet transverse momentum is focused around 600 GeV and 1200 GeV (with prefix "pt1200_" in file names).  Each file includes 100k original events from MadGraph, but might have slightly less events in the final h5 files due to fatjet pre-selection. Production processes include:

  • pp -> W' -> W (jj) Z(νν); mW=59,80,120,174 GeV
  • pp -> Z' -> t t~; mt=80,174 GeV. For m_t=80 GeV, the decay product W mass is set to 20 GeV.
  • pp -> HH -> (hh) (hh), (h -> jj); mH=174 GeVmh=20,80 GeV

Data Generation

Jet samples in this dataset are generated with MadGraph, Pythia8, and Delphes (no pile-up effects simulated). Particle flow objects are used to cluster jets. FastJet was used for jet clustering. Jets are clustered using the anti-kt algorithm with the cone size R=1.0.

  •  Leading jet: pT>450 GeV; sub-leading jet: pT>200 GeV

 

Data Structure

  • To get jets: f['objects/jets']
  • For jets, there are two datasets: ['constituents', 'obs']. (jets information is stored with the higher-pt jet first)
    • `obs[:, n_j - 1]`: jet four vectors and n-subjettiness for the nj-th jet (pt, eta, phi, m, tau1, tau2, tau3, tau4, tau5) 
    • pt-sorted (highest first) jet constituents information are stored in variable length arrays for the nj-th jet `constituents[:, n_j - 1]`: {Ei,Pxi,Pyi,Pzi,PIDi} (PID: PDG for tracks; [22] for photons; [0] for neutral hadrons) 

Extra Notes

  • Since the dataset is structured as events, for W jet samples, only leading jet is available; while for Top and Higgs jets, leading and sub-leading jets are both valid. One might need to restrict the jet pT range at use.
  • e.g. to get leading jet constituents: `f["objects/jets/constituents"][:, 0]`
  • The file names are self-explanatory on the corresponding generation process. Each file was generated in 100K original Madgraph events. After the preselection, a small fraction of events is discarded.

Contact

  • we are welcoming any feedback, suggestions, or requests on new test samples. Please contact chengtaoli.1990@gmail.com for more information.

Files

Files (10.5 GB)

Name Size Download all
md5:878a2e82124505b473f170bc761e0375
415.0 MB Download
md5:160449b582832b0f7df649fbf9055106
309.1 MB Download
md5:9d290f8e0b1cf382014ea896110a1192
653.3 MB Download
md5:bf10738d086a8867720d1de3619d8ef0
653.2 MB Download
md5:69151006bbd7ac9ef6165fa49c9f865b
406.6 MB Download
md5:67f4492488021e1a871d61be6455007a
406.7 MB Download
md5:872693cdf8e93fca3fe0d4ca48674904
654.3 MB Download
md5:ecc587c614acf970f23939846f701abf
654.7 MB Download
md5:c2ca330a8f1620044fd0881a1663b426
1.1 GB Download
md5:2b804a1c0388b4cd4446146ddf26310e
875.6 MB Download
md5:24a78b4e7b41385b09d1e1f787b6c17e
206.5 MB Download
md5:3caf57fb762192b5c2d463fb6b647cd0
206.5 MB Download
md5:6b022b77298bd8c4b226f5b8259edd36
231.1 MB Download
md5:1cdc3b9d84d7e55b6e477503a4650fd8
230.9 MB Download
md5:3ebc9991a962eaca93215edbefee8c86
253.6 MB Download
md5:2955d2882458b48c182d7c6e5840b681
253.5 MB Download
md5:8ca4178c101948b2f653533c66de5486
189.7 MB Download
md5:ef11fbd81537a67092c6fefd7d7d2e7c
190.0 MB Download
md5:99db554eded764956fdd79114e9b2cca
346.0 MB Download
md5:1bea3066d64a7ed5031771f20af1cfde
347.2 MB Download
md5:808c88ebf6da4a7c03152a489115f942
237.4 MB Download
md5:8b744da7d65106b44e7e11bbe4c84488
236.4 MB Download
md5:927e8848ae3727a87f853e93915b1082
174.6 MB Download
md5:02bcf73b3ff7c9d1d96b2bb25580d5f6
174.8 MB Download
md5:bb7669eb2dc7bd73261b93a0130086af
189.2 MB Download
md5:aa9e8b182d2433c512b360598dd827a6
188.5 MB Download
md5:bb88288e6d00b8e16fdfddfb3162f877
194.9 MB Download
md5:d0152da6c7eabe19c7b243c575e14cd3
193.9 MB Download
md5:10c694766b0601b723c9bec31196ecb4
162.7 MB Download
md5:ceb2faa355e39cfb638133e92cab9443
162.7 MB Download