There is a newer version of the record available.

Published June 19, 2020 | Version 2.0
Dataset Open

Test Sets for Jet Anomaly Detection at the LHC

Creators

  • 1. Mila, University of Montreal

Description

Data Description

These datasets are generated as a series of test sets for anomalous jet tagging at the LHC. They include boosted W jets, Top jets, and Higgs jets. Jet transverse momentum is focused around 600 GeV and 1200 GeV (with prefix "pt1200_" in file names).  Each file includes 100k original events from MadGraph, but might have slightly less events in the final h5 files due to fatjet pre-selection. Production processes include:

  • pp -> W' -> W (jj) Z(\(\nu \nu\)); \(m_{W} = 59, 80, 120, 174 ~GeV\)
  • pp -> Z' -> t t~; \(m_t=80, 174 ~GeV\)
  • pp -> HH -> (hh) (hh), (h -> bb); \(m_H=174~GeV\)\(m_h = 20, 80 ~GeV\)

Data Generation

Jet samples in this dataset are generated with MadGraph, Pythia8 and Delphes (no pile-up effects simulated). Particle flow objects are used to cluster jets. FastJet was used for jet clustering. Jets are clustered using anti-kt algorithm with cone size R=1.0.

  •  Leading jet: \(p_T>450 \textrm{GeV}\); sub-leading jet: \(p_T>200 \textrm{GeV}\)

 

Data Structure

  • To get jets: f['objects/jets']
  • For jets, there are two datasets: ['constituents', 'obs']. (jets information is stored with higher-pt jet first)
    • `obs[:, n_j - 1]`: jet four vectors and n-subjettiness for \(n_j\)-th jet (pt, eta, phi, m, tau1, tau2, tau3, tau4, tau5) 
    • pt-sorted (highest first) jet constituents information are stored in variable length arrays for \(n_j\)-th jet `constituents[:, n_j - 1]`: \(\{ E_i, P_{xi}, P_{yi}, P_{zi}, \textrm{PID}_i\}\) (PID: PDG for tracks; [22] for photon; [0] for neutral hadron) 

Extra Notes

  • Since the dataset is structured as events, for W jet samples, only leading jet is available; while for Top and Higgs jets, leading and sub-leading jets are both valid. One might need to restrict jet \(p_T\) range at use.
  • e.g. to get leading jet constituents: `f["objects/jets/constituents"][:,0]`
  • The file names are self-explanatory on the corresponding generation process.

Files

Files (10.3 GB)

Name Size Download all
md5:878a2e82124505b473f170bc761e0375
415.0 MB Download
md5:160449b582832b0f7df649fbf9055106
309.1 MB Download
md5:9d290f8e0b1cf382014ea896110a1192
653.3 MB Download
md5:bf10738d086a8867720d1de3619d8ef0
653.2 MB Download
md5:69151006bbd7ac9ef6165fa49c9f865b
406.6 MB Download
md5:67f4492488021e1a871d61be6455007a
406.7 MB Download
md5:872693cdf8e93fca3fe0d4ca48674904
654.3 MB Download
md5:ecc587c614acf970f23939846f701abf
654.7 MB Download
md5:aa1acf4fdc8543f7046cf34f2947f8a6
501.8 MB Download
md5:a0c48ad66d510fb374bcd7466e955fe3
504.4 MB Download
md5:5605f60ed5cb10edc923e315801f88b4
423.1 MB Download
md5:2f9e2ca81b33e6d70fcb532283ddc95a
421.9 MB Download
md5:24a78b4e7b41385b09d1e1f787b6c17e
206.5 MB Download
md5:3caf57fb762192b5c2d463fb6b647cd0
206.5 MB Download
md5:6b022b77298bd8c4b226f5b8259edd36
231.1 MB Download
md5:1cdc3b9d84d7e55b6e477503a4650fd8
230.9 MB Download
md5:3ebc9991a962eaca93215edbefee8c86
253.6 MB Download
md5:2955d2882458b48c182d7c6e5840b681
253.5 MB Download
md5:8ca4178c101948b2f653533c66de5486
189.7 MB Download
md5:ef11fbd81537a67092c6fefd7d7d2e7c
190.0 MB Download
md5:99db554eded764956fdd79114e9b2cca
346.0 MB Download
md5:1bea3066d64a7ed5031771f20af1cfde
347.2 MB Download
md5:2fd9107c7f237576f3f875d2389d45d8
223.8 MB Download
md5:04c395ee3e0f8c758ed1c68a065bfa59
220.4 MB Download
md5:927e8848ae3727a87f853e93915b1082
174.6 MB Download
md5:02bcf73b3ff7c9d1d96b2bb25580d5f6
174.8 MB Download
md5:bb7669eb2dc7bd73261b93a0130086af
189.2 MB Download
md5:aa9e8b182d2433c512b360598dd827a6
188.5 MB Download
md5:bb88288e6d00b8e16fdfddfb3162f877
194.9 MB Download
md5:d0152da6c7eabe19c7b243c575e14cd3
193.9 MB Download
md5:10c694766b0601b723c9bec31196ecb4
162.7 MB Download
md5:ceb2faa355e39cfb638133e92cab9443
162.7 MB Download