R&D Dataset for LHC Olympics 2020 Anomaly Detection Challenge
Creators
- 1. University of Hamburg
- 2. Lawrence Berkeley National Lab
- 3. Rutgers University
Description
This is the first R&D dataset for the LHC Olympics 2020 Anomaly Detection Challenge. It consists of 1M QCD dijet events and 100k W'->XY events, with X->qq and Y->qq. The W', X, and Y masses are 3.5 TeV, 500 GeV and 100 GeV respectively. The events are produced using Pythia8 and Delphes 3.4.1, with no pileup or MPI included. They are selected using a single fat-jet (R=1) trigger with pT threshold of 1.2 TeV.
The events are randomly shuffled together, but for the purposes of testing and development, we provide the user with a signal/background truth bit for each event. Obviously, the truth bit will not be included in the actual challenge.
These events are stored as pandas dataframes saved to compressed h5 format. For each event, all Delphes reconstructed particles in the event are assumed to be massless and are recorded in detector coordinates (pT, eta, phi). More detailed information such as particle charge is not included. Events are zero padded to constant size arrays of 700 particles, with the truth bit appended at the end. The array format is therefore (Nevents=1.1M, 2101).
For more information, including an example Jupyter notebook illustrating how to read and process the events, see the official LHC Olympics 2020 webpage.
https://lhco2020.github.io/homepage/
UPDATE May 18 2020
We have uploaded a second signal dataset for R&D, consisting of 100k W'->XY with X,Y->qqq (i.e. 3-prong substructure). Everything else about this signal dataset (particle masses, trigger, Pythia configuration, detector simulation) is the same as the previous one described above.
UPDATE November 23 2020
We now include high-level feature files for the background and 2-prong signal (events_anomalydetection_v2.features.h5) and for the 3-prong signal (events_anomalydetection_Z_XY_qqq.features.h5). To produce the features, we have clustered every event into R=1 jets using the anti-kT algorithm. The features (calculated using fastjet plugins) are the 3-momenta, invariant masses, and n-jettiness variables tau1, tau2 and tau3 for the highest pT jet (j1) and the second highest pT jet (j2):
'pxj1', 'pyj1', 'pzj1', 'mj1', 'tau1j1', 'tau2j1', 'tau3j1', 'pxj2', 'pyj2', 'pzj2', 'mj2', 'tau1j2', 'tau2j2', 'tau3j2'
The rows (events) in each feature file should be ordered exactly the same as in their corresponding raw event file. For convenience, we have also included the label (1 for signal and 0 for background) as an additional column in the first feature file (events_anomalydetection_v2.features.h5).
UPDATE February 11 2021
We have included the Delphes detector card and the Pythia8 command files used to produce the R&D datasets.
UPDATE April 17 2022
It was brought to our attention that somehow the raw events file events_anomalydetection.h5 was never updated to v2, which had a lower generator-level pT threshold (PhaseSpace:pTHatMin = 500) for QCD events to minimize artificial trigger sculpting. This v2 is the version that the features file (events_anomalydetection_v2.features.h5) corresponds to, as well as the Pythia cmnd file (pythia_RnD_qcd.cmnd). Now the raw events file has been brought up to date as well.
Files
Files
(3.2 GB)
Name | Size | Download all |
---|---|---|
md5:cb11b729ec10c04ae5250d057fd088b2
|
22.4 kB | Download |
md5:271cf5e71fc756b2a8d2b32730689bdb
|
74.3 MB | Download |
md5:629789d55813be3860781b084ae7f1de
|
2.9 GB | Download |
md5:1e729f7dff225451182c28afaa4bb411
|
5.2 MB | Download |
md5:54e123a86143b668f9cb76905152a124
|
235.5 MB | Download |
md5:19555e76f8a787184ec43fd5ff295465
|
1.9 kB | Download |
md5:1e9b731c2bf90f4ba549b85996cd7424
|
2.0 kB | Download |
md5:21472daafd7d54cd10e7548869a41d03
|
2.0 kB | Download |