R&D Dataset for LHC Olympics 2020 Anomaly Detection Challenge
Creators
- 1. University of Hamburg
- 2. Lawrence Berkeley National Lab
- 3. Rutgers University
Description
This is the first R&D dataset for the LHC Olympics 2020 Anomaly Detection Challenge. It consists of 1M QCD dijet events and 100k W'->XY events, with X->jj and Y->jj. The W', X, and Y masses are 3.5 TeV, 500 GeV and 100 GeV respectively. The events are produced using Pythia8 and Delphes 3.4.1, with no pileup or MPI included. They are selected using a single fat-jet (R=1) trigger with pT threshold of 1.3 TeV.
The events are randomly shuffled together, but for the purposes of testing and development, we provide the user with a signal/background truth bit for each event. Obviously, the truth bit will not be included in the actual challenge.
These events are stored as pandas dataframes saved to compressed h5 format. For each event, all Delphes reconstructed particles in the event are assumed to be massless and are recorded in detector coordinates (pT, eta, phi). More detailed information such as particle charge is not included. Events are zero padded to constant size arrays of 700 particles, with the truth bit appended at the end. The array format is therefore (Nevents=1.1M, 2101).
For more information, including an example Jupyter notebook illustrating how to read and process the events, see the official LHC Olympics 2020 webpage.
https://indico.cern.ch/event/809820/page/16782-lhcolympics2020
Files
Files
(2.8 GB)
Name | Size | Download all |
---|---|---|
md5:a06c71ae36cbd4de6699a490c06b94b7
|
2.8 GB | Download |