There is a newer version of the record available.

Published November 19, 2019 | Version v4
Dataset Open

Official Datasets for LHC Olympics 2020 Anomaly Detection Challenge

  • 1. Hamburg University
  • 2. Lawrence Berkeley National Lab
  • 3. Rutgers University, LBNL, and UC Berkeley

Description

These are the official datasets for the LHC Olympics 2020 Anomaly Detection Challenge. Each "black box" contains 1M events meant to be representative of actual LHC data. These events may include signal(s) and the challenge consists of finding these signals using the method of your choice. We have uploaded a total of THREE black boxes to be used for the challenge.

In addition, we include a background sample of 1M events meant to aid in the challenge. The background sample consists of QCD dijet events simulated using Pythia8 and Delphes 3.4.1. Be warned that both the physics and the detector modeling for this simulation may not exactly reflect the "data" in the black boxes. For both background and black box data, events are selected using a single fat-jet (R=1) trigger with pT threshold of 1.2 TeV.

These events are stored as pandas dataframes saved to compressed h5 format. For each event, all reconstructed particles are assumed to be massless and are recorded in detector coordinates (pT, eta, phi). More detailed information such as particle charge is not included. Events are zero padded to constant size arrays of 700 particles. The array format is therefore (Nevents=1M, 2100).

For more information, including a complete description of the challenge and an example Jupyter notebook illustrating how to read and process the events, see the official LHC Olympics 2020 webpage here.

Files

Files (11.0 GB)

Name Size Download all
md5:f9ebde2e7739465902234ae73ec045e8
2.7 GB Download
md5:847be306afd923a4ba30543070113627
2.6 GB Download
md5:2e5dce598238789938cb31e9cd3e7a46
2.2 GB Download
md5:ffc53e0c0b42e0f2c752ce48db74ce27
3.5 GB Download