There is a newer version of the record available.

Published November 19, 2019 | Version v5
Dataset Open

Official Datasets for LHC Olympics 2020 Anomaly Detection Challenge

  • 1. Hamburg University
  • 2. Lawrence Berkeley National Lab
  • 3. Rutgers University, LBNL, and UC Berkeley


These are the official datasets for the LHC Olympics 2020 Anomaly Detection Challenge. Each "black box" contains 1M events meant to be representative of actual LHC data. These events may include signal(s) and the challenge consists of finding these signals using the method of your choice. We have uploaded a total of THREE black boxes to be used for the challenge.

In addition, we include a background sample of 1M events meant to aid in the challenge. The background sample consists of QCD dijet events simulated using Pythia8 and Delphes 3.4.1. Be warned that both the physics and the detector modeling for this simulation may not exactly reflect the "data" in the black boxes. For both background and black box data, events are selected using a single fat-jet (R=1) trigger with pT threshold of 1.2 TeV.

These events are stored as pandas dataframes saved to compressed h5 format. For each event, all reconstructed particles are assumed to be massless and are recorded in detector coordinates (pT, eta, phi). More detailed information such as particle charge is not included. Events are zero padded to constant size arrays of 700 particles. The array format is therefore (Nevents=1M, 2100).

For more information, including a complete description of the challenge and an example Jupyter notebook illustrating how to read and process the events, see the official LHC Olympics 2020 webpage here.

UPDATE: November 23, 2020

Now that the challenge is over, we have uploaded the solutions to Black Boxes 1 and 3. They are simple ASCII files (events_LHCO2020_BlackBox1.masterkey and events_LHCO2020_BlackBox3.masterkey) where each line is the truth label -- 0 for background and 1 (and 2 in the case of BB3) for signal -- of each event in the corresponding h5 files (same ordering). For more information about the solutions, please visit the LHCO2020 webpage.


Files (11.0 GB)

Name Size Download all
2.7 GB Download
2.6 GB Download
4.0 MB Download
2.2 GB Download
3.5 GB Download
4.0 MB Download