Published April 26, 2018 | Version v2
Dataset Open

LHCb Particle Identification Compression Challenge

  • 1. Massachusetts Institute of Technology

Description

A dataset containing Particle Identification information for a challenge to compress the contained features. 

An example of how to use this dataset and a compression benchmark is given here: https://github.com/weissercn/LHCb_PID_Compression

-999 is the missing value identifier.

The goal is to compress features with the name of the form S[n]x[m], where [n] and [m] are integers. Features of the form S[n]aux[m],  where [n] and [m] are integers, contain information present at any time during compression and decompression. These features do not have to be compressed. The feature labelled 'pid' is the truth information of which type of particle an example is. This information is not present in real running conditions, is not present during either compression or decompression and can only be used to evaluate how well the autoencoder did. An example of this is in the example code.
 

Notes

Distinguished between auxiliary and compressible variables

Files

LHCb_PID_obscured.csv

Files (2.0 GB)

Name Size Download all
md5:4bf4eab072df3919ce0a47a6b316d788
2.0 GB Preview Download