Published July 21, 2023 | Version v0.2.0
Dataset Open

Cherenkov Telescope Data for Ordinal Quantification

  • 1. TU Dortmund University
  • 2. Consiglio Nazionale delle Ricerche

Description

This labeled data set is targeted at ordinal quantification. The goal of quantification is not to predict the class label of each individual instance, but the distribution of labels in unlabeled sets of data.

With the scripts provided, you can extract the relevant features and labels from the public data set of the FACT Cherenkov telescope. These features are precisely the ones that domain experts from astro-particle physics employ in their analyses. The labels stem from a binning of a continuous energy label, which is common practice in these analyses.

We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, app-oq_tst_indices.csv, real_val_indices.csv, and real_tst_indices.csv represents one sample.

Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ(5%), is a variant thereof, where only the smoothest 5% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed. The labels of the FACT data lie on an ordinal scale and, hence, pose such an ordinal quantification task. The third protocol considers "real" distributions of labels. These distributions would be expected by observing the Crab Nebula through the FACT telescope.

Usage

You can extract the data fact.csv through the provided script extract-fact.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

Data Extraction: In your terminal, you can call either

make

(recommended) or

curl --fail -o fact.hdf5 https://factdata.app.tu-dortmund.de/dl2/FACT-Tools/v1.1.2/gamma_simulations_facttools_dl2.hdf5
julia --project="." --eval "using Pkg; Pkg.instantiate()"
julia --project="." extract-fact.jl

Outcome: The first row in the resulting fact.csv file is the header. The first column, named "class_label", is the ordinal class.

Further Reading

Implementation of our experiments: https://github.com/mirkobunse/regularized-oq

Original data repository: https://factdata.app.tu-dortmund.de/

Reference analysis by astro-particle physicists: https://github.com/fact-project/open_crab_sample_analysis

Files

extract-fact-oq.zip

Files (46.7 MB)

Name Size Download all
md5:1adeb942a035ddf6ed14aac3c445a337
46.7 MB Preview Download

Additional details

Funding

SoBigData-PlusPlus – SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics 871042
European Commission
AI4Media – A European Excellence Centre for Media, Society and Democracy 951911
European Commission

References

  • M. Bunse, A. Moreo, F. Sebastiani, M. Senz (2022). Ordinal Quantification through Regularization.
  • H. Anderhub et al. (2013). Design and operation of FACT--the first G-APD Cherenkov telescope.