Cherenkov Telescope Data for Ordinal Quantification
- 1. TU Dortmund University
- 2. Consiglio Nazionale delle Ricerche
Description
This labeled data set is targeted at ordinal quantification. The goal of quantification is not to predict the class label of each individual instance, but the distribution of labels in unlabeled sets of data.
With the scripts provided, you can extract the relevant features and labels from the public data set of the FACT Cherenkov telescope. These features are precisely the ones that domain experts from astro-particle physics employ in their analyses. The labels stem from a binning of a continuous energy label, which is common practice in these analyses.
We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, app-oq_tst_indices.csv, real_val_indices.csv, and real_tst_indices.csv represents one sample.
Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ(5%), is a variant thereof, where only the smoothest 5% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed. The labels of the FACT data lie on an ordinal scale and, hence, pose such an ordinal quantification task. The third protocol considers "real" distributions of labels. These distributions would be expected by observing the Crab Nebula through the FACT telescope.
Usage
You can extract the data fact.csv through the provided script extract-fact.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.
Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.
Data Extraction: In your terminal, you can call either
make
(recommended) or
curl --fail -o fact.hdf5 https://factdata.app.tu-dortmund.de/dl2/FACT-Tools/v1.1.2/gamma_simulations_facttools_dl2.hdf5
julia --project="." --eval "using Pkg; Pkg.instantiate()"
julia --project="." extract-fact.jl
Outcome: The first row in the resulting fact.csv file is the header. The first column, named "class_label", is the ordinal class.
Further Reading
Implementation of our experiments: https://github.com/mirkobunse/regularized-oq
Original data repository: https://factdata.app.tu-dortmund.de/
Reference analysis by astro-particle physicists: https://github.com/fact-project/open_crab_sample_analysis
Files
extract-fact-oq.zip
Files
(46.7 MB)
Name | Size | Download all |
---|---|---|
md5:1adeb942a035ddf6ed14aac3c445a337
|
46.7 MB | Preview Download |
Additional details
Funding
References
- M. Bunse, A. Moreo, F. Sebastiani, M. Senz (2022). Ordinal Quantification through Regularization.
- H. Anderhub et al. (2013). Design and operation of FACT--the first G-APD Cherenkov telescope.