Published October 31, 2018 | Version v1.0.0
Dataset Open

Machine learning for Gravity Spy: Glitch classification and dataset

  • 1. Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA
  • 2. Department of Computer Science, University of Illinois at Chicago, IL, USA
  • 3. Center for Interdisciplinary Exploration and Research in Astrophysics (CIERA), Northwestern University, Evanston, IL, USA
  • 4. Department of Physics, California State University Fullerton, Fullerton, CA, USA

Description

We present the first version of the training set used in the Gravity Spy citizen science project. This training set, discussed in detail here, was utilized to train the convolutional neural network employed in the Gravity Spy project. We anticipate moving forward to release more labelled Gravity Spy data sets, including a refined version of this training set which can be found here 10.5281/zenodo.1476551, and data sets containing the annotations provided by our citizen science volunteers.

Data Set Information

There are three files provided in this data set

  • trainingset_v1d0_metadata.csv
    • This file has three columns, gravityspy_id, label, and sample_type. gravityspy_id is the unique 10 character hash given to every Gravity Spy sample. label is the string label of the sample. sample_type indicates whether this sample was used in the paper for testing training or validating the models. This is provided for those who would like to do direct comparisons to the network described in the paper.
  • trainingsetv1d0.h5
    • This file contains the exact arrays used in the paper for every Gravity Spy sample. Each Gravity Spy sample is defined by four different images with varying temporal duration, 0.5, 1.0, 2.0, and 4.0 second, respectively. This also determines the naming conventions of the PNGs: interferometer_gravityspyid_spectrogram_duration.png (e.g. H1_Fv3p6eROvA_spectrogram_0.5.png, H1_Fv3p6eROvA_spectrogram_1.0.png, H1_Fv3p6eROvA_spectrogram_2.0.png, H1_Fv3p6eROvA_spectrogram_4.0.png).
    • This file contains all the information needed for each sample in the Gravity Spy dataset (i.e. the label, the sample type of the sample, the unique id of the sample, and the image data for that sample.
      • /1080Lines/validation/xUEyaWr34c Group
        /1080Lines/validation/xUEyaWr34c/0.5.png Dataset {1, 140, 170}
        /1080Lines/validation/xUEyaWr34c/1.0.png Dataset {1, 140, 170}
        /1080Lines/validation/xUEyaWr34c/2.0.png Dataset {1, 140, 170}
        /1080Lines/validation/xUEyaWr34c/4.0.png Dataset {1, 140, 170}
  • trainingsetv1d0.tar.gz
    • Contains the raw PNGs of the Gravity Spy training set.
    • The structure of the folder is /"label"/"sample_type"/"pngs"

Data Set Parsing Information

To read and crop out the plot axis and labels of the provided PNGs, the following small python code using scikit-image should work.

from skimage import io

image_data = io.imread("filename_of_image")

x=[66, 532]; y=[105, 671]

image_data = image_data[x[0]:x[1], y[0]:y[1], :3]

Files

trainingset_v1d0_metadata.csv

Files (12.9 GB)

Name Size Download all
md5:6b99210289d508696fb3feccfa7c79fc
248.9 kB Preview Download
md5:7055ef3e0b700c117f8fad7489d6da6b
3.3 GB Download
md5:efb0d0d3f3750375d00d0d2e99c9e352
9.6 GB Download

Additional details

Funding

INSPIRE: Teaming Citizen Science with Machine Learning to Deepen LIGO's View of the Cosmos 1547880
National Science Foundation