Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published May 31, 2023 | Version 1.0.1
Dataset Open

Radio Galaxy Zoo: Tagging Radio Subjects using Text

  • 1. Australian National University
  • 2. University of Western Australia
  • 3. Google Australia
  • 4. CSIRO Space & Astronomy
  • 5. Data61, CSIRO

Description

RadioTalk is a communication platform that enabled members of the Radio Galaxy Zoo (RGZ) citizen science project to engage in discussion threads and provide further descriptions of the radio subjects they were observing in the form of tags and comments. It contains a wealth of auxiliary information which is useful for the morphology identification of complex and extended radio sources. In this paper, we present this new dataset, and for the first time in radio astronomy, we combine text and images to automatically classify radio galaxies using a multi-modal learning approach. We found incorporating text features improved classification performance which demonstrates that text annotations are rare but valuable sources of information for classifying astronomical sources, and suggests the importance of exploiting multi-modal information in future citizen science projects. We also discovered over 10,000 new radio sources beyond the RGZ-DR1 catalogue in this dataset.

Notes

data_train.zip: compressed csv file for training data. data_val.zip: compressed csv file for validation data. data_test.zip: compressed csv file for test data. Each row in the csv file corresponds to a radio subject. Columns corresponds to features and tags: radio image features (column radio001 to column radio768), infrared image features (column ir001 to column ir768), RadioTalk discussion text features (column text001 to column text768), and boolean indicator of tags (the last 11 columns).

Files

data_test.zip

Files (114.0 MB)

Name Size Download all
md5:ac6c632d6357ef1a86509331d4bfdad9
17.2 MB Preview Download
md5:0c588e8c561fdcdda4ce99f617870cb6
82.3 MB Preview Download
md5:486810619cd93830fb18873433885e9d
14.6 MB Preview Download