Dataset Open Access
Walmsley, Mike; Lintott, Chris; Tobias, Geron; Kruk, Sandor J; Krawczyk, Coleman; Willett, Kyle; Bamford, Steven; Kelvin, Lee S; Fortson, Lucy; Gal, Yarin; Keel, William; Masters, Karen; Mehta, Vihang; Simmons, Brooke; Smethurst, Rebecca J; Smith, Lewis; Baeten, Elisabeth M L; Macmillan, Christine
This repository contains the data released in the paper "Galaxy Zoo DECaLS: Detailed Visual Morphology Measurements from Volunteers and Deep Learning for 314,000 Galaxies" (DOI to follow on publication).
We release detailed morphology catalogues, both volunteer and automated, for Galaxy Zoo DECaLS.
- gz_decals_volunteers_1_and_2 contains volunteer classifications for galaxies classified during the GZD-1 and GZD-2 campaigns.
- gz_decals_volunteers_5 similarly contains classifications from the GZD-5 campaign. Note that GZD-5 used a modified schema designed to better detect mergers and weak bars, and includes many galaxies with only approx. five volunteer responses.
- gz_decals_auto_posteriors contains the predicted posteriors for volunteer responses to all galaxies used in any campaign. The full posteriors are recorded as Dirichlet distribution concentrations. gz_decals_auto_posteriors also summarises these posteriors as the automated equivalent of previous Galaxy Zoo data releases; the expected vote fractions (mean posteriors). Note that not all posteriors/vote fractions are relevant for every galaxy; we suggest assessing relevance using the estimated fraction of volunteers that would have been asked each question.
We include a schema document, schema.md, to define the column names in each catalogue.
We also release the galaxy images shown to volunteers on www.galaxyzoo.org during GZD-5. The images on which the automated classifier was trained may be derived from these volunteer-facing images. These images are split into four zip files, each of which contains images named by iauname inside a subfolder named by the first four characters in their iauname. Not all images were labelled during GZD-5 - refer to the catalog for training labels. We are working with the Zenodo team to add these large files to this repository - meanwhile, you can download them from The University of Manchester here.
The .csv and .parquet files contain identical data. Parquet is a fast column-oriented binary format which can be read with pd.read_parquet(loc, columns=[some columns]).
You may also be interested in the github repository which contains code to reproduce the model and to fine-tune it for new tasks (including pretrained weights).
We will release updates if needed via Zenodo versioning. We recommend using the latest version of this repository. You can check the version you are currently viewing on the right-hand sidebar.
Please cite the paper (DOI to follow on publication) when using the data in this repository.
v0.0.1 (submission) provides the catalog files.
v0.0.2 (first revision) renames the catalog files, adds flags for poorly sized galaxies, and includes the galaxy images via the University of Manchester