Published May 6, 2024 | Version v6
Dataset Open

GalaxiesML: an imaging and photometric dataset of galaxies for machine learning

  • 1. UCLA Division of Physical Sciences
  • 2. ROR icon University of California, Los Angeles
  • 3. ROR icon Southern Oregon University

Description

We present a dataset built for machine learning applications consisting of galaxy photometry, images, and spectroscopic redshifts. This is a curated dataset of 286,401 galaxy images and photometry from the Hyper-Suprime-Cam survey in five filters $g,r,i,z,y$ with spectroscopically confirmed redshifts. Such a dataset is important for machine learning applications because it is uniform, consistent, and has minimal outliers. We describe the challenges associated with putting together a dataset from publicly available archives, including: outlier rejection, duplication, establishing ground truths, and sample selection.  This is one of the largest public machine learning-ready training sets of its kind with redshifts ranging from 0.01 to 4. The redshift distribution of this sample peaks at redshift of 1.5 and falls off rapidly beyond redshift 2.5.

Files

Files (116.3 GB)

Name Size Download all
md5:a67ffc01def95ec07f9996105848f19d
13.2 GB Download
md5:334e8d698dcc7169770ddc7d718f4a57
66.2 GB Download
md5:ba1ce213162fc10e54f666c3a83f9eba
13.2 GB Download
md5:a63d80990a9cfacde1fbb9562c6830c9
3.4 GB Download
md5:866a5494adefc700b637785b69917bd7
16.9 GB Download
md5:95154a37d2b1ea4a51182ec8ef4e5a19
3.4 GB Download

Additional details

Related works

Is cited by
Publication: 10.3847/1538-4357/ad2070 (DOI)