GalaxiesML: an imaging and photometric dataset of galaxies for machine learning
Description
We present a dataset built for machine learning applications consisting of galaxy photometry, images, and spectroscopic redshifts. This is a curated dataset of 286,401 galaxy images and photometry from the Hyper-Suprime-Cam survey in five filters $g,r,i,z,y$ with spectroscopically confirmed redshifts. Such a dataset is important for machine learning applications because it is uniform, consistent, and has minimal outliers. We describe the challenges associated with putting together a dataset from publicly available archives, including: outlier rejection, duplication, establishing ground truths, and sample selection. This is one of the largest public machine learning-ready training sets of its kind with redshifts ranging from 0.01 to 4. The redshift distribution of this sample peaks at redshift of 1.5 and falls off rapidly beyond redshift 2.5.
Files
Files
(116.3 GB)
Name | Size | Download all |
---|---|---|
md5:a67ffc01def95ec07f9996105848f19d
|
13.2 GB | Download |
md5:334e8d698dcc7169770ddc7d718f4a57
|
66.2 GB | Download |
md5:ba1ce213162fc10e54f666c3a83f9eba
|
13.2 GB | Download |
md5:a63d80990a9cfacde1fbb9562c6830c9
|
3.4 GB | Download |
md5:866a5494adefc700b637785b69917bd7
|
16.9 GB | Download |
md5:95154a37d2b1ea4a51182ec8ef4e5a19
|
3.4 GB | Download |
Additional details
Related works
- Is cited by
- Publication: 10.3847/1538-4357/ad2070 (DOI)