Published September 5, 2022 | Version 0.0.4
Dataset Open

SDSS Galaxy Subset


The Sloan Digital Sky Survey  (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 100077 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.

The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):

  • objid: unique SDSS object identifier 
  • mjd: MJD of observation
  • plate: plate identifier
  • tile: tile identifier
  • fiberid: fiber identifier
  • run: run number
  • rerun: rerun number
  • camcol: camera column
  • field: field number
  • ra: right ascension
  • dec: declination
  • class: spectroscopic class (only objetcs with GALAXY are included)
  • subclass: spectroscopic subclass
  • modelMag_u: better of DeV/Exp magnitude fit for band u
  • modelMag_g: better of DeV/Exp magnitude fit for band g
  • modelMag_r: better of DeV/Exp magnitude fit for band r
  • modelMag_i: better of DeV/Exp magnitude fit for band i
  • modelMag_z: better of DeV/Exp magnitude fit for band z
  • redshift: final redshift from SDSS data z
  • stellarmass: stellar mass extracted from the eBOSS Firefly catalog
  • w1mag: WISE W1 "standard" aperture magnitude
  • w2mag: WISE W2 "standard" aperture magnitude
  • w3mag: WISE W3 "standard" aperture magnitude
  • w4mag: WISE W4 "standard" aperture magnitude
  • gz2c_f: Galaxy Zoo 2 classification from Willett et al 2013
  • gz2c_s: simplified version of Galaxy Zoo 2 classification (labels set)

Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:

├── data.csv
├── fits
├── img
├── spectra
└── ssel

Where, each directory contains:

  • img: RGB images from the object in JPEG format, 150x150 pixels, generated using the SkyServer DR16 API
  • fits: FITS data subsets around the object across the u, g, r, i, z bands; cut is done using the ImageCutter library
  • spectra: full best fit spectra data from SDSS between 4000 and 9000 wavelengths
  • ssel: best fit spectra data from SDSS for specific selected intervals of wavelengths discussed by Sánchez Almeida 2010


  • v0.0.4 - Increase number of objects to ~100k.
  • v0.0.3 - Increase number of objects to ~80k.
  • v0.0.2 - Increase number of objects to ~60k.
  • v0.0.1 - Initial import.


Files (9.6 GB)

Name Size Download all
9.6 GB Download