Dataset Open Access

SDSS Galaxy Subset

Carvalho, Nuno Ramos

The Sloan Digital Sky Survey  (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 60247 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.

The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):

  • objid: unique SDSS object identifier 
  • mjd: MJD of observation
  • plate: plate identifier
  • tile: tile identifier
  • fiberid: fiber identifier
  • run: run number
  • rerun: rerun number
  • camcol: camera column
  • field: field number
  • ra: right ascension
  • dec: declination
  • class: spectroscopic class (only objetcs with GALAXY are included)
  • subclass: spectroscopic subclass
  • modelMag_u: better of DeV/Exp magnitude fit for band u
  • modelMag_g: better of DeV/Exp magnitude fit for band g
  • modelMag_r: better of DeV/Exp magnitude fit for band r
  • modelMag_i: better of DeV/Exp magnitude fit for band i
  • modelMag_z: better of DeV/Exp magnitude fit for band z
  • redshift: final redshift from SDSS data z
  • stellarmass: stellar mass extracted from the eBOSS Firefly catalog
  • w1mag: WISE W1 "standard" aperture magnitude
  • w2mag: WISE W2 "standard" aperture magnitude
  • w3mag: WISE W3 "standard" aperture magnitude
  • w4mag: WISE W4 "standard" aperture magnitude
  • gz2c_f: Galaxy Zoo 2 classification from Willett et al 2013
  • gz2c_s: simplified version of Galaxy Zoo 2 classification (labels set)

Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:

├── data.csv
├── fits
├── img
├── spectra
└── ssel

Where, each directory contains:

  • img: RGB images from the object in JPEG format, 150x150 pixels, generated using the SkyServer DR16 API
  • fits: FITS data subsets around the object across the u, g, r, i, z bands; cut is done using the ImageCutter library
  • spectra: full best fit spectra data from SDSS between 4000 and 9000 wavelengths
  • ssel: best fit spectra data from SDSS for specific selected intervals of wavelengths discussed by Sánchez Almeida 2010


  • v0.0.3 - Increase number of objects to ~80k.
  • v0.0.2 - Increase number of objects to ~60k.
  • v0.0.1 - Initial import.
Files (7.7 GB)
Name Size
7.7 GB Download
All versions This version
Views 817
Downloads 60
Data volume 33.0 GB0 Bytes
Unique views 657
Unique downloads 50


Cite as