Published July 24, 2025 | Version v1
Dataset Open

Data for QZO: A Catalog of 5 Million Quasars from the Zwicky Transient Facility

  • 1. Division of Physics, Mathematics and Astronomy, California Institute of Technology, Pasadena, CA 91125, USA
  • 2. Jet Propulsion Laboratory, California Institute of Technology, 4800 Oak Grove Drive, Pasadena, CA 91109
  • 3. DIRAC Institute, Department of Astronomy, University of Washington, Seattle, WA 98195, USA
  • 4. IPAC, California Institute of Technology, Pasadena, CA 91125, USA
  • 5. Caltech Optical Observatories, California Institute of Technology, Pasadena, CA 91125, USA
  • 6. Center for Data Driven Discovery, California Institute of Technology, Pasadena, CA 91125, USA
  • 7. Department of Physics, Drexel University, Philadelphia, PA 19104, USA

Description

QZO.csv

The QZO catalog, which includes 4,849,574 objects and columns as described below, excluding the duplicate objects flag. The classifications are based on XGB models trained on ZTF g-band median magnitude and light curves classification with transformer model, as well as WISE W[1-4] magnitudes and colors. The photo-zs are based on ZTF g-band magnitude and WISE magnitudes and colors. We remove duplicated ZTF light curves by removing objects which within the full ZTF catalog have at least one neighbour within 1 arcsec with more ZTF observation epochs. The final number of quasars was achieved with magnitude, number of observation epochs, and minimum quasar classification probability cuts, such that g < n_obs / 80 + 20.375, where n_obs is the number of ZTF observational epochs per light curve, and p_(QSO) > 0.9, where p_(QSO) is XGB classification probability for the QSO class. The photo-zs are available for 35% of these objects, depending on the availability of WISE observations.

ZTF_all_QSO.csv

This file provides all the columns for 78,078,450 objects classified as QSOs by at least one of the two XGB models with and without the WISE features. There are no cuts applied, and there are no duplicates removed. 26% of objects are marked with the duplicates flag.

train.csv

The train data predictions. This file contains 2,588,221 records, with ZTF ID and duplicates flag missing. Selecting the longest ZTF light curve for each non duplicated SDSS object removed ZTF duplicates.

Catalog columns

ID                                                ZTF identifier

ra                                                right ascension

dec                                             declination

n_obs                                         number of ZTF observation epochs

is_duplicate                               flag indicating duplicated light curves

mag_median                             ZTF g-band median magnitude

p_[galaxy, QSO, star]               classification probabilities

p_WISE_[galaxy, QSO, star]    classifications with added WISE data

redshift                                      redshift estimate

ANN_clf.[data-00000-of-00001, index]

ANN model for classification of ZTF g-band light curves. ANN model is trained on ZTF g-band data with at least 20 observation epochs per light curve. It does not require scaling of input light curves, which is done separately for each light curve as part of the transformer model. An example on how to load and use the ANN can be found in the script “run_inference.py” in the GitHub repository.

XGB_clf__ZTF_[PS, WISE, GAIA, PS_WISE, PS_GAIA, WISE_GAIA, PS_WISE_GAIA].pickle

XGB_z__ZTF_WISE.pickle

XGB classification and redshift models for different combinations of input surveys. XGB classification models are trained on all ZTF data with available ANN classification, learning to classify missing features. The XGB redshift model does not include ANN classification as features. An example on how to load and use XGB models can be found in the script “run_inference_XGB.py” in the GitHub repository.

Features order

ZTF         g_mag_median, p_ANN_galaxy, p_ANN_QSO, p_ANN_star

PS           g, r, i, z, g - r, g - i, g - z, r - i, r - z, i - z

WISE      W1, W2, W3, W4, W1 - W2, W1 - W3, W1 - W4, W2 - W3, W2 - W4, W3 - W4

GAIA       g_mean_mag, parallax, pmra, pmdec, bp_mean_mag, rp_mean_mag, bp_rp_excess_factor

The exact column names can be found in the script “features.py” in the Github repository.

Files

QZO.csv

Files (11.9 GB)

Name Size Download all
md5:0ebd913517a990e5603f4ba4edf88772
19.0 MB Download
md5:01b35b8d41c89461c354ac8cdae34bda
9.6 kB Download
md5:18c1faaedf0e17e6101967e7a66952bf
19.0 MB Download
md5:1a92f82ec1b6073a8121cc452a738fa6
9.6 kB Download
md5:7f39e62d4beda90ecead2f1c008b118c
672.0 MB Preview Download
md5:b87d9e64e9a2bab022ab3ddddc1ba4d8
301.1 MB Preview Download
md5:c2709406957f055ffdc540b456d6855e
7.7 MB Download
md5:987df660e7bddbed9fecade7f0ae0754
13.1 MB Download
md5:82d98745b0e86fd066fc4c7fa8406b02
29.2 MB Download
md5:f5ae56bd03e1e8e17a07e8e653de5d5b
19.9 MB Download
md5:397875271874e78736377d634b8b6d0d
22.7 MB Download
md5:27adaf52b13210d57309f72185a46197
17.9 MB Download
md5:0a01540d84f7df67dd05b66e0a8c943d
12.5 MB Download
md5:c1a58b2abf365ef4d91e25def0af0d88
12.0 MB Download
md5:1ec4394f62b4c648708bb41c5dc57175
16.5 MB Download
md5:a397505b4b4d38cc6b4847fb052989f4
10.8 GB Preview Download

Additional details

Funding

U.S. National Science Foundation
AST-2108402

Software

Repository URL
https://github.com/snakoneczny/ztf-agn
Programming language
Python