GAMMA: Galactic Attributes of Mass, Metallicity, and Age Dataset
Description
We introduce the GAMMA (Galactic Attributes of Mass, Metallicity, and Age) dataset, a comprehensive collection of galaxy data tailored for Machine Learning applications. This dataset offers detailed 2D maps and 3D cubes of 11 727 galaxies, capturing essential attributes: stellar age, metallicity, and mass.
Together with the dataset we publish our code to extract any other stellar or gaseous property from the raw simulation suite to extend the dataset beyond these initial properties, ensuring versatility for various computational tasks. Ideal for feature extraction, clustering, and regression tasks, GAMMA offers a unique lens for exploring galactic structures through computational methods and is a bridge between astrophysical simulations and the field of scientific machine learning (ML).
As a first benchmark, we apply Principal Component Analysis (PCA) on this dataset. We find that PCA effectively captures the key morphological features of galaxies with a small number of components. We achieve a dimensionality reduction by a factor of ∼200 (∼3650) for 2D images (3D cubes) with a reconstruction accuracy below 5%.
We calculate UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction) on the lower dimensional PCA scores of the 2D images to visualize the image space. An interactive version of this plot can be accessed using an online Dashboard (hover over a point to see the galaxy image and the IllustrisTNG Subhalo ID).
All the code to generate this dataset and load the data structure is publicly available on GitHub, with an additional documentation page hosted on ReadTheDocs.
Files
Files
(37.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:464e7d36eaecf72b599253229054c9f9
|
37.5 GB | Download |