Data Acquisition

Note that the code in Figures.ipynb will not run unless the relevant data has been downloaded, processed and stored in compatible formats. So this notebook must be run before the figures from the paper (https://arxiv.org/abs/2105.02234) can be generated.

[Note: Some relevant post-processed data is also available in the repository.]

If using this data, you can skip steps 3 anf 5 in this notebook. Scroll to the end of the notebook (look under Some Shortcuts) for more information on how to use the data included.

0. Get API key

Downloading TNG data requires an API key from the IllustrisTNG server. If this your first time working with TNG data, visit https://www.tng-project.org/data/docs/api/ and click on New User Registration to create a user account and request an API key.

Once in possession of your API key, navigate to simulation_data.__init__ and set the variable API = "ThisIsMyAPIKeyForIllustrisTNG" with your API key as a string.

You do not need to repeat this step during later iterations, unless your API key changes, or the variable API is not set permanently.

1. Import custom functions

First, import the necessary functions. Make sure to provide the right local path to the custom module simulation_data. This module is required for generating the figures from the paper.

In [ ]:
from simulation_data import get
In [ ]:
import numpy as np
import h5py

from simulation_data.galaxies import GalaxyPopulation
my_galaxy_population = GalaxyPopulation()
from simulation_data.galaxies.galaxy import get_galaxy_particle_data, get_stellar_assembly_data

2. Download particle data

Now, download targeted particle data for galaxies only within the specified mass cut at the specified redshift. This cell downloads particle data for all $z=2$ galaxies within $10^{10.5} \leq M_{*}/M_{\odot} \leq 10^{12}$.

Remember to check the path specified in the function get_galaxy_particle_data in simulation_data.galaxies.galaxy and make sure it points to a valid local drive.

Within the target drive, create a folder titled 'redshift_'+str(redshift)+'_data before running this cell.

Note that the get_stellar_assembly_data needs a pre-existing stellar assembly file to run. Generating Figures 4 and 5 in the Figures notebook partially depends on the stellar assembly files for $z=2$.

In [ ]:
redshift = 2
# this initializes the values in simulation_data.galaxies.galaxy_population
ids = my_galaxy_population.select_galaxies(redshift=redshift, mass_min=10.5, mass_max=12)

#this gets and saves the particle data for each galaxy in our selection
for idx in ids:
    get_galaxy_particle_data(id=idx, redshift=redshift, populate_dict=False)
    # Download Stellar Assembly Files for the chosen redshift before attempting to get particle assembly data
    get_stellar_assembly_data(id=idx, redshift=redshift, populate_dict=False)

3. Find Main Progenitor and Descendent IDs

We now get the ids for the progenitors and descendents of our population identified at $z=2$.

This example finds the main progenitor for each galaxy at $z=3$ and the descendant at $z=1.5$. It saves the arrays of ids at each redshift in the file redshift_ids.hdf5.

Note that this step may be time-consuming. The intermediate print statements are not required.

In [ ]:
z2_ids = ids
z3_ids = np.array([-1]*(len(ids)))
z1_5_ids = np.array([-1]*(len(ids)))


#Finding the progenitors at z=3   
count = 0
print('z=3', z3_ids)
for i, id in enumerate(z2_ids):
    if z3_ids[i] == -1:
        start_url = "http://www.tng-project.org/api/TNG100-1/snapshots/33/subhalos/" + str(id)
        sub = get(start_url)  
        while sub['prog_sfid'] != -1:
            # request the full subhalo details of the progenitor by following the sublink URL
            sub = get(sub['related']['sublink_progenitor'])
            if sub['snap'] == 25:
                z3_ids[i] = sub['id'] 
    count += 1
    print(count)
with h5py.File('redshift_ids.hdf5', 'a') as f:
    d1 = f.create_dataset('z3_ids', data = z3_ids)
    d2 = f.create_dataset('z2_ids', data = z2_ids)
    
#Finding the descendants at z=1.5
count = 0
print('z=1.5', z1_5_ids)
for i, id in enumerate(z2_ids):
    if z1_5_ids[i] == -1:
        start_url = "http://www.tng-project.org/api/TNG100-1/snapshots/33/subhalos/" + str(id)
        sub = get(start_url)   
        while sub['desc_sfid'] != -1:
            # request the full subhalo details of the progenitor by following the sublink URL
            sub = get(sub['related']['sublink_descendant'])
            if sub['snap'] == 40:
                z1_5_ids[i] = sub['id']         
    count += 1
    print(count)
with h5py.File('redshift_ids.hdf5', 'a') as f:
    d3 = f.create_dataset('z1.5_ids', data = z1_5_ids)

4. Download particle data for linked ids at different redshifts

For each new set of ids for progenitors and descendants, now repeat the steps from the second cell to get and save the particle data for each galaxy at the new redshift.

Remember to create new folders for each redshift you look at. It is not necessary to add the stellar assembly data for these redshifts to generate the figures in the Letter.

In [ ]:
with h5py.File('redshift_ids.hdf5', 'r') as f:
    z1_5_ids = f['z1.5_ids'][:]
    z3_ids = f['z3_ids'][:]

redshift = 1.5
for idx in z1_5_ids:
    get_galaxy_particle_data(id=idx, redshift=redshift, populate_dict=False)
    
redshift = 3
for idx in z1_5_ids:
    get_galaxy_particle_data(id=idx, redshift=redshift, populate_dict=False)

This concludes our section on downloading data on individual halos.

The following section uses the downloaded data to calculate halo properties necessary for reproducing the figures in the Letter.

5. Calculate halo properties from particle data

To speed up analysis, we now calculate and store some halo properties in a separate hdf5 file named 'galaxy_population_data_'+str(self.redshift)+'.hdf5'.

Remember to finish downloading individual halo data (running sections 1 and 2, obtaining the relevant halo ids for each chosen redshift) before moving on to the steps below.

In [ ]:
redshift = 2
#this initializes the values in simulation_data.galaxies.galaxy_population
ids = my_galaxy_population.select_galaxies(redshift=redshift, mass_min=10.5, mass_max=12)

#calculate halo properties and store calculated data
my_galaxy_population.get_galaxy_population_data()

Following the directions above will help you download and process IllustrsTNG data on your local machine.

You can move on to the Figures.ipynb notebook after the data has been stored in compatible formats.

Some shortcuts:

Walking down merger trees can be time consuming. Stellar assembly files may be hard to come by/compile from scratch. So here are some shortcuts to make life a little easier.

Relevant post-processed data included

Post-processed data generated from TNG-100 that has been used to create the figures in this Letter are included in galaxy_population_data_2.hdf5 and redshift_ids.hdf5.

Each array stores a halo property for individual halos within our chosen mass-cut at $z=2$, arranged by the order of ids in 'ids'. These properties can also be easily recalculated from raw TNG-100 data and stored in the same format by following the steps above.

This data is stored in a format compatible with the code in Figures.ipynb. However, individual halo files are not included. Following the steps above will help you download particle data for galaxies directly from the IllustrisTNG public resease.

In [ ]: