# How To Use

Thank you for your interest in the ABC Database. This notebook is a tutorial on how to handle the level 1 data. For descriptions on the level 2 data please refer to [Ariel Data Challenge website](https://www.ariel-datachallenge.space/2022/)

Author: Kai Hou (Gordon) Yip

email: kai.hou.yip@ucl.ac.uk

last edit: 28/06/2022

In [None]:
import pandas as pd
import numpy as np
import h5py
from helper import extract_instrument_data

## `all_data.csv`

In [None]:
## you can open the file via pandas
all_data = pd.read_csv("Level1Data/all_data.csv")

In [None]:
all_data.head(1)

There are 27 columns in total, which includes two categories of information: 

    - Model parameters: Input values for Temperature, H2O, CO2, CH4, CO and NH3. They are suffixed with "_model".
    - Stellar and Planetary Paramters: Supplementary information about the planetary system.
To access the information about a given planet, one can simply navigate DataFrame.

## `observations.hdf5`
Contains simulated observations of each planet instances. It is organised in a nested structure, under each planet ID you will find spectroscopic inforamtion about the observation


In [None]:
observation= h5py.File("Level1Data/observations.hdf5",'r')

In [None]:
print("total number of data instances:",len(observation.keys()))

In [None]:
## to access observation of a particular data instances
observation['Planet_0']['instrument_spectrum'][:]

you can also convert them to a matrix if you need to manipulate the observations in bulk. 

In [None]:
def to_observed_matrix(data_file,aux_file):
    # careful, orders in data files are scambled. We need to "align them with id from aux file"
    num = len(data_file.keys())
    id_order = aux_file['planet_ID'].to_numpy()
    observed_spectrum = np.zeros((num,52,4))

    for idx, x in enumerate(id_order):
        current_planet_id = f'Planet_{x}'
        instrument_wlgrid = data_file[current_planet_id]['instrument_wlgrid'][:]
        instrument_spectrum = data_file[current_planet_id]['instrument_spectrum'][:]
        instrument_noise = data_file[current_planet_id]['instrument_noise'][:]
        instrument_wlwidth = data_file[current_planet_id]['instrument_width'][:]
        observed_spectrum[idx,:,:] = np.concatenate([instrument_wlgrid[...,np.newaxis],
                                            instrument_spectrum[...,np.newaxis],
                                            instrument_noise[...,np.newaxis],
                                            instrument_wlwidth[...,np.newaxis]],axis=-1)
    return observed_spectrum

In [None]:
observed_matrix = to_observed_matrix(observation,all_data)

## `all_target.csv`
Contains outputs from atmosphertic retrievals. All the retrieval are performed under the same atmospheric assumption, e.g. free chemistry. Here 6 atmospheric targets are retrieved in the process. We have also included additional information for each planet, such as the weighted quartiles for each of the 6 targets. The planet_ID in the file corresponds to the planet_ID from the `all_data.csv` file. 

In [None]:
all_target= h5py.File("Level1Data/all_target.hdf5",'r')

In [None]:
print("total number of data instances:",len(all_target.keys()))

To access information of a particular instances:

In [None]:
trace = all_target['Planet_0']['tracedata'][:]

To view attributes of each planet instance:

In [None]:
print(all_target['Planet_0'].attrs.keys())