# Enhance ICCAT data with HYCOM

This workflow enhances ICCAT data with environmental data from HYCOM and other sources. This can be done separately for each day, which allows incremental and distributed operation.


![`iccat-hycom`](./images/iccat-hycom.drawio.svg)

## Pipeline Overview

This pipeline has three primary steps.

- First it splits the ICCAT psuedoabsence data by date.
- Then for each of those dates, it retrieves and enhances the data with both direct (sea surface temperature, salinity...) data, and calculated (buoyancy frequency...) data.
- Then each of the enhanced subsets of data are re-combined back into a single output CSV.

The pipeline (with a subset of data) is defined in [`dvc.yaml`](../../dvc.yaml) as `iccat_hycom_split`, `iccat_hycom_enhance`, and `iccat_hycom_combine`.

For local or remote running, the pipeline is described in [`iccat_hycom.py`](./iccat_hycom.py).

### Running locally with DVC.

When running locally, the `data/` directory is mapped into the containers as `/data/`, 
and DVC is configured to run each step with commands directing the scripts to work at the appropriate paths.

To run the full ICCAT-HYCOM pipeline, and any required preceding steps (including having Docker make sure dependencies are up to date),
run `dvc repro iccat_hycom_combine`.

### Split <img src="./images/iccat_hycom_split.drawio.svg" align="right">



### Enhance

### Combine

## Script

See `/python-tools` for instructions on `extract.py`

## Pipeline

This pipeline consists of three steps.

### Input

For a single datum we expect our input paths to look like:

```sh
/pfs/
    iccat_by_date/
        1993-01-01/
            1993-01-01.csv
    hycom_salinity/
        1993-01-01/
            1993-01-01_salinity.nc
    hycom_sst/
        1993-01-01/
            1993-01-01_water_temp_d0.nc
    ...
```


# Consolidating Enhanced Biological Data

Once the biological data has been combined with environmental data
for each day, it then needs to be consolidated back into a single file
that can be fed to modeling processes.

## Script

See `/python-tools/` for instructions on usage of `combine.py`.

## Pipeline

This pipeline takes in all the results from `extract_iccat_hycom` and combines them into a single file.

```json

{
    "pipeline": {
        "name": "combine_iccat_hycom"
    },
    "input": {
        "pfs": {
            "repo": "extract_iccat_hycom",
            "glob": "/"
        }
    },
    "transform": {
        "image": "gmri/nasa-facet-py",
        "cmd": [
            "./combine.py",
            "/pfs/extract_iccat_hycom",
            "/pfs/out/iccat_with_hycom.csv"
        ]
    },
    "standby": true
}
```
