# Fit BRT Model

Here, we model the probability of species presence as a function of environmental variables using a species distribution modelling framework called boosted regression trees (BRTs). Our goal is to fit BRTs to, for example, a given set of presence data from a particular type of observational data (such as fishery or tagging data) to capture the environmental metrics associated with species presence/absence. Ultimately, we aim to use these relationships to predict the probability of a species' presence/absence or the likely suitability of a given habitat based on its associated environmental characteristics.

Our input data is comprised of .csv files containing presence and (pseudo)absence data for species of interest and associated predictor and response variables, such as bathymetry, sea surface temperature and others (see config).

## Workflow overview

![`model-brt`](./model-brt.drawio.svg)

## Elements of workflow

To prepare this data and fit the models, this BRT workflow contains 2 main processing pipelines:

- [`model_split_by_species`](./model_split_by_species.pipeline.json) - xxx.
- [`model_fit_brt`](./model_fit_brt.pipeline.json) - xxx.


### Splitting data for modeling by species

The desired model(s) are trained/fit on a per species basis, so all the combined input data needs to be split back up by species.

![`model_split_by_species`](./model_split_by_species.drawio.svg)

#### The script

See `/python-tools` for instructions on usage of `split_generic.py`.

#### Pipeline

This pipeline takes the single output CSV from `combine_iccat_hycom`
and splits them up into separate files and directories by the `SpeciesCode` column.

```json
{
    "pipeline": {
        "name": "model_split_by_species"
    },
    "input": {
        "pfs": {
            "repo": "combine_iccat_hycom",
            "glob": "/"
        }
    },
    "transform": {
        "image": "gmri/nasa-facet-py",
        "cmd": [
            "./split_generic.py",
            "/pfs/combine_iccat_hycom/iccat_with_hycom.csv",
            "/pfs/out/",
            "SpeciesCode"
        ]
    },
    "standby": true
}
```

### Fitting BRT models

Models are fit using a model-specific configuration file to provide all the model configuration needs. Resulting fitted model is output as R-native `.RDS` file.

![`model_fit_brt`](./model_fit_brt.drawio.svg)

#### The script

Implementation of this pipeline is given in `./model_fit_brt.pipeline.json`. See `./model_fit_brt/R/` directory for instructions on `model_fit_brt.r` which is the function this pipeline calls to do the work.

#### Pipeline

Arguments:
  
- `in_dir` - path containing CSV files of presence-absence data. Minimum req'd columns correspond to response variable (e.g. observation type) and predictor variable(s) in model_config
- `config_dir` - path to directory containing model config as csv
- `out_dir` - dir to output model results to as R native `.RDS`


Should be ran like:
`$ ./model_fit_brt.r ./model_split_by_species/BET/ /model_config_brt/BET/ ./out/`

or in `.yaml` as:
```
command:
      - ./model_fit_brt.r
      - /pfs/model_split_by_species/BET/
      - /pfs/model_config_brt/BET/
      - /pfs/out/
```
