# Electronic Tag (etag) Data

This workflow deals with electronic tag data that has associated spatial information (e.g. a track). The data consists of locations a tagged animal traveled between where it was tagged and where the tag popped off (as in pop up tags) or where the tag stopped transmitting (as in satellite-linked tags). Here, we treat these data as "presence" data that can be subsequently used to build species distribution models for quantifying habitat suitability for a particular species. 

The data generally contains latitude, longitude (and error/uncertainty metrics for both lat and lon), and date. Tracks for a tagged individual fish typically vary in duration from weeks to months (and in rare cases a few years). This data is entirely independent of fishing effort but is influenced by tagging effort and the track data is heavily auto-correlated. These data come from several different projects and PIs who have agreed to contribute data to this effort.

Electronic tags are an excellent source of fishery-independent data on where these species are and can thus be used, with caution, for quantifying habitat suitability and ultimately predicting habitats.

## Pipeline overview

![etag](./images/etag.drawio.svg)

The pipeline is defined in [`dvc.yaml`](../../dvc.yaml) for running locally with DVC using a [Docker container](./Dockerfile) built by the the `etag` service as defined in [`docker-compose.yaml`](../../docker-compose.yaml) with `data/` mapped as `/data/`.

## Elements of Pipeline

The input data for this pipeline has already been standardized using a file format and associated types of metadata and observational types as defined for the "eTUFF" file. Eventually, the conversion of raw tag data to eTUFF may be added to this workflow, but, for now, the pipeline relies on eTUFF as a starting point. For additional information on converting raw tag data to standardized eTUFF format, visit the [`tags2etuff`](https://github.com/camrinbraun/tags2etuff) repo on Github.

To get and prepare this data for use in our modeling framework, this etag-specific workflow contains 3 main processing steps:

- Per tagged individual
    - `etag_track` - This step reads eTUFF files and standardizes the associated trajectory data they contain to a user-defined temporal resolution.
- `etag_combine` - Next, the per-individual electronic tag data is combined into a master etag dataset for subsequent use.
- `etag_pseudoabs_bg` - Finally, pseudoabsences are generated for the cleaned and combined etag data as this is a necessary step prior to building species distribution models (SDMs) for this data. For now, background sampling is used to generate pseudoabsences although other methods are in development.

> ### Instrument names (e.g. individual identifier)
>
> For each tagged individual, they are assigned a unique instrument name. All the associated metadata for that tag deployment is contained in the eTUFF file.

### ETAG TRACK

This step generates standardized trajectories for the individual etag data.

#### Step

This step runs for each individual species. `etag_track.r`

This pipeline takes in individual eTUFF files.

We expect our input path for the data to look like:

```sh
/data/
    etuff/
        etag/
            instrument-name-1_eTUFF.txt
            instrument-name-2_eTUFF.txt
        ...
```

The pipeline output is a trajectory for this etag dataset that is standardized in space and time for subsequent use in SDM building. Output path looks like:

```sh
/data/
    etuff/
        track/
            instrument-name-1_eTUFF_track.csv
            instrument-name-2_eTUFF_track.csv
        ...
```

### ETAG COMBINE

This pipeline combines the individual etag data into one aggregated `.csv` using the standard trajectories.

#### Pipeline

This pipeline takes in individual tag datasets produced as output from the `etag_track` pipeline. It contains `.csv` files for each tagged individual that are ready for downstream analyses.

The pipeline output is an aggregated `.csv` containing all etag data. Output path looks like:

```sh
/data/
    etag/
        combined_tags.csv
```

### ETAG PSEUDOABS (BG)

This pipeline generates pseudoabsences for etag data using a generic background pseudoabsence sampling functionality for `R`. See `./tools/generate_pseudoabs/` for details.

#### Pipeline

This pipeline takes in the combined, standardized etag data produced as output from the `etag_combine` stage. It splits the combined data by an index (provided as an optional argument), in this case by `instrument_name` (e.g. tag ID) then generates a number of random "draws" from within the spatiotemporal limits of the input data resulting in a desired ratio of pseudoabsence to presence data (optional `--abs_ratio` argument). For further details see `./tools/generate_pseudoabs/` and the core function that does the work `./R/sp_random.r`. Our input path looks like:

The pipeline output is an aggregated `.csv` containing the same data as the input `etag_combine.csv` except pseudoabsences have been generated and added to the resulting output. Output path looks like:

```sh
/data/
    etag/
        with_pseudoabs_bg.csv
```

This is the end of the etag-specific workflow. The output from `etag_pseudoabs_bg` is used as input to the `../etag-hycom/` part of the pipeline.