Introduction

This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is automatically created from the function: https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/refs/heads/master/R/tunaatlas_scripts/pre-harmonization/atlantic_ocean_effort_tunaatlasiccat_level0__noschool.R, the documentation keeps the format of roxygen2 skeleton.

A summary of the mapping process is provided. The path to the dataset is specified. You will find on this same repository on GitHub the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows understanding correctly which dataset should be used in this markdown.

Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons.

If you are interested in further details, the results and codes are available for review.

Each .Rmd script requires the user to knit the dataset at the beginning of the script in order to execute the harmonization process correctly. It is also possible to run the code chunk by chunk but be sure to be in the correct working directory (i.e., the one of the .Rmd).

path_to_raw_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_noSchool.csv')

Harmonize Data Structure of ICCAT Effort Dataset

This function harmonizes the data structure of ICCAT effort datasets from voluminous Microsoft Access databases. It handles datasets by potentially including metadata and code lists for integration within the Tuna Atlas database. Depending on the parameter settings, either ‘fleet’ or ‘flag’ column will be kept in the output.

@param keep_fleet_instead_of_flag Boolean parameter to decide whether to keep ‘fleet’ instead of ‘flag’. @return Writes a harmonized dataset, metadata file, and code lists to CSV files. @export

@author Paul Taconet, IRD @author Bastien Grasset, IRD

@keywords International Commission for the Conservation of Atlantic Tunas (ICCAT), RFMO, Sardara Global Database

@seealso , for other ICCAT data structure conversions.

@examples

keep_fleet_instead_of_flag=FALSE

packages

if(!require(readr)){
  install.packages("readr")
  require(readr)
}
if(!require(dplyr)){
  install.packages("dplyr")
  require(dplyr)
}

Input data sample: No sample. Miscrosoft Acces DB. However after the commands that read the input DB the sample is the following: StrataID DSetID FleetID GearGrpCode GearCode FileTypeCode YearC TimePeriodID SquareTypeCode QuadID Lat Lon Eff1 Eff1Type Eff2 Eff2Type DSetTypeID CatchUnit ALB BET BFT BUM 1 1 021ES00 TP TRAP OF-REP 1950 17 1x1 4 36 5 4 NO.TRAPS NA nw kg 0 0 6725000 0 2 1 021ES00 TP TRAP OF-REP 1950 17 1x1 4 36 5 4 NO.TRAPS NA nw nr 0 0 52928 0 3 2 026YU00 PS PS OF-REP 1950 17 5x5 1 40 15 14 NO.BOATS NA -w kg 0 0 657000 0 4 3 021ES00 TP TRAP OF-REP 1951 17 1x1 4 36 5 4 NO.TRAPS NA nw kg 0 0 3072000 0 5 3 021ES00 TP TRAP OF-REP 1951 17 1x1 4 36 5 4 NO.TRAPS NA nw nr 0 0 28654 0 6 4 026YU00 PS PS OF-REP 1951 17 5x5 1 40 15 14 NO.BOATS NA -w kg 0 0 531000 0 SAI SKJ SWO WHM YFT BLF BLT BON BOP BRS CER FRI KGM KGX LTA MAW SLT SSM WAH oSmt BIL BLM MLS SBF SPF oTun BSH POR SMA MAK oSks FleetCode FleetName FlagID FlagCode 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EU.ESP EU.España 21 EU.ESP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EU.ESP EU.España 21 EU.ESP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 YUG Yugoslavia Fed. 26 YUG 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EU.ESP EU.España 21 EU.ESP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EU.ESP EU.España 21 EU.ESP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 YUG Yugoslavia Fed. 26 YUG FlagName StatusCode EU.España CP EU.España CP Yugoslavia Fed. NCO EU.España CP EU.España CP Yugoslavia Fed. NCO Effort: final data sample: Flag Gear time_start time_end AreaName School EffortUnits Effort ARG LL 1967-02-01 1967-03-01 7330040 OTH NO.HOOKS 13000 ARG LL 1967-03-01 1967-04-01 7330040 OTH NO.HOOKS 67000 ARG LL 1967-04-01 1967-05-01 7330040 OTH NO.HOOKS 107000 ARG LL 1967-05-01 1967-06-01 7330040 OTH NO.HOOKS 88000 ARG LL 1967-06-01 1967-07-01 7330040 OTH NO.HOOKS 66000 ARG LL 1967-07-01 1967-08-01 7330040 OTH NO.HOOKS 50000 # download database to get .mdb ./R/tunaatlas_scripts/pre-harmonization/data_iccat_from_mdb.R Historical name for the dataset at source t2ce_noSchool.csv

keep_fleet_instead_of_flag=FALSE  

t2ce <- as.data.frame(readr::read_csv(path_to_raw_dataset))
ICCAT_CE_species_colnames<-setdiff(colnames(t2ce),c("StrataID","DSetID",
                                                    "FleetID","GearGrpCode","GearCode","FileTypeCode","YearC","TimePeriodID","SquareTypeCode",
                                                    "QuadID","Lat","Lon","Eff1","Eff1Type","Eff2","Eff2Type","DSetTypeID","CatchUnit", "FleetCode", "FleetName", "FlagID", "FlagCode", 
                                                    "SchoolTypeCode", "FlagName", "StatusCode"))

remove duplicated lines, as displayed in kilos and numbers

eff_cols <- c("Eff1","Eff1Type","Eff2","Eff2Type")
species_cols <- ICCAT_CE_species_colnames

key_cols <- setdiff(
  colnames(t2ce),
  c("StrataID", "CatchUnit", eff_cols, species_cols)
)

unit_priority <- function(x) {
  dplyr::case_when(
    x == "kg" ~ 1L,
    x == "nr" ~ 2L,
    TRUE      ~ 3L
  )
}

t2ce <- t2ce %>%
  dplyr::mutate(.unit_rank = unit_priority(CatchUnit)) %>%
  dplyr::group_by(across(all_of(key_cols))) %>%
  dplyr::arrange(.unit_rank, .by_group = TRUE) %>%
  dplyr::slice(1) %>%
  dplyr::ungroup() %>%
  dplyr::select(-.unit_rank)

remove data displayed annualy data_pivot_ICCAT<-left_join(t2ce,Flags,by=“FleetID”) # equivalent to “select FlagCode,FlagID,t2ce.* from t2ce, Flags where t2ce.FleetID=Flags.FleetID” # Efforts # Reach the efforts pivot DSD using a function in ICCAT_functions.R # If we want in the output dataset the column ‘FleetCode’ instead of ‘flag’ source(“https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/FUN_efforts_ICCAT_CE_without_schooltype.R”) efforts_pivot_ICCAT<-FUN_efforts_ICCAT_CE_without_schooltype(RFMO_CE = t2ce,ICCAT_CE_species_colnames)

efforts_pivot_ICCAT <- t2ce %>%
  dplyr::select(
    -dplyr::all_of(ICCAT_CE_species_colnames),
    -CatchUnit
  )

source(“https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/FUN_efforts_ICCAT_CE_keep_all_efforts.R”) efforts_pivot_ICCAT<-FUN_efforts_ICCAT_CE_keep_all_efforts(efforts_pivot_ICCAT,c(“Eff1”,“Eff2”),c(“Eff1Type”,“Eff2Type”)) Keep all columns except the raw effort/unit columns, because these will be rebuilt in a long format

cols_to_keep <- setdiff(
  names(efforts_pivot_ICCAT),
  c("Eff1", "Eff2", "Eff1Type", "Eff2Type")
)

Add an identifier for each original row, so we can track which Eff1/Eff2 pairs come from the same source row

tmp <- efforts_pivot_ICCAT %>%
  dplyr::mutate(original_row_id = dplyr::row_number())

Convert Eff1 and Eff2 to numeric safely (important because some effort values may use commas instead of dots)

tmp <- tmp %>%
  dplyr::mutate(
    Eff1_num = suppressWarnings(as.numeric(gsub(",", ".", Eff1))),
    Eff2_num = suppressWarnings(as.numeric(gsub(",", ".", Eff2)))
  )

Build the Eff1 rows: these are always considered the primary effort rows

eff1_rows <- tmp %>%
  dplyr::transmute(
    dplyr::across(dplyr::all_of(cols_to_keep)),
    original_row_id,
    effort_source = "Eff1",
    Effort = Eff1_num,
    EffortUnits = Eff1Type,
    is_duplicate_strata = FALSE
  ) %>%
  # Remove empty or zero Eff1 values
  dplyr::filter(!is.na(Effort) & Effort != 0)

Build the Eff2 rows: these may either replace missing Eff1, or become duplicated rows

eff2_rows <- tmp %>%
  dplyr::transmute(
    dplyr::across(dplyr::all_of(cols_to_keep)),
    original_row_id,
    Eff1_num,
    effort_source = "Eff2",
    Effort = Eff2_num,
    EffortUnits = Eff2Type
  ) %>%
  # Remove empty or zero Eff2 values
  dplyr::filter(!is.na(Effort) & Effort != 0) %>%
  # If Eff1 existed and was non-zero, then Eff2 is an additional duplicated row
  # If Eff1 was missing or zero, then Eff2 replaces Eff1 and is not a duplicate
  dplyr::mutate(
    is_duplicate_strata = !is.na(Eff1_num) & Eff1_num != 0
  ) %>%
  dplyr::select(-Eff1_num)%>% dplyr::mutate(EffortUnits = as.character(EffortUnits))

Combine both sets of rows into one long dataset

efforts_pivot_ICCAT <- dplyr::bind_rows(eff1_rows, eff2_rows) %>%
  dplyr::select(
    original_row_id,
    effort_source,
    is_duplicate_strata,
    dplyr::everything()
  )

School

efforts_pivot_ICCAT$School<-"OTH"

Flag

efforts_pivot_ICCAT$FleetCode_short <- sub("-.*", "", efforts_pivot_ICCAT$FleetCode) # fleet code only what is after the '-'

names(efforts_pivot_ICCAT)[names(efforts_pivot_ICCAT) == 'FleetCode_short'] <- 'FishingFleet'
efforts_pivot_ICCAT <- efforts_pivot_ICCAT[, c("FishingFleet", setdiff(names(efforts_pivot_ICCAT), "FishingFleet"))] # put flag in first position

Reach the efforts harmonized DSD using a function in ICCAT_functions.R

colToKeep_efforts <- c("FishingFleet","Gear","time_start","time_end","AreaName","School","EffortUnits","Effort")
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/ICCAT_CE_effort_pivotDSD_to_harmonizedDSD.R")
efforts_pivot_ICCAT$Lat <- floor(abs(efforts_pivot_ICCAT$Lat)) # we put floor as independently of the quadrant the floor always correspond to the cwp
efforts_pivot_ICCAT$Lon <- floor(abs(efforts_pivot_ICCAT$Lon))

handling duplicated stratas

efforts<-ICCAT_CE_effort_pivotDSD_to_harmonizedDSD(efforts_pivot_ICCAT,colToKeep_efforts)
efforts$CatchType <- "C" #bastien adding as it is not in effort function but it is in chatch function
colnames(efforts)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","measurement_unit","measurement_value","measurement_type")
efforts$source_authority<-"ICCAT"
efforts <- efforts %>% dplyr::mutate(fishing_mode = ifelse(fishing_mode == "UNK", "OTH", fishing_mode))

efforts$time_start <- as.Date(efforts$time_start)
efforts$time_end <- as.Date(efforts$time_end)
dataset_temporal_extent <- paste(
  paste0(format(min(efforts$time_start), "%Y"), "-01-01"),
  paste0(format(max(efforts$time_end), "%Y"), "-12-31"),
  sep = "/"
)

output in same folder as path_to_raw_dataset

output_name_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_noSchool_harmonized.csv')

write.csv(efforts, output_name_dataset, row.names = FALSE)
georef_dataset <- efforts

@ Load pre-harmonization scripts and apply mappings

download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "effort"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC", "ICCAT", "IOTC"))
## 
##  mapping dimension gear_type with code list mapping
## 
##  mapping dimension fishing_fleet with code list mapping
## 
##  mapping dimension fishing_mode with code list mapping
## 
##  mapping dimension measurement_unit with code list mapping

@ Handle unmapped values and save the results

georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
data.table::fwrite(mapping_codelist$recap_mapping, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_noSchool_recap_mapping.csv'))
data.table::fwrite(mapping_codelist$not_mapped_total, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_noSchool_not_mapped_total.csv'))
data.table::fwrite(georef_dataset, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_noSchool_CWP_dataset.csv'))

Display the first few rows of the mapping summaries

print(head(mapping_codelist$recap_mapping))
## # A tibble: 6 × 5
##   src_code trg_code src_codingsystem trg_codingsystem source_authority
##   <chr>    <chr>    <chr>            <chr>            <chr>           
## 1 D.AT SEA DAYS     effortunit_iccat effortunit_rfmos ICCAT           
## 2 NO.BOATS BOATS    effortunit_iccat effortunit_rfmos ICCAT           
## 3 NO.HOOKS HOOKS    effortunit_iccat effortunit_rfmos ICCAT           
## 4 NO.TRAPS NO.TRAPS effortunit_iccat effortunit_rfmos ICCAT           
## 5 TRAP D   TRAP D   effortunit_iccat effortunit_rfmos ICCAT           
## 6 OTH      OTH      schooltype_iccat schooltype_rfmos ICCAT