Introduction

This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is automatically created from the function: https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/refs/heads/master/R/tunaatlas_scripts/pre-harmonization/atlantic_ocean_effort_1deg_1m_ps_tunaatlasiccat_level0__byschool.R, the documentation keeps the format of roxygen2 skeleton.

A summary of the mapping process is provided. The path to the dataset is specified. You will find on this same repository on GitHub the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows understanding correctly which dataset should be used in this markdown.

Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons.

If you are interested in further details, the results and codes are available for review.

Each .Rmd script requires the user to knit the dataset at the beginning of the script in order to execute the harmonization process correctly. It is also possible to run the code chunk by chunk but be sure to be in the correct working directory (i.e., the one of the .Rmd).

path_to_raw_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_ETRO-PS1991-2024_bySchool_v1.xlsx')

Harmonize data structure of ICCAT by operation mode effort datasets

This function harmonizes the structure of ICCAT catch-and-effort datasets based on operation modes. It adapts the dataset structure for compatibility with the Tuna Atlas database, considering only specific fields as mandatory. The function also handles optional metadata integration if provided.

@param keep_fleet_instead_of_flag Logical, defaults to FALSE. Determines whether to replace the ‘flag’ column with the ‘fleet’ column in the output dataset.

@return None; this function performs data manipulation and outputs files directly.

@import dplyr @importFrom stringr str_detect str_replace @importFrom readr read_csv write_csv @seealso for converting ICCAT task 2, @export @author Paul Taconet, IRD @author Bastien Grasset, IRD @keywords ICCAT, tuna, fisheries, data harmonization

keep_fleet_instead_of_flag=FALSE

packages

if(!require(dplyr)){
  install.packages("dplyr")
  require(dplyr)
}

Historical name for the dataset at source t2ce_bySchool.csv

opts <- options()
options(encoding = "UTF-8")

Input data sample (after importing as data.frame in R): A tibble: 6 × 33 DSetID StrataID FlagName FleetCode GearCode YearC Decade TimePeriodID GeoStrataCode QuadID Lat Lon xLon yLat FishMode Eff1 Eff1Type Eff2 Eff2Type Eff3 Eff3Type Eff4 Eff4Type Eff5 Eff5Type YFT ALB BET BLF LTA SKJ FRI TOTAL 1 2797 509067 EU-España EU.ESP-ES-ETRO PS 1994 1990 1 1x1 1 0 1 1.5 0.5 FAD 13 FISH.HOUR 25.9 HOURS.SEA 13 Hours.STD 0 Hours.FAD 0 Hours.FSC 0 0 0 0 0 0 0 0 2 2797 509068 EU-España EU.ESP-ES-ETRO PS 1994 1990 1 1x1 1 1 0 0.5 1.5 FAD 13 FISH.HOUR 25.9 HOURS.SEA 13 Hours.STD 0 Hours.FAD 0 Hours.FSC 0 0 0 0 0 0 0 0 3 2797 509069 EU-España EU.ESP-ES-ETRO PS 1994 1990 1 1x1 1 1 2 2.5 1.5 FAD 13 FISH.HOUR 25.9 HOURS.SEA 13 Hours.STD 0 Hours.FAD 0 Hours.FSC 0 0 0 0 0 0 0 0 4 2797 509070 EU-España EU.ESP-ES-ETRO PS 1994 1990 1 1x1 2 0 0 0.5 -0.5 FAD 13.1 FISH.HOUR 25.9 HOURS.SEA 13.1 Hours.STD 0 Hours.FAD 0 Hours.FSC 0 0 0 0 0 0 0 0 5 2797 509071 EU-España EU.ESP-ES-ETRO PS 1994 1990 1 1x1 2 0 1 1.5 -0.5 FAD 26.1 FISH.HOUR 51.8 HOURS.SEA 26.1 Hours.STD 0 Hours.FAD 2.21 Hours.FSC 0 0 0 0 0 0 0 0 6 2797 509072 EU-España EU.ESP-ES-ETRO PS 1994 1990 1 1x1 2 0 2 2.5 -0.5 FAD 12.1 FISH.HOUR 24 HOURS.SEA 12.1 Hours.STD 0 Hours.FAD 0 Hours.FSC 0 0 0 0 0 0 0 0 Effort: final data sample: Flag Gear time_start time_end AreaName School EffortUnits Effort Belize PS 2009-08-01 2009-09-01 5402000 FS FISH.HOUR 36.50 Belize PS 2009-08-01 2009-09-01 5402000 FS Hours.FSC 3.12 Belize PS 2009-08-01 2009-09-01 5402000 FS HOURS.SEA 72.00 Belize PS 2009-08-01 2009-09-01 5402000 FS Hours.STD 37.10 Belize PS 2009-09-01 2009-10-01 5202006 LS FISH.HOUR 12.10 # Catches RFMO_CE<-read.csv(path_to_raw_dataset,stringsAsFactors = F)

RFMO_CE <- read_excel(path_to_raw_dataset,
                sheet = "Data")

RFMO_CE$FleetCode_short <- sub("-.*", "", RFMO_CE$FleetCode) # fleet code only what is after the '-'

names(RFMO_CE)[names(RFMO_CE) == 'FleetCode_short'] <- 'FishingFleet'
RFMO_CE <- RFMO_CE[, c("FishingFleet", setdiff(names(RFMO_CE), "FishingFleet"))] # oput flag in first position

# If we want in the output dataset the column ‘FleetCode’ instead of ‘flag’

if(keep_fleet_instead_of_flag==TRUE){ RFMO_CE$Flag<-NULL names(RFMO_CE)[names(RFMO_CE) == ‘Fishingfleet’] <- ‘Flag’ } # Efforts

last_column_not_catch_value=27
RFMO_CE<-RFMO_CE[,-(last_column_not_catch_value:ncol(RFMO_CE))] 

source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/FUN_efforts_ICCAT_CE_keep_all_efforts.R")
efforts_pivot_ICCAT<-FUN_efforts_ICCAT_CE_keep_all_efforts(RFMO_CE,c("Eff1","Eff2","Eff3","Eff4","Eff5"),c("Eff1Type","Eff2Type","Eff3Type","Eff4Type","Eff5Type"))

School The format changed, the school is now in the FishMode column

efforts_pivot_ICCAT <- efforts_pivot_ICCAT %>% dplyr::mutate(FishMode = ifelse(FishMode == "n/a", "OTH", FishMode)) 
efforts_pivot_ICCAT<- efforts_pivot_ICCAT %>% dplyr::rename("School" = "FishMode")
colToKeep_efforts <- c("FishingFleet","Gear","time_start","time_end","AreaName","School","EffortUnits","Effort")
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/ICCAT_CE_effort_pivotDSD_to_harmonizedDSD.R")
efforts_pivot_ICCAT <- efforts_pivot_ICCAT %>% dplyr::rename(SquareTypeCode = GeoStrataCode) # to match definition in ICCAT_CE_effort_pivotDSD_to_harmonizedDSD
efforts_pivot_ICCAT$Lat <- floor(abs(efforts_pivot_ICCAT$Lat)) # we put floor as independently of the quadrant the floor always correspond to the cwp
efforts_pivot_ICCAT$Lon <- floor(abs(efforts_pivot_ICCAT$Lon))
efforts<-ICCAT_CE_effort_pivotDSD_to_harmonizedDSD(efforts_pivot_ICCAT,colToKeep_efforts)
efforts$AreaName <- as.character(as.integer(efforts$AreaName))
colnames(efforts)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","measurement_unit","measurement_value")
efforts$source_authority<-"ICCAT"
efforts$time_start <- as.Date(efforts$time_start)
efforts$time_end <- as.Date(efforts$time_end)
dataset_temporal_extent <- paste(
  paste0(format(min(efforts$time_start), "%Y"), "-01-01"),
  paste0(format(max(efforts$time_end), "%Y"), "-12-31"),
  sep = "/"
)
efforts$geographic_identifier <- format(as.integer(efforts$geographic_identifier), scientific = FALSE, trim = TRUE)

output in same folder as path_to_raw_dataset

output_name_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_ETRO-PS1991-2024_bySchool_v1_harmonized.csv')

write.csv(efforts, output_name_dataset, row.names = FALSE)
georef_dataset <- efforts

@ Load pre-harmonization scripts and apply mappings

download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "effort"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC", "ICCAT", "IOTC"))
## 
##  mapping dimension gear_type with code list mapping
## 
##  mapping dimension fishing_fleet with code list mapping
## 
##  mapping dimension fishing_mode with code list mapping
## 
##  mapping dimension measurement_unit with code list mapping

@ Handle unmapped values and save the results

georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
data.table::fwrite(mapping_codelist$recap_mapping, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_ETRO-PS1991-2024_bySchool_v1_recap_mapping.csv'))
data.table::fwrite(mapping_codelist$not_mapped_total, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_ETRO-PS1991-2024_bySchool_v1_not_mapped_total.csv'))
data.table::fwrite(georef_dataset, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_ETRO-PS1991-2024_bySchool_v1_CWP_dataset.csv'))

Display the first few rows of the mapping summaries

print(head(mapping_codelist$recap_mapping))
## # A tibble: 6 × 5
##   src_code  trg_code  src_codingsystem trg_codingsystem source_authority
##   <chr>     <chr>     <chr>            <chr>            <chr>           
## 1 FISH.HOUR FHOURS    effortunit_iccat effortunit_rfmos ICCAT           
## 2 HOURS.SEA HOURS     effortunit_iccat effortunit_rfmos ICCAT           
## 3 Hours.FAD Hours.FAD effortunit_iccat effortunit_rfmos ICCAT           
## 4 Hours.FSC Hours.FSC effortunit_iccat effortunit_rfmos ICCAT           
## 5 Hours.STD Hours.STD effortunit_iccat effortunit_rfmos ICCAT           
## 6 FAD       LS        schooltype_iccat schooltype_rfmos ICCAT