Introduction

This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is created from a function so the documentation keep the format of roxygen2 skeleton A summary of the mapping process is provided. The path to the dataset is specified, you will find on this same repository on github the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows to understand correctly which dataset should be used in this markdown. Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons. If you are interested in further details, the results and codes are available for review.

path_to_raw_dataset <- here::here('tunaatlas_scripts/pre-harmonization', 'iotc', 'effort', 'data', 'IOTC-DATASETS-2023-04-24-CE-Surface_1950-2021.csv')

Harmonize IOTC Surface Effort Datasets

This function processes and harmonizes Indian Ocean Tuna Commission (IOTC) surface effort datasets. It prepares the data for integration into the Tuna Atlas database by ensuring compliance with data standardization requirements and optionally includes metadata if the dataset is intended for database loading.

@return None; the function outputs files directly, including harmonized datasets, optional metadata, and code lists for integration within the Tuna Atlas database.

@details This function modifies the dataset to include only essential fields, performs any necessary calculations for effort units, and standardizes the format for date fields and geographical identifiers. Metadata integration is contingent on the final use of the dataset within the Tuna Atlas database.

@importFrom dplyr filter mutate @importFrom readr read_csv write_csv @seealso for initial effort data processing, for converting effort data to a standardized structure. @export @keywords IOTC, tuna, fisheries, data harmonization, effort data @author Paul Taconet, IRD @author Bastien Grasset, IRD

if(!require(readr)){
  install.packages("readr")
}

require(rtunaatlas)
require(readr)
  if(!require(dplyr)){
    install.packages("dplyr")
    require(dplyr)
  }

Input data sample: Fleet Gear Year MonthStart MonthEnd iGrid Grid Effort EffortUnits QualityCode Source CatchUnits YFT.FS YFT.LS YFT.UNCL BET.FS BET.LS BET.UNCL SKJ.FS SKJ.LS SKJ.UNCL ALB.FS ALB.LS ALB.UNCL SBF.FS SBF.LS EUESP PS 1996 9 9 5100043 5100043 36.2 FHOURS 3 LO MT NA 13.36 NA NA 5.58 NA NA 20.26 NA NA NA NA NA NA EUESP PS 2004 8 8 5100043 5100043 12.1 FHOURS 3 LO MT NA 8.74 NA NA 2.66 NA NA 32.97 NA NA NA NA NA NA NEIPS PS 1992 10 10 5100043 5100043 12.1 FHOURS 3 RFRI MT NA 4.53 NA NA 3.00 NA NA 37.47 NA NA NA NA NA NA EUESP PS 1991 8 8 5100044 5100044 12.1 FHOURS 3 LO MT 76.26 NA NA 9.95 NA NA 68.14 NA NA NA NA NA NA NA EUESP PS 1995 6 6 5100044 5100044 12.1 FHOURS 3 LO MT NA 0.79 NA NA 0.68 NA NA 4.93 NA NA NA NA NA NA EUESP PS 1995 9 9 5100044 5100044 12.1 FHOURS 3 LO NA NA NA NA NA NA NA NA NA NA NA NA NA NA SBF.UNCL LOT.FS LOT.LS LOT.UNCL FRZ.FS FRZ.LS FRZ.UNCL KAW.FS KAW.LS KAW.UNCL COM.FS COM.LS COM.UNCL TUX.FS TUX.LS TUX.UNCL FAL.FS FAL.LS FAL.UNCL OCS.FS OCS.LS OCS.UNCL SKH.FS SKH.LS SKH.UNCL NTAD.FS NTAD.LS NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NTAD.UNCL NA NA NA NA NA NA Effort: final data sample: Flag Gear time_start time_end AreaName School EffortUnits Effort AUS BB 1992-01-01 1992-02-01 6230130 IND HRSRH 403 AUS BB 1992-02-01 1992-03-01 6230130 IND HRSRH 366 AUS BB 1992-02-01 1992-03-01 6230135 IND HRSRH 30 AUS BB 1992-02-01 1992-03-01 6235115 IND HRSRH 23 AUS BB 1992-03-01 1992-04-01 6230130 IND HRSRH 221 AUS BB 1992-03-01 1992-04-01 6235130 IND HRSRH 68 —————————————————————————————————————————- Historical name for the dataset at source IOTC-DATASETS-2023-04-24-CE-Surface_1950-2021.csv

  opts <- options()
  options(encoding = "UTF-8")  

#Efforts

source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/FUN_efforts_IOTC_CE.R")
efforts_pivot_IOTC<-FUN_efforts_IOTC_CE(path_to_raw_dataset,12)
efforts_pivot_IOTC$CatchUnits<-NULL
efforts_pivot_IOTC$Source<-NULL

colToKeep_efforts <- c("FishingFleet","Gear","time_start","time_end","AreaName","School","EffortUnits","Effort")
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/IOTC_CE_effort_pivotDSD_to_harmonizedDSD.R")
efforts<-IOTC_CE_effort_pivotDSD_to_harmonizedDSD(efforts_pivot_IOTC,colToKeep_efforts)


colnames(efforts)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","measurement_unit","measurement_value")
efforts$source_authority<-"IOTC"
efforts$measurement <- "efforts"

efforts$time_start <- as.Date(efforts$time_start)
efforts$time_end <- as.Date(efforts$time_end)
dataset_temporal_extent <- paste(
  paste0(format(min(efforts$time_start), "%Y"), "-01-01"),
  paste0(format(max(efforts$time_end), "%Y"), "-12-31"),
  sep = "/"
)

output_name_dataset <- "Dataset_harmonized.csv"
write.csv(efforts, output_name_dataset, row.names = FALSE)
georef_dataset <- efforts

@ Load pre-harmonization scripts and apply mappings

download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "effort"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC", "ICCAT", "IOTC"))
## 
##  mapping dimension gear_type with code list mapping
##  mapping dimension fishing_fleet with code list mapping
##  mapping dimension fishing_mode with code list mapping
##  mapping dimension measurement_unit with code list mapping

@ Handle unmapped values and save the results

georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
fwrite(mapping_codelist$recap_mapping, 'recap_mapping.csv')
fwrite(mapping_codelist$not_mapped_total, 'not_mapped_total.csv')
fwrite(georef_dataset, 'CWP_dataset.csv')

Display the first few rows of the mapping summaries

print(head(mapping_codelist$recap_mapping))
## # A tibble: 6 × 5
##   src_code trg_code src_codingsystem trg_codingsystem source_authority
##   <chr>    <chr>    <chr>            <chr>            <chr>           
## 1 DAYS     DAYS     effortunit_iotc  effortunit_rfmos IOTC            
## 2 FDAYS    FDAYS    effortunit_iotc  effortunit_rfmos IOTC            
## 3 FHOURS   FHOURS   effortunit_iotc  effortunit_rfmos IOTC            
## 4 HOURS    HOURS    effortunit_iotc  effortunit_rfmos IOTC            
## 5 HRSRH    HRSRH    effortunit_iotc  effortunit_rfmos IOTC            
## 6 SETS     SETS     effortunit_iotc  effortunit_rfmos IOTC
print(head(mapping_codelist$not_mapped_total))
##        Value source_authority     Dimension
## 1        AUS             IOTC fishing_fleet
## 989    EUESP             IOTC fishing_fleet
## 58134  EUFRA             IOTC fishing_fleet
## 105431 EUITA             IOTC fishing_fleet
## 105658 EUMYT             IOTC fishing_fleet
## 110305   IDN             IOTC fishing_fleet