Introduction

This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is automatically created from the function: https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/west_pacific_ocean_catch_5deg_1m_bb_tunaatlaswcpfc_level0.R, the documentation keeps the format of roxygen2 skeleton.

A summary of the mapping process is provided. The path to the dataset is specified. You will find on this same repository on GitHub the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows understanding correctly which dataset should be used in this markdown.

Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons.

If you are interested in further details, the results and codes are available for review.

Each .Rmd script requires the user to knit the dataset at the beginning of the script in order to execute the harmonization process correctly. It is also possible to run the code chunk by chunk but be sure to be in the correct working directory (i.e., the one of the .Rmd).

path_to_raw_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'catch', 'data', 'WCPFC_P_PUBLIC_BY_YY_MM.csv')

Harmonize WCPFC Pole-and-Line Catch Datasets

This function processes and harmonizes the structure of Western and Central Pacific Fisheries Commission (WCPFC) pole-and-line catch datasets. The output is tailored for integration into the Tuna Atlas database, aligning with the requirements for data standardization and optionally including metadata if the dataset is intended for database loading.

@return None; the function outputs files directly, including harmonized datasets, optional metadata, and code lists for database integration.

@details The function restructures datasets to include only essential fields, performs any necessary calculations for catch units, and standardizes the format for date fields and geographical identifiers. Metadata incorporation is contingent on the final data usage within the Tuna Atlas database.

@importFrom dplyr mutate filter select @importFrom readr read_csv write_csv @importFrom tidyr gather @importFrom reshape2 melt @seealso for detailed data structuring operations. @export @author Paul Taconet, IRD @author Bastien Grasset, IRD @keywords WCPFC, tuna, fisheries, data harmonization, pole-and-line catch Input data sample: YY MM LAT5 LON5 DAYS SKJ_C YFT_C OTH_C 1950 1 30N 135E 0 0 0 0 1950 1 30N 140E 0 0 0 0 1950 1 35N 140E 0 0 0 0 1950 1 40N 140E 0 0 0 0 1950 1 40N 145E 0 0 0 0 1950 2 30N 135E 0 0 0 0 Catch: pivot data sample: YY MM LAT5 LON5 Effort Species value CatchUnits School EffortUnits Gear 1970 3 05S 150E 82 SKJ 279.16 MT ALL DAYS P 1970 4 05S 150E 74 SKJ 336.61 MT ALL DAYS P 1970 5 05S 150E 82 SKJ 361.80 MT ALL DAYS P 1970 6 05S 150E 81 SKJ 438.48 MT ALL DAYS P 1970 7 05S 150E 75 SKJ 472.75 MT ALL DAYS P 1970 12 05S 150E 56 SKJ 215.05 MT ALL DAYS P Catch: final data sample: FishingFleet Gear time_start time_end AreaName School Species CatchType CatchUnits Catch ALL P 1970-03-01 1970-04-01 6200150 ALL OTH ALL MT 0.01 ALL P 1970-03-01 1970-04-01 6200150 ALL SKJ ALL MT 279.16 ALL P 1970-03-01 1970-04-01 6200150 ALL YFT ALL MT 27.81 ALL P 1970-04-01 1970-05-01 6200150 ALL OTH ALL MT 0.14 ALL P 1970-04-01 1970-05-01 6200150 ALL SKJ ALL MT 336.61 ALL P 1970-04-01 1970-05-01 6200150 ALL YFT ALL MT 11.34

  source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/WCPFC_CE_catches_pivotDSD_to_harmonizedDSD.R")

packages

if(!require(reshape)){
  install.packages("reshape")
  require(reshape)
}

if(!require(tidyr)){
  install.packages("tidyr")
  require(tidyr)
}

if(!require(dplyr)){
  install.packages("dplyr")
  require(dplyr)
}


  
  # Historical name for the dataset at source  WCPFC_P_PUBLIC_BY_YR_MON.csv
  opts <- options()
  options(encoding = "UTF-8")
  #----------------------------------------------------------------------------------------------------------------------------
  
  
  ## Catches
  
  ### Reach the catches pivot DSD using a function stored in WCPFC_functions.R
  #-------------
  #Changes:
  # - switch to csv
  # - switch to upper colnames
  # - CWP grid (removed for the timebeing to apply rtunaatlas codes)
  # - toupper applied to Species/CatchUnits
  
  DF <- read.csv(path_to_raw_dataset)
  colnames(DF) <- toupper(colnames(DF))
  DF$CWP_GRID <- NULL 
  
  DF <- DF %>% tidyr::gather(variable, value, -c(colnames(DF[1:5])))
  
  DF <- DF %>% dplyr::filter(!value %in% 0) %>% dplyr::filter(!is.na(value))
  DF$variable <- as.character(DF$variable)
  colnames(DF)[which(colnames(DF) == "variable")] <- "Species"
  DF$CatchUnits <- substr(DF$Species, nchar(DF$Species), nchar(DF$Species))
  DF$CatchUnits <- toupper(DF$CatchUnits) 
  DF$Species <- toupper(DF$Species) 
  DF$Species <- sub("_C", "", DF$Species)
  DF$Species <- sub("_N", "", DF$Species)
  DF$School <- "OTH"
  DF$EffortUnits <- colnames(DF[5])
  colnames(DF)[5] <- "Effort"
  catches_pivot_WCPFC <- DF; rm(DF)
  #-------------
  
  #Gear
  catches_pivot_WCPFC$Gear<-"P"
  
  # Catchunits
  index.kg <- which( catches_pivot_WCPFC[,"CatchUnits"] == "C" )
  catches_pivot_WCPFC[index.kg,"CatchUnits"]<- "t"
  
  index.nr <- which( catches_pivot_WCPFC[,"CatchUnits"] == "N" )
  catches_pivot_WCPFC[index.nr,"CatchUnits"]<- "no" 
  
  ### Reach the catches harmonized DSD using a function in WCPFC_functions.R
  
  colToKeep_captures <- c("FishingFleet","Gear","time_start","time_end","AreaName","School","Species","CatchType","CatchUnits","Catch")
  catches<-WCPFC_CE_catches_pivotDSD_to_harmonizedDSD(catches_pivot_WCPFC,colToKeep_captures)
  
  colnames(catches)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","species","measurement_type","measurement_unit","measurement_value")
  catches$source_authority<-"WCPFC"
  catches$measurement_type <- "RC" # Retained catches
  catches$measurement <- "catch" 
  catches$measurement_processing_level <- "raised"
  #----------------------------------------------------------------------------------------------------------------------------
  catches$time_start <- as.Date(catches$time_start)
  catches$time_end <- as.Date(catches$time_end)
  dataset_temporal_extent <- paste(
    paste0(format(min(catches$time_start), "%Y"), "-01-01"),
    paste0(format(max(catches$time_end), "%Y"), "-12-31"),
    sep = "/"
  )
  catches$measurement_processing_level <- "unknown" 
  # output in same folder as path_to_raw_dataset 
  output_name_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'catch', 'data', 'WCPFC_P_PUBLIC_BY_YY_MM_harmonized.csv')
  
  write.csv(catches, output_name_dataset, row.names = FALSE)
georef_dataset <- catches
  
  #----------------------------------------------------------------------------------------------------------------------------

@ Load pre-harmonization scripts and apply mappings

download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "catch"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC"))
## 
##  mapping dimension gear_type with code list mapping
## 
##  mapping dimension species with code list mapping
## 
##  mapping dimension fishing_fleet with code list mapping
## 
##  mapping dimension fishing_mode with code list mapping

@ Handle unmapped values and save the results

georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
data.table::fwrite(mapping_codelist$recap_mapping, here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'catch', 'data', 'WCPFC_P_PUBLIC_BY_YY_MM_recap_mapping.csv'))
data.table::fwrite(mapping_codelist$not_mapped_total, here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'catch', 'data', 'WCPFC_P_PUBLIC_BY_YY_MM_not_mapped_total.csv'))
data.table::fwrite(georef_dataset, here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'catch', 'data', 'WCPFC_P_PUBLIC_BY_YY_MM_CWP_dataset.csv'))

Display the first few rows of the mapping summaries

print(head(mapping_codelist$recap_mapping))
## # A tibble: 5 × 5
##   src_code trg_code src_codingsystem trg_codingsystem   source_authority
##   <chr>    <chr>    <chr>            <chr>              <chr>           
## 1 OTH      OTH      schooltype_wcpfc schooltype_rfmos   WCPFC           
## 2 ALL      NEI      flag_wcpfc       fishingfleet_firms WCPFC           
## 3 SKJ      SKJ      species_wcpfc    species_asfis      WCPFC           
## 4 YFT      YFT      species_wcpfc    species_asfis      WCPFC           
## 5 P        09.1     gear_wcpfc       isscfg_revision_1  WCPFC