Introduction

This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is automatically created from the function: https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/refs/heads/master/R/tunaatlas_scripts/pre-harmonization/west_pacific_ocean_effort_5deg_1m_ll_tunaatlaswcpfc_level0_from_csv.R, the documentation keeps the format of roxygen2 skeleton.

A summary of the mapping process is provided. The path to the dataset is specified. You will find on this same repository on GitHub the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows understanding correctly which dataset should be used in this markdown.

Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons.

If you are interested in further details, the results and codes are available for review.

Each .Rmd script requires the user to knit the dataset at the beginning of the script in order to execute the harmonization process correctly. It is also possible to run the code chunk by chunk but be sure to be in the correct working directory (i.e., the one of the .Rmd).

path_to_raw_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'effort', 'data', 'WCPFC_L_PUBLIC_BY_YY_MM_FLAG.csv')

Harmonize Data Structure of WCPFC Longline Effort Datasets

This function harmonizes the data structure of WCPFC Longline effort datasets from CSV files provided by WCPFC. It includes handling datasets such as ‘longline_60’ ‘longline_70’ ‘longline_80’ ‘longline_90’ ‘longline_00’. The function also manages optional metadata and code lists for integration within the Tuna Atlas database.

@return Creates a CSV file with harmonized data structure along with metadata and code lists files. @export

@author Paul Taconet, IRD , Bastien Grasset, IRD @author Bastien Grasset, IRD

@seealso , , , , for other specific conversions within WCPFC datasets.

@examples

  source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/harmo_time_2.R")
  source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/harmo_spatial_3.R")
  
  

  if(!require(readr)){
    install.packages("readr")
    require(readr)
  }
  
  if(!require(tidyr)){
    install.packages("tidyr")
    require(tidyr)
  }
  
  if(!require(dplyr)){
    install.packages("dplyr")
    require(dplyr)
  }
  
  
  if(!require(reshape)){
    install.packages("reshape")
    require(reshape)
  }
  
  
  
  # Input data sample:
  # YY MM LAT5 LON5   HHOOKS ALB_C ALB_N   YFT_C YFT_N   BET_C BET_N MLS_C MLS_N  BLM_C BLM_N  BUM_C BUM_N  SWO_C SWO_N OTH_C OTH_N
  # 2000  1  00N 120E 12391.11 0.000     0 267.338 10056  58.850  1537 0.627    15 11.391   249 18.203   314  9.998   189 0.120     4
  # 2000  1  00N 125E 16349.59 0.000     0 352.417 13256  77.975  2036 0.827    19 15.030   329 24.018   414 13.192   249 0.158     5
  # 2000  1  00N 130E  7091.08 0.000     0 130.454  4630  37.695   903 0.200     5  3.870    83  6.418   109  4.714    93 0.038     1
  # 2000  1  00N 135E  6113.85 1.276    73  75.469  2431 115.868  2575 0.037     1  0.058     1  6.948    90  2.719    38 0.245     4
  # 2000  1  00N 140E  9904.92 1.350    77 176.963  6266 251.303  6084 0.462    11  1.527    38 12.150   187  4.200    52 0.296     9
  # 2000  1  00N 145E  8679.03 0.428    24 122.945  4613 144.910  3579 0.537    12 11.062   237  8.748   137  6.326   110 0.000     0
  
  # Effort: final data sample:
  # Flag Gear time_start   time_end AreaName School EffortUnits  Effort
  #  ALL    L 2000-01-01 2000-02-01  6100120    ALL      HHOOKS 1239111
  #  ALL    L 2000-01-01 2000-02-01  6100125    ALL      HHOOKS 1634959
  #  ALL    L 2000-01-01 2000-02-01  6100130    ALL      HHOOKS  709108
  #  ALL    L 2000-01-01 2000-02-01  6100135    ALL      HHOOKS  611385
  #  ALL    L 2000-01-01 2000-02-01  6100140    ALL      HHOOKS  990492
  #  ALL    L 2000-01-01 2000-02-01  6100145    ALL      HHOOKS  867903
  
  #----------------------------------------------------------------------------------------------------------------------------

Historical name for the dataset at source WCPFC_L_PUBLIC_BY_FLAG_MON.csv

  opts <- options()
  options(encoding = "UTF-8")
  #----------------------------------------------------------------------------------------------------------------------------
  
  
  ##Efforts
  DF <- read.table(path_to_raw_dataset, sep=",", header=TRUE, stringsAsFactors=FALSE,strip.white=TRUE)
  
  # Reach the efforts pivot DSD using a function in WCPFC_functions.R
  #2020-11-13 
  #Changes
  # - Flag column added add UNK where missing
  # - Change id upper index for melting
  #---------------------------------------
  DF$cwp_grid=NULL # remove column cwp_grid
  colnames(DF)<-toupper(colnames(DF))
  DF$FLAG_CODE[is.na(DF$FLAG_CODE) | DF$FLAG_CODE == ""] <- "UNK"
  # DF<-melt(DF, id=c(colnames(DF[1:6]))) 
  # DF <- melt(as.data.table(DF), id=c(colnames(DF[1:6]))) 
  DF <- DF %>% tidyr::gather(variable, value, -c(colnames(DF[1:6])))
  
  DF<- DF %>% 
    dplyr::filter( ! value %in% 0 ) %>%
    dplyr::filter( ! is.na(value)) 
  DF$variable<-as.character(DF$variable)
  colnames(DF)[which(colnames(DF) == "variable")] <- "Species"
  
  DF$CatchUnits<-substr(DF$Species, nchar(DF$Species), nchar(DF$Species))
  
  DF$Species<-sub('_C', '', DF$Species)
  DF$Species<-sub('_N', '', DF$Species)
  
  DF$School<-"OTH"
  
  DF$EffortUnits<-colnames(DF[6])    
  colnames(DF)[6]<-"Effort"
  
  
  efforts_pivot_WCPFC=DF
  efforts_pivot_WCPFC$Gear<-"L"
  
  # Catchunits
  # Check data that exist both in number and weight
  
  number_of_units_by_strata<- dplyr::summarise(group_by_(efforts_pivot_WCPFC,.dots=setdiff(colnames(efforts_pivot_WCPFC),c("value","CatchUnits"))), count = n())
  
  strata_in_number_and_weight<-number_of_units_by_strata[number_of_units_by_strata$count>1,]
  
  efforts_pivot_WCPFC<-left_join (efforts_pivot_WCPFC,strata_in_number_and_weight,by=setdiff(colnames(strata_in_number_and_weight),"count"))
  
  index.catchinweightandnumber <- which(efforts_pivot_WCPFC[,"count"]==2 & efforts_pivot_WCPFC[,"CatchUnits"]=="N")
  efforts_pivot_WCPFC[index.catchinweightandnumber,"CatchUnits"]="NOMT"
  
  index.catchinweightandnumber <- which(efforts_pivot_WCPFC[,"count"]==2 & efforts_pivot_WCPFC[,"CatchUnits"]=="C")
  efforts_pivot_WCPFC[index.catchinweightandnumber,"CatchUnits"]="MTNO"
  
  index.catchinweightonly <- which(efforts_pivot_WCPFC[,"CatchUnits"]=="C")
  efforts_pivot_WCPFC[index.catchinweightonly,"CatchUnits"]="t"
  
  index.catchinnumberonly <- which(efforts_pivot_WCPFC[,"CatchUnits"]=="N")
  efforts_pivot_WCPFC[index.catchinnumberonly,"CatchUnits"]="no"
  
  # School
  efforts_pivot_WCPFC$School<-"OTH"
  
  ### Reach the efforts harmonized DSD using a function in WCPFC_functions.R
  colToKeep_efforts <- c("FishingFleet","Gear","time_start","time_end","AreaName","School","EffortUnits","Effort")
  #efforts<-WCPFC_CE_efforts_pivotDSD_to_harmonizedDSD(efforts_pivot_WCPFC,colToKeep_captures)
  #2020-11-13 
  efforts_pivot_WCPFC$RFMO <- "WCPFC"
  efforts_pivot_WCPFC$Ocean <- "PAC_W"
  efforts_pivot_WCPFC$FishingFleet <- efforts_pivot_WCPFC$FLAG_CODE 
  efforts_pivot_WCPFC <- harmo_time_2(efforts_pivot_WCPFC, "YY", "MM")
  efforts_pivot_WCPFC <- harmo_spatial_3(efforts_pivot_WCPFC, "LAT5", "LON5", 5, 6) 
  efforts_pivot_WCPFC$CatchType <- "ALL"
  
  efforts_pivot_WCPFC$Effort <- efforts_pivot_WCPFC$value
  efforts <- efforts_pivot_WCPFC[colToKeep_efforts]
  rm(efforts_pivot_WCPFC)
  efforts[, c("AreaName", "FishingFleet")] <- as.data.frame(apply(efforts[, 
                                                                          c("AreaName", "FishingFleet")], 2, function(x) {
                                                                            gsub(" *$", "", x)
                                                                          }), stringsAsFactors = FALSE)
  efforts <- efforts %>% filter(!Effort %in% 0) %>% filter(!is.na(Effort))
  efforts <- as.data.frame(efforts)
  efforts <- aggregate(efforts$Effort,
                       by = list(
                         FishingFleet = efforts$FishingFleet,
                         Gear = efforts$Gear,
                         time_start = efforts$time_start,
                         time_end = efforts$time_end,
                         AreaName = efforts$AreaName,
                         School = efforts$School,
                         EffortUnits = efforts$EffortUnits
                       ),
                       FUN = sum)
  colnames(efforts)[colnames(efforts)=="x"] <- "Effort"
  
  colnames(efforts)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","measurement_unit","measurement_value")
  efforts$source_authority<-"WCPFC"
  efforts$measurement <- "effort" 
  #----------------------------------------------------------------------------------------------------------------------------
  efforts$time_start <- as.Date(efforts$time_start)
  efforts$time_end <- as.Date(efforts$time_end)
  dataset_temporal_extent <- paste(
    paste0(format(min(efforts$time_start), "%Y"), "-01-01"),
    paste0(format(max(efforts$time_end), "%Y"), "-12-31"),
    sep = "/"
  )
  efforts$measurement_processing_level <- "unknown" 
  # output in same folder as path_to_raw_dataset 
  output_name_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'effort', 'data', 'WCPFC_L_PUBLIC_BY_YY_MM_FLAG_harmonized.csv')
  
  write.csv(efforts, output_name_dataset, row.names = FALSE)
georef_dataset <- efforts
  
  #----------------------------------------------------------------------------------------------------------------------------

@ Load pre-harmonization scripts and apply mappings

download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "effort"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC", "ICCAT", "IOTC"))
## 
##  mapping dimension gear_type with code list mapping
## 
##  mapping dimension fishing_fleet with code list mapping
## 
##  mapping dimension fishing_mode with code list mapping
## 
##  mapping dimension measurement_unit with code list mapping

@ Handle unmapped values and save the results

georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
data.table::fwrite(mapping_codelist$recap_mapping, here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'effort', 'data', 'WCPFC_L_PUBLIC_BY_YY_MM_FLAG_recap_mapping.csv'))
data.table::fwrite(mapping_codelist$not_mapped_total, here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'effort', 'data', 'WCPFC_L_PUBLIC_BY_YY_MM_FLAG_not_mapped_total.csv'))
data.table::fwrite(georef_dataset, here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'effort', 'data', 'WCPFC_L_PUBLIC_BY_YY_MM_FLAG_CWP_dataset.csv'))

Display the first few rows of the mapping summaries

print(head(mapping_codelist$recap_mapping))
## # A tibble: 4 × 5
##   src_code trg_code src_codingsystem trg_codingsystem   source_authority
##   <chr>    <chr>    <chr>            <chr>              <chr>           
## 1 HHOOKS   HOOKS    effortunit_wcpfc effortunit_rfmos   WCPFC           
## 2 OTH      OTH      schooltype_wcpfc schooltype_rfmos   WCPFC           
## 3 JP       JPN      flag_wcpfc       fishingfleet_firms WCPFC           
## 4 L        09.31    gear_wcpfc       isscfg_revision_1  WCPFC