Introduction

This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is automatically created from the function: https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/west_pacific_ocean_catch_5deg_1m_ps_tunaatlaswcpfc_level0.R, the documentation keeps the format of roxygen2 skeleton.

A summary of the mapping process is provided. The path to the dataset is specified. You will find on this same repository on GitHub the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows understanding correctly which dataset should be used in this markdown.

Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons.

If you are interested in further details, the results and codes are available for review.

Each .Rmd script requires the user to knit the dataset at the beginning of the script in order to execute the harmonization process correctly. It is also possible to run the code chunk by chunk but be sure to be in the correct working directory (i.e., the one of the .Rmd).

path_to_raw_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'catch', 'data', 'WCPFC_S_PUBLIC_BY_YY_MM.csv')

Harmonize WCPFC Purse Seine Catch Datasets

This function harmonizes WCPFC Purse Seine catch datasets for integration into the Tuna Atlas database, ensuring data compliance with specified format requirements.

@return None; the function outputs files directly, including harmonized datasets, optional metadata, and code lists for integration within the Tuna Atlas database.

@details This function modifies the Purse Seine catch dataset to ensure compliance with the standardized format, including renaming, reordering, and recalculating specific fields as necessary. Metadata integration is contingent on the intended use within the Tuna Atlas database.

@importFrom dplyr %>% filter select mutate group_by summarise @importFrom tidyr gather @importFrom reshape melt @seealso to convert WCPFC task 2 Purse Seine data structure. @export @keywords data harmonization, fisheries, WCPFC, tuna @author Paul Taconet, IRD @author Bastien Grasset, IRD

  # Input data sample:
  # YY MM LAT5 LON5 DAYS SETS_UNA SETS_LOG SETS_DFAD SETS_AFAD SETS_OTH SKJ_C_UNA YFT_C_UNA BET_C_UNA OTH_C_UNA SKJ_C_LOG YFT_C_LOG BET_C_LOG OTH_C_LOG SKJ_C_DFAD
  # 1967  2  30N 135E    0        0        0         0         0        0         0         0         0         0         0         0         0         0          0
  # 1967  2  30N 140E    0        0        0         0         0        0         0         0         0         0         0         0         0         0          0
  # 1967  2  35N 140E    0        0        0         0         0        0         0         0         0         0         0         0         0         0          0
  # 1967  2  40N 140E    0        0        0         0         0        0         0         0         0         0         0         0         0         0          0
  # 1967  2  40N 145E    0        0        0         0         0        0         0         0         0         0         0         0         0         0          0
  # 1967  3  30N 135E    0        0        0         0         0        0         0         0         0         0         0         0         0         0          0
  # YFT_C_DFAD BET_C_DFAD OTH_C_DFAD SKJ_C_AFAD YFT_C_AFAD BET_C_AFAD OTH_C_AFAD SKJ_C_OTH YFT_C_OTH BET_C_OTH OTH_C_OTH
  #          0          0          0          0          0          0          0         0         0         0         0
  #          0          0          0          0          0          0          0         0         0         0         0
  #          0          0          0          0          0          0          0         0         0         0         0
  #          0          0          0          0          0          0          0         0         0         0         0
  #          0          0          0          0          0          0          0         0         0         0         0
  #          0          0          0          0          0          0          0         0         0         0         0
  
  
  # Catch: final data sample:
  # FishingFleet Gear time_start   time_end AreaName School Species CatchType CatchUnits   Catch
  #  ALL    S 1970-01-01 1970-02-01  6100135    LOG     BET       ALL         MT  12.181
  #  ALL    S 1970-01-01 1970-02-01  6100135    LOG     SKJ       ALL         MT  84.587
  #  ALL    S 1970-01-01 1970-02-01  6100135    LOG     YFT       ALL         MT 110.307
  #  ALL    S 1970-02-01 1970-03-01  6100125    LOG     BET       ALL         MT   5.943
  #  ALL    S 1970-02-01 1970-03-01  6100125    LOG     SKJ       ALL         MT  35.133
  #  ALL    S 1970-02-01 1970-03-01  6100125    LOG     YFT       ALL         MT  53.466

packages

if(!require(reshape)){
  install.packages("reshape")
  require(reshape)
}

if(!require(tidyr)){
  install.packages("tidyr")
  require(tidyr)
}

if(!require(dplyr)){
  install.packages("dplyr")
  require(dplyr)
}

Historical name for the dataset at source WCPFC_S_PUBLIC_BY_YR_MON.csv

opts <- options()
options(encoding = "UTF-8")

Catches

# Reach the catches pivot DSD

Changes - change from dbf to csv - remove cwp_grid code - to upper colnames

DF <- read.csv(path_to_raw_dataset)
colnames(DF) <- toupper(colnames(DF))
DF$CWP_GRID <- NULL

DF <- DF %>% tidyr::gather(variable, value, -c(colnames(DF[1:10])))

DF <- DF %>% dplyr::filter(!value %in% 0) %>% dplyr::filter(!is.na(value))
DF$variable <- as.character(DF$variable)
colnames(DF)[which(colnames(DF) == "variable")] <- "Species"
DF$School <- substr(DF$Species, 7, nchar(DF$Species))
DF$Species <- sub("_C_UNA", "", DF$Species)
DF$Species <- sub("_C_LOG", "", DF$Species)
DF$Species <- sub("_C_DFAD", "", DF$Species)
DF$Species <- sub("_C_AFAD", "", DF$Species)
DF$Species <- sub("_C_OTH", "", DF$Species)
DF$CatchUnits <- "t"
DF$EffortUnits <- colnames(DF[5])
colnames(DF)[5] <- "Effort"
catches_pivot_WCPFC <- DF; rm(DF)

Gear

catches_pivot_WCPFC$Gear<-"S"

Catchunits # # Reach the catches harmonized DSD using a function in WCPFC_functions.R

colToKeep_captures <- c("FishingFleet","Gear","time_start","time_end","AreaName","School","Species","CatchType","CatchUnits","Catch")
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/WCPFC_CE_catches_pivotDSD_to_harmonizedDSD.R")
catches<-WCPFC_CE_catches_pivotDSD_to_harmonizedDSD(catches_pivot_WCPFC,colToKeep_captures)

colnames(catches)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","species","measurement_type","measurement_unit","measurement_value")
catches$source_authority<-"WCPFC"
catches$measurement_type <- "RC" # Retained catches
catches$measurement <- "catch" 
catches$measurement_processing_level <- "raised"
catches$time_start <- as.Date(catches$time_start)
catches$time_end <- as.Date(catches$time_end)
dataset_temporal_extent <- paste(
  paste0(format(min(catches$time_start), "%Y"), "-01-01"),
  paste0(format(max(catches$time_end), "%Y"), "-12-31"),
  sep = "/"
)

output in same folder as path_to_raw_dataset

output_name_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'catch', 'data', 'WCPFC_S_PUBLIC_BY_YY_MM_harmonized.csv')

write.csv(catches, output_name_dataset, row.names = FALSE)
georef_dataset <- catches

@ Load pre-harmonization scripts and apply mappings

download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "catch"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC"))
## 
##  mapping dimension gear_type with code list mapping
## 
##  mapping dimension species with code list mapping
## 
##  mapping dimension fishing_fleet with code list mapping
## 
##  mapping dimension fishing_mode with code list mapping

@ Handle unmapped values and save the results

georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
data.table::fwrite(mapping_codelist$recap_mapping, here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'catch', 'data', 'WCPFC_S_PUBLIC_BY_YY_MM_recap_mapping.csv'))
data.table::fwrite(mapping_codelist$not_mapped_total, here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'catch', 'data', 'WCPFC_S_PUBLIC_BY_YY_MM_not_mapped_total.csv'))
data.table::fwrite(georef_dataset, here::here('R/tunaatlas_scripts/pre-harmonization', 'wcpfc', 'catch', 'data', 'WCPFC_S_PUBLIC_BY_YY_MM_CWP_dataset.csv'))

Display the first few rows of the mapping summaries

print(head(mapping_codelist$recap_mapping))
## # A tibble: 6 × 5
##   src_code trg_code src_codingsystem trg_codingsystem   source_authority
##   <chr>    <chr>    <chr>            <chr>              <chr>           
## 1 DFAD     LS       schooltype_wcpfc schooltype_rfmos   WCPFC           
## 2 LOG      LS       schooltype_wcpfc schooltype_rfmos   WCPFC           
## 3 OTH      OTH      schooltype_wcpfc schooltype_rfmos   WCPFC           
## 4 UNA      FS       schooltype_wcpfc schooltype_rfmos   WCPFC           
## 5 ALL      NEI      flag_wcpfc       fishingfleet_firms WCPFC           
## 6 BET      BET      species_wcpfc    species_asfis      WCPFC