Introduction

This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is automatically created from the function: https://raw.githubusercontent.com/eblondel/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/southern_hemisphere_oceans_nominal_catch_tunaatlasccsbt_level0__bygear.R, the documentation keeps the format of roxygen2 skeleton.

A summary of the mapping process is provided. The path to the dataset is specified. You will find on this same repository on GitHub the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows understanding correctly which dataset should be used in this markdown.

Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons.

If you are interested in further details, the results and codes are available for review.

Each .Rmd script requires the user to knit the dataset at the beginning of the script in order to execute the harmonization process correctly. It is also possible to run the code chunk by chunk but be sure to be in the correct working directory (i.e., the one of the .Rmd).

path_to_raw_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'ccsbt', 'nominal', 'data', 'CCSBT_Global_Catch.xlsx')

Harmonize CCSBT Nominal Catch Dataset

This function harmonizes the nominal catch dataset provided by the Commission for the Conservation of Southern Bluefin Tuna (CCSBT), preparing it for integration into the Tuna Atlas database.

@return None; the function outputs files directly, including a harmonized dataset, optional metadata, and code lists for integration within the Tuna Atlas database.

@details The function processes input datasets to match the standardized format required for integration into the Tuna Atlas, including adjustments to column names, units conversion, and data aggregation. Metadata integration is conditional, based on whether it will be loaded into the Tuna Atlas database.

@importFrom dplyr %>% filter select mutate group_by summarise @importFrom readxl read_excel @importFrom reshape melt @seealso for converting CCSBT Longline data structure. @export @keywords data harmonization, fisheries, CCSBT, tuna @author Bastien Grasset, IRD

Catches

Input data sample (after importing as data.frame in R): A tibble: 6 × 6 Calendar_Year Flag_Code Flag Ocean Gear Catch_mt 1 1965 AU Australia Indian Unspecif… 4675. 2 1965 AU Australia Pacific Unspecif… 2201. 3 1965 JP Japan Atlantic Longline 15.3 4 1965 JP Japan Indian Longline 28095. 5 1965 JP Japan Pacific Longline 12579. 6 1965 ZA South Africa Indian Longline 2 final data sample: fishing_fleet gear_type time_start time_end geographic_identifier fishing_mode species measurement_type 1 JP Longline 1965-01-01 1965-01-12 Atlantic ALL SBF ALL 2 JP Longline 1968-01-01 1968-01-12 Atlantic ALL SBF ALL 3 JP Longline 1969-01-01 1969-01-12 Atlantic ALL SBF ALL 4 JP Longline 1970-01-01 1970-01-12 Atlantic ALL SBF ALL 5 JP Longline 1971-01-01 1971-01-12 Atlantic ALL SBF ALL 6 JP Longline 1972-01-01 1972-01-12 Atlantic ALL SBF ALL measurement_unit measurement_value source_authority 1 t 15.33201 CCSBT 2 t 411.48727 CCSBT 3 t 1869.37842 CCSBT 4 t 7574.64216 CCSBT 5 t 2125.58909 CCSBT 6 t 3928.10401 CCSBT

  source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/format_time_db_format.R")
  #packages
  
  
  if(!require(reshape)){
    install.packages("reshape")
    require(reshape)
  }
  if(!require(readxl)){
    install.packages("readxl")
    require(readxl)
  }
  if(!require(dplyr)){
    install.packages("dplyr")
    require(dplyr)
  }
  
  #----------------------------------------------------------------------------------------------------------------------------
  opts <- options()
  options(encoding = "UTF-8")
  #----------------------------------------------------------------------------------------------------------------------------
  
  CCSBT_NC <- readxl::read_excel(path_to_raw_dataset, sheet = "Sheet1")
  
  CCSBT_NC <- CCSBT_NC %>% dplyr::select(Year = Calendar_Year, fishing_fleet = Flag_Code, 
                                         geographic_identifier = Ocean, gear_type = Gear, 
                                         measurement_value = Catch_mt)
  #Year and period
  CCSBT_NC$MonthStart<-1
  CCSBT_NC$Period<-12
  #Format inputDataset time to have the time format of the DB, which is one column time_start and one time_end
  CCSBT_NC<-format_time_db_format(CCSBT_NC)
  
  #School
  CCSBT_NC$fishing_mode<-"UNK"
  
  #Species
  CCSBT_NC$species<-"SBF"
  
  #CatchType
  CCSBT_NC$measurement_type<-"NC"
  
  #Geographic identifier
  CCSBT_NC <- CCSBT_NC  %>% dplyr::mutate(geographic_identifier = case_when(geographic_identifier == "Indian"~"IOTC", 
                                                                            geographic_identifier == "Pacific" ~ "WCPFC",
                                                                            geographic_identifier == "Atlantic" ~ "AT", 
                                                                            TRUE ~ geographic_identifier))
  
  #measurement_unit
  CCSBT_NC$measurement_unit<-"t"
  
  
  # remove 0 and NA values 
  CCSBT_NC <- CCSBT_NC[CCSBT_NC$measurement_value != 0,]
  CCSBT_NC <- CCSBT_NC[!is.na(CCSBT_NC$measurement_value),] 
  
  NC <- aggregate(CCSBT_NC$measurement_value,
                  FUN = sum,
                  by = list(
                    fishing_fleet = CCSBT_NC$fishing_fleet,
                    gear_type = CCSBT_NC$gear_type,
                    time_start = CCSBT_NC$time_start,
                    time_end = CCSBT_NC$time_end,
                    geographic_identifier = CCSBT_NC$geographic_identifier,
                    fishing_mode = CCSBT_NC$fishing_mode,
                    species = CCSBT_NC$species,
                    measurement_type = CCSBT_NC$measurement_type,
                    measurement_unit = CCSBT_NC$measurement_unit
                  )
  )
  
  colnames(NC)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","species","measurement_type","measurement_unit","measurement_value")
  
  NC$source_authority<-"CCSBT"
  NC$measurement <- "catch"
  NC$measurement_processing_level<-"raised"
  #----------------------------------------------------------------------------------------------------------------------------
  NC$time_start <- as.Date(NC$time_start)
  NC$time_end <- as.Date(NC$time_end)
  dataset_temporal_extent <- paste(
    paste0(format(min(NC$time_start), "%Y"), "-01-01"),
    paste0(format(max(NC$time_end), "%Y"), "-12-31"),
    sep = "/"
  )
  NC$measurement_processing_level <- "unknown" 
  # output in same folder as path_to_raw_dataset 
  output_name_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'ccsbt', 'nominal', 'data', 'CCSBT_Global_Catch_harmonized.csv')
  
  write.csv(NC, output_name_dataset, row.names = FALSE)
georef_dataset <- NC
  
  #----------------------------------------------------------------------------------------------------------------------------  

@ Load pre-harmonization scripts and apply mappings

download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "catch"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC"))
## 
##  mapping dimension gear_type with code list mapping
## 
##  mapping dimension species with code list mapping
## 
##  mapping dimension fishing_fleet with code list mapping
## 
##  mapping dimension fishing_mode with code list mapping

@ Handle unmapped values and save the results

georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
data.table::fwrite(mapping_codelist$recap_mapping, here::here('R/tunaatlas_scripts/pre-harmonization', 'ccsbt', 'nominal', 'data', 'CCSBT_Global_Catch_recap_mapping.csv'))
data.table::fwrite(mapping_codelist$not_mapped_total, here::here('R/tunaatlas_scripts/pre-harmonization', 'ccsbt', 'nominal', 'data', 'CCSBT_Global_Catch_not_mapped_total.csv'))
data.table::fwrite(georef_dataset, here::here('R/tunaatlas_scripts/pre-harmonization', 'ccsbt', 'nominal', 'data', 'CCSBT_Global_Catch_CWP_dataset.csv'))

Display the first few rows of the mapping summaries

print(head(mapping_codelist$recap_mapping))
## # A tibble: 6 × 5
##   src_code trg_code src_codingsystem trg_codingsystem   source_authority
##   <chr>    <chr>    <chr>            <chr>              <chr>           
## 1 UNK      UNK      schooltype_ccsbt schooltype_rfmos   CCSBT           
## 2 AU       AUS      flag_ccsbt       fishingfleet_firms CCSBT           
## 3 ID       IDN      flag_ccsbt       fishingfleet_firms CCSBT           
## 4 JP       JPN      flag_ccsbt       fishingfleet_firms CCSBT           
## 5 KR       KOR      flag_ccsbt       fishingfleet_firms CCSBT           
## 6 NZ       NZL      flag_ccsbt       fishingfleet_firms CCSBT