This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is automatically created from the function: https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/refs/heads/master/R/tunaatlas_scripts/pre-harmonization/atlantic_ocean_effort_1deg_1m_ps_tunaatlasiccat_level0__byschool.R, the documentation keeps the format of roxygen2 skeleton.
A summary of the mapping process is provided. The path to the dataset is specified. You will find on this same repository on GitHub the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows understanding correctly which dataset should be used in this markdown.
Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons.
If you are interested in further details, the results and codes are available for review.
Each .Rmd script requires the user to knit the
dataset at the beginning of the script in order to execute the
harmonization process correctly. It is also possible to run the code
chunk by chunk but be sure to be in the correct working directory (i.e.,
the one of the .Rmd).
path_to_raw_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_ETRO-PS1991-2024_bySchool_v1.xlsx')
Harmonize data structure of ICCAT by operation mode effort datasets
This function harmonizes the structure of ICCAT catch-and-effort datasets based on operation modes. It adapts the dataset structure for compatibility with the Tuna Atlas database, considering only specific fields as mandatory. The function also handles optional metadata integration if provided.
@param keep_fleet_instead_of_flag Logical, defaults to FALSE. Determines whether to replace the ‘flag’ column with the ‘fleet’ column in the output dataset.
@return None; this function performs data manipulation and outputs files directly.
@import dplyr @importFrom stringr str_detect str_replace @importFrom readr read_csv write_csv @seealso for converting ICCAT task 2, @export @author Paul Taconet, IRD @author Bastien Grasset, IRD @keywords ICCAT, tuna, fisheries, data harmonization
keep_fleet_instead_of_flag=FALSE
packages
if(!require(dplyr)){
install.packages("dplyr")
require(dplyr)
}
Historical name for the dataset at source t2ce_bySchool.csv
opts <- options()
options(encoding = "UTF-8")
Input data sample (after importing as data.frame in R): A tibble: 6 ×
33 DSetID StrataID FlagName FleetCode GearCode YearC Decade TimePeriodID
GeoStrataCode QuadID Lat Lon xLon yLat FishMode Eff1 Eff1Type Eff2
Eff2Type Eff3 Eff3Type Eff4 Eff4Type Eff5 Eff5Type YFT ALB BET BLF LTA
SKJ FRI TOTAL
RFMO_CE <- read_excel(path_to_raw_dataset,
sheet = "Data")
RFMO_CE$FleetCode_short <- sub("-.*", "", RFMO_CE$FleetCode) # fleet code only what is after the '-'
names(RFMO_CE)[names(RFMO_CE) == 'FleetCode_short'] <- 'FishingFleet'
RFMO_CE <- RFMO_CE[, c("FishingFleet", setdiff(names(RFMO_CE), "FishingFleet"))] # oput flag in first position
if(keep_fleet_instead_of_flag==TRUE){ RFMO_CE$Flag<-NULL names(RFMO_CE)[names(RFMO_CE) == ‘Fishingfleet’] <- ‘Flag’ } # Efforts
last_column_not_catch_value=27
RFMO_CE<-RFMO_CE[,-(last_column_not_catch_value:ncol(RFMO_CE))]
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/FUN_efforts_ICCAT_CE_keep_all_efforts.R")
efforts_pivot_ICCAT<-FUN_efforts_ICCAT_CE_keep_all_efforts(RFMO_CE,c("Eff1","Eff2","Eff3","Eff4","Eff5"),c("Eff1Type","Eff2Type","Eff3Type","Eff4Type","Eff5Type"))
School The format changed, the school is now in the FishMode column
efforts_pivot_ICCAT <- efforts_pivot_ICCAT %>% dplyr::mutate(FishMode = ifelse(FishMode == "n/a", "OTH", FishMode))
efforts_pivot_ICCAT<- efforts_pivot_ICCAT %>% dplyr::rename("School" = "FishMode")
colToKeep_efforts <- c("FishingFleet","Gear","time_start","time_end","AreaName","School","EffortUnits","Effort")
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/ICCAT_CE_effort_pivotDSD_to_harmonizedDSD.R")
efforts_pivot_ICCAT <- efforts_pivot_ICCAT %>% dplyr::rename(SquareTypeCode = GeoStrataCode) # to match definition in ICCAT_CE_effort_pivotDSD_to_harmonizedDSD
efforts_pivot_ICCAT$Lat <- floor(abs(efforts_pivot_ICCAT$Lat)) # we put floor as independently of the quadrant the floor always correspond to the cwp
efforts_pivot_ICCAT$Lon <- floor(abs(efforts_pivot_ICCAT$Lon))
efforts<-ICCAT_CE_effort_pivotDSD_to_harmonizedDSD(efforts_pivot_ICCAT,colToKeep_efforts)
efforts$AreaName <- as.character(as.integer(efforts$AreaName))
colnames(efforts)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","measurement_unit","measurement_value")
efforts$source_authority<-"ICCAT"
efforts$time_start <- as.Date(efforts$time_start)
efforts$time_end <- as.Date(efforts$time_end)
dataset_temporal_extent <- paste(
paste0(format(min(efforts$time_start), "%Y"), "-01-01"),
paste0(format(max(efforts$time_end), "%Y"), "-12-31"),
sep = "/"
)
efforts$geographic_identifier <- format(as.integer(efforts$geographic_identifier), scientific = FALSE, trim = TRUE)
output in same folder as path_to_raw_dataset
output_name_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_ETRO-PS1991-2024_bySchool_v1_harmonized.csv')
write.csv(efforts, output_name_dataset, row.names = FALSE)
georef_dataset <- efforts
@ Load pre-harmonization scripts and apply mappings
download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "effort"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC", "ICCAT", "IOTC"))
##
## mapping dimension gear_type with code list mapping
##
## mapping dimension fishing_fleet with code list mapping
##
## mapping dimension fishing_mode with code list mapping
##
## mapping dimension measurement_unit with code list mapping
@ Handle unmapped values and save the results
georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
data.table::fwrite(mapping_codelist$recap_mapping, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_ETRO-PS1991-2024_bySchool_v1_recap_mapping.csv'))
data.table::fwrite(mapping_codelist$not_mapped_total, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_ETRO-PS1991-2024_bySchool_v1_not_mapped_total.csv'))
data.table::fwrite(georef_dataset, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_ETRO-PS1991-2024_bySchool_v1_CWP_dataset.csv'))
Display the first few rows of the mapping summaries
print(head(mapping_codelist$recap_mapping))
## # A tibble: 6 × 5
## src_code trg_code src_codingsystem trg_codingsystem source_authority
## <chr> <chr> <chr> <chr> <chr>
## 1 FISH.HOUR FHOURS effortunit_iccat effortunit_rfmos ICCAT
## 2 HOURS.SEA HOURS effortunit_iccat effortunit_rfmos ICCAT
## 3 Hours.FAD Hours.FAD effortunit_iccat effortunit_rfmos ICCAT
## 4 Hours.FSC Hours.FSC effortunit_iccat effortunit_rfmos ICCAT
## 5 Hours.STD Hours.STD effortunit_iccat effortunit_rfmos ICCAT
## 6 FAD LS schooltype_iccat schooltype_rfmos ICCAT