This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is created from a function so the documentation keep the format of roxygen2 skeleton A summary of the mapping process is provided. The path to the dataset is specified, you will find on this same repository on github the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows to understand correctly which dataset should be used in this markdown. Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons. If you are interested in further details, the results and codes are available for review.
path_to_raw_dataset <- here::here('tunaatlas_scripts/pre-harmonization', 'iattc', 'catch', 'data', 'PublicPSTunaSetType.csv')
Harmonize IATTC PSSetType Catch Datasets by School
This function harmonizes the structure of IATTC PS (Purse Seine) catch datasets by school type, specifically for Billfish, Tuna, and Shark, according to the operation modes ‘PublicPSBillfishSetType’, ‘PublicPSTunaSetType’, and ‘PublicPSSharkSetType’. It prepares the data for integration into the Tuna Atlas database, ensuring that only the essential fields are retained and that metadata is included if the dataset will be loaded into the database. This script works with any data that has the first 5 columns named and ordered as follow: {Year|Month|Flag|LatC1|LonC1|NumSets}
@return None; this function outputs files directly, including harmonized datasets, optional metadata, and code lists for integration within the Tuna Atlas database.
@details The function requires the path to a raw dataset and, optionally, a metadata file. It processes the data to harmonize it based on the specified school type stratification. The process may include renaming columns, recalculating fields, and reformatting the data for consistency with database requirements.
@importFrom dplyr select mutate @importFrom readr read_csv write_csv @seealso to convert IATTC task 2, to convert IATTC nominal catch data structure. @export @author Paul Taconet, IRD @author Bastien Grasset, IRD @keywords IATTC, tuna, fisheries, data harmonization, catch data Historical name for the dataset at source PublicPSTunaSetType.csv or PublicPSBillfishSetType.csv
opts <- options()
options(encoding = "UTF-8")
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/FUN_catches_IATTC_CE_Flag_or_SetType.R")
Reach the catches pivot DSD using a function stored in IATTC_functions.R
catches_pivot_IATTC <-FUN_catches_IATTC_CE_Flag_or_SetType(path_to_raw_dataset,"SetType","PS")
catches_pivot_IATTC$NumSets<-NULL
Reach the catches harmonized DSD using a function in IATTC_functions.R
colToKeep_captures <- c("FishingFleet","Gear","time_start","time_end","AreaName","School","Species","CatchType","CatchUnits","Catch")
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/IATTC_CE_catches_pivotDSD_to_harmonizedDSD.R")
catches<-IATTC_CE_catches_pivotDSD_to_harmonizedDSD(catches_pivot_IATTC,colToKeep_captures)
colnames(catches)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","species","measurement_type","measurement_unit","measurement_value")
catches$source_authority<-"IATTC"
catches$measurement_type <- "RC" # Retained catches
catches$measurement <- "catch"
catches$time_start <- as.Date(catches$time_start)
catches$time_end <- as.Date(catches$time_end)
dataset_temporal_extent <- paste(
paste0(format(min(catches$time_start), "%Y"), "-01-01"),
paste0(format(max(catches$time_end), "%Y"), "-12-31"),
sep = "/"
)
output_name_dataset <- "Dataset_harmonized.csv"
write.csv(catches, output_name_dataset, row.names = FALSE)
georef_dataset <- catches
@ Load pre-harmonization scripts and apply mappings
download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "catch"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC"))
##
## mapping dimension gear_type with code list mapping
## mapping dimension species with code list mapping
## mapping dimension fishing_fleet with code list mapping
## mapping dimension fishing_mode with code list mapping
## mapping dimension measurement_type with code list mapping
@ Handle unmapped values and save the results
georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
fwrite(mapping_codelist$recap_mapping, 'recap_mapping.csv')
fwrite(mapping_codelist$not_mapped_total, 'not_mapped_total.csv')
fwrite(georef_dataset, 'CWP_dataset.csv')
Display the first few rows of the mapping summaries
print(head(mapping_codelist$recap_mapping))
## # A tibble: 6 × 5
## src_code trg_code src_codingsystem trg_codingsystem source_authority
## <chr> <chr> <chr> <chr> <chr>
## 1 DEL DEL schooltype_iattc schooltype_rfmos IATTC
## 2 NOA FS schooltype_iattc schooltype_rfmos IATTC
## 3 OBJ LS schooltype_iattc schooltype_rfmos IATTC
## 4 ALL NEI flag_wcpfc fishingfleet_firms WCPFC
## 5 ALB ALB species_iattc species_asfis IATTC
## 6 BET BET species_iattc species_asfis IATTC
print(head(mapping_codelist$not_mapped_total))
## Value source_authority Dimension
## 1 ALL IATTC fishing_fleet
## 2 RC IATTC measurement_type