This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is automatically created from the function: https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/refs/heads/master/R/tunaatlas_scripts/pre-harmonization/southern_hemisphere_oceans_effort_5deg_1m_ll_tunaatlasccsbt_level0.R, the documentation keeps the format of roxygen2 skeleton.
A summary of the mapping process is provided. The path to the dataset is specified. You will find on this same repository on GitHub the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows understanding correctly which dataset should be used in this markdown.
Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons.
If you are interested in further details, the results and codes are available for review.
Each .Rmd script requires the user to knit the
dataset at the beginning of the script in order to execute the
harmonization process correctly. It is also possible to run the code
chunk by chunk but be sure to be in the correct working directory (i.e.,
the one of the .Rmd).
path_to_raw_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'ccsbt', 'effort', 'data', 'CEData_Longline.xlsx')
Harmonize CCSBT Longline Effort Datasets
This function harmonizes CCSBT Longline effort datasets for integration into the Tuna Atlas database, ensuring data compliance with specified format requirements.
@return None; the function outputs files directly, including harmonized datasets, optional metadata, and code lists for integration within the Tuna Atlas database.
@details This function modifies the effort dataset to ensure compliance with the standardized format, including renaming, reordering, and recalculating specific fields as necessary. Metadata integration is contingent on the intended use within the Tuna Atlas database.
@importFrom readxl read_excel @importFrom dplyr %>% filter select mutate group_by summarise @seealso @export @keywords data harmonization, fisheries, CCSBT, tuna @author Paul Taconet, IRD @author Bastien Grasset, IRD
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/harmo_time_2.R")
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/harmo_spatial_5.R")
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/format_time_db_format.R")
if(!require(readxl)){
install.packages("readxl")
require(readxl)
}
if(!(require(dplyr))){
install.packages(dplyr)
(require(dplyr))}
Input data sample (after importing as data.frame in R): YEAR MONTH COUNTRY_CODE TARGET_SPECIES CCSBT_STATISTICAL_AREA LATITUDE LONGITUDE NUMBER_OF_HOOKS NUMBER_OF_SBT_RETAINED 1965 1 JP NA 1 -15 100 2083 4 1965 1 JP NA 1 -15 110 9647 0 1965 1 JP NA 1 -15 115 91431 525 1965 1 JP NA 1 -10 100 23560 56 1965 1 JP NA 1 -10 105 31232 35 1965 1 JP NA 1 -10 110 4960 10 Effort: final data sample: Flag Gear time_start time_end AreaName School EffortUnits Effort AU LL 1986-11-01 1986-12-01 6330150 ALL HOOKS 3520 AU LL 1986-11-01 1986-12-01 6335150 ALL HOOKS 5970 AU LL 1986-12-01 1987-01-01 6335150 ALL HOOKS 5150 AU LL 1987-01-01 1987-02-01 6330150 ALL HOOKS 1840 AU LL 1987-01-01 1987-02-01 6335150 ALL HOOKS 14740 AU LL 1987-02-01 1987-03-01 6335150 ALL HOOKS 17300
#----------------------------------------------------------------------------------------------------------------------------
Historical name for the dataset at source CEData_Longline.xlsx
opts <- options()
options(encoding = "UTF-8")
RFMO_CE<-readxl::read_excel(path_to_raw_dataset, sheet = "CEData_Longline", col_names = TRUE, col_types = NULL,na = "")
colnames(RFMO_CE)<-gsub("\r\n", "_", colnames(RFMO_CE))
colnames(RFMO_CE)<-gsub(" ", "_", colnames(RFMO_CE))
RFMO_CE<-as.data.frame(RFMO_CE)
#Remove lines that are read in the Excel but that are not real
RFMO_CE<- RFMO_CE[!is.na(RFMO_CE$YEAR),]
RFMO_CE$NUMBER_OF_SBT_RETAINED<-as.numeric(RFMO_CE$NUMBER_OF_SBT_RETAINED)
#FishingFleet
RFMO_CE$FishingFleet<-RFMO_CE$COUNTRY_CODE
#Gear
RFMO_CE$Gear<-"Longline"
#Year and period
RFMO_CE<-harmo_time_2(RFMO_CE, "YEAR", "MONTH")
#Format inputDataset time to have the time format of the DB, which is one column time_start and one time_end
RFMO_CE<-format_time_db_format(RFMO_CE)
# Area
RFMO_CE<-harmo_spatial_5(RFMO_CE,"LATITUDE","LONGITUDE",5,6)
#School
RFMO_CE$School<-"UNK"
#Species
RFMO_CE$Species<-"SBF"
#CatchType
RFMO_CE$CatchType<-"UNK" #not used later as it is no catch
efforts<-RFMO_CE
efforts$EffortUnits<-"NUMBER_OF_HOOKS"
efforts$Effort<-efforts$NUMBER_OF_HOOKS
colToKeep_efforts <- c("FishingFleet","Gear","time_start","time_end","AreaName","School","EffortUnits","Effort")
efforts <-efforts[colToKeep_efforts]
remove whitespaces on columns that should not have withespace
efforts[,c("AreaName","FishingFleet")]<-as.data.frame(apply(efforts[,c("AreaName","FishingFleet")],2,function(x){gsub(" *$","",x)}),stringsAsFactors=FALSE)
remove 0 and NA values
efforts <- efforts %>%
dplyr::filter( ! Effort %in% 0 ) %>%
dplyr::filter( ! is.na(Effort))
efforts <- efforts %>%
dplyr::group_by(FishingFleet,Gear,time_start,time_end,AreaName,School,EffortUnits) %>%
dplyr::summarise(Effort = sum(Effort))
efforts<-as.data.frame(efforts)
colnames(efforts)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","measurement_unit","measurement_value")
efforts$source_authority<-"CCSBT"
efforts$measurement <- "effort"
efforts$measurement_processing_level <- "unknown"
efforts$time_start <- as.Date(efforts$time_start)
efforts$time_end <- as.Date(efforts$time_end)
dataset_temporal_extent <- paste(
paste0(format(min(efforts$time_start), "%Y"), "-01-01"),
paste0(format(max(efforts$time_end), "%Y"), "-12-31"),
sep = "/"
)
output in same folder as path_to_raw_dataset
output_name_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'ccsbt', 'effort', 'data', 'CEData_Longline_harmonized.csv')
write.csv(efforts, output_name_dataset, row.names = FALSE)
georef_dataset <- efforts
@ Load pre-harmonization scripts and apply mappings
download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "effort"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC", "ICCAT", "IOTC"))
##
## mapping dimension gear_type with code list mapping
##
## mapping dimension fishing_fleet with code list mapping
##
## mapping dimension fishing_mode with code list mapping
##
## mapping dimension measurement_unit with code list mapping
@ Handle unmapped values and save the results
georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
data.table::fwrite(mapping_codelist$recap_mapping, here::here('R/tunaatlas_scripts/pre-harmonization', 'ccsbt', 'effort', 'data', 'CEData_Longline_recap_mapping.csv'))
data.table::fwrite(mapping_codelist$not_mapped_total, here::here('R/tunaatlas_scripts/pre-harmonization', 'ccsbt', 'effort', 'data', 'CEData_Longline_not_mapped_total.csv'))
data.table::fwrite(georef_dataset, here::here('R/tunaatlas_scripts/pre-harmonization', 'ccsbt', 'effort', 'data', 'CEData_Longline_CWP_dataset.csv'))
Display the first few rows of the mapping summaries
print(head(mapping_codelist$recap_mapping))
## # A tibble: 4 × 5
## src_code trg_code src_codingsystem trg_codingsystem source_authority
## <chr> <chr> <chr> <chr> <chr>
## 1 NUMBER_OF_HOOKS HOOKS effortunit_ccsbt effortunit_rfmos CCSBT
## 2 UNK UNK schooltype_ccsbt schooltype_rfmos CCSBT
## 3 JP JPN flag_ccsbt fishingfleet_firms CCSBT
## 4 Longline 09.39 gear_ccsbt isscfg_revision_1 CCSBT