This R Markdown document is designed to transform data that is not in CWP format into CWP format. Initially, it changes the format of the data; subsequently, it maps the data to adhere to CWP standards. This markdown is automatically created from the function: https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/refs/heads/master/R/tunaatlas_scripts/pre-harmonization/atlantic_ocean_effort_tunaatlasiccat_level0__noschool.R, the documentation keeps the format of roxygen2 skeleton.
A summary of the mapping process is provided. The path to the dataset is specified. You will find on this same repository on GitHub the first line of each dataset. The datasets are named after the historical name provided by tRFMOs while exporting and may change. The information provided in the Rmd allows understanding correctly which dataset should be used in this markdown.
Additional operations are performed next to verify other aspects of the data, such as the consistency of the geolocation, the values, and the reported catches in numbers and tons.
If you are interested in further details, the results and codes are available for review.
Each .Rmd script requires the user to knit the
dataset at the beginning of the script in order to execute the
harmonization process correctly. It is also possible to run the code
chunk by chunk but be sure to be in the correct working directory (i.e.,
the one of the .Rmd).
path_to_raw_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_noSchool.csv')
Harmonize Data Structure of ICCAT Effort Dataset
This function harmonizes the data structure of ICCAT effort datasets from voluminous Microsoft Access databases. It handles datasets by potentially including metadata and code lists for integration within the Tuna Atlas database. Depending on the parameter settings, either ‘fleet’ or ‘flag’ column will be kept in the output.
@param keep_fleet_instead_of_flag Boolean parameter to decide whether to keep ‘fleet’ instead of ‘flag’. @return Writes a harmonized dataset, metadata file, and code lists to CSV files. @export
@author Paul Taconet, IRD @author Bastien Grasset, IRD
@keywords International Commission for the Conservation of Atlantic Tunas (ICCAT), RFMO, Sardara Global Database
@seealso , for other ICCAT data structure conversions.
@examples
keep_fleet_instead_of_flag=FALSE
packages
if(!require(readr)){
install.packages("readr")
require(readr)
}
if(!require(dplyr)){
install.packages("dplyr")
require(dplyr)
}
Input data sample: No sample. Miscrosoft Acces DB. However after the commands that read the input DB the sample is the following: StrataID DSetID FleetID GearGrpCode GearCode FileTypeCode YearC TimePeriodID SquareTypeCode QuadID Lat Lon Eff1 Eff1Type Eff2 Eff2Type DSetTypeID CatchUnit ALB BET BFT BUM 1 1 021ES00 TP TRAP OF-REP 1950 17 1x1 4 36 5 4 NO.TRAPS NA nw kg 0 0 6725000 0 2 1 021ES00 TP TRAP OF-REP 1950 17 1x1 4 36 5 4 NO.TRAPS NA nw nr 0 0 52928 0 3 2 026YU00 PS PS OF-REP 1950 17 5x5 1 40 15 14 NO.BOATS NA -w kg 0 0 657000 0 4 3 021ES00 TP TRAP OF-REP 1951 17 1x1 4 36 5 4 NO.TRAPS NA nw kg 0 0 3072000 0 5 3 021ES00 TP TRAP OF-REP 1951 17 1x1 4 36 5 4 NO.TRAPS NA nw nr 0 0 28654 0 6 4 026YU00 PS PS OF-REP 1951 17 5x5 1 40 15 14 NO.BOATS NA -w kg 0 0 531000 0 SAI SKJ SWO WHM YFT BLF BLT BON BOP BRS CER FRI KGM KGX LTA MAW SLT SSM WAH oSmt BIL BLM MLS SBF SPF oTun BSH POR SMA MAK oSks FleetCode FleetName FlagID FlagCode 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EU.ESP EU.España 21 EU.ESP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EU.ESP EU.España 21 EU.ESP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 YUG Yugoslavia Fed. 26 YUG 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EU.ESP EU.España 21 EU.ESP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EU.ESP EU.España 21 EU.ESP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 YUG Yugoslavia Fed. 26 YUG FlagName StatusCode EU.España CP EU.España CP Yugoslavia Fed. NCO EU.España CP EU.España CP Yugoslavia Fed. NCO Effort: final data sample: Flag Gear time_start time_end AreaName School EffortUnits Effort ARG LL 1967-02-01 1967-03-01 7330040 OTH NO.HOOKS 13000 ARG LL 1967-03-01 1967-04-01 7330040 OTH NO.HOOKS 67000 ARG LL 1967-04-01 1967-05-01 7330040 OTH NO.HOOKS 107000 ARG LL 1967-05-01 1967-06-01 7330040 OTH NO.HOOKS 88000 ARG LL 1967-06-01 1967-07-01 7330040 OTH NO.HOOKS 66000 ARG LL 1967-07-01 1967-08-01 7330040 OTH NO.HOOKS 50000 # download database to get .mdb ./R/tunaatlas_scripts/pre-harmonization/data_iccat_from_mdb.R Historical name for the dataset at source t2ce_noSchool.csv
keep_fleet_instead_of_flag=FALSE
t2ce <- as.data.frame(readr::read_csv(path_to_raw_dataset))
ICCAT_CE_species_colnames<-setdiff(colnames(t2ce),c("StrataID","DSetID",
"FleetID","GearGrpCode","GearCode","FileTypeCode","YearC","TimePeriodID","SquareTypeCode",
"QuadID","Lat","Lon","Eff1","Eff1Type","Eff2","Eff2Type","DSetTypeID","CatchUnit", "FleetCode", "FleetName", "FlagID", "FlagCode",
"SchoolTypeCode", "FlagName", "StatusCode"))
remove duplicated lines, as displayed in kilos and numbers
eff_cols <- c("Eff1","Eff1Type","Eff2","Eff2Type")
species_cols <- ICCAT_CE_species_colnames
key_cols <- setdiff(
colnames(t2ce),
c("StrataID", "CatchUnit", eff_cols, species_cols)
)
unit_priority <- function(x) {
dplyr::case_when(
x == "kg" ~ 1L,
x == "nr" ~ 2L,
TRUE ~ 3L
)
}
t2ce <- t2ce %>%
dplyr::mutate(.unit_rank = unit_priority(CatchUnit)) %>%
dplyr::group_by(across(all_of(key_cols))) %>%
dplyr::arrange(.unit_rank, .by_group = TRUE) %>%
dplyr::slice(1) %>%
dplyr::ungroup() %>%
dplyr::select(-.unit_rank)
remove data displayed annualy data_pivot_ICCAT<-left_join(t2ce,Flags,by=“FleetID”) # equivalent to “select FlagCode,FlagID,t2ce.* from t2ce, Flags where t2ce.FleetID=Flags.FleetID” # Efforts # Reach the efforts pivot DSD using a function in ICCAT_functions.R # If we want in the output dataset the column ‘FleetCode’ instead of ‘flag’ source(“https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/FUN_efforts_ICCAT_CE_without_schooltype.R”) efforts_pivot_ICCAT<-FUN_efforts_ICCAT_CE_without_schooltype(RFMO_CE = t2ce,ICCAT_CE_species_colnames)
efforts_pivot_ICCAT <- t2ce %>%
dplyr::select(
-dplyr::all_of(ICCAT_CE_species_colnames),
-CatchUnit
)
source(“https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/FUN_efforts_ICCAT_CE_keep_all_efforts.R”) efforts_pivot_ICCAT<-FUN_efforts_ICCAT_CE_keep_all_efforts(efforts_pivot_ICCAT,c(“Eff1”,“Eff2”),c(“Eff1Type”,“Eff2Type”)) Keep all columns except the raw effort/unit columns, because these will be rebuilt in a long format
cols_to_keep <- setdiff(
names(efforts_pivot_ICCAT),
c("Eff1", "Eff2", "Eff1Type", "Eff2Type")
)
Add an identifier for each original row, so we can track which Eff1/Eff2 pairs come from the same source row
tmp <- efforts_pivot_ICCAT %>%
dplyr::mutate(original_row_id = dplyr::row_number())
Convert Eff1 and Eff2 to numeric safely (important because some effort values may use commas instead of dots)
tmp <- tmp %>%
dplyr::mutate(
Eff1_num = suppressWarnings(as.numeric(gsub(",", ".", Eff1))),
Eff2_num = suppressWarnings(as.numeric(gsub(",", ".", Eff2)))
)
Build the Eff1 rows: these are always considered the primary effort rows
eff1_rows <- tmp %>%
dplyr::transmute(
dplyr::across(dplyr::all_of(cols_to_keep)),
original_row_id,
effort_source = "Eff1",
Effort = Eff1_num,
EffortUnits = Eff1Type,
is_duplicate_strata = FALSE
) %>%
# Remove empty or zero Eff1 values
dplyr::filter(!is.na(Effort) & Effort != 0)
Build the Eff2 rows: these may either replace missing Eff1, or become duplicated rows
eff2_rows <- tmp %>%
dplyr::transmute(
dplyr::across(dplyr::all_of(cols_to_keep)),
original_row_id,
Eff1_num,
effort_source = "Eff2",
Effort = Eff2_num,
EffortUnits = Eff2Type
) %>%
# Remove empty or zero Eff2 values
dplyr::filter(!is.na(Effort) & Effort != 0) %>%
# If Eff1 existed and was non-zero, then Eff2 is an additional duplicated row
# If Eff1 was missing or zero, then Eff2 replaces Eff1 and is not a duplicate
dplyr::mutate(
is_duplicate_strata = !is.na(Eff1_num) & Eff1_num != 0
) %>%
dplyr::select(-Eff1_num)%>% dplyr::mutate(EffortUnits = as.character(EffortUnits))
Combine both sets of rows into one long dataset
efforts_pivot_ICCAT <- dplyr::bind_rows(eff1_rows, eff2_rows) %>%
dplyr::select(
original_row_id,
effort_source,
is_duplicate_strata,
dplyr::everything()
)
School
efforts_pivot_ICCAT$School<-"OTH"
Flag
efforts_pivot_ICCAT$FleetCode_short <- sub("-.*", "", efforts_pivot_ICCAT$FleetCode) # fleet code only what is after the '-'
names(efforts_pivot_ICCAT)[names(efforts_pivot_ICCAT) == 'FleetCode_short'] <- 'FishingFleet'
efforts_pivot_ICCAT <- efforts_pivot_ICCAT[, c("FishingFleet", setdiff(names(efforts_pivot_ICCAT), "FishingFleet"))] # put flag in first position
Reach the efforts harmonized DSD using a function in ICCAT_functions.R
colToKeep_efforts <- c("FishingFleet","Gear","time_start","time_end","AreaName","School","EffortUnits","Effort")
source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/sardara_functions/ICCAT_CE_effort_pivotDSD_to_harmonizedDSD.R")
efforts_pivot_ICCAT$Lat <- floor(abs(efforts_pivot_ICCAT$Lat)) # we put floor as independently of the quadrant the floor always correspond to the cwp
efforts_pivot_ICCAT$Lon <- floor(abs(efforts_pivot_ICCAT$Lon))
handling duplicated stratas
efforts<-ICCAT_CE_effort_pivotDSD_to_harmonizedDSD(efforts_pivot_ICCAT,colToKeep_efforts)
efforts$CatchType <- "C" #bastien adding as it is not in effort function but it is in chatch function
colnames(efforts)<-c("fishing_fleet","gear_type","time_start","time_end","geographic_identifier","fishing_mode","measurement_unit","measurement_value","measurement_type")
efforts$source_authority<-"ICCAT"
efforts <- efforts %>% dplyr::mutate(fishing_mode = ifelse(fishing_mode == "UNK", "OTH", fishing_mode))
efforts$time_start <- as.Date(efforts$time_start)
efforts$time_end <- as.Date(efforts$time_end)
dataset_temporal_extent <- paste(
paste0(format(min(efforts$time_start), "%Y"), "-01-01"),
paste0(format(max(efforts$time_end), "%Y"), "-12-31"),
sep = "/"
)
output in same folder as path_to_raw_dataset
output_name_dataset <- here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_noSchool_harmonized.csv')
write.csv(efforts, output_name_dataset, row.names = FALSE)
georef_dataset <- efforts
@ Load pre-harmonization scripts and apply mappings
download.file('https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/R/tunaatlas_scripts/pre-harmonization/map_codelists_no_DB.R', destfile = 'local_map_codelists_no_DB.R')
source('local_map_codelists_no_DB.R')
fact <- "effort"
mapping_codelist <- map_codelists_no_DB(fact, mapping_dataset = "https://raw.githubusercontent.com/fdiwg/fdi-mappings/main/global/firms/gta/codelist_mapping_rfmos_to_global.csv", dataset_to_map = georef_dataset, mapping_keep_src_code = FALSE, summary_mapping = TRUE, source_authority_to_map = c("IATTC", "CCSBT", "WCPFC", "ICCAT", "IOTC"))
##
## mapping dimension gear_type with code list mapping
##
## mapping dimension fishing_fleet with code list mapping
##
## mapping dimension fishing_mode with code list mapping
##
## mapping dimension measurement_unit with code list mapping
@ Handle unmapped values and save the results
georef_dataset <- mapping_codelist$dataset_mapped %>% dplyr::mutate(fishing_fleet = ifelse(fishing_fleet == 'UNK', 'NEI', fishing_fleet), gear_type = ifelse(gear_type == 'UNK', '99.9', gear_type))
data.table::fwrite(mapping_codelist$recap_mapping, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_noSchool_recap_mapping.csv'))
data.table::fwrite(mapping_codelist$not_mapped_total, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_noSchool_not_mapped_total.csv'))
data.table::fwrite(georef_dataset, here::here('R/tunaatlas_scripts/pre-harmonization', 'iccat', 'effort', 'data', 't2ce_noSchool_CWP_dataset.csv'))
Display the first few rows of the mapping summaries
print(head(mapping_codelist$recap_mapping))
## # A tibble: 6 × 5
## src_code trg_code src_codingsystem trg_codingsystem source_authority
## <chr> <chr> <chr> <chr> <chr>
## 1 D.AT SEA DAYS effortunit_iccat effortunit_rfmos ICCAT
## 2 NO.BOATS BOATS effortunit_iccat effortunit_rfmos ICCAT
## 3 NO.HOOKS HOOKS effortunit_iccat effortunit_rfmos ICCAT
## 4 NO.TRAPS NO.TRAPS effortunit_iccat effortunit_rfmos ICCAT
## 5 TRAP D TRAP D effortunit_iccat effortunit_rfmos ICCAT
## 6 OTH OTH schooltype_iccat schooltype_rfmos ICCAT