Evidence-based guidelines for developing automated conservation assessment methods (script outputs)

Walker, Barnaby E.; Leão, Tarciso C.C.; Bachman, Steven P.; Lucas, Eve; Nic Lughadha, Eimear

doi:10.5281/zenodo.4899925

Published June 4, 2021 | Version 1.0.0

Dataset Open

Evidence-based guidelines for developing automated conservation assessment methods (script outputs)

1. Royal Botanic Gardens, Kew, London, UK

Script outputs for the paper "Evidence-based guidelines for developing automated conservation assessment methods".

The code used to generate these outputs can be found on GitHub.

To use these outputs, download the code, download this dataset and extract the dataset in the project folder. Some outputs are in the RData format, including all of the trained models. To view these files in R you may need to install the packages listed in the README of the GitHub project.

The outputs are arranged in this file structure:

output
- cleaned_occurrences: CSV files containing the GBIF ID of all occurrence records retained after each cleaning step, and the IPNI ID of the species they relate to. Generated by the script 05_clean_occurrences.R.
- explanations: SHapely Additive exPlanations for an example set of predictions. Generated by the script 08_calculate_explanations.R.
- model_results: CSV files with the evaluation results for each model, on each study group, after each cleaning step. There are results for the method performance, learning curves, and permutation importance (random forest models only), as well as predictions for test sets and unassessed species. Generated by the script 07_evaluate_methods.R.
- models: RData files containing the trained models, generated by the script 07_evaluate_methods.R.
- name_matching: CSV files with the results of matching IUCN Red List assessment and GBIF names to WCVP taxonomy, as well as JSON files used to manually resolve ambiguous and missing matches. Generated by the scripts 02_collate_species.R and 03_process_occurrences.R.
- predictors: CSV files with species-level predictors calculated from the cleaned occurrence files, ready for input into automated assessment methods. Generated by the script 06_prepare_predictors.R.
- rasters: Processed raster files used to calculate species-level predictors. Generated by the script 01_process_rasters.R.
- results: CSV files of summarised results, generated by the script 09_summarise_results.R.
- {group}_distributions.csv: CSV files with the distribution for species in each study group, downloaded from POWO by the script 02_collate_species.R.
- {group}-{source}_species-list.csv: The list of species for each study group along with their IUCN Red List category if they have been assessed, generated by the script 02_collate_species.R. The 'source' refers to if the assessments were from the IUCN Red List or Sampled Red List Index.
- {group}-GBIF_occurrences.csv: The occurrence records for each species group, downloaded from GBIF. Generated by the script 03_process_occurrences.R.
- {group}-GBIF_labelled-occurrences.csv: The occurrence records for each species group labelled with values extracted at their coordinates from the rasters in the rasters folder. Generated by the script 04_annotate_points.R.

Files

output.zip

Files (34.4 GB)

Name	Size	Download all
output.zip md5:f4e290e7ee9042c8f155910a2d100519	34.4 GB	Preview Download

	All versions	This version
Views	208	208
Downloads	21	21
Data volume	722.4 GB	722.4 GB

Evidence-based guidelines for developing automated conservation assessment methods (script outputs)

Creators

Description

Files

output.zip

Files (34.4 GB)