Published January 3, 2025 | Version v1
Dataset Open

Data from: Global conservation prioritisation approach provides credible results at a regional scale

  • 1. University of Maryland

Contributors

  • 1. EDMO icon University of Maryland

Description

Overview

This repository contains code and data for Roswell and Espíndola "Global conservation prioritization approach provides credible results at a regional scale" (doi:10.1111/ddi.13969), a manuscript about predicting which unassessed regional taxa are likely to faceconservation threats... using occurrence data, covariates, and random forest classifiers. The development version is on GitHub https://github.com/mikeroswell/threatRF.git

General organization

  • code contains R scripts to download occurrence data and GIS layers, process them, and fit the Random Forests.
  • data contains all the downloaded and manufactured datasets (these are often large) for this project
  • data/fromR mainly contains tables generated by the scripts in code
  • data/GIS_downloads contains raster layers downloaded from various sources

Workflow within `code/`: 

Utilities

data cleanup

1. tidy_flora.R uses regex matching to turn .pdf into a flat file
2. robust_gbif_namesearch.R wraps an `rgbif` function to try to get nice matches for taxon names without returning synonyms if a valid match exists.


model fitting, etc. 
1. fix_mod.R handles novel factor levels when using various `predict` functions.
2. RF_tuner.R specifies how to tune and fit the random forests
3. RF_setup.R creates folds for model fitting, cleans up model formulae

Data download and analysis scripts (may call 1 or more utilities above)

  1. download_gis.R documents the sources of many of the GIS layers used downstream. Created a long time ago and unstable. Do not run
  2. download_occurrences_and_statuses.R documents the queries in GBIF and natureserve. Largely stable but not rerun; the dataset liable to change if rerun.
  3. crunch_GIS.R Should be rel. stable, all GIS work done in R
  4. fit_RF.R Fits random forests
  5. graphing_model_outputs.R generates figures and tabular results

Data

The data input for analyses is saved as a .RDA file data/fromR/lfs/to_predict.RDA

This dataset is generated by cleaning and harmonizing occurrence data (GBIF.org (08 May 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.9jrwwd.) with conservation status data from Nature Serve and geographic covariates from a variety of sources, with further details in scripts described above.

Files

Roswell_Espíndola_Zenodo_code.zip

Files (18.7 GB)

Name Size Download all
md5:dae736de76d5aeeda2240c8a5807d53f
34.6 kB Preview Download
md5:6c6d29a7fe839dbf741461001afdc037
18.7 GB Preview Download

Additional details

Related works

Is described by
Journal article: 10.1111/ddi.13969 (DOI)

Funding

University of Maryland, College Park

Dates

Available
2025-01-03

Software

Programming language
R
Development Status
Inactive