Published January 27, 2026 | Version v1
Software Open

Data from: Model-based data integration improves species distribution models for data deficient and narrow-ranged hummingbird species

  • 1. Finnish Environment Institute
  • 2. Yale University

Description

For species with narrow ranges or low population sizes, a deficiency of species occurrence records can limit the capacity to build accurate species distribution models (SDMs). Model-based integration of data from multiple sources has been offered as a solution to improve predictions of species' distributions at large scales, especially for data-deficient species, but clear empirical demonstrations for this are lacking. The study location was South and Central America. We applied a state-of-the-art data integration technique to model the distributions of 98104 hummingbird species. We fitted SDMs using either presence-absence (PA) data from eBird or presence-only (PO) data from eBird and the Global Biodiversity Information Facility (GBIF) and compared them to integrated SDMs, which utilize both PA and PO data. We fitted generalized linear mixed-effects models and validated them with spatial block cross-validation and expert range map adjusted validation. We also conducted an experiment using artificially thinned datasets of 47 abundant enough species to assess model performance under different levels of data deficiency. Data integration improved model performance compared to PA models for species for which PA data covered poorly the environmental conditions in the study area. Thinning experiment showed that even a small amount of PO data in data integration improved the predictive accuracy in comparison to PA models which was not clear in the cross-validation results with the full data. In comparison to PO models, data integration improved models over all species, but especially for data rich species with large geographical ranges. Overall, data integration enables a more comprehensive capture of available species information and can improve range predictions in comparison to conventional modeling methods.

Notes

Funding provided by: E.O. Wilson Biodiversity Foundation
ROR ID: https://ror.org/03a01dc42
Award Number:

Funding provided by: National Aeronautics and Space Administration
ROR ID: https://ror.org/027ka1x80
Award Number: 80NSSC17K0282

Funding provided by: National Aeronautics and Space Administration
ROR ID: https://ror.org/027ka1x80
Award Number: 80NSSC18K0435

Methods

This data set is a collection of publicly available species and environmental data. They have been used to study populations and distributions of species and their associations with environment. All data processing, analysis and presentations are conducted with R and Rstudio, which are openly available.

Files

Files (766.5 kB)

Name Size Download all
md5:716a4e9a118b05ecc8c57f42bb4caa57
766.5 kB Download

Additional details

Related works

Is source of
10.5061/dryad.q83bk3jrs (DOI)