Published May 31, 2018 | Version 1
Journal article Open

Comparison of large-scale citizen science data and long-term study data for phenology modeling

  • 1. University of Florida
  • 2. Chinese Academy of Sciences


Codebase and data files for this study.

Github repo:

Manuscript preprint:


Large-scale observational data from citizen science efforts are becoming increasingly common in ecology, and researchers often choose between these and data from intensive local-scale studies for their analyses. This choice has potential trade-offs related to spatial scale, observer variance, and inter-annual variability. Here we explored this issue with phenology by comparing models built using data from the large-scale, citizen science National Phenology Network (NPN) effort with models built using data from more intensive studies at Long Term Ecological Research (LTER) sites. We built process based phenology models for species common to each dataset. From these models we compared parameter estimates, estimates of phenological events, and out-of-sample errors between models derived from both NPN and LTER data. We found that model parameter estimates for the same species were most similar between the two datasets when using simple models, but parameter estimates varied widely as model complexity increased. Despite this, estimates for the date of phenological events and out-of-sample errors were similar, regardless of the model chosen. Predictions for NPN data had the lowest error when using models built from the NPN data, while LTER predictions were best made using LTER-derived models, confirming that models perform best when applied at the same scale they were built. Accordingly, the choice of dataset depends on the research question. Inferences about species-specific phenological requirements are best made with LTER data, and if NPN or similar data are all that is available, then analyses should be limited to simple models. Large-scale predictive modeling is best done with the larger-scale NPN data, which has high spatial representation and a large regional species pool. LTER datasets, on the other hand, have high site fidelity and thus characterize inter-annual variability extremely well. Future research aimed at forecasting phenology events for particular species over larger scales should develop models which integrate the strengths of both datasets.



Files (538.6 MB)

Name Size Download all
538.6 MB Preview Download

Additional details

Related works

Is cited by
10.1101/335802 (DOI)