Published July 27, 2021 | Version v1
Software Open

Source code for R tutorials and dataset for empirical case study on Malurus elegans (red-winged fairy wren)

  • 1. Nederlands Instituut voor Ecologie
  • 2. Radboud University Nijmegen

Description

Biological processes exhibit complex temporal dependencies due to the sequential nature of allocation decisions in organisms' life-cycles, feedback loops, and two-way causality. Consequently, longitudinal data often contain cross-lags: the predictor variable depends on the response variable of the previous time-step. Although statisticians have warned that regression models that ignore such covariate endogeneity in time series are likely to be inappropriate, this has received relatively little attention in biology. Furthermore, the resulting degree of estimation bias remains largely unexplored.

We use a graphical model and numerical simulations to understand why and how regression models that ignore cross-lags can be biased, and how this bias depends on the length and number of time series. Ecological and evolutionary examples are provided to illustrate that cross-lags may be more common than is typically appreciated and that they occur in functionally different ways.

We show that routinely used regression models that ignore cross-lags are asymptotically unbiased. However, this offers little relief, as for most realistically feasible lengths of time series conventional methods are biased. Furthermore, collecting time series on multiple subjects–such as populations, groups or individuals—does not help to overcome this bias when the analysis focusses on within-subject patterns (often the pattern of interest). Simulations (R tutorial 1 & 2), a literature search and a real-world empirical example on fairy wrens (data archived here with analyses presented in R-tutorial 3) together suggest that approaches that ignore cross-lags are likely biased in the direction opposite to the sign of the cross-lag (e.g. towards detecting density-dependence of vital rates and against detecting life history trade-offs and benefits of group living). Next, we show that multivariate (e.g. structural equation) models can dynamically account for cross-lags, and simultaneously address additional bias induced by measurement error, but only if the analysis considers multiple time series.

We provide guidance on how to identify a cross-lag and subsequently specify it in a multivariate model, which can be far from trivial. Our tutorials with data and R code of the worked examples provide step‐by‐step instructions on how to perform such analyses.

Our study offers insights into situations in which cross-lags can bias analysis of ecological and evolutionary time series and suggests that adopting dynamical models can be important, as this directly affects our understanding of population regulation, the evolution of life histories and cooperation, and possibly many other topics. Determining how strong estimation bias due to ignoring covariate endogeneity has been in the ecological literature requires further study, also because it may interact with other sources of bias.

Notes

Tutorials (Rmarkdown files), R function (R-file) and empirical data (asci-text file) associated with the paper (5 files in total).

Tutorial1.rmd shows estimation bias in simulated dataset (see Box 1 & 2 in paper for details).

Tutorial2.rmd illustrates bias due to measurement error and how to account for it (see Box 3 in paper for details).

Tutorial3.rmd explains how to analyze the real-world case study of group living benefits in red-winged fairy wrens (see Box 4 in paper for details).

Melegans.txt contains the emprical data for red-winged fairy wrens (Malurus elegans) for each of the 108 groups (SubjectID) across 9 years (time). Presented are 698 values for adult group size (GroupSize), the number of surviving adults till the next year (Survivors) and the group productivity in terms of number of offspring produced in a year that survives till the next year (Offspring), and their 1-timestep-lagged variables (OffspringLagged & GroupSizeLagged, with LaggedUnavailable=1 meaning missing lagged value). There are no further missing values, see description in Box 4 in paper and references therein for details.

simulation_functions.R contains the R functions used in Tutorials 1-3.

Funding provided by: Australian Research Council
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000923
Award Number: DE130100174

Files

Files (91.8 kB)

Name Size Download all
md5:485232d57dccc81dc8e99171de7eef3c
33.8 kB Download
md5:286444b70459a71b5f1b3286d0ecf1d1
21.8 kB Download
md5:659b09fca3123ea1d108d55d0dfd83ec
11.5 kB Download
md5:11dc6c7bf05c9b2fbe7953594f671b04
24.6 kB Download

Additional details

Related works

Is source of
10.5061/dryad.7h44j0ztw (DOI)