Published March 17, 2026 | Version v1
Software Open

Data from: Improving the robustness of phylogenetic independent contrasts: Addressing abrupt evolutionary shifts with outlier- and distribution-guided correlation

  • 1. Beijing Normal University

Description

This dataset provides a comprehensive resource for evaluating phylogenetic comparative methods under diverse evolutionary scenarios. The dataset includes: simulated phylogenetic trees (fixed-full-balanced and randomly generated), trait data for 16, 128, and 256 species, incorporating both gradual and abrupt evolutionary shifts, statistical outputs from multiple phylogenetic comparative methods, including PIC-OGC, PIC-MM, and other robust regression models, and benchmark results for detecting trait correlations under varying degrees of phylogenetic autocorrelation and noise. This dataset enables researchers to explore the impact of evolutionary shifts on trait correlation analysis, compare the performance of phylogenetic methods, and validate novel approaches for handling outliers and non-normal data distributions.

Notes

Funding provided by: National Natural Science Foundation of China
ROR ID: https://ror.org/01h0zpd94
Award Number: 31671321

Methods

Phylogenetic tree simulation: Two types of phylogenetic trees were simulated: balanced trees with fixed topologies and randomly generated trees using a coalescent model. The random trees introduced variability in branching rates to reflect diverse phylogenetic scenarios. Tree sizes included 16,128 and 256 species.

Trait data simulation: Trait data were generated under both Brownian motion (BM) and abrupt evolutionary shift scenarios. For abrupt shifts, two traits (X1 and X2) were simulated with independent evolution except for a significant shift at the root branch. Gradual evolution data were simulated under BM with varying levels of noise.

Statistical analysis: Multiple phylogenetic comparative methods were applied to the datasets, including:

  • PIC-OGC: A hybrid framework integrating Pearson and Spearman correlations to handle outliers and non-normal data.
  • Robust regression methods (PIC-MM, PIC-L1, etc.).
  • PGLS models optimized across evolutionary scenarios (BM, λ, OU fixed/random, EB).
  • PGLMM : Phylogenetic generalized linear mixed model.
  • MR-PMM: Multi-response phylogenetic mixed model.
  • Benchmarks for evaluating true and spurious correlations were constructed using simulation parameters.

Data processing: All simulations and analyses were conducted using R (version 4.1.3). Packages including phytools, ape, phylolm, ROBRTand MCMCglmm were employed for tree generation, PIC calculation, and statistical modeling. The dataset was pre-processed to include raw and processed outputs for reproducibility and ease of use.

Files

code.zip

Files (28.1 kB)

Name Size Download all
md5:a8c00543e3e49b357c9d910a4251c7a8
28.1 kB Preview Download

Additional details

Related works

Is source of
10.5061/dryad.8w9ghx3xp (DOI)