Data from: Improving the robustness of phylogenetic independent contrasts: Addressing abrupt evolutionary shifts with outlier- and distribution-guided correlation
Authors/Creators
- 1. Beijing Normal University
Description
This dataset provides a comprehensive resource for evaluating phylogenetic comparative methods under diverse evolutionary scenarios. The dataset includes: simulated phylogenetic trees (fixed-full-balanced and randomly generated), trait data for 16, 128, and 256 species, incorporating both gradual and abrupt evolutionary shifts, statistical outputs from multiple phylogenetic comparative methods, including PIC-OGC, PIC-MM, and other robust regression models, and benchmark results for detecting trait correlations under varying degrees of phylogenetic autocorrelation and noise. This dataset enables researchers to explore the impact of evolutionary shifts on trait correlation analysis, compare the performance of phylogenetic methods, and validate novel approaches for handling outliers and non-normal data distributions.
Notes
Methods
Phylogenetic tree simulation: Two types of phylogenetic trees were simulated: balanced trees with fixed topologies and randomly generated trees using a coalescent model. The random trees introduced variability in branching rates to reflect diverse phylogenetic scenarios. Tree sizes included 16,128 and 256 species.
Trait data simulation: Trait data were generated under both Brownian motion (BM) and abrupt evolutionary shift scenarios. For abrupt shifts, two traits (X1 and X2) were simulated with independent evolution except for a significant shift at the root branch. Gradual evolution data were simulated under BM with varying levels of noise.
Statistical analysis: Multiple phylogenetic comparative methods were applied to the datasets, including:
- PIC-OGC: A hybrid framework integrating Pearson and Spearman correlations to handle outliers and non-normal data.
- Robust regression methods (PIC-MM, PIC-L1, etc.).
- PGLS models optimized across evolutionary scenarios (BM, λ, OU fixed/random, EB).
- PGLMM : Phylogenetic generalized linear mixed model.
- MR-PMM: Multi-response phylogenetic mixed model.
- Benchmarks for evaluating true and spurious correlations were constructed using simulation parameters.
Data processing: All simulations and analyses were conducted using R (version 4.1.3). Packages including phytools, ape, phylolm, ROBRTand MCMCglmm were employed for tree generation, PIC calculation, and statistical modeling. The dataset was pre-processed to include raw and processed outputs for reproducibility and ease of use.
Files
code.zip
Files
(28.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:a8c00543e3e49b357c9d910a4251c7a8
|
28.1 kB | Preview Download |
Additional details
Related works
- Is source of
- 10.5061/dryad.8w9ghx3xp (DOI)