Published June 6, 2019 | Version v1
Dataset Open

Data from: Probabilistic species tree distances: implementing the multispecies coalescent to compare species trees within the same model-based framework used to estimate them

  • 1. The University of Texas at Arlington

Description

Despite the ubiquitous use of statistical models for phylogenomic and population genomic inferences, this model-based rigor is rarely applied to post-hoc comparison of trees. In a recent study, Garba and colleagues derived new methods for measuring the distance between two gene trees computed as the difference in their site pattern probability distributions. Unlike traditional metrics that compare trees solely in terms of geometry, these measures consider gene trees and associated parameters as probabilistic models that can be compared using standard information theoretic approaches. Consequently, probabilistic measures of phylogenetic tree distance can be far more informative than simply comparisons of topology and/or branch lengths alone. However, in their current form, these distance measures are not suitable for the comparison of species tree models in the presence of gene tree heterogeneity. Here we demonstrate an approach for how the theory of Garba et al. (2018), which is based on gene tree distances, can be extended naturally to the comparison of species tree models. Multispecies coalescent models (MSC) parameterize the discrete probability distribution of gene trees conditioned upon a species tree with a particular topology and set of divergence times (in coalescent units), and thus provide a framework for measuring distances between species tree models in terms of their corresponding gene tree topology probabilities. We describe the computation of probabilistic species tree distances in the context of standard MSC models, which assume complete genetic isolation post-speciation, as well as recent theoretical extensions to the MSC in the form of network-based MSC models that relax this assumption and permit hybridization among taxa. We demonstrate these metrics using simulations and empirical species tree estimates and discuss both the benefits and limitations of these approaches. We make our species-tree distance approach available as an R package called pSTDistanceR, for open use by the community.

Notes

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: DEB-1655571

Files

SupplementaryFigure01_11Dec2018.pdf

Files (250.8 kB)

Name Size Download all
md5:9b2502f8e22efff677be95d2fc19ec33
156.7 kB Download
md5:7ec163529380ddf17ddff7d92499b5bb
94.1 kB Preview Download

Additional details

Related works

Is cited by
10.1093/sysbio/syz031 (DOI)