Published July 3, 2017 | Version v1
Dataset Open

Data from: Why concatenation fails near the anomaly zone

  • 1. Indiana University Bloomington

Description

Genome-scale sequencing has been of great benefit in recovering species trees, but has not provided final answers. Despite the rapid accumulation of molecular sequences, resolving short and deep branches of the tree of life has remained a challenge, and has prompted the development of new strategies that can make the best use of available data. One such strategy – the concatenation of gene alignments – can be successful when coupled with many tree estimation methods, but has also been shown to fail when there are high levels of incomplete lineage sorting. Here, we focus on the failure of likelihood-based methods in retrieving a rooted, asymmetric four-taxon species tree from concatenated data when the species tree is in or near the anomaly zone – a region of parameter space where the most common gene tree does not match the species tree because of incomplete lineage sorting. First, we use coalescent theory to prove that most informative sites will support the species tree in the anomaly zone, and that as a consequence maximum-parsimony succeeds in recovering the species tree from concatenated data. We further show that maximum-likelihood tree estimation from concatenated data fails both inside and outside the anomaly zone, and that this failure cannot be easily predicted from the topology of the most common gene tree. We show that likelihood-based methods often fail in a region partially overlapping the anomaly zone, likely because of the lower relative cost of substitutions on discordant gene tree branches that are absent from the species tree. Our results confirm and extend previous reports on the performance of these methods applied to concatenated data from a rooted, asymmetric four-taxon species tree, and highlight avenues for future work improving the performance of methods aimed at recovering species tree.

Notes

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: DEB-1136707

Files

Appendix.pdf

Files (1.1 MB)

Name Size Download all
md5:b6f7ae7105eb481b18e4ee53f874fb10
677.6 kB Preview Download
md5:fa25fd49c0f2b31fa0b95974a6022d46
479 Bytes Preview Download
md5:ebb895bd780cd7c3e22c95587884c309
1.2 kB Preview Download
md5:17a4ed729b83ab97c26d8b36a1bf20a5
170.8 kB Preview Download
md5:4ec06c94563345a0205ef68d3965924f
200.0 kB Preview Download

Additional details

Related works

Is cited by
10.1093/sysbio/syx063 (DOI)