Published March 23, 2017 | Version v1
Dataset Open

Data from: Why do phylogenomic data sets yield conflicting trees? Data type influences the avian tree of life more than taxon sampling

Description

Phylogenomics, the use of large-scale data matrices in phylogenetic analyses, has been viewed as the ultimate solution to the problem of resolving difficult nodes in the tree of life. However, it has become clear that analyses of these large genomic data sets can also result in conflicting estimates of phylogeny. Here, we use the early divergences in Neoaves, the largest clade of extant birds, as a "model system" to understand the basis for incongruence among phylogenomic trees. We were motivated by the observation that trees from two recent avian phylogenomic studies exhibit conflicts. Those studies used different strategies: 1) collecting many characters [42 mega base pairs (Mbp) of sequence data] from 48 birds, sometimes including only one taxon for each major clade; and 2) collecting fewer characters (0.4 Mbp) from 198 birds, selected to subdivide long branches. However, the studies also used different data types: the taxon-poor data matrix comprised 68% non-coding sequences whereas coding exons dominated the taxon-rich data matrix. This difference raises the question of whether the primary reason for incongruence is the number of sites, the number of taxa, or the data type. To test among these alternative hypotheses we assembled a novel, large-scale data matrix comprising 90% non-coding sequences from 235 bird species. Although increased taxon sampling appeared to have a positive impact on phylogenetic analyses the most important variable was data type. Indeed, by analyzing different subsets of the taxa in our data matrix we found that increased taxon sampling actually resulted in increased congruence with the tree from the previous taxon-poor study (which had a majority of non-coding data) instead of the taxon-rich study (which largely used coding data). We suggest that the observed differences in the estimates of topology for these studies reflect data-type effects due to violations of the models used in phylogenetic analyses, some of which may be difficult to detect. If incongruence among trees estimated using phylogenomic methods largely reflects problems with model fit developing more "biologically-realistic" models is likely to be critical for efforts to reconstruct the tree of life.

Notes

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: DEB-0228682, DEB-1118823, DEB-0228675, DEB-0228688, DEB-0733029, and DEB-0228617

Files

Reddy_sup_figS1_fileS1_indicator_clades.pdf

Files (22.2 MB)

Name Size Download all
md5:028b6f2a6be4a6abe630015fdcb0faaa
142.3 kB Preview Download
md5:e02d48ae4ff50cf98d4a64449eb89f99
173.5 kB Preview Download
md5:d9f694f0d0f822dc76e905e51f504245
228.7 kB Preview Download
md5:628f4c245973c01d6250e07951607eef
154.1 kB Preview Download
md5:785033eec3bbac5cfb50f80b0146174f
143.0 kB Preview Download
md5:43725b0549b96c67ec040a1cf884cc5c
9.0 MB Download
md5:9782165c0cc015b77cb460041f35c949
20.5 kB Download
md5:529a59dddf16c110413a22eca29948cb
7.0 kB Download
md5:3d762e7ee751c91c53b8b76c5e0f240c
73.6 kB Download
md5:691aba215d0744502dd8713857760e28
80.2 kB Download
md5:852293f32d83d843a5315df50b4b1b89
62.4 kB Download
md5:86b6d67bd4aa14531e6687a27f5dc522
11.4 MB Download
md5:a6b7bf085038031c6fc2aa0427cf980e
259.6 kB Download
md5:4ef8111a5bb3c5e5239a2f904c0d0b6f
172.3 kB Download
md5:47637fe0840efe60fd20957ad5f9474e
78.4 kB Download
md5:648d255374e053361b49e32f5a253953
113.0 kB Download
md5:c12671b80b2a9d846e35126e6178d79d
55.1 kB Preview Download

Additional details

Related works

Is cited by
10.1093/sysbio/syx041 (DOI)