Published December 23, 2022 | Version v1
Dataset Open

Data from: Improving quartet graph construction for scalable and accurate species tree estimation from gene trees

  • 1. University of Maryland, College Park

Description

Summary methods are one of the dominant approaches for estimating species trees from genome-scale data. However, they can fail to produce accurate species trees when the input gene trees are highly discordant due to gene tree estimation error as well as biological processes, like incomplete lineage sorting. Here, we introduce a new summary method TREE-QMC that offers improved accuracy and scalability under these challenging scenarios. TREE-QMC builds upon the algorithmic framework of QMC (Snir and Rao 2010) and its weighted version wQMC (Avni et al. 2014). Their approach takes weighted quartets (four-leaf trees) as input and builds a species tree in a divide-and-conquer fashion, at each step constructing a graph and seeking its max cut. We improve upon this methodology in two ways. First, we address scalability by providing an algorithm to construct the graph directly from the input gene trees. By skipping the quartet weighting step, TREE-QMC has a time complexity of O(n^3 k) with some assumptions on subproblem sizes, where n is the number of species and k is the number of gene trees. Second, we address accuracy by normalizing the quartet weights to account for "artificial taxa," which are introduced during the divide phase so that solutions on subproblems can be combined during the conquer phase. Together, these contributions enable TREE-QMC to outperform the leading methods (ASTRAL-III, FASTRAL, wQFM) in an extensive simulation study. We also present the application of these methods to an avian phylogenomics data set.

Notes

Funding provided by: State of Maryland
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100017027
Award Number:

Files

README.md

Files (3.2 GB)

Name Size Download all
md5:98c279d340cb8022d8bae6c56ec9b49a
35.5 MB Download
md5:9af4e207a3b63bbdabe907182fe705e1
715.8 kB Download
md5:53fd9b5673572f7a9300b0a96aac179d
640.7 kB Download
md5:82e056abd6b3d6bdd304d9f06c17b9fb
81.7 MB Download
md5:56bd9ae92c8e2f7899d2090cac5d37a9
559.5 kB Download
md5:094628387433b0ced9d4f0c108cde00c
936 Bytes Download
md5:f7bc38e08d61b6f89ddadcca8864c8f8
229.9 kB Download
md5:11a6c2c0c23140c1c1a87a44dac84bb5
310.4 kB Download
md5:ed08c46dda5ad005011aaa34b5db6c3b
5.4 MB Download
md5:1607eeae407bd8bb6d2d44658e1d7ec3
171.9 kB Download
md5:54c7652b20e53479c7973a80b48d295e
676 Bytes Download
md5:00dce94cc1f3877946b46023dc2e6560
9.0 MB Download
md5:a63ba97fc8d324e2ec9e59bcab42ad1d
1.2 GB Download
md5:404dfa984af1a83af550f4d629325d20
24.3 MB Download
md5:dd4648281b8df4f02e7f06d98f3a7df3
225.6 MB Download
md5:dae46b8a5959e551ee1bd2684bcf1732
1.6 GB Download
md5:59d9b21142608a41d935e5f715fb7771
2.5 MB Download
md5:69dff52d610cb8c20411e1e37e351a13
4.9 kB Preview Download

Additional details