Published May 2, 2022 | Version v1
Dataset Open

Conflict over the eukaryote root resides in strong outliers, mosaics and missing data sensitivity of site-specific (CAT) mixture models

  • 1. Uppsala University

Description

Abstract Phylogenetic reconstruction using concatenated loci ("phylogenomics" or "supermatrix phylogeny") is a powerful tool for solving evolutionary splits that are poorly resolved in single gene/protein trees (SGTs). However, recent phylogenomic attempts to resolve the eukaryote root have yielded conflicting results, along with claims of various artefacts hidden in the data. We have investigated these conflicts using two new methods for assessing phylogenetic conflict. ConJak uses whole marker (gene or protein) jackknifing to assess deviation from a central mean for each individual sequence, while ConWin uses a sliding window to screen for incongruent protein fragments (mosaics). Both methods allow selective masking of individual sequences or sequence fragments in order to minimize missing data, an important consideration for resolving deep splits with limited data. Analyses focused on a set of 76 eukaryotic proteins of bacterial-ancestry previously used in various combinations to assess the branching order among the three major divisions of eukaryotes: Amorphea (mainly animals, fungi and Amoebozoa), Diaphoretickes (most other well-known eukaryotes and nearly all algae) and Excavata, represented here by Discoba (Jakobida, Heterolobosea, and Euglenozoa). ConJak analyses found strong outliers to be concentrated in under-sampled lineages, while ConWin analyses of Discoba, the most under-sampled of the major lineages, detected potentially incongruent fragments scattered throughout. Phylogenetic analyses of the full data using an LG-gamma model support a Discoba sister scenario (neozoan-excavate root), which rises to 99-100% bootstrap support with data masked according to either protocol. However, analyses with two site-specific (CAT) mixture models yielded widely inconsistent results and a striking sensitivity to missing data. The neozoan-excavate root places Amorphea and Diaphoretickes as more closely related to each other than either is to Discoba, a fundamental relationship that should remain unaffected by additional taxa.

Notes

Funding provided by: Vetenskapsrådet
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100004359
Award Number: VR 2017-04351

Files

intermediate_data.zip

Files (25.8 MB)

Name Size Download all
md5:29084b1052420b1e39234e7c26bece5c
17.5 MB Preview Download
md5:dac6ceb7be3ee018f59353e9294bffce
8.2 MB Preview Download
md5:c782ba3b8dee36fa3441503ed5b862b3
87.8 kB Preview Download
md5:c14538a52a336e57e2af461daaecc0a8
3.8 kB Preview Download

Additional details

Related works

Is cited by
10.1093/sysbio/syac029 (DOI)
Is derived from
10.5281/zenodo.6502163 (DOI)
Is source of
10.5281/zenodo.6502239 (DOI)