Published January 8, 2024 | Version v1

Artifactual orthologs and the need for diligent data exploration in complex phylogenomic datasets: A museomic case study from the Andean flora

  • 1. University of South Alabama
  • 2. Louisiana State University

Description

The Andes mountains of western South America are a globally important biodiversity hotspot, yet there is a paucity of resolved phylogenies for plant clades from this region. Filling an important gap to our understanding of the World's richest flora, we present the first phylogeny of Freziera (Pentaphylacaceae), an Andean-centered, cloud forest radiation. Our dataset was obtained via hybrid-enriched target sequence capture of Angiosperms353 universal loci for 50 of the ca. 75 spp., obtained almost entirely from herbarium specimens. We identify high phylogenomic complexity in Freziera, including a significant proportion of paralogous loci and a high degree of gene tree discordance. Via gene tree filtering, by-eye observation of gene trees, and detailed examination of warnings from recently improved assembly pipelines, we identified that cryptic paralogs (i.e., the presence of only one copy of a multi-copy gene due to assembly errors) were a major source of gene tree heterogeneity that had a negative impact on phylogenetic inference and support. These cryptic paralogs likely result from limitations in data collection that are common in museomics, combined with a history of genome duplication; they may be common in plant phylogenomic datasets. After accounting for cryptic paralogs as source of gene tree error, we identified a significant, but non-specific signal of introgression using Patterson's D and f4 statistics. Despite phylogenomic complexity, we were able to resolve Freziera into nine well-supported subclades whose histories have been shaped by myriad evolutionary processes, including incomplete lineage sorting, historical gene flow, and gene duplication. Our results highlight the complexities of plant phylogenomics, and point to the need to test for multiple sources of gene tree discordance via careful examination of empirical datasets.

Notes

Funding provided by: National Science Foundation
Crossref Funder Registry ID: https://ror.org/021nxhr62
Award Number: DEB-2055525

Funding provided by: Louisiana Board of Regents
Crossref Funder Registry ID: https://ror.org/00jv89z46
Award Number:

Funding provided by: Louisiana State University
Crossref Funder Registry ID: https://ror.org/05ect4e57
Award Number:

Files

alignments.zip

Files (11.2 MB)

Name Size Download all
md5:3afe1c435262448f2170ecea0b829507
7.1 MB Preview Download
md5:ab8d4825be0ff1e416ffe8e631fe9802
4.1 MB Preview Download
md5:dc4aaad091ef9c2f849acf38d405e984
2.4 kB Preview Download
md5:807c1beb78c85648bf2a356d6c5acfeb
38.3 kB Preview Download

Additional details

Related works

Is source of
10.5281/zenodo.10018234 (DOI)