Published August 28, 2020 | Version v1
Dataset Open

Disentangling sources of gene tree discordance in phylogenomic datasets: testing ancient hybridizations in Amaranthaceae s.l.

  • 1. University of Minnesota
  • 2. Johannes Gutenberg University of Mainz
  • 3. Oberlin College
  • 4. University of Michigan-Ann Arbor
  • 5. University of Cambridge
  • 6. University of Nevada Reno

Description

Gene tree discordance in large genomic datasets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The dataset included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations.

Notes

- The file 'Supplementa_Methods_and_Materials.tar.gz' contains the supplemental methods, figures and tables referenced in the main text

- The file 'Homologs.tar.gz' contains the 14584 homolog trees:

    raw_homologs.tar.gz - trees without any filtering or pruning

    final_homologs.tar.gz - trees after, monophyletic and paraphyletic grades of the same species masked, deep paralogs prunned, and spurious tips removed.


- The file 'Analyses_data.tar.gz' contains the data (alignments and individual gene trees) used for each of the dataset:

    filtered_transcriptomes.tar.gz - 88 filtered transcriptomes
    all_13025_orthologs_cln_aln.tar.gz - all the 13025 'monophyletic outgroup' orthologs
    105-taxon.tar.gz - 936 alignments and trees of the full 105-taxon analyses
    41-taxon.tar.gz - 1242 alignments and trees of the 41-taxon cloudogram
    11-taxon-net.tar.gz - 4138 alignments and trees of the 11-taxon(net) used for network analyses
    4-taxon.tar.gz - alignments and trees (between 7,756 and 8,793) for each of the 10 4-taxon quartets
    11-taxon-tree.tar.gz - 5936 alignments and trees of the 11-taxon(tree) analyses
    chloroplast.tar.gz - 11-taxon alignment and tree and 76 individual CDS alignment and trees of the plastid analyses

Funding provided by: University of Minnesota
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100007249

Funding provided by: University of Michigan
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100007270

Funding provided by: US National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: DEB 1354048

Funding provided by: Department of Energy, Office of Science, Genomic Science Program
Crossref Funder Registry ID:
Award Number: DE-SC0008834

Files

Homologs.zip

Files (1.4 GB)

Name Size Download all
md5:acd34b9dba2728f2a9d3a821acc79e14
1.3 GB Download
md5:a382db48510a2f596308c088fe1b4d56
75.5 MB Preview Download
md5:d2f21fd410c0297f17ac181ce8173f64
1.6 kB Preview Download
md5:2eacc42dd9b1fe1a1011179d0a0dae23
11.1 MB Download

Additional details

Related works

Is cited by
10.1093/sysbio/syaa066 (DOI)