Published May 8, 2024 | Version v1
Other Open

Benefits and limits of phasing alleles for network inference of allopolyploid complexes

  • 1. Royal Botanic Gardens
  • 2. Duke University
  • 3. University of Florida
  • 4. University of Wisconsin-Madison

Description

Accurately reconstructing the reticulate histories of polyploids remains a central challenge for understanding plant evolution. Although phylogenetic networks can provide insights into relationships among polyploid lineages, inferring networks may be hindered by the complexities of homology determination in polyploid taxa. We use simulations to show that phasing alleles from allopolyploid individuals can improve phylogenetic network inference under the multispecies coalescent by obtaining the true network with fewer loci compared to haplotype consensus sequences or sequences with heterozygous bases represented as ambiguity codes. Phased allelic data can also improve divergence time estimates for networks, which is helpful for evaluating allopolyploid speciation hypotheses and proposing mechanisms of speciation. To achieve these outcomes in empirical data, we present a novel pipeline that leverages a recently developed phasing algorithm to reliably phase alleles from polyploids. This pipeline is especially appropriate for target enrichment data, where depth of coverage is typically high enough to phase entire loci. We provide an empirical example in the North American Dryopteris fern complex that demonstrates insights from phased data as well as the challenges of network inference. We establish that our pipeline (PATÉ: Phased Alleles from Target Enrichment data) is capable of recovering a high proportion of phased loci from both diploids and polyploids. These data may improve network estimates compared to using haplotype consensus assemblies by accurately inferring the direction of gene flow, but statistical non-identifiability of phylogenetic networks poses a barrier to inferring the evolutionary history of reticulate complexes.

Notes

Included is a single pdf of the supplementary material as well as a tarball with all of the data used in the manuscript.

Download the tarball and within is a readme detailing contents including:

1) Control files and simulated sequence data for the single allotetraploid with BPP

2) A static release of the PATÉ pipeline used for phasing Dryopteris sequences in the manuscript

3) Dryopteris data used for analyses

4) BPP Control files, Julia scripts, and notes for repeating some of the empirical analyses

Funding provided by: National Science Foundation
ROR ID: https://ror.org/021nxhr62
Award Number: DEB-2038213

Funding provided by: National Science Foundation
ROR ID: https://ror.org/021nxhr62
Award Number: DEB-1541506

Funding provided by: Department of Energy and Environment
ROR ID: https://ror.org/05d5hbz44
Award Number: DE-SC0021016

Funding provided by: Duke University
ROR ID: https://ror.org/00py81415
Award Number:

Funding provided by: European Commission
ROR ID: https://ror.org/00k4n6c32
Award Number: 101026923

Files

phasingNetworks_20240424_SI.pdf

Files (2.7 MB)

Name Size Download all
md5:332dc58852b33f8bc06fd8d92bb5786d
2.7 MB Preview Download

Additional details