Published June 6, 2022 | Version v1
Other Open

DEPP: Deep learning enables extending species trees using single genes

  • 1. University of California, San Diego
  • 2. Arizona State University

Description

Placing new sequences onto reference phylogenies is increasingly used for analyzing environmental samples, especially microbiomes. However, existing placement methods have a fundamental limitation: they assume that query sequences have evolved using specific models directly on the reference phylogeny. Thus, they can place single-gene data (e.g., 16S rRNA amplicons) onto their own gene tree. This practice is a proxy for a more ambitious goal: extending a (genome-wide) species tree given data from individual genes. No algorithm currently addresses this challenging problem. Here, we introduce Deep-learning Enabled Phylogenetic Placement (DEPP), an algorithm that learns to extend species trees using single genes without pre-specified models. We show that DEPP updates the multi-locus microbial tree-of-life with single genes with high accuracy. We further demonstrate that DEPP can achieve the long-standing goal of combining 16S and metagenomic data onto a single tree, enabling community structure analyses that were previously impossible and producing robust patterns.

Notes

Please note, this dataset is the most recent version of a duplicate dataset available via this link: https://doi.org/10.6076/D1JS3Z (published February 4, 2022).

Files

supplement.pdf

Files (2.3 MB)

Name Size Download all
md5:312dfd387a3a9c727a56406b2d95b829
2.3 MB Preview Download

Additional details

Related works

Is derived from
10.6076/D14G68 (DOI)
Is supplemented by
10.6076/D1JS3Z (DOI)