Published June 26, 2023 | Version v1
Dataset Open

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees

  • 1. University of California, San Diego

Description

Phylogenetic trees include errors for a variety of reasons. We argue that one way to detect errors is to build a phylogeny with all the data and then detect taxa that artificially inflate the tree diameter. We formulate an optimization problem that seeks to find k leaves that can be removed to reduce the tree diameter maximally. We present a polynomial time solution to this "k-shrink" problem. Given this solution, we then use non-parametric statistics to find an outlier set of taxa that have an unexpectedly high impact on the tree diameter. We test our method, TreeShrink, on five biological datasets, and show that it is more conservative than rogue taxon removal using RogueNaRok. When the amount of filtering is controlled, TreeShrink outperforms RogueNaRok in three out of the five datasets, and they tie in another dataset.

Notes

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: IIS-1565862

Files

README.md

Files (2.0 GB)

Name Size Download all
md5:e20baaeeb0b6433ca62fe8a058fe2a92
552.5 kB Download
md5:675f7ca040fc363f664e2297f079abd1
2.8 kB Preview Download
md5:c2ac7d5c8dc401ce840ff1fb9b0b5de0
2.0 GB Download

Additional details

Related works

Is cited by
10.1186/s12864-018-4620-2 (DOI)