Published August 29, 2022 | Version v1
Dataset Open

Robust analysis of phylogenetic tree space

  • 1. Durham University

Description

Phylogenetic analyses often produce large numbers of trees. Mapping trees' distribution in "tree space" can illuminate the behavior and performance of search strategies, reveal distinct clusters of optimal trees, and expose differences between different data sources or phylogenetic methods—but the high-dimensional spaces defined by metric distances are necessarily distorted when represented in fewer dimensions. Here, I explore the consequences of this transformation in phylogenetic search results from 128 morphological data sets, using stratigraphic congruence—a complementary aspect of tree similarity—to evaluate the utility of low-dimensional mappings. I find that phylogenetic similarities between cladograms are most accurately depicted in tree spaces derived from information-theoretic tree distances or the quartet distance. Robinson–Foulds tree spaces exhibit prominent distortions and often fail to group trees according to phylogenetic similarity, whereas the strong influence of tree shape on the Kendall–Colijn distance makes its tree space unsuitable for many purposes. Distances mapped into two or even three dimensions often display little correspondence with true distances, which can lead to profound misrepresentation of clustering structure. Without explicit testing, one cannot be confident that a tree space mapping faithfully represents the true distribution of trees, nor that visually evident structure is valid. My recommendations for tree space validation and visualization are implemented in a new graphical user interface in the "TreeDist" R package.

Notes

Underlying data and scripts necessary for reproduction are included as described in the README.md file.

Files

ClusterMapping.zip

Files (16.7 GB)

Name Size Download all
md5:268c951b0bd7ba9c530ffb2ecd3b92d2
9.9 MB Preview Download
md5:a7900c34e1cea6c6050bfef744d05e94
33.3 MB Preview Download
md5:9837324babe88e44a1f3fb556ab131d3
55.8 kB Preview Download
md5:34229d498f051636722e178a528fb15b
338.7 kB Preview Download
md5:fef57d56ee7b9c5a34ee00a44a28ea5a
152.4 kB Preview Download
md5:c26842db572e282be37a5c6353b92d54
12.8 kB Preview Download
md5:3af4a4ff12c9ce8ecf2e851b98910688
59.5 MB Preview Download
md5:4b889144aa9695bd2b29167ba7a10fd0
255.0 MB Preview Download
md5:70d0cb23de9c5902b45056adabbd1a83
2.2 GB Preview Download
md5:3a3564bbcf1df6508c442ff2aea18419
967.6 MB Preview Download
md5:aeffe2a672066b4bbaa7fa01a8354232
997.1 MB Preview Download
md5:896bccced3df2403b4252005905b971a
243.4 MB Preview Download
md5:8c0929ab317537d54a7210fe0858dece
1.4 GB Preview Download
md5:770bd2012d9f3bae2ef3491a221231dc
935.7 MB Preview Download
md5:1c349be96697aaef6780762d0cc2c5dc
673.8 MB Preview Download
md5:7f16da540d349ba2b82ee8943322958b
339.2 MB Preview Download
md5:b724bed7cd4dd6d1ebef8370984892f8
685.7 MB Preview Download
md5:756c3cfda3256bbfde81c03222eabcbf
1.7 GB Preview Download
md5:0a520628435499e6c2621d742da8cebb
300.2 MB Preview Download
md5:52af6a1dfff3428caaa82c5ea074daa1
1.2 GB Preview Download
md5:4e950a039b4c87d733175b3142102598
826.0 MB Preview Download
md5:0f9f4edceaa89af186994d9a25e080e0
1.6 GB Preview Download
md5:660b6959cd4072b9c7acc02fb008e468
638.6 MB Preview Download
md5:ca503bbb66ea7565eec45b44661d2521
375.9 MB Preview Download
md5:af5212c57531a884de299b0f6ed551d7
245.6 MB Preview Download
md5:e647a7ad975ec15cd0a076598ba42daf
301.6 MB Preview Download
md5:8a998973f9de2b1e64f8d1be4864aa86
255.2 MB Preview Download
md5:41d204603da146fd590487da286d2ac8
133.2 MB Preview Download
md5:21c21ecd7684faf50da61841021731b5
202.8 MB Preview Download

Additional details

Related works

Is cited by
10.1093/sysbio/syab100 (DOI)
Is derived from
10.5281/zenodo.6414847 (DOI)
Is source of
10.5281/zenodo.4898778 (DOI)