Published February 7, 2021 | Version v1
Dataset Open

Incorporating hierarchical characters into phylogenetic analysis

  • 1. City University of New York
  • 2. American Museum of Natural History

Description

Popular optimality criteria for phylogenetic trees focus on sequences of characters that are applicable to all the taxa. As studies grow in breadth, it can be the case that some characters are applicable for a portion of the taxa and inapplicable for others.  Past work has explored the limitations of treating inapplicable characters as missing data, noting that this strategy may favor trees where interval nodes are assigned impossible states, where the arrangement of taxa within subclades is unduly influenced by variation in distant parts of the tree, and/or where taxa that otherwise share most primary characters are grouped distantly. Approaches that avoid the first two problems have recently been proposed. Here, we propose an alternative approach which avoids all three problems. In the spirit of maximum parsimony, the proposed criterion seeks the phylogenetic tree with the minimal changes across any tree branch, but where changes are defined in terms of dissimilarity metrics that weigh the affects of inapplicable characters. The approach can accommodate binary, multistate, ordered, unordered, and polymorphic characters.  We give a polynomial-time algorithm, inspired by Fitch's algorithm, to score trees under a family of dissimilarity metrics, and prove its correctness.  We show that the resulting optimality criteria is computationally hard, by reduction to the NP-hardness of the maximum parsimony optimality criteria.  We demonstrate our approach using synthetic and empirical data sets and compare the results with other recently proposed methods for choosing optimal phylogenetic trees when the data includes inapplicable characters.

Notes

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number:

Funding provided by: Simons Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000893
Award Number:

Files

cusack_et_al_1999_type.txt

Files (16.7 MB)

Name Size Download all
md5:7b53cbbc5fb43ce431cf5a25451f4ae0
384 Bytes Download
md5:281f88dea9c1a09c01088b3cff9cc2b7
440 Bytes Download
md5:4ac9eefe3c305ea91d91e2ed6d6f81c0
1.8 kB Download
md5:0be074f5871642cc0ba00ee3f98d8684
1.8 kB Download
md5:4ff8f9dc1a645375e5bf6ea9978419f1
1.9 kB Download
md5:d9480f584be0467cb9b69d33dc54ee61
1.2 kB Download
md5:53dac1e7bc7be6098c09e77da578c921
436 Bytes Preview Download
md5:87187143593ffacaaa23ea035f721c42
7.0 MB Download
md5:8a8353c2635bb1ef983a39085e7c0c39
35.8 kB Download
md5:10632410fed3474167f62708f44fe15d
35.8 kB Download
md5:dadbcf02c2be422a269c9292dc566527
35.8 kB Download
md5:c6405ea7042ead3badd33b3a1e9b8e93
23.3 kB Download
md5:a06438ad2daced3c00ed7e4df7d24941
1.9 kB Preview Download
md5:ee7c64205e31b3623f38dd6f1917b50d
8.6 MB Download
md5:b9693c2c323033269a8e15710dc61066
5.8 kB Preview Download
md5:4377b531be3e0bebf3a14bef637abc92
883.6 kB Preview Download
md5:157a7c9f334696bf420a8d98525b67d6
110 Bytes Preview Download
md5:f7c68e95abb07b5cb7a85dbc9450aff8
170 Bytes Preview Download

Additional details