Presentation Open Access
Dinh, Vu; Bilge, Arman; Matsen, Erick
Talk given at Evolution 2016 in the SSB Spotlight: Next generation phylogenetic inference 1.
Evolutionary tree inference, or phylogenetics, is an essential tool for understanding biological systems from deep-time divergences to recent viral transmission. The Bayesian paradigm is now commonly used in phylogenetics to describe support for estimated phylogenies or to test hypotheses that can be expressed in phylogenetic terms. However, current Bayesian phylogenetic inference algorithms are limited to about 1,000 sequences, which is much fewer than are available via modern sequencing technology.
Here we develop phylogenetic Hamiltonian Monte Carlo (HMC) as a new approach to enable phylogenetic inference on larger data sets. HMC is an existing computational statistical method that scales to large datasets by using Newton's laws of motions to efficiently explore various parameter values. However, because a phylogenetic tree parameter includes both its branch lengths and topology, we must go beyond the current implementations of HMC which cannot consider this special structure of trees. To do so, we develop a probabilistic version of the physics simulator within HMC, which can explore tree space. This algorithm generalizes previous algorithms by doing classical HMC on the branch lengths when considering a single topology, but making random choices between the tree topologies at the "intersection" between various trees. We show that our algorithm correctly explores the entire tree space and provide a proof-of-concept implementation in open-source software.