Published January 5, 2016 | Version v1
Dataset Open

Data from: The relationship between dN/dS and scaled selection coefficients

  • 1. The University of Texas at Austin

Description

Numerous computational methods exist to assess the mode and strength of natural selection in protein-coding sequences, yet how distinct methods relate to one another remains largely unknown. Here, we elucidate the relationship between two widely-used phylogenetic modeling frameworks: dN/dS models and mutation-selection (MutSel) models. We derive a mathematical relationship between dN/dS and scaled selection coefficients, the focal parameters of MutSel models, and use this relationship to gain deeper insight into the behaviors, limitations, and applicabilities of these two modeling frameworks. We prove that, if all synonymous changes are neutral, standard MutSel models correspond to dN/dS < 1. However, if synonymous codons differ in fitness, dN/dS can take on arbitrarily high values even if all selection is purifying. Thus, the MutSel modeling framework cannot necessarily accommodate positive, diversifying selection, while dN/dS cannot distinguish between purifying selection on synonymous codons and positive selection on amino acids. We further propose a new benchmarking strategy of dN/dS inferences against MutSel simulations and demonstrate that the widely-used Goldman-Yang-style dN/dS models yield substantially biased dN/dS estimates on realistic sequence data. By contrast, the less frequently used Muse-Gaut-style models display much less bias. Strikingly, the least-biased and most-precise dN/dS estimates are never found in the models with the best fit to the data, measured through both AIC and BIC scores. Thus, selecting models based on goodness-of-fit criteria can yield poor parameter estimates if the models considered do not precisely correspond to the underlying mechanism that generated the data. In conclusion, establishing mathematical links among modeling frameworks represents a novel, powerful strategy to pinpoint previously unrecognized model limitations and strengths.

Notes

Files

Files (2.8 GB)

Name Size Download all
md5:c4452f0eb5c38617bab30391d228b5e8
101.5 MB Download
md5:a71142e9c4178061459af95851e9bc79
153.3 MB Download
md5:8057c4d133b531c6248078b186592243
836.7 MB Download
md5:b60da5ece6280c5434efda8eee4eece4
758.8 MB Download
md5:0a5185c2d20fb1d5cbda0c11c2ca1dd5
133.4 MB Download
md5:e537500e16512a484740c8972a20bae0
819.1 MB Download

Additional details

Related works

Is cited by
10.1093/molbev/msv003 (DOI)