Published August 6, 2021 | Version v1
Dataset Open

Data for: PickMe: sample selection for species tree reconstruction using coalescent weighted quartets

  • 1. Hobart and William Smith Colleges
  • 2. University of Kentucky
  • 3. A2Bio2*
  • 4. Oklahoma State University

Description

After collecting large data sets of many genes for many species for phylogenomics studies, researchers may make ad hoc decisions about which genes or samples to include in a species tree reconstruction analysis based on various parameters, including the amount of missing data. Optimally, sampling would be maximized, but it can be difficult for empiricists to determine where to draw the line for sample inclusion when data sets are incomplete. Under the multispecies coalescent model, in which the dominant quartet topology displayed across gene trees matches the topology of that quartet on the species tree, we propose a Bayesian framework to select samples for which there is support for inclusion in a species tree analysis. Given a collection of gene trees, a posterior probability is assigned to each quartet topology, describing the likelihood that the species tree displays this topology. From this, individual samples are assigned reliability scores computed as the average of a rescaling of the posterior probabilities. These weights are used in a Bayesian framework in an algorithm called PickM}, which determines which individuals should be included in a species tree analysis. To illustrate the efficacy of this tool, PickMe is applied to gene trees generated from target capture data from milkweeds. PickMe indicates that more samples could have reliably been included in a previous milkweed phylogenomic analysis than the authors analyzed, without access to a formal decision-making procedure. Thus, PickMe will be a valuable addition to data analysis pipelines for phylogenomics studies.

Notes

Uploaded Readme contains description of uploaded datafiles.

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: DMS 1616186

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: DEB 1457510

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: DEB 1457473

Files

Milkweed-Sequence-Files.zip

Files (13.6 MB)

Name Size Download all
md5:2c9b157b3a12d298b92bb865e23eb7c9
1.1 MB Download
md5:f47d6325a7d0a3ad2709f15f73ba9dc3
347.1 kB Download
md5:a9f3a7ae707f16a93580b6b9d67843be
1.6 MB Download
md5:b6ce22cf8d2ee809807191313bb1dccf
10.5 MB Preview Download
md5:1c4c6d9f0d9528e7fb185265c6bb2f20
1.8 kB Download
md5:857ba22b96aa61aeeb2e60fb4f3a8c1b
1.4 kB Download
md5:e5c696fb8b15666da6b83f3a838d4f97
1.5 kB Preview Download
md5:bbc3e9d45919bf7f8f5d6174f6fbd6be
63.4 kB Preview Download

Additional details

Related works