Published May 18, 2023 | Version v1
Dataset Open

Supplementary material for: PhyloCoalSimulations: A simulator for network multispecies coalescent models, including a new extension for the inheritance of gene flow

  • 1. University of Wisconsin-Madison
  • 2. University of Alaska Fairbanks

Description

We consider the evolution of phylogenetic gene trees along phylogenetic species networks, according to the network multispecies coalescent process, and introduce a new network coalescent model with correlated inheritance of gene flow. This model generalizes two traditional versions of the network coalescent: with independent or common inheritance. At each reticulation, multiple lineages of a given locus are inherited from parental populations chosen at random, either independently across lineages, or with positive correlation according to a Dirichlet process. This process may account for locus-specific probabilities of inheritance, for example.

We implemented the simulation of gene trees under these network coalescent models in the Julia package PhyloCoalSimulations, which depends on PhyloNetworks and its powerful network manipulation tools. Input species phylogenies can be read in extended Newick format, either in numbers of generations or in coalescent units. Simulated gene trees can be written in Newick format, and in a way that preserves information about their embedding within the species network. This embedding can be used for downstream purposes, such as to simulate species-specific processes like rate variation across species, or for other scenarios as illustrated in this note. This package should be useful for simulation studies and simulation-based inference methods. The software is available open source with documentation and a tutorial at https://github.com/cecileane/PhyloCoalSimulations.jl.

Notes

Supplementary Material

`supplementarymaterial.pdf` contains an appendix and supplementary figures S1-S6.

Code to reproduce analyses

The code uses Julia and R. Files `Project.toml` and `Manifest.toml` record the Julia packages used and their specific version. To reproduce the environment, activate this folder and run `instantiate` in package mode within Julia.

Fig. 1: node mapping

`figures.jl`: Julia code to simulate a gene tree with degree-2 nodes for mapping of the gene tree into the species network, and to create the first 2 panels of Fig.1, output as `fig_nodemapping*.pdf`

Fig. 2: validation of quartet concordance factors
  • `validation_qCF.jl`: Julia code to reproduce the simulations in Fig.2. Running this code will create 3 output files: `qCF_4taxa.csv` for the left network a), and `qCF_case_{1,2}.csv` for the right network b) on 6 taxa. It will also create `net4.pdf` and `net3.pdf`, showing the 4-taxon and 6-taxon networks respectively.
  • `validation_qCF.Rmd`: R code to create Fig.2, taking as input the CSV files from above.
Fig. 3: level-2 network
  • `fig_level2_network.jl`: Julia code to create Fig.3 showing the level-2 network that was used to validate the distribution of pairwise distances, using either rho=0 or 1 (independent or common inheritance). output: file `fig_level2net.pdf`.
  • `ntwk_level_2.tre`: file containing the Newick description of that network, which can be visualized with julia package PhyloPlots.
Figures 4 and supplementary figures: validation of pairwise distances

Figure S2, on a 4-taxon species tree:

  • `gtrees_4tax-changing_PhyloNetworks.tre` contains the 10k simulated gene trees.
  • `validation_distances.Rmd`: R code to create Fig.S2, taking as input the gene trees in `gtrees_4tax-changing_PhyloNetworks.tre`.

Figure 4 and supplementary figures S3-S5, on a 6-taxon network with 2 reticulations:

  • folders `validation_distances_level2net_rho0` and `validation_distances_level2net_rho1`: input files for the figures as compressed `.RData` files:
    • The `samp_big*_d_xy.RData` files contain the pairwise distances from the 100k simulated gene trees between taxa x and y (used for histograms)
    • the `d_*.RData` files contain the pairwise distances drawn from their theoretical distributions, summarized by their frequency in 100,000 small bins (for the theoretical density curve)
    • the `sampleMeans*_dxy.Rdata` files contain the mean (over 100 replicates) of the 1000 ordered distances between taxa x and y (dbar_i in the paper) from 1000 simulated gene trees in each replicate (used for QQ plots).
    • `Dmatrix.Rdata` contains the 6×6 matrix of *minimum* pairwise distances between all pairs of taxa on the network.
  • `validation_distances_level2net_figure.R`: R code to create Fig.4 and supplementary figures, for the distances from the 6-taxon level-2 network in Fig.3. Takes as input files in folders above. output: files `fig_pairwisedist_level2net_rho*.pdf`.
Software archive
  • `PhyloCoalSimulations-code-1d266fd.zip`: archive of the PhyloCoalSimulations package from GitHub, main branch (for the code), at commit 1d266fd, which is 1 commit ahead of version v0.1.2.
  • `PhyloCoalSimulations-documentation-1d266fd.zip`: archive of the PhyloCoalSimulations package's documentation, from the gh-pages branch, at commit 1d266fd.

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: 1902892

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: 2023239

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: 2051760

Files

README.md

Files (163.5 MB)

Name Size Download all
md5:e56a3f015351c2245e65b648af91dfc0
3.6 kB Download
md5:1758e0f452f7aaddf7267afde02fde39
1.8 kB Download
md5:f0268e5662455ab3029621892fdc480d
1.5 MB Download
md5:2f927ee52c4d605f52e186cf4d665141
22.8 kB Download
md5:e95018fb41cf791a2d39d1a013db3ad6
191 Bytes Download
md5:224c7b2c3d53f79193087c907d87e01c
442 Bytes Download
md5:5164e7301046332f3df6bbac67043550
3.3 kB Preview Download
md5:2c79735d95773dc6901581b53392442e
7.1 kB Download
md5:00c1f6952cbc85b470ddf0e89fd49258
5.4 kB Download
md5:325b56d83fa9f82df7150a1cad9c04fb
130.9 MB Preview Download
md5:d07cc935c01f1512fe4b97079b1d8192
31.1 MB Preview Download
md5:043522d1faeb70440fe070e852381cbd
8.8 kB Download
md5:49373e6bb546224b5cbc33e8778d3749
4.0 kB Download