Model trees and associated simulated nucleotide sequences for testing phylogenetic inference methods

Alexis Criscuolo

  This repository contains 142 tar.gz archive files, each containing nucleotide sequence data that have been simulated using INDELible for testing alignment-free phylogenetic inference methods. These datasets were generated by using the results (trees and model parameters) of 142 phylogenomic analyses of real-case data as model (available here). Initial sequence length was 5 Mbs, and an indel rate of 0.01 was set with indel length drawn from [1, 50000] according to a Zipf distribution with parameter 1.5 (see INDELible manual).

Each archive contains the following files/directories:

	GTR.params.trees.tsv       a tab-delimited file summarizing the real-case GTR+Γ model parameters and the phylogenetic tree used to simulate the sequence dataset (gathered from
	tax.tsv                a tab-delimited file containing the initial (col 1) and simplified (col 2) taxon names
	model.nwk              a Newick-formatted file containing the initial model tree (gathered from GTR.params.trees.tsv) with simplified leaf names (following tax.tsv)
	control.txt            the INDELible input file used to simulate the evolution of a sequence along the tree in model.nwk
	seq/                   a directory containing the simulated sequences (one FASTA file per leaf in the tree in model.nwk)

___

Criscuolo A (2020) On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference. F1000Research, 9:1309. doi:10.12688/f1000research.26930.1 
