Dataset Open Access

# Model trees and associated simulated nucleotide sequences for testing phylogenetic inference methods

Alexis Criscuolo

### Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:creator>Alexis Criscuolo</dc:creator>
<dc:date>2020-09-17</dc:date>
<dc:description>This repository contains 142 tar.gz archive files, each containing nucleotide sequence data that have been simulated using INDELible for testing alignment-free phylogenetic inference methods. These datasets were generated by using the results (trees and model parameters) of 142 phylogenomic analyses of real-case data as model (available here). Initial sequence length was 5 Mbs, and an indel rate of 0.01 was set with indel length drawn from [1, 50000] according to a Zipf distribution with parameter 1.5 (see INDELible manual).

Each archive contains the following files/directories:

GTR.params.trees.tsv      a tab-delimited file summarizing the real-case GTR+Γ model parameters and the phylogenetic tree used to simulate the sequence dataset (gathered from https://zenodo.org/record/4034261)
tax.tsv                   a tab-delimited file containing the initial (col 1) and simplified (col 2) taxon names
model.nwk                 a Newick-formatted file containing the initial model tree (gathered from GTR.params.trees.tsv) with simplified leaf names (following tax.tsv)
control.txt               the INDELible input file used to simulate the evolution of a sequence along the tree in model.nwk
seq/                      a directory containing the simulated sequences (one FASTA file per leaf in the tree in model.nwk)

___

Criscuolo A (2020) On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference. F1000Research, 9:1309. doi:10.12688/f1000research.26930.1</dc:description>
<dc:identifier>https://zenodo.org/record/4034644</dc:identifier>
<dc:identifier>10.5281/zenodo.4034644</dc:identifier>
<dc:identifier>oai:zenodo.org:4034644</dc:identifier>
<dc:relation>doi:10.5281/zenodo.4034643</dc:relation>
<dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
<dc:subject>phylogenetics</dc:subject>
<dc:subject>simulation</dc:subject>
<dc:title>Model trees and associated simulated nucleotide sequences for testing phylogenetic inference methods</dc:title>
<dc:type>info:eu-repo/semantics/other</dc:type>
<dc:type>dataset</dc:type>
</oai_dc:dc>

53
338
views