Dataset Open Access

Model trees and associated simulated nucleotide sequences for testing phylogenetic inference methods

Alexis Criscuolo


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Alexis Criscuolo</dc:creator>
  <dc:date>2020-09-17</dc:date>
  <dc:description>This repository contains 142 tar.gz archive files, each containing nucleotide sequence data that have been simulated using INDELible for testing alignment-free phylogenetic inference methods. These datasets were generated by using the results (trees and model parameters) of 142 phylogenomic analyses of real-case data as model (available here). Initial sequence length was 5 Mbs, and an indel rate of 0.01 was set with indel length drawn from [1, 50000] according to a Zipf distribution with parameter 1.5 (see INDELible manual).

Each archive contains the following files/directories:


	GTR.params.trees.tsv      a tab-delimited file summarizing the real-case GTR+Γ model parameters and the phylogenetic tree used to simulate the sequence dataset (gathered from https://zenodo.org/record/4034261)
	tax.tsv                   a tab-delimited file containing the initial (col 1) and simplified (col 2) taxon names
	model.nwk                 a Newick-formatted file containing the initial model tree (gathered from GTR.params.trees.tsv) with simplified leaf names (following tax.tsv)
	control.txt               the INDELible input file used to simulate the evolution of a sequence along the tree in model.nwk
	seq/                      a directory containing the simulated sequences (one FASTA file per leaf in the tree in model.nwk)


___

Criscuolo A (2020) On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference. F1000Research, 9:1309. doi:10.12688/f1000research.26930.1</dc:description>
  <dc:identifier>https://zenodo.org/record/4034644</dc:identifier>
  <dc:identifier>10.5281/zenodo.4034644</dc:identifier>
  <dc:identifier>oai:zenodo.org:4034644</dc:identifier>
  <dc:relation>doi:10.5281/zenodo.4034643</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:subject>phylogenetics</dc:subject>
  <dc:subject>simulation</dc:subject>
  <dc:title>Model trees and associated simulated nucleotide sequences for testing phylogenetic inference methods</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
53
338
views
downloads
All versions This version
Views 5353
Downloads 338338
Data volume 14.8 GB14.8 GB
Unique views 4343
Unique downloads 66

Share

Cite as