Dataset Open Access

Model trees and associated simulated nucleotide sequences for testing phylogenetic inference methods

Alexis Criscuolo


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.4034644", 
  "author": [
    {
      "family": "Alexis Criscuolo"
    }
  ], 
  "issued": {
    "date-parts": [
      [
        2020, 
        9, 
        17
      ]
    ]
  }, 
  "abstract": "<p>This repository contains 142 tar.gz archive files, each containing nucleotide sequence data that have been simulated using <a href=\"http://abacus.gene.ucl.ac.uk/software/indelible/\"><em>INDELible</em></a> for testing alignment-free phylogenetic inference methods. These datasets were generated by using the results (trees and model parameters) of 142 phylogenomic analyses of real-case data as model (available <a href=\"https://zenodo.org/record/4034261\">here</a>). Initial sequence length was 5 Mbs, and an indel rate of 0.01 was set with indel length drawn from [1, 50000] according to a Zipf distribution with parameter 1.5 (see <em>INDELible</em> <a href=\"http://abacus.gene.ucl.ac.uk/software/indelible/manual/model.shtml\">manual</a>).</p>\n\n<p>Each archive contains the following files/directories:</p>\n\n<ul>\n\t<li><code>GTR.params.trees.tsv &nbsp; </code> &nbsp; a tab-delimited file summarizing the real-case GTR+&Gamma; model parameters and the phylogenetic tree used to simulate the sequence dataset (gathered from <a href=\"https://zenodo.org/record/4034261\">https://zenodo.org/record/4034261</a>)</li>\n\t<li><code>tax.tsv &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; </code> &nbsp; a tab-delimited file containing the initial (col 1) and simplified (col 2) taxon names</li>\n\t<li><code>model.nwk &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; </code> &nbsp; a <a href=\"https://evolution.genetics.washington.edu/phylip/newicktree.html\">Newick</a>-formatted file containing the initial model tree (gathered from <code>GTR.params.trees.tsv</code>) with simplified leaf names (following <code>tax.tsv</code>)</li>\n\t<li><code>control.txt &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; </code> &nbsp; the <em>INDELible</em> input file used to simulate the evolution of a sequence along the tree in <code>model.nwk</code></li>\n\t<li><code>seq/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </code> &nbsp; a directory containing the simulated sequences (one FASTA file per leaf in the tree in <code>model.nwk</code>)</li>\n</ul>\n\n<p>___</p>\n\n<p>Criscuolo A (2020) <em>On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference</em>. F1000Research, 9:1309. <a href=\"https://doi.org/10.12688/f1000research.26930.1\">doi:10.12688/f1000research.26930.1</a></p>", 
  "title": "Model trees and associated simulated nucleotide sequences for testing phylogenetic inference methods", 
  "type": "dataset", 
  "id": "4034644"
}
53
338
views
downloads
All versions This version
Views 5353
Downloads 338338
Data volume 14.8 GB14.8 GB
Unique views 4343
Unique downloads 66

Share

Cite as