4022500
doi
10.5281/zenodo.4022500
oai:zenodo.org:4022500
Simulated nucleotide sequences for testing alignment-free genome distance estimates
Criscuolo Alexis
Institut Pasteur
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
<p>This repository contains (12×500=)6,000 pairs of nucleotide sequences that have been simulated for testing alignment-free genome distance estimates, as described in <a href="https://riojournal.com/article/36178/">Criscuolo (2019)</a>. Given an evolutionary distance <em>d</em> varying from 0.05 to 0.60 (step = 0.05), the program <a href="http://tree.bio.ed.ac.uk/software/seqgen/">SeqGen</a> was used to simulate the evolution of 500 nucleotide sequence pairs with <em>d</em> substitution events per character (GTR+Γ evolutionary model).</p>
<p>For each of the 12 evolutionary distances <em>d</em> = 0.05, 0.10, ..., 0.60, an XZ-compressed file containing 500 lines is available. Each line contains 18 fields separated by blank spaces:<br>
[1] seed value used during simulation,<br>
[2] true evolutionary distance <em>d</em> between the two simulated sequences,<br>
[3] total number of simulated characters,<br>
[4] number of non-indel characters with nucleotide mismatch,<br>
[5] number of non-indel characters,<br>
[6-9] A, C, G, T frequencies used during simulation,<br>
[10-15] GTR parameters used during simulation,<br>
[16] Γ distribution parameter used during simulation,<br>
[17-18] two simulated sequences with indel events as gaps.</p>
<p>Of note, each pair of aligned sequences without gaps can be regenerated using <a href="http://tree.bio.ed.ac.uk/software/seqgen/">SeqGen</a> v1.3.4 with parameters from fields [1,3,6-16] and the following two-leaf model tree:</p>
<pre>(t1:d,t2:0.000);</pre>
<p>where <em>d</em> is given in field [2].</p>
<p>___</p>
<p>Criscuolo A (2019) <em>A fast alignment-free bioinformatics procedure to infer accurate distance-based phylogenetic trees from genome assemblies</em>. Research Ideas and Outcomes, 5:e36178. doi:<a href="https://doi.org/10.3897/rio.5.e36178">10.3897/rio.5.e36178</a></p>
Zenodo
2020-09-10
info:eu-repo/semantics/other
4022499
1599785967.203701
843724224
md5:0ddb2bfc2653791063e9d9d0b8cc3d30
https://zenodo.org/records/4022500/files/d.0.05.txt.xz
907924376
md5:3f755ef8d6150b360f6b4d1cccb94f26
https://zenodo.org/records/4022500/files/d.0.10.txt.xz
984862572
md5:a03606bff176c462a2b971db9415bcbf
https://zenodo.org/records/4022500/files/d.0.15.txt.xz
1214991656
md5:70e9ff207c46f7418327db3eb9411a24
https://zenodo.org/records/4022500/files/d.0.60.txt.xz
1249838328
md5:ce2a236c41a138336aff771ed5ae7e50
https://zenodo.org/records/4022500/files/d.0.55.txt.xz
1294980656
md5:9872b16b7dfb49b6c276d8fcc9d24c21
https://zenodo.org/records/4022500/files/d.0.50.txt.xz
1255217544
md5:2cbef5bbf476710a17b92223d5084264
https://zenodo.org/records/4022500/files/d.0.45.txt.xz
1207853692
md5:4d5725c17bcee62621d3dcbc689e6934
https://zenodo.org/records/4022500/files/d.0.40.txt.xz
1211170344
md5:b2634cee2bc4b6cdcf9f8045ec980d00
https://zenodo.org/records/4022500/files/d.0.35.txt.xz
1094308684
md5:053ee087d17d772de93c050b9e69990e
https://zenodo.org/records/4022500/files/d.0.30.txt.xz
1081956200
md5:12f2a148ae3cd14a0a07662b6d323634
https://zenodo.org/records/4022500/files/d.0.25.txt.xz
1049111276
md5:74dfaff1a72433bab0da79a95681074c
https://zenodo.org/records/4022500/files/d.0.20.txt.xz
public
10.5281/zenodo.4022499
isVersionOf
doi