Published September 10, 2020 | Version v1
Dataset Open

Simulated nucleotide sequences for testing alignment-free genome distance estimates

  • 1. Institut Pasteur

Description

This repository contains (12×500=)6,000 pairs of nucleotide sequences that have been simulated for testing alignment-free genome distance estimates, as described in Criscuolo (2019). Given an evolutionary distance d varying from 0.05 to 0.60 (step = 0.05), the program SeqGen was used to simulate the evolution of 500 nucleotide sequence pairs with d substitution events per character (GTR+Γ evolutionary model).

For each of the 12 evolutionary distances d = 0.05, 0.10, ..., 0.60, an XZ-compressed file containing 500 lines is available. Each line contains 18 fields separated by blank spaces:
  [1]     seed value used during simulation,
  [2]     true evolutionary distance d between the two simulated sequences,
  [3]     total number of simulated characters,
  [4]     number of non-indel characters with nucleotide mismatch,
  [5]     number of non-indel characters,
  [6-9]   A, C, G, T frequencies used during simulation,
  [10-15]   GTR parameters used during simulation,
  [16]     Γ distribution parameter used during simulation,
  [17-18]   two simulated sequences with indel events as gaps.

Of note, each pair of aligned sequences without gaps can be regenerated using SeqGen v1.3.4 with parameters from fields [1,3,6-16] and the following two-leaf model tree:

(t1:d,t2:0.000);

where d is given in field [2].

___

Criscuolo A (2019) A fast alignment-free bioinformatics procedure to infer accurate distance-based phylogenetic trees from genome assemblies. Research Ideas and Outcomes, 5:e36178. doi:10.3897/rio.5.e36178

Files

Files (13.4 GB)

Name Size Download all
md5:0ddb2bfc2653791063e9d9d0b8cc3d30
843.7 MB Download
md5:3f755ef8d6150b360f6b4d1cccb94f26
907.9 MB Download
md5:a03606bff176c462a2b971db9415bcbf
984.9 MB Download
md5:74dfaff1a72433bab0da79a95681074c
1.0 GB Download
md5:12f2a148ae3cd14a0a07662b6d323634
1.1 GB Download
md5:053ee087d17d772de93c050b9e69990e
1.1 GB Download
md5:b2634cee2bc4b6cdcf9f8045ec980d00
1.2 GB Download
md5:4d5725c17bcee62621d3dcbc689e6934
1.2 GB Download
md5:2cbef5bbf476710a17b92223d5084264
1.3 GB Download
md5:9872b16b7dfb49b6c276d8fcc9d24c21
1.3 GB Download
md5:ce2a236c41a138336aff771ed5ae7e50
1.2 GB Download
md5:70e9ff207c46f7418327db3eb9411a24
1.2 GB Download