Published September 17, 2020 | Version v1
Dataset Open

Simulated pairs of nucleotide sequences for testing (alignment-free) genome distance estimate methods

  • 1. Institut Pasteur

Description

This repository contains 24,000 pairs of nucleotide sequences (and associated parameters) that have been simulated for testing alignment-free genome distance estimates. Given an evolutionary distance d varying from 0.05 to 1.00 nucleotide substitutions per character (step = 0.05), the program INDELible was used to simulate the evolution of 200 nucleotide sequence pairs with d substitution events per character under the models GTR and GTR+Γ. Each model was adjusted with three different equilibrium frequencies:

  • f1: equal frequencies, i.e. freq(A) = freq(C) = freq(G) = freq(T) = 0.25,
  • f2: GC-rich, i.e. freq(A) = 0.1, freq(C) = 0.3, freq(G) = 0.4, freq(T) = 0.2,
  • f3: AT-rich, i.e. freq(A) = freq(T) = 0.4, freq(C) = freq(G) = 0.1.

For each simulated sequence pair, model parameters (i.e. GTR: six relative rates of nucleotide substitution; GTR+Γ: six rates and one Γ shape parameter) were randomly drawn from 142 sets of parameters derived from real-case data (see file GTR.params.trees.tsv at https://zenodo.org/record/4034261). Initial sequence length was 5 Mbs, and an indel rate of 0.01 was set with indel length drawn from [1, 50000] according to a Zipf distribution with parameter 1.5 (see INDELible manual).

 

For each of the 20 evolutionary distances d = 0.05, 0.10, ..., 1.00, six XZ-compressed files containing 200 simulation data are available:

  • data-d-f1-nogam.tsv.xz   data simulated under the model GTR with equilibrium frequencies f1
  • data-d-f1-gamma.tsv.xz   data simulated under the model GTR+Γ with equilibrium frequencies f1
  • data-d-f2-nogam.tsv.xz   data simulated under the model GTR with equilibrium frequencies f2
  • data-d-f2-gamma.tsv.xz   data simulated under the model GTR+Γ with equilibrium frequencies f2
  • data-d-f3-nogam.tsv.xz   data simulated under the model GTR with equilibrium frequencies f3
  • data-d-f3-gamma.tsv.xz   data simulated under the model GTR+Γ with equilibrium frequencies f3

 

Each file is tab-delimited and contains the 18 following fields:

  • [1]      integer seed value specified to INDELible,
  • [2-5]    frequencies of T, C, A, G, respectively, specified to INDELible,
  • [6-10]  C-T, A-T, G-T, A-C, C-G rate parameters, respectivly (normalized such that A-G rate = 1), specified to INDELible,
  • [11]     Γ shape parameter alpha (= 0 in the nogam files, i.e. GTR substitution model without Γ) specified to INDELible,
  • [12]     length lgt1 of the first sequence seq1 (i.e. no. A, C, G, T in seq1),
  • [13]     length lgt2 of the second sequence seq2 (i.e. no. A, C, G, T in seq2),
  • [14]     no. sites in aligned sequences seq1 and seq2 (i.e. no. A, C, G, T and gap character states in seq1 or seq2),
  • [15]     no. non-gapped sites (core sites) in aligned sequences seq1 and seq2,
  • [16]     observed p-distance between aligned sequences seq1 and seq2 (i.e. no. nucleotide mismatches divided by no. core sites),
  • [17]     aligned seq1 (containing indel gaps),
  • [18]     aligned seq2 (containing indel gaps).

Of note, seq1 and seq2 (fields [17-18]) being aligned, these two entries are two strings with identical no. sites (field [14]). Gap character states (-) should be removed from seq1 and seq2 to obtain the unaligned sequences.

_____

Criscuolo A (2020) On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference. F1000Research, 9:1309. doi:10.12688/f1000research.26930.1

Files

Files (48.5 GB)

Name Size Download all
md5:295203785b54b57379dfc38a5e384a5f
332.7 MB Download
md5:0d5e0b84b9112eb135dd966ee2758bec
336.8 MB Download
md5:d5cde7a490eb72a4a8c0571a32927772
315.1 MB Download
md5:86631b99f691d0ded340f0a4afd2b002
318.3 MB Download
md5:c49245f48af7029f735462c286d48e41
298.5 MB Download
md5:2022e2caea4ed84543a0045b5f4dab71
302.9 MB Download
md5:e152ddd2445244d68d2a8fcda1c0b8f3
373.1 MB Download
md5:4ace2e6331eb4c99eeafa7e5cff78fe2
383.4 MB Download
md5:a4e820020ac7b3691caa3b7f83f00201
353.7 MB Download
md5:1bfcd150f16fa9e5501ef33a54b5f8f0
361.7 MB Download
md5:c392a73e90ea2cf318d0c0a7c725b44c
335.0 MB Download
md5:1f32223652aefdc1be613ac1f9494058
346.2 MB Download
md5:641fcc5c1f93de85a44b22267f0dde81
401.4 MB Download
md5:c1f06d473badbe361c8b63113e664f51
416.7 MB Download
md5:7ed07f3ecf500530f3d8294b947652be
378.6 MB Download
md5:0ebcd3a2183fbdb97ce6cc551567b260
392.8 MB Download
md5:32407df8a7dfa630e35cea08b77f39f0
360.2 MB Download
md5:243bcce4c60037dd8cfa8c82195fe79f
377.1 MB Download
md5:af102250836a2ef245eb7354cb0a0534
420.2 MB Download
md5:de01126fbc88fbd33bb70428917cb442
440.2 MB Download
md5:3b0300af969375f876948e99cf4dd1e4
396.1 MB Download
md5:6bd21a2c2ee653e36909d401510ad44a
415.9 MB Download
md5:3e50983ee40148446f12570e36754d1e
377.2 MB Download
md5:8c2834cb4ebf707a15c13df70e9f5e65
399.4 MB Download
md5:a29855cdf1061b7e6a02834de8356d8b
432.9 MB Download
md5:a09f6133611dad9ebb2c04918a6e1531
458.3 MB Download
md5:fe066f3b3c93b6c20ed8d9d1199f8a24
409.6 MB Download
md5:13208d0735f6bf456e40a516d64a74b8
433.3 MB Download
md5:3134d68c9f407a54723f78d87fab2030
390.7 MB Download
md5:3006271ba4904588460972dcc061f1e6
414.5 MB Download
md5:5a93d84b664b0fdb39427cd83db37681
440.6 MB Download
md5:caa94d19a5b89db5d7cf726f9debd484
471.5 MB Download
md5:63ceb1a374c9e4b031a249da2508253a
417.7 MB Download
md5:cf19b6c906894911be796d5ed3349e54
445.0 MB Download
md5:f1188fb58cfc7c527e3ab6dd6fd34dfb
395.7 MB Download
md5:8bc831fc0cedc35d042c17ad32c23e9d
427.0 MB Download
md5:81c3778589b51c3b3adae49fcebcf8b9
445.0 MB Download
md5:2762c238b31aeb0158c5c1397db47744
480.2 MB Download
md5:f1578159a1e63ce44942f7a146d708fa
422.6 MB Download
md5:fd63b4f792efea5761d9cfa88d025219
456.1 MB Download
md5:31f58d81dd49e73200f8d142e373ede1
398.3 MB Download
md5:c4c664890994110abfd03d67f995e6b5
438.9 MB Download
md5:71dfa552bf5d0e886065769c38fb8d6d
449.1 MB Download
md5:a05543ec845fde4d08caf863579bd32d
489.3 MB Download
md5:788f8787100786e330f6a5cb58dadc3a
423.3 MB Download
md5:3e00dc6e0480db67888c1f8a50a78a8c
464.0 MB Download
md5:af3f36f59df663874a6d5fc64cc4d050
401.0 MB Download
md5:a2caafffd9364513c2d71125b290deec
446.0 MB Download
md5:a7ab7d20d30ed8f924a8de3645dec638
449.7 MB Download
md5:424ccb0d94101deb0383fc49ee29fdbf
495.1 MB Download
md5:27ee2640a5f8df6d46fba2a3b023b480
422.3 MB Download
md5:671e80adc61d3118a3909fa585544363
463.4 MB Download
md5:dbff65cbe03a7f78ad3f6076482e5bc4
401.6 MB Download
md5:175ed174fdf94a4d1767e176e110a161
438.6 MB Download
md5:1517758a2e8bac253ba1a53341ab2efe
445.2 MB Download
md5:05bc823927e03407ab281ded6496224f
487.9 MB Download
md5:0553a29e4245bd68eae53b7e8b89d424
420.7 MB Download
md5:6c4afcf3608c30f0d947531a0fb6ae31
455.2 MB Download
md5:4f0690f5b426e6d6199b56894e95dc63
396.7 MB Download
md5:4b1c31895c215c31ecd71cc66e73da41
429.6 MB Download
md5:e783dba7aa56be0b76aa10f2dfe7c2fc
441.1 MB Download
md5:95f6ff08352888b3e34169e8f083ad6b
477.3 MB Download
md5:c010c2a405b92c5b5d18b6d854fe8a2a
419.1 MB Download
md5:cb95e62dd944d157139604e9a65b7ac0
445.7 MB Download
md5:184f1829b68e0186982a4eab4d14a8ee
395.5 MB Download
md5:23d8f8312792417a683dafd568b7841f
420.3 MB Download
md5:d475373791897735e125b4c15da0b9b6
439.3 MB Download
md5:7b24d25acd96a1860e76f56d4e70b5a9
465.9 MB Download
md5:c18b929b4d6ca03870ba7b28f559e41e
413.0 MB Download
md5:7ca3bb77772df6f11154b319037dd169
435.9 MB Download
md5:a38fecc8f760ac7c55b62acadd65ac6d
392.3 MB Download
md5:c835756c1a69b339a9b0783b2c82fad0
410.2 MB Download
md5:9201a8f0d9d28a2cdede382c72e0865a
432.0 MB Download
md5:5ebc7d9a113a5dbaf64cc79adf2dfe95
456.0 MB Download
md5:2aa821d88543f3e34430d7a499d7329d
410.5 MB Download
md5:8fb20ea2c4c58dced2d298744c46cf3e
426.0 MB Download
md5:b1b46091488ba831568a2dd32fce7778
386.1 MB Download
md5:350add70ea14f305b5ebdff6de999701
401.5 MB Download
md5:a92b3f9f0569663d98da7336f3af152f
429.1 MB Download
md5:7c4c7b611558715c9406a1d36f7650a1
444.9 MB Download
md5:cb2d2151692e6fc2bccd5a4cc82e1824
402.2 MB Download
md5:b7e0832fd7f621b0633bbce6a6332d5d
416.2 MB Download
md5:0cd375097a4a34e69e389025baa6d35b
381.8 MB Download
md5:3126f301d6f032a32013543f2c046c9d
392.4 MB Download
md5:051d97e6c89b4bc8e8aee7d44bcae76d
421.4 MB Download
md5:a87c5e1273041d2307bb48d9e087e5a8
434.3 MB Download
md5:219cfd1b44fc9028a3ea8219af092db2
397.7 MB Download
md5:ba09c204a13b674fa58e2d5424ce3733
406.1 MB Download
md5:a62ecb47423436a2219f4ffed1421d8d
377.4 MB Download
md5:4408f99bfbc05c9a3cc2fd73060b19f4
383.8 MB Download
md5:39943e7425ed13a503fc31a116819e8c
417.5 MB Download
md5:3f84e5ed5ed5d33d83f7eeb3793baa8f
423.8 MB Download
md5:22c092ca7a0a834471163c20bd0c68e2
391.8 MB Download
md5:6f60749896a1332f8e488766c8dec239
396.4 MB Download
md5:2dfe4671b9d396bf555141c0ceb58403
369.9 MB Download
md5:e90af0dfefd2b7dca6b53cd7db24f4f9
375.1 MB Download
md5:aff038e32ef4fe6bed824fa680b528c0
408.4 MB Download
md5:61e4aa9eec1c67dec535138080d5bae0
414.6 MB Download
md5:003ba2e19cad146f67377b8c70f4a085
383.2 MB Download
md5:bf8ebc8a0ce10a878a06176222aa5931
387.7 MB Download
md5:da635506d9cd980863552d5bc2ae4a8d
360.8 MB Download
md5:64b99f66ac30c86a3fd0a51a343cff4f
366.6 MB Download
md5:57b89f2959c3a70a95a9b45d977e1dff
399.8 MB Download
md5:3d3cebec66a70c6b6f6b6b3f11e3536f
404.9 MB Download
md5:af0503018d12367f3c97fb22f81412db
376.1 MB Download
md5:d37d279cc2296c33e793d65c8f133818
378.4 MB Download
md5:d06594dbdabf1628d5365dd394e7fd30
352.8 MB Download
md5:83e7885f5a6ff1aa8f3e803f2f9b5607
358.6 MB Download
md5:1ca15547be87c73345efb6f455b7f37d
390.8 MB Download
md5:4fadbec1c538ac6e1a67a7662dca8e00
394.8 MB Download
md5:5e8ed699f84942bf5fd942b46b9090a6
365.1 MB Download
md5:98473636ca11480a326c2dbb5bcbd91d
370.3 MB Download
md5:32e9897fc6ebc7db75f1ca0c7b3af366
346.0 MB Download
md5:e6515097ac682d4add558c4cbfc437e5
348.7 MB Download
md5:942a05b3b7df87ee1c1684fa267343f7
384.3 MB Download
md5:e0d946112902cbb83c4423405a23b597
385.2 MB Download
md5:460e7e8ef3a07a6ff5ff23a6daa02d11
359.4 MB Download
md5:7656ff613e702b17216903ea6cc691af
361.6 MB Download
md5:5a2d6d574c6a20a5443303d49d9b8d1f
340.1 MB Download
md5:b72e43dd2ea9eb57f0a35428c5e2993f
340.4 MB Download