Published March 1, 2024
| Version v1
Dataset
Open
Artificial genome sequences and Illumina reads - Nucleotide divergence (0.01% to 80%) from the SARS-CoV-2 reference (MN908947.3)
- 1. Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
Description
This dataset comprises artificial genome sequences and derived simulated Illumina 150bp paired-end reads with varying percentages of homogeneous nucleotide divergence from the SARS-CoV-2 reference genome (MN908947.3), ranging from 0.01% to 80%. A total of 1990 Illumina 150bp paired-end reads were artificially generated per sample, targeting a 10-fold depth of coverage.
Reads were simulated using ART (https://doi.org/10.1093/bioinformatics/btr708).
The code and instructions to reproduce the artificial sequences and reads are also available in this repository.
Files
fasta.zip
Files
(3.2 MB)
Name | Size | Download all |
---|---|---|
md5:8da3c303ed9d43d907a995d3450b358d
|
1.4 kB | Download |
md5:bd6afb3658c0b7d674092f6524511565
|
145.4 kB | Preview Download |
md5:dbcd549c25a462de1c987aa9ae40eeff
|
3.0 MB | Preview Download |
md5:3434993ade81330198c3289613fdff41
|
71 Bytes | Preview Download |
md5:9950365c0eb19d8d607ebbb47208b485
|
30.4 kB | Download |