Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published March 1, 2024 | Version v1
Dataset Open

Artificial genome sequences and Illumina reads - Nucleotide divergence (0.01% to 80%) from the SARS-CoV-2 reference (MN908947.3)

  • 1. Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal

Description

This dataset comprises artificial genome sequences and derived simulated Illumina 150bp paired-end reads with varying percentages of homogeneous nucleotide divergence from the SARS-CoV-2 reference genome (MN908947.3), ranging from 0.01% to 80%.  A total of 1990 Illumina 150bp paired-end reads were artificially  generated per sample, targeting a 10-fold depth of coverage.

Reads were simulated using ART (https://doi.org/10.1093/bioinformatics/btr708).

The code and instructions to reproduce the artificial sequences and reads are also available in this repository.

 

Files

fasta.zip

Files (3.2 MB)

Name Size Download all
md5:8da3c303ed9d43d907a995d3450b358d
1.4 kB Download
md5:bd6afb3658c0b7d674092f6524511565
145.4 kB Preview Download
md5:dbcd549c25a462de1c987aa9ae40eeff
3.0 MB Preview Download
md5:3434993ade81330198c3289613fdff41
71 Bytes Preview Download
md5:9950365c0eb19d8d607ebbb47208b485
30.4 kB Download