Dataset Open Access

Simulated NGS read datasets for novel human virus prediction

Anonymous

This repository contains simulated Illumina read datasets for novel human virus prediction and associated metadata extracted from the Virus Host Database (https://www.genome.jp/virushostdb/). The reads are 250bp long and were simulated with Mason (https://www.seqan.de/apps/mason/) from genomes downloaded from NCBI. The training-validation-test split was done on whole viral sequences to ensure "novelty" of validation and test viruses. The training sets contain 10 million reads per class, validation sets - 1.25 million reads per class, and test sets - 1.25 million paired reads per class. The negative class sets contain reads simulated from chordate-infecting ("cho"), metazoan-infecting ("met"), eukariote-infecting ("euk") and all-nonhuman viruses. The positive class contains human-infecting viruses. The stratified dataset ("strat") contains an equal number of reads from "cho", "met but not cho", "euk but not met" and "all but not euk". 

Files (3.6 GB)
Name Size
nonpathogenic_test_all_1.fasta.gz
md5:861d93fb6a647f2b9a82a43baccaf464
46.4 MB Download
nonpathogenic_test_all_2.fasta.gz
md5:b0021ff1f483d7ac20adc58cbd4251f4
46.4 MB Download
nonpathogenic_test_cho_1.fasta.gz
md5:be496655e3369507056527e0460bbb93
42.3 MB Download
nonpathogenic_test_cho_2.fasta.gz
md5:1b7a7877c28236e4efa52ba2eaa06607
42.3 MB Download
nonpathogenic_test_euk_1.fasta.gz
md5:b5be592ba56bceb295249719bb552540
43.2 MB Download
nonpathogenic_test_euk_2.fasta.gz
md5:bd8654bd51722d123991c7d22d4a2aa2
43.2 MB Download
nonpathogenic_test_met_1.fasta.gz
md5:bac1f86746fac4cfe851d09cf59ab323
43.3 MB Download
nonpathogenic_test_met_2.fasta.gz
md5:09fef8a7f00f82191579b7fab667dc31
43.3 MB Download
nonpathogenic_test_strat_1.fasta.gz
md5:83e7288b1d8edd8407ad6e49e68f6626
44.6 MB Download
nonpathogenic_test_strat_2.fasta.gz
md5:39e9c26840aa4e554a0a645bac7e004f
44.6 MB Download
nonpathogenic_train_all.fasta.gz
md5:0a472ba2a58677670c4d17ea404ce73e
725.4 MB Download
nonpathogenic_train_cho.fasta.gz
md5:91b78b8c971e1cf88da0291161b45932
679.7 MB Download
nonpathogenic_train_strat.fasta.gz
md5:a82c115d3a091d1bc72d9a87db1f4564
707.2 MB Download
nonpathogenic_val_all.fasta.gz
md5:c74933750380f0d0c8e388c6caa54254
91.6 MB Download
nonpathogenic_val_cho.fasta.gz
md5:49550fa742c3b8896bd153fa61303450
86.3 MB Download
nonpathogenic_val_strat.fasta.gz
md5:e85d57352c4adf24b7daeec470341b32
89.1 MB Download
pathogenic_test_hum_1.fasta.gz
md5:e8e7eaea021d77d2964c8bdc6520a807
39.2 MB Download
pathogenic_test_hum_2.fasta.gz
md5:597d7c290c2acff787810309b7e60393
39.2 MB Download
pathogenic_train_hum.fasta.gz
md5:ba8468b101a92b725df1c54a22e38b27
606.0 MB Download
pathogenic_val_hum.fasta.gz
md5:e8ea78f44db77fc3d55640c8e0a9b8b8
80.4 MB Download
VHDB_1_folds_all_nhuman.rds
md5:6712dee8dc1a7a846938a0e8d223e474
520.3 kB Download
VHDB_1_folds_chordata_nhuman.rds
md5:79178e6444ea4ab27c7660e0a98c0a22
372.7 kB Download
VHDB_1_folds_eukarya_nhuman.rds
md5:54ec3b4e5e1c8861adc52365f11ff52f
454.4 kB Download
VHDB_1_folds_human.rds
md5:03879fe3f7f0c4daf274ffb4b0643d28
346.4 kB Download
VHDB_1_folds_metazoa_nhuman.rds
md5:655eeb5c7de9edc819bb90e61366aa88
396.9 kB Download
66
106
views
downloads
All versions This version
Views 6666
Downloads 106106
Data volume 20.4 GB20.4 GB
Unique views 6161
Unique downloads 2323

Share

Cite as