Published December 14, 2022
| Version 1
Dataset
Open
Investigation of machine learning algorithms for taxonomic classification of marine metagenomes
Description
Training, testing, and blind datasets used for machine learning algorithms for taxonomic classification of marine metagenomes:
- K12.kmers.txt - 12bp k-mer vocabulary constructed by Jellyfish v1.1.11 from 47,894 genomes in GTDB release 202
- MarRef_1.6.tsv - Metadata file downloaded from MarRef v1.6
- MarRef.genustrain.fasta - Training set from MarRef v1.6 (seed=808) used for genus classification
- MarRef.genustest.fasta - Testing set from MarRef v1.6 (seed=747) used for genus classification
- MarRef.speciestrain.fasta - Training set from MarRef v1.6 (seed=808) used for species classification
- MarRef.speciestest.fasta - Testing set from MarRef v1.6 (seed=747) used for species classification
- MarRef.traintest.key.tsv - Table containing MarRef accession, GenBank accession, GenBank taxonomy ID, taxonomic information, and labels used for species and genus testing and training
- anonymous_reads_*.fq - Blind datasets (1-10) in interleaved fastq format
- reads_mapping_*.tsv - Key for blind datasets 1-10. Each sequence header is mapped to its corresponding MarRef accession and NCBI taxonomic ID.
Files
K12.kmers.txt
Files
(33.6 GB)
Name | Size | Download all |
---|---|---|
md5:b4e8008127bc3db4ecda65d50d98e9cd
|
210.9 MB | Download |
md5:55cc3d5f54f52e15a4cbebcc1d1d6ac3
|
210.9 MB | Download |
md5:42ebe6f8fbb696079f84a80930087d44
|
210.8 MB | Download |
md5:5f8c28ee63f9e0789071dbaa3899b1ed
|
210.6 MB | Download |
md5:2a0f7aeb9ce2fc7b98fce547ef51b00d
|
211.0 MB | Download |
md5:e0ae75c66906d0acd5f7c5bd8098259e
|
210.5 MB | Download |
md5:fc9453d8066bdc876429bc8a4b25f070
|
210.8 MB | Download |
md5:6eef7e2069997a141032cde61c044233
|
210.6 MB | Download |
md5:55cc3d5f54f52e15a4cbebcc1d1d6ac3
|
210.9 MB | Download |
md5:b88c329af9cbd22792501b94901455bf
|
210.8 MB | Download |
md5:81ecc9f7c0891e15c922abcc20154a2e
|
109.1 MB | Preview Download |
md5:eb751f0d277afebc8df1c10451667dff
|
7.8 GB | Download |
md5:6a33e954749ff375e8b1db047b7c6161
|
7.8 GB | Download |
md5:61258b5e8e50958d20f19f956d5ad87c
|
7.8 GB | Download |
md5:d8a04771a740ca843f9d5cc1779f38f9
|
7.8 GB | Download |
md5:89ac42fc41a1f0aefa5f768f426dd368
|
204.2 kB | Download |
md5:f2e418d9d182ed6f60a92d0c62b3f77d
|
2.4 MB | Download |
md5:b4ce0fe68b115a9c7acc52e8a39e8848
|
34.5 MB | Download |
md5:de4639a8243f540a1f38d26f8f63d450
|
35.5 MB | Download |
md5:dbd42204c7d29a7cbf18f8c4e7f1b4f8
|
34.2 MB | Download |
md5:e814f39f1956336eb71465c0bea27e32
|
34.5 MB | Download |
md5:e429bc3a4b70ec36d0d5676416288f03
|
34.4 MB | Download |
md5:3026a183d0885546ea2576aaa451582d
|
34.8 MB | Download |
md5:d171cc93341ee56cad0413d497d28465
|
34.6 MB | Download |
md5:49461fbb9fcc043af681d51e2beb87d8
|
34.4 MB | Download |
md5:de4639a8243f540a1f38d26f8f63d450
|
35.5 MB | Download |
md5:95511e957c2ac97b5fd2ce70f21048f1
|
34.4 MB | Download |