Published December 14, 2022 | Version 1
Dataset Open

Investigation of machine learning algorithms for taxonomic classification of marine metagenomes

Description

Training, testing, and blind datasets used for machine learning algorithms for taxonomic classification of marine metagenomes:

  1. K12.kmers.txt - 12bp k-mer vocabulary constructed by Jellyfish v1.1.11 from 47,894 genomes in GTDB release 202
  2. MarRef_1.6.tsv - Metadata file downloaded from MarRef v1.6
  3. MarRef.genustrain.fasta - Training set from MarRef v1.6 (seed=808) used for genus classification
  4. MarRef.genustest.fasta - Testing set from MarRef v1.6 (seed=747) used for genus classification 
  5. MarRef.speciestrain.fasta - Training set from MarRef v1.6 (seed=808) used for species classification
  6. MarRef.speciestest.fasta - Testing set from MarRef v1.6 (seed=747) used for species classification
  7. MarRef.traintest.key.tsv - Table containing MarRef accession, GenBank accession, GenBank taxonomy ID, taxonomic information, and labels used for species and genus testing and training
  8. anonymous_reads_*.fq - Blind datasets (1-10) in interleaved fastq format
  9. reads_mapping_*.tsv - Key for blind datasets 1-10. Each sequence header is mapped to its corresponding MarRef accession and NCBI taxonomic ID.

Files

K12.kmers.txt

Files (33.6 GB)

Name Size Download all
md5:b4e8008127bc3db4ecda65d50d98e9cd
210.9 MB Download
md5:55cc3d5f54f52e15a4cbebcc1d1d6ac3
210.9 MB Download
md5:42ebe6f8fbb696079f84a80930087d44
210.8 MB Download
md5:5f8c28ee63f9e0789071dbaa3899b1ed
210.6 MB Download
md5:2a0f7aeb9ce2fc7b98fce547ef51b00d
211.0 MB Download
md5:e0ae75c66906d0acd5f7c5bd8098259e
210.5 MB Download
md5:fc9453d8066bdc876429bc8a4b25f070
210.8 MB Download
md5:6eef7e2069997a141032cde61c044233
210.6 MB Download
md5:55cc3d5f54f52e15a4cbebcc1d1d6ac3
210.9 MB Download
md5:b88c329af9cbd22792501b94901455bf
210.8 MB Download
md5:81ecc9f7c0891e15c922abcc20154a2e
109.1 MB Preview Download
md5:eb751f0d277afebc8df1c10451667dff
7.8 GB Download
md5:6a33e954749ff375e8b1db047b7c6161
7.8 GB Download
md5:61258b5e8e50958d20f19f956d5ad87c
7.8 GB Download
md5:d8a04771a740ca843f9d5cc1779f38f9
7.8 GB Download
md5:89ac42fc41a1f0aefa5f768f426dd368
204.2 kB Download
md5:f2e418d9d182ed6f60a92d0c62b3f77d
2.4 MB Download
md5:b4ce0fe68b115a9c7acc52e8a39e8848
34.5 MB Download
md5:de4639a8243f540a1f38d26f8f63d450
35.5 MB Download
md5:dbd42204c7d29a7cbf18f8c4e7f1b4f8
34.2 MB Download
md5:e814f39f1956336eb71465c0bea27e32
34.5 MB Download
md5:e429bc3a4b70ec36d0d5676416288f03
34.4 MB Download
md5:3026a183d0885546ea2576aaa451582d
34.8 MB Download
md5:d171cc93341ee56cad0413d497d28465
34.6 MB Download
md5:49461fbb9fcc043af681d51e2beb87d8
34.4 MB Download
md5:de4639a8243f540a1f38d26f8f63d450
35.5 MB Download
md5:95511e957c2ac97b5fd2ce70f21048f1
34.4 MB Download