Dataset Open Access

EukZoo, an aquatic protistan protein database for meta-omics studies.

Liu, Zhenfeng; Hu, Sarah; Caron, David

This database contain protein sequences of aquatic microbial eukaryotes, or protists. The purpose of this is to make a database that is of reasonable quality to serve as resource for both taxonomy and functional interpretation of metagenomic and metatranscriptomic studies of protists. The source of the sequences were mainly from Marine Microbial Eukaryotes Transcriptome Sequencing Project (MMETSP), and supplemented with various genomes and transcriptomes of organisms that were not a part of MMETSP.

To use this database, one has to understand the main function of the three files here.

(1) The protein sequences are stored in .faa file. You can build an alignment/search database out of that and search your meta-omics sequences against it. Each sequence in the FASTA file has an ID which always consists of two parts like this: "MMETSP0004_1234567". The text before the first underscore is the source ID of that sequence.

(2) Taxonomy information of each source ID are stored in "EukZoo_taxonomy_table_v_0.2.tsv". One can use the information within in conjunction with database search results to assign taxonomy to sequences.

(3) KEGG annotation of each sequence are stored in "EukZoo_KEGG_annotation_v_0.2.tsv". One can use the information within in conjunction with database search results to assign KEGG functional annotation (KO ID) to sequences.

I also provide scripts to assign taxonomy and KEGG annotation from database search results. You can also find the scripts and explanations on how to use them on the EukZoo GitHub page. You will find details on how the database was created and curated on there as well.

Please contact me at zhenfeng.liu1@gmail.com if you have any questions or requests. Thank you for your interest in EukZoo.

Files (3.9 GB)
Name Size
EukZoo_creation_and_cleanup.docx
md5:58d2cd88847d378a6f7ed2c190382d72
529.3 kB Download
EukZoo_KEGG_annotation_v_0.2.tsv
md5:d898b9061517fd3b6c6dbe5a1bf4266e
100.0 MB Download
EukZoo_taxonomy_table_v_0.2.tsv
md5:91821d18415b9021751cc5ae95820f98
107.2 kB Download
EukZoo_v_0.2.faa
md5:4a753abebff09f700927d039a4ba4d1c
3.8 GB Download
273
256
views
downloads
All versions This version
Views 273273
Downloads 256256
Data volume 639.3 GB639.3 GB
Unique views 254254
Unique downloads 8181

Share

Cite as