EukZoo, an aquatic protistan protein database for meta-omics studies.

Liu, Zhenfeng; Hu, Sarah; Caron, David

doi:10.5281/zenodo.1476236

Published October 31, 2018 | Version 0.2

Dataset Open

EukZoo, an aquatic protistan protein database for meta-omics studies.

1. University of Southern California

This database contain protein sequences of aquatic microbial eukaryotes, or protists. The purpose of this is to make a database that is of reasonable quality to serve as resource for both taxonomy and functional interpretation of metagenomic and metatranscriptomic studies of protists. The source of the sequences were mainly from Marine Microbial Eukaryotes Transcriptome Sequencing Project (MMETSP), and supplemented with various genomes and transcriptomes of organisms that were not a part of MMETSP.

To use this database, one has to understand the main function of the three files here.

(1) The protein sequences are stored in .faa file. You can build an alignment/search database out of that and search your meta-omics sequences against it. Each sequence in the FASTA file has an ID which always consists of two parts like this: "MMETSP0004_1234567". The text before the first underscore is the source ID of that sequence.

(2) Taxonomy information of each source ID are stored in "EukZoo_taxonomy_table_v_0.2.tsv". One can use the information within in conjunction with database search results to assign taxonomy to sequences.

(3) KEGG annotation of each sequence are stored in "EukZoo_KEGG_annotation_v_0.2.tsv". One can use the information within in conjunction with database search results to assign KEGG functional annotation (KO ID) to sequences.

I also provide scripts to assign taxonomy and KEGG annotation from database search results. You can also find the scripts and explanations on how to use them on the EukZoo GitHub page. You will find details on how the database was created and curated on there as well.

Please contact me at zhenfeng.liu1@gmail.com if you have any questions or requests. Thank you for your interest in EukZoo.

Files

Files (3.9 GB)

Name	Size
EukZoo_creation_and_cleanup.docx md5:58d2cd88847d378a6f7ed2c190382d72	529.3 kB	Download
EukZoo_KEGG_annotation_v_0.2.tsv md5:d898b9061517fd3b6c6dbe5a1bf4266e	100.0 MB	Download
EukZoo_taxonomy_table_v_0.2.tsv md5:91821d18415b9021751cc5ae95820f98	107.2 kB	Download
EukZoo_v_0.2.faa md5:4a753abebff09f700927d039a4ba4d1c	3.8 GB	Download

	All versions	This version
Views	1,206	1,197
Downloads	768	762
Data volume	1.6 TB	1.6 TB

EukZoo, an aquatic protistan protein database for meta-omics studies.

Authors/Creators

Description

Files

Files (3.9 GB)