VEBA Microeukaryotic Protein Database (MicroEuk100/90/50, Version 3)

Espinoza, Josh

doi:10.5281/zenodo.10139451

Published November 15, 2023 | Version 3

Dataset Open

VEBA Microeukaryotic Protein Database (MicroEuk100/90/50, Version 3)

Espinoza, Josh (Contact person)¹

1. J. Craig Venter Institute

Microeukaryotic protein database consisting of protists and fungi for VEBA.

Number of sequences:

* MicroEuk100 = 79,920,431 (19 GB)

* MicroEuk90 = 51,767,730 (13 GB)

* MicroEuk50 = 29,898,853 (6.5 GB)

Number of source organisms per dataset:

* MycoCosm = 2503

* PhycoCosm = 174

* EnsemblProtists = 233

* MMETSP = 759

* TARA_SAGv1 = 8

* EukProt = 366

* EukZoo = 27

* TARA_SMAGv1 = 389

* NR_Protists-Fungi = 48217

Files:

MicroEuk_v3.tar.gz = 25 GB

-rw-rw---- 1 jespinoz jcl110 19G Nov 15 14:57 MicroEuk100.faa.gz - Main fasta file with 79,920,431 protein sequences from 52,676 source organisms. Uses md5 hash identifiers.

-rw-rw---- 1 jespinoz jcl110 2.0G Nov 15 14:59 identifier_mapping.proteins.tsv.gz - Protein identifier mappings between datasets, original identifiers, source organisms, and md5 hash identifiers.

-rw-rw---- 1 jespinoz jcl110 1.7G Nov 15 16:10 MicroEuk90_clusters.tsv.gz - MMSEQS2 clustering MicroEuk100

-rw-rw---- 1 jespinoz jcl110 1.5G Nov 15 14:57 MicroEuk100.list.gz - List of md5 hash protein identifiers in MicroEuk100

-rw-rw---- 1 jespinoz jcl110 1.1G Nov 15 16:10 MicroEuk50_clusters.tsv.gz - MMSEQS2 clustering MicroEuk90

-rw-rw---- 1 jespinoz jcl110 13M Nov 15 23:39 MicroEuk100.eukaryota_odb10.list.gz - MicroEuk100 protein identifier hits to BUSCO's eukaryota_odb10 marker using the provided score thresholds

-rw-rw---- 1 jespinoz jcl110 1.5M Nov 15 14:58 source_taxonomy.tsv.gz - Source taxonomy, lineage, dataset, and notes for each source organism

For more information and citations, please visit the main GitHub repository:

https://github.com/jolespin/veba

Files

Files (26.2 GB)

Name	Size	Download all
MicroEuk_v3.tar.gz md5:fae810faf99499dc7dcc27b66974f0b6	26.2 GB	Download

	All versions	This version
Views	212	212
Downloads	168	168
Data volume	4.6 TB	4.6 TB

VEBA Microeukaryotic Protein Database (MicroEuk100/90/50, Version 3)

Creators

Description

Files

Files (26.2 GB)