VEBA Microeukaryotic Protein Database (MicroEuk100/90/50, Version 3)
Description
Microeukaryotic protein database consisting of protists and fungi for VEBA.
Number of sequences:
* MicroEuk100 = 79,920,431 (19 GB)
* MicroEuk90 = 51,767,730 (13 GB)
* MicroEuk50 = 29,898,853 (6.5 GB)
Number of source organisms per dataset:
* MycoCosm = 2503
* PhycoCosm = 174
* EnsemblProtists = 233
* MMETSP = 759
* TARA_SAGv1 = 8
* EukProt = 366
* EukZoo = 27
* TARA_SMAGv1 = 389
* NR_Protists-Fungi = 48217
Files:
MicroEuk_v3.tar.gz = 25 GB
-rw-rw---- 1 jespinoz jcl110 19G Nov 15 14:57 MicroEuk100.faa.gz - Main fasta file with 79,920,431 protein sequences from 52,676 source organisms. Uses md5 hash identifiers.
-rw-rw---- 1 jespinoz jcl110 2.0G Nov 15 14:59 identifier_mapping.proteins.tsv.gz - Protein identifier mappings between datasets, original identifiers, source organisms, and md5 hash identifiers.
-rw-rw---- 1 jespinoz jcl110 1.7G Nov 15 16:10 MicroEuk90_clusters.tsv.gz - MMSEQS2 clustering MicroEuk100
-rw-rw---- 1 jespinoz jcl110 1.5G Nov 15 14:57 MicroEuk100.list.gz - List of md5 hash protein identifiers in MicroEuk100
-rw-rw---- 1 jespinoz jcl110 1.1G Nov 15 16:10 MicroEuk50_clusters.tsv.gz - MMSEQS2 clustering MicroEuk90
-rw-rw---- 1 jespinoz jcl110 13M Nov 15 23:39 MicroEuk100.eukaryota_odb10.list.gz - MicroEuk100 protein identifier hits to BUSCO's eukaryota_odb10 marker using the provided score thresholds
-rw-rw---- 1 jespinoz jcl110 1.5M Nov 15 14:58 source_taxonomy.tsv.gz - Source taxonomy, lineage, dataset, and notes for each source organism
For more information and citations, please visit the main GitHub repository:
Files
Files
(26.2 GB)
Name | Size | Download all |
---|---|---|
md5:fae810faf99499dc7dcc27b66974f0b6
|
26.2 GB | Download |