MarFERReT: an open-source, version-controlled reference library of marine microbial eukaryote functional genes
Description
The emerging field of environmental metatranscriptomics generates large volumes of sequence data about actively transcribed genes in natural environments, and taxonomic annotation of these sequences are dependent on curated reference sequences. For marine microbial eukaryotes, current reference libraries are limited by gaps in sequenced organism diversity and barriers to updating libraries with new sequence data and approximately half of eukaryotic environmental transcripts can be annotated. Here, we introduce Marine Functional EukaRyotic Reference Taxa (MarFERReT), an updated marine microbial eukaryotic sequence library with version-controlled contents designed for taxonomic annotation of eukaryotic metatranscriptomes. MarFERReT contains over 30 million protein sequences from 899 marine eukaryotic genomes and transcriptomes, covering 503 species and 323 genera. Continued expansion of MarFERReT as new reference sequences become available will enable up-to-date taxonomic annotations into the future.
Please see the MarFERReT GitHub repository for full code, documentation and updates:
https://github.com/armbrustlab/marferret
Files
MarFERReT.v1.metadata.csv
Files
(6.1 GB)
Name | Size | Download all |
---|---|---|
md5:66e56f4bf64f2e867f12a149815a4c50
|
228.7 kB | Preview Download |
md5:8dfed13849cc2b97c4dafa6bb925c568
|
655.7 MB | Download |
md5:9df5de32ba5a62aed449634acf48d4a5
|
5.1 GB | Download |
md5:141e1071d80dcdbf031f86d149097d9c
|
271.0 MB | Download |
md5:9305b80a83087e0e4ed344b24d8c3f31
|
99.6 MB | Download |