There is a newer version of the record available.

Published November 26, 2019 | Version v1
Dataset Open

Diamond database for taxonomic annotation of fungal metatranscriptomics

Authors/Creators

  • 1. NBIS

Description

This is a protein fasta dataset for use with diamond. The fasta.gz file contains protein sequences for the following:

  • 1,164 genomes downloaded from JGI (taxonomy_taxids.tsv)
  • 121 genomes that are part of the taxmapper database (taxmapper_taxonomy_taxids.tsv) of which 6 were fungal
  • the Hygrophorus russula MG78 genome downloaded from NCBI.

For the H. russula genome, genes were predicted using Augustus (v. 3.2.3) with the laccaria_bicolor model.

The final protein database consists of a total of 17,694,143 protein sequences (14,976,193 from JGI, 2,708,401 from taxmapper and 9,549 from H. russula).

The fasta file and associated taxonomic information files (nodes.dmp.gz & taxonmap.gz) can be used to build a diamond database compatible with diamond version 0.9.22:

zcat fasta.gz | diamond makedb -d diamond --taxonmap taxonmap.gz --taxonnodes nodes.dmp

 

Files

Files (4.5 GB)

Name Size Download all
md5:dd76914ffebaecad62d48aaff4313ebb
4.3 GB Download
md5:18565b0d24fd8693084ec12e7cf0300e
10.7 MB Download
md5:bf5a0c7f013380268b1b5c7899a26c96
13.1 kB Download
md5:f3ba4bac9aeec6f9f8f88c95b9e65a43
124.0 MB Download
md5:10581d606ded5b94732fbaf3165e9f1c
130.9 kB Download