DiazoTIME Database: a metabolically-resolved reference database of nitrogen-fixing microbial genomes
Authors/Creators
-
1.
Hamilton College
-
2.
University of Minnesota, Duluth
-
3.
Georgia Institute of Technology
-
4.
Linnaeus University
- 5. US Geological Survey
-
6.
Massachusetts Institute of Technology
-
7.
Lawrence Berkeley National Laboratory
- 8. St. Catherine Universty
-
9.
Michigan Technological University
- 10. Baylor University
Description
The Diazotroph Taxonomic Identity and MEtabolism (DiazoTIME) database contains annotated taxonomy and metabolic predictions for nifH-, nifD-, and nifK- containing genomes (2798 genomes) in the Genome Taxonomy Database (GTDB; r214; Parks et al. 2022). This database provides a useful reference for studies focused on diazotroph biodiversity, environmental distribution, and functional potential.
Table of contents
| Description | File Name |
|
Genome metadata (accession number, taxonomy, metabolic prediction) |
DiazoTIME_GTDBr214_taxonomy_and_METABOLIC.xlsx |
|
List of genomes from GTDB r214 with all 3 nif genes (nifH, nifD, nifK) |
GTDB_r214_AnnoTree_genome_Nifs_N2fixation_potential.xlsx |
| METABOLIC program output | METABOLIC_raw_outputs.xlsx |
| nifH, nifD, nifK nucleotide sequences | gtdb_r214_nifHDK_with_tax.fna.zip |
| nifH, nifD, nifK amino acid sequences | gtdb_r214_nifHDK_with_tax.faa.zip |
| Full genomes nucleotide sequences | gtdb_diazotroph_genome_full_fnas.tar.gz |
| Dictionary linking NCBI and GTDB accessions | combined_gtdb_r214_genome_contigs_dict.txt |
Methods
Database assembly, curation, and taxonomic annotation of genomes
To create a reference database for classifying environmental sequences, we constructed a database of putatively diazotrophic genomes from the Genome Taxonomy Database (r214; Parks et al. 2022). Species-representative genomes containing any of nifH, nifD, or nifK were identified with AnnoTree (Mendler et al. 2019). Since a large number of genomes with nifH (or homologous genes) do not contain any other nif genes (Mise et al. 2021), any genomes without all three nifDHK genes were assumed to not be “true” diazotrophs and were discarded, leaving 2798 genomes (3.3% of GTDB representative genomes) with the full suite of nifHDK genes that were assumed to be capable of N2-fixation, i.e., the “DiazoTIME" database.
To assess the metabolic capabilities of these diazotrophs, we used METABOLIC v4 (Zhou et al. 2022) to annotate the metabolic genes of each diazotroph genome. METABOLIC identifies key functional pathways by aggregating results from genome searches using Hidden Markov Models (HMMs) from KOFam (Kanehisa et al. 2023), TIGR (Li et al. 2021), and select custom models. These gene annotations were used to categorize genomes into broad metabolic categories, focused on energy production and carbon sources.
References
Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. 2023. KEGG for taxonomy-based analysis of pathways and genomes. D1. Nucleic Acids Research 51:D587–D592.
Li W, O’Neill KR, Haft DH, DiCuccio M, Chetvernin V, Badretdin A, Coulouris G, Chitsaz F, Derbyshire MK, Durkin AS, Gonzales NR, Gwadz M, Lanczycki CJ, Song JS, Thanki N, Wang J, Yamashita RA, Yang M, Zheng C, Marchler-Bauer A, Thibaud-Nissen F. 2021. RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucleic Acids Research 49:D1020–D1028.
Mendler K, Chen H, Parks DH, Lobb B, Hug LA, Doxey AC. 2019. AnnoTree: visualization and exploration of a functionally annotated microbial tree of life. Nucleic Acids Res. 47(9):4442-4448. doi: 10.1093/nar/gkz246.
Mise K, Masuda Y, Senoo K, Itoh H. 2021. Undervalued pseudo-nifH sequences in public databases distort metagenomic insights into biological nitrogen fixers. mSphere 6, e00785-21.
Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, Hugenholtz P. 2022. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research 50:D785–D794.
Zhou Z, Tran PQ, Breister AM, Liu Y, Kieft K, Cowley ES, Karaoz U, Anantharaman K. 2022. METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks. 1. Microbiome 10:33.
Files
combined_gtdb_r214_genome_contigs_dict.txt
Files
(4.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:8521a00d1d86b3e444e74bc26f107632
|
642.4 MB | Preview Download |
|
md5:aee4a4fffd84269d0d28e659b50c336c
|
305.6 kB | Download |
|
md5:8a01f14e2357d33508bbc475a0f052a8
|
3.8 GB | Download |
|
md5:48ecc5b43b1fb757a99194e2fd6334c4
|
427.3 kB | Download |
|
md5:fc8ee9fcabc44576c420b2f9035451cc
|
13.0 MB | Preview Download |
|
md5:d56a7be676090f9d11586c354f88ba95
|
25.8 MB | Preview Download |
|
md5:de115d2eb88a9768bae49cc0bf5b2a7c
|
40.2 MB | Download |
Additional details
Funding
- U.S. National Science Foundation
- Aquatic N2-Fixation Research Coordination Network (ANF-RCN) 2015825