Published May 14, 2025 | Version 1
Dataset Open

DiazoTIME Database: a metabolically-resolved reference database of nitrogen-fixing microbial genomes

  • 1. ROR icon Hamilton College
  • 2. ROR icon University of Minnesota, Duluth
  • 3. ROR icon Georgia Institute of Technology
  • 4. ROR icon Linnaeus University
  • 5. US Geological Survey
  • 6. ROR icon Massachusetts Institute of Technology
  • 7. ROR icon Lawrence Berkeley National Laboratory
  • 8. St. Catherine Universty
  • 9. ROR icon Michigan Technological University
  • 10. Baylor University

Description

The Diazotroph Taxonomic Identity and MEtabolism (DiazoTIME) database contains annotated taxonomy and metabolic predictions for nifH-, nifD-, and nifK- containing genomes (2798 genomes) in the Genome Taxonomy Database (GTDB; r214; Parks et al. 2022). This database provides a useful reference for studies focused on diazotroph biodiversity, environmental distribution, and functional potential. 

 

Table of contents

Description File Name

Genome metadata (accession number, taxonomy, metabolic prediction)

DiazoTIME_GTDBr214_taxonomy_and_METABOLIC.xlsx

List of genomes from GTDB r214 with all 3 nif genes (nifH, nifD, nifK)

GTDB_r214_AnnoTree_genome_Nifs_N2fixation_potential.xlsx
METABOLIC program output METABOLIC_raw_outputs.xlsx
nifHnifDnifK nucleotide sequences gtdb_r214_nifHDK_with_tax.fna.zip
nifH, nifD, nifK amino acid sequences gtdb_r214_nifHDK_with_tax.faa.zip
Full genomes nucleotide sequences gtdb_diazotroph_genome_full_fnas.tar.gz
Dictionary linking NCBI and GTDB accessions combined_gtdb_r214_genome_contigs_dict.txt

Methods

Database assembly, curation, and taxonomic annotation of genomes

To create a reference database for classifying environmental sequences, we constructed a database of putatively diazotrophic genomes from the Genome Taxonomy Database (r214; Parks et al. 2022). Species-representative genomes containing any of nifH, nifD, or nifK were identified with AnnoTree (Mendler et al. 2019). Since a large number of genomes with nifH (or homologous genes) do not contain any other nif genes (Mise et al. 2021), any genomes without all three nifDHK genes were assumed to not be “true” diazotrophs and were discarded, leaving 2798 genomes (3.3% of GTDB representative genomes) with the full suite of nifHDK genes that were assumed to be capable of N2-fixation, i.e., the “DiazoTIME" database. 

To assess the metabolic capabilities of these diazotrophs, we used METABOLIC v4 (Zhou et al. 2022) to annotate the metabolic genes of each diazotroph genome. METABOLIC identifies key functional pathways by aggregating results from genome searches using Hidden Markov Models (HMMs) from KOFam (Kanehisa et al. 2023), TIGR (Li et al. 2021), and select custom models. These gene annotations were used to categorize genomes into broad metabolic categories, focused on energy production and carbon sources. 

 

References

Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. 2023. KEGG for taxonomy-based analysis of pathways and genomes. D1. Nucleic Acids Research 51:D587–D592.

Li W, O’Neill KR, Haft DH, DiCuccio M, Chetvernin V, Badretdin A, Coulouris G, Chitsaz F, Derbyshire MK, Durkin AS, Gonzales NR, Gwadz M, Lanczycki CJ, Song JS, Thanki N, Wang J, Yamashita RA, Yang M, Zheng C, Marchler-Bauer A, Thibaud-Nissen F. 2021. RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucleic Acids Research 49:D1020–D1028.

Mendler K, Chen H, Parks DH, Lobb B, Hug LA, Doxey AC. 2019. AnnoTree: visualization and exploration of a functionally annotated microbial tree of life. Nucleic Acids Res. 47(9):4442-4448. doi: 10.1093/nar/gkz246. 

Mise K, Masuda Y, Senoo K, Itoh H. 2021. Undervalued pseudo-nifH sequences in public databases distort metagenomic insights into biological nitrogen fixers. mSphere 6, e00785-21.

Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, Hugenholtz P. 2022. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research 50:D785–D794.

Zhou Z, Tran PQ, Breister AM, Liu Y, Kieft K, Cowley ES, Karaoz U, Anantharaman K. 2022. METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks. 1. Microbiome 10:33.

Files

combined_gtdb_r214_genome_contigs_dict.txt

Files (4.5 GB)

Name Size Download all
md5:8521a00d1d86b3e444e74bc26f107632
642.4 MB Preview Download
md5:aee4a4fffd84269d0d28e659b50c336c
305.6 kB Download
md5:8a01f14e2357d33508bbc475a0f052a8
3.8 GB Download
md5:48ecc5b43b1fb757a99194e2fd6334c4
427.3 kB Download
md5:fc8ee9fcabc44576c420b2f9035451cc
13.0 MB Preview Download
md5:d56a7be676090f9d11586c354f88ba95
25.8 MB Preview Download
md5:de115d2eb88a9768bae49cc0bf5b2a7c
40.2 MB Download

Additional details

Funding

U.S. National Science Foundation
Aquatic N2-Fixation Research Coordination Network (ANF-RCN) 2015825