Dataset Open Access

Hg-MATE-Db v1.01142021: Hg-cycling Microorganisms in Aquatic and Terrestrial Ecosystems Database

Caitlin M Gionfriddo; Eric Capo; Benjamin D Peterson; Heyu Lin; Daniel S Jones; Andrea Garcia Bravo; Stefan Bertilsson; John Moreau; Katherine McMahon; Dwayne A Elias; Cynthia Gilmour

Microorganisms play a significant role in regulating the form and fate of mercury (Hg) in aquatic and terrestrial ecosystems. Microbes with the hgcAB gene pair can produce a more toxic, and bioaccumulative form of Hg, methylmercury (MeHg). Microbes that possess the mer operon can demethylate and/or reduce Hg species as part of a detoxification mechanism. Improved techniques for capturing hgcAB and mer presence and diversity are necessary for identifying the major microbial players in environmental Hg cycling. The primary goal of the database Hg-MATE is to provide an up-to-date collated resource of Hg-cycling genes from pure culture and environmental microbial genomes and meta-omic datasets. The current version contains an hgcAB dataset with resources for identifying key microbial producers of the toxin MeHg. Future versions will include a mer gene dataset, which will contain resources for identifying genes of the mer-operon that encode for demethylation of organomercurials (merB), reduction of inorganic Hg(II) (merA), as well as operon regulation (merR), and Hg transport across the cell (merTPC).

The Hg-MATE database v1.01142021 contains: * A catalogue of 1053 HgcAB amino acid sequences . There, HgcAB amino acid sequences are categorized into four types depending on whether they were encoded in pure culture/environmental microbial isolates (ISO), single-cell genome sequences (CEL), metagenome-assembled genomes (MAGs) and environmental meta-omic contig (CON). Included in the database are amino acid sequences of HgcA, HgcB, and concatenated HgcA and HgcB. If hgcB is not co-localized with hgcA in the genome and/or cannot be identified, then ‘na’ will be listed in the ‘HgcB’ sequence column. We collated the HgcAB databases from Gionfriddo et al. 2020 and McDaniel et al. 2020 and added HgcAB amino acid sequences pulled from three public data repositories: NCBI GenBank, JGI-IMG GOLD and GTDB release 89 obtained on 23 October 2020. HgcAB amino acid sequences were identified in these databases by hmmsearch with HgcA and HgcB HMM profiles from Gionfriddo et al. 2020. Other resources generated from pure culture/environmental isolates, single-cell genome sequences, and metagenome-assembled genomes (‘ISOCELMAG’) include * FASTA files containing amino acid sequences of HgcA (‘_HgcA.fas’), HgcB (‘_HgcB.fas’), and concatenated HgcA-HgcB sequences (‘_Hgc.fas’). FASTA files with either unaligned and aligned (msa) amino acid sequences are provided. * Hidden Markov models containing amino acid sequences of HgcA (‘_HgcA.hmm’), HgcB (‘_HgcB.hmm’), and concatenated HgcA-HgcB sequences (‘_Hgc.hmm’). These HMM profiles can be used to detect homologs of hgc genes in meta-omics dataset. * Reference packages that can be used to identify and classify: 1) the cap-helix encoding region of HgcA (‘_HgcA_CH.refpkg‘), for example in Desulfovibrio desulfuricans ND132, this encompasses the CdhD-like encoding region, sites ~37-156 of HgcA (https://www.uniprot.org/uniprot/F0JBF0); 2) full HgcA (‘_HgcA_Full.refpkg‘); and 3) concatenated HgcA and HgcB (‘_HgcA-HgcB.refpkg‘). Each reference package contains sequence alignments, HMM model, phylogenetic tree, and NCBI taxonomy.

We recommend using the resources provided by Hg-MATE (HMM profiles and reference packages) to detect and taxonomically identify hgcAB genes in metagenomes, metatranscriptomes and MAGs. The detection, counting and taxonomic identification of hgcAB genes can be done from raw fastq files with marky-coco introduced in Capo et al. 2022. Briefly, the metagenomes are trimmed and cleaned using fastp (Chen et al. 2018) with following parameters: -q 30 -l 25 --detect_adapter_for_pe --trim_poly_g --trim_poly_x. A de novo single assembly approach is applied using the assembler megahit 1.1.2 (Li et al 2016) with default settings. The annotation of the contigs for prokaryotic protein-coding gene prediction is done with the software prodigal 2.6.3 (Hyatt et al 2010) (Hyatt et al., 2010). The DNA reads are mapped against the contigs with bowtie2 (Langdmead and Salzberg 2012), and the resulting .sam files are converted to .bam files using samtools 1.9 (Li et al 2009). The .bam files and the prodigal output .gff file are used to estimate read counts by using featureCounts (Liao et al 2014). Then, hgcAB homologs are detected and taxonomically identified using the in-house script genesearch.sh associated to HMM profiles and the reference package ‘hgcA’ from the database Hg-MATE version 1.

Files (4.2 MB)
Name Size
Hg-MATE-Db.v1.01142021_catalogue.xlsx
md5:3a648e315e6a91fce26421bdf83c53de
683.3 kB Download
Hg-MATE-Db_v1.01142021.zip
md5:e22f2fed872021a4221eb72f39b2b9ff
3.6 MB Download
  • https://www.biorxiv.org/content/10.1101/2022.03.14.484253v1.abstract

78
1
views
downloads
All versions This version
Views 7878
Downloads 11
Data volume 683.3 kB683.3 kB
Unique views 6666
Unique downloads 11

Share

Cite as