Published July 10, 2025
| Version v1
Dataset
Open
Predicted Biosynthetic Gene Clusters and Peptides from Fermented Food Microbial Genomes
Creators
Description
This repository contains results and raw files from running the MicrocosmFoods/bac-mining workflow on ~11,500 bacterial genomes assembled from diverse fermented foods. Specifically, biosynthetic gene clusters (BGCs) were predicted using antiSMASH and two different peptide types - small ORFs (smORFs) and cleavage peptides were predicted on these set of genomes. This repository contains the following files:
- all_molecule_counts.tsv - This is the main summary file that summarizes for each genome the count of each type of molecule such as certain types of BGCs, smorfs, and cleavage peptides. The corresponding metadata for these genomes can be found in the Fermented Foods Microbial Genomes Database Zenodo repository
- all_smorfinder_results.tsv - All combined results output from smorfinder
- all_deeppeptide_results.tsv - All combined results from DeepPeptide for predicting cleavage peptides
- all-MAG-combined-batch-peptides.fasta - Protein sequences for all peptide sequences, including smorfs, cleavage peptides (not including predicted propeptide sequences) and core RiPP sequences predicted from antiSMASH, if they were found
- 2025-06-08-mag-antismash-predictions.tar.gz - All predicted antiSMASH results for each of the ~11,500 bacterial genomes. The decompressed archive is split by genome, so for example each subdirectory for a genome contains:
- genome_name.json - The json summary file of all identified BGCs
- genome_name.log - The logfile from the antiSMASH run
- genome_name*.gbk - Each biosynthetic gene cluster identified in GBK format