There is a newer version of the record available.

Published August 6, 2025 | Version v1
Dataset Open

Predicted Bioactive Peptides from Fermented Foods

Description

This repository contains raw files, results, and ML models for predicting bioactivity of peptides from fermented food datasets. This includes both predicted peptides directly from proteomics data of fermented foods and predicted peptides encoded in bacterial genomes assembled from fermented foods substrates. The datasets include:

  1. 5 Peptidomics Studies from Fermented Foods
  2. ~200 Bacterial Isolates from the BacDive database collected from various fermented foods
  3. ~11,500 bacterial metagenome-assembled genomes (MAGs) from diverse fermented food metagenomic surveys

For bacterial genomes, peptides were predicted using the bac-mining workflow. For the proteomics datasets, raw protein sequences were collected from each accession. For all datasets physiochemical characteristics and bioactivity predictions were made using the peptide-bioactivity-prediction workflow. The fermented-food-peptidomics-mining-results GitHub repository contains metadata, scripts, notebooks, and figures for these parsed results. 

This repository contains the following files: 

Peptidomics results: 

  • all_ff_proteomics_samples_combined.fasta: Raw FASTA protein sequences collected for all 5 of the proteomics studies that were either accessed from PRIDE database files or supplementary files from the study. These sequences are not dereplicated.
  • ff_peptidomics_peptides_predictions.tsv: Results file for physicochemical characteristics and bioactivity predictions for all raw sequences from the 5 proteomics studies

Peptides from bacterial genomes results: 

We analyzed peptides from two different sources of bacterial genomes that we curated. Because there may be some overlap in these sources of genomes, we analyze the resulting peptides results separately. Documentation for how the sets of genomes were curated can be found on GitHub.

  • 2025-02-24-bacdive-peptides-predictions.tsv: Results file for peptides predicted from ~200 bacterial genomes collected from the BacDive database that are from isolates, have some sort of metadata in BacDive, and a corresponding publicly available genome in Genbank/RefSeq.
  • 2025-08-05-mag-bioactivity-info.tsv: Results file for peptides predicted from ~11,500 bacterial metagenome-assembled genomes (MAGs) and genomes collected from diverse fermented foods. 

Machine learning models: 

  • ANIF_1.zip: Raw model files for an anti-inflammatory bioactivity prediction model built using positive and negative activity sequences from the Peptipedia database
  • ANIF_2_BM.zip: Raw model files for anti-inflammatory bioactivity prediction model built using a benchmark dataset that contains positive and negative sequences from the Immune Epitope Database 
  • ANIF_benchmark_data.zip: Raw FASTA sequences of the anti-inflammatory benchmark data used to build the ensemble model contained in the ANIF_2_BM.zip file
  • IMM_1.zip: Raw model files for an immunomodulatory bioactivity prediction model built using both positive and negative activity sequences from the Peptipedia database.

Files

ANIF_1.zip

Files (777.7 MB)

Name Size Download all
md5:73439075254ec5bb242cb4370eb1c4e8
10.7 MB Download
md5:fe33b65219a10481c8462e14d6725b29
580.5 MB Download
md5:d584931314246cd27a888a6b3506b0d0
8.4 MB Download
md5:2430336dd09138df8012205089073c33
72.0 MB Preview Download
md5:fb24a65e0b448f1f45843649d95ac2b4
35.0 MB Preview Download
md5:f30e2a640534366f0ef175189d704b1c
49.6 kB Preview Download
md5:ed92a81ba2779c4868172f6ec1258794
62.2 MB Download
md5:cef2107891fdbd1b2f4720905c39a862
8.8 MB Preview Download

Additional details

Dates

Issued
2025-08-14