Published February 23, 2026 | Version v2
Dataset Open

Predicted Bioactive Peptides from Fermented Foods

Description

This repository contains raw files, results, and ML models for predicting bioactivity of peptides from fermented food datasets. This includes both predicted peptides directly from proteomics data of fermented foods and predicted peptides encoded in bacterial genomes assembled from fermented foods substrates. The datasets include:

  1. 5 Peptidomics Studies from Fermented Foods
  2. ~11,500 bacterial metagenome-assembled genomes (MAGs) from diverse fermented food metagenomic surveys

For bacterial genomes, peptides were predicted using the bac-mining workflow. For the proteomics datasets, raw protein sequences were collected from each accession. For all datasets physiochemical characteristics and bioactivity predictions were made using the peptide-bioactivity-prediction workflow. The fermented-food-peptidomics-mining-results GitHub repository contains metadata, scripts, notebooks, and figures for these parsed results. 

This repository contains the following files: 

Peptidomics results: 

  • all_ff_proteomics_samples_combined.fasta: Raw FASTA protein sequences collected for all 5 of the proteomics studies that were either accessed from PRIDE database files or supplementary files from the study. These sequences are not dereplicated.
  • proteomics-peptide-bioactivity-results.tsv: Results file for physicochemical characteristics and bioactivity predictions for all raw sequences from the 5 proteomics studies

Peptides from bacterial genomes results: 

  • all-MAG-combined-batch-peptides.fasta: Input FASTA file of all 674,113 genome-encoded peptides from the MAG dataset
  • genome-peptide-bioactivity-results.tsv: Results file for peptides predicted from ~11,500 bacterial metagenome-assembled genomes (MAGs) and genomes collected from diverse fermented foods. 

Machine learning models: 

  • ANIF2_20260220.zip: Raw model files for an anti-inflammatory bioactivity prediction model built using positive activity sequences from the Peptipedia database.
  • 2026-02-11-antiinflammatory-training-dataset.fasta: Raw, non-redundant sequences obtained from Peptipedia that were non-predicted for anti-inflammatory data and used for building the ANIF2_20260220 model.

Files

ANIF2_20260220.zip

Files (596.1 MB)

Name Size Download all
md5:726574885491277a380e00a321220505
63.9 kB Download
md5:9b28f8a74fdb138634d8a93aacf089de
69.8 MB Download
md5:d584931314246cd27a888a6b3506b0d0
8.4 MB Download
md5:11ef3bd6c5b3ea0dac04c21e1403e2ae
49.7 MB Preview Download
md5:e182ed60d7566043651206766a553d42
395.4 MB Download
md5:04953dd2423a02cdb0614f83578ed0c0
72.8 MB Download

Additional details

Dates

Issued
2025-08-14