Predicted Bioactive Peptides from Fermented Foods
Authors/Creators
Description
This repository contains raw files, results, and ML models for predicting bioactivity of peptides from fermented food datasets. This includes both predicted peptides directly from proteomics data of fermented foods and predicted peptides encoded in bacterial genomes assembled from fermented foods substrates. The datasets include:
- 5 Peptidomics Studies from Fermented Foods
- ~11,500 bacterial metagenome-assembled genomes (MAGs) from diverse fermented food metagenomic surveys
For bacterial genomes, peptides were predicted using the bac-mining workflow. For the proteomics datasets, raw protein sequences were collected from each accession. For all datasets physiochemical characteristics and bioactivity predictions were made using the peptide-bioactivity-prediction workflow. The fermented-food-peptidomics-mining-results GitHub repository contains metadata, scripts, notebooks, and figures for these parsed results.
This repository contains the following files:
Peptidomics results:
- all_ff_proteomics_samples_combined.fasta: Raw FASTA protein sequences collected for all 5 of the proteomics studies that were either accessed from PRIDE database files or supplementary files from the study. These sequences are not dereplicated.
- proteomics-peptide-bioactivity-results.tsv: Results file for physicochemical characteristics and bioactivity predictions for all raw sequences from the 5 proteomics studies
Peptides from bacterial genomes results:
- all-MAG-combined-batch-peptides.fasta: Input FASTA file of all 674,113 genome-encoded peptides from the MAG dataset
- genome-peptide-bioactivity-results.tsv: Results file for peptides predicted from ~11,500 bacterial metagenome-assembled genomes (MAGs) and genomes collected from diverse fermented foods.
Machine learning models:
- ANIF2_20260220.zip: Raw model files for an anti-inflammatory bioactivity prediction model built using positive activity sequences from the Peptipedia database.
- 2026-02-11-antiinflammatory-training-dataset.fasta: Raw, non-redundant sequences obtained from Peptipedia that were non-predicted for anti-inflammatory data and used for building the ANIF2_20260220 model.
Files
ANIF2_20260220.zip
Files
(596.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:726574885491277a380e00a321220505
|
63.9 kB | Download |
|
md5:9b28f8a74fdb138634d8a93aacf089de
|
69.8 MB | Download |
|
md5:d584931314246cd27a888a6b3506b0d0
|
8.4 MB | Download |
|
md5:11ef3bd6c5b3ea0dac04c21e1403e2ae
|
49.7 MB | Preview Download |
|
md5:e182ed60d7566043651206766a553d42
|
395.4 MB | Download |
|
md5:04953dd2423a02cdb0614f83578ed0c0
|
72.8 MB | Download |
Additional details
Dates
- Issued
-
2025-08-14