Predicted Bioactive Peptides from Fermented Foods
Authors/Creators
Description
This repository contains raw files, results, and ML models for predicting bioactivity of peptides from fermented food datasets. This includes both predicted peptides directly from proteomics data of fermented foods and predicted peptides encoded in bacterial genomes assembled from fermented foods substrates. The datasets include:
- 5 Peptidomics Studies from Fermented Foods
- ~200 Bacterial Isolates from the BacDive database collected from various fermented foods
- ~11,500 bacterial metagenome-assembled genomes (MAGs) from diverse fermented food metagenomic surveys
For bacterial genomes, peptides were predicted using the bac-mining workflow. For the proteomics datasets, raw protein sequences were collected from each accession. For all datasets physiochemical characteristics and bioactivity predictions were made using the peptide-bioactivity-prediction workflow. The fermented-food-peptidomics-mining-results GitHub repository contains metadata, scripts, notebooks, and figures for these parsed results.
This repository contains the following files:
Peptidomics results:
- all_ff_proteomics_samples_combined.fasta: Raw FASTA protein sequences collected for all 5 of the proteomics studies that were either accessed from PRIDE database files or supplementary files from the study. These sequences are not dereplicated.
- ff_peptidomics_peptides_predictions.tsv: Results file for physicochemical characteristics and bioactivity predictions for all raw sequences from the 5 proteomics studies
Peptides from bacterial genomes results:
We analyzed peptides from two different sources of bacterial genomes that we curated. Because there may be some overlap in these sources of genomes, we analyze the resulting peptides results separately. Documentation for how the sets of genomes were curated can be found on GitHub.
- 2025-02-24-bacdive-peptides-predictions.tsv: Results file for peptides predicted from ~200 bacterial genomes collected from the BacDive database that are from isolates, have some sort of metadata in BacDive, and a corresponding publicly available genome in Genbank/RefSeq.
- 2025-08-05-mag-bioactivity-info.tsv: Results file for peptides predicted from ~11,500 bacterial metagenome-assembled genomes (MAGs) and genomes collected from diverse fermented foods.
Machine learning models:
- ANIF_1.zip: Raw model files for an anti-inflammatory bioactivity prediction model built using positive and negative activity sequences from the Peptipedia database
- ANIF_2_BM.zip: Raw model files for anti-inflammatory bioactivity prediction model built using a benchmark dataset that contains positive and negative sequences from the Immune Epitope Database
- ANIF_benchmark_data.zip: Raw FASTA sequences of the anti-inflammatory benchmark data used to build the ensemble model contained in the ANIF_2_BM.zip file
- IMM_1.zip: Raw model files for an immunomodulatory bioactivity prediction model built using both positive and negative activity sequences from the Peptipedia database.
Files
ANIF_1.zip
Files
(777.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:73439075254ec5bb242cb4370eb1c4e8
|
10.7 MB | Download |
|
md5:fe33b65219a10481c8462e14d6725b29
|
580.5 MB | Download |
|
md5:d584931314246cd27a888a6b3506b0d0
|
8.4 MB | Download |
|
md5:2430336dd09138df8012205089073c33
|
72.0 MB | Preview Download |
|
md5:fb24a65e0b448f1f45843649d95ac2b4
|
35.0 MB | Preview Download |
|
md5:f30e2a640534366f0ef175189d704b1c
|
49.6 kB | Preview Download |
|
md5:ed92a81ba2779c4868172f6ec1258794
|
62.2 MB | Download |
|
md5:cef2107891fdbd1b2f4720905c39a862
|
8.8 MB | Preview Download |
Additional details
Dates
- Issued
-
2025-08-14