Retrained-ProteinMPNN-versions
Contributors
Producer:
Project leader (2):
Researcher (4):
Description
This repository contains retrained versions of the ProteinMPNN model for reverse protein design within customized protein spaces.
We customise the original training dataset (1) by removing all experimental entries annotated as hydrolases (2) by removing all experimental entries annotated as any class of enzymes. For filtering, we first consider the annotation related to the Enzyme Commission (E.C) number retrieved from the PDB database and then combine it with the predictions from CLEAN (contrastive learning algorithm). CLEAN makes a prediction of the function by protein sequence based on the pairwise distances between the query sequence and all functional cluster centres of the ESM-1b embeddings, assigning E.C. numbers per protein.
In total, four new ProteinMPNN models have been released. All are included and described in the manuscript Adaptive and Spandrel-like Constraints at Functional Sites in Protein Folds (https://doi.org/10.64898/2026.02.09.704872).
The custom datasets used for retraining and the resulting model weights at their best are available for download here.
NOTE: Related code and documentation available at https://github.com/miriampol2c/architectural-constraints
Files
data-for-training.zip
Files
(110.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:62e2578e009c04e4228bb3310f53465e
|
85.3 MB | Preview Download |
|
md5:20878b80dfd07e730f71bb7648f77b66
|
24.8 MB | Preview Download |
Additional details
Related works
- Is documented by
- Software documentation: https://github.com/miriampol2c/architectural-constraints (URL)
- Is supplement to
- Data paper: 10.5281/zenodo.18922046 (DOI)
Funding
Software
- Repository URL
- https://github.com/miriampol2c/architectural-constraints
- Programming language
- Python , R
- Development Status
- Active