Published March 11, 2026 | Version v1
Model Open

Retrained-ProteinMPNN-versions

Authors/Creators

  • 1. ROR icon Barcelona Supercomputing Center
  • 1. ROR icon Barcelona Supercomputing Center
  • 2. ROR icon Institució Catalana de Recerca i Estudis Avançats
  • 3. ROR icon Technical University of Munich
  • 4. EDMO icon Helmholtz Munich
  • 5. ROR icon Institute for Advanced Study

Description

This repository contains retrained versions of the ProteinMPNN model for reverse protein design within customized protein spaces.

We customise the original training dataset (1) by removing all experimental entries annotated as hydrolases (2) by removing all experimental entries annotated as any class of enzymes. For filtering, we first consider the annotation related to the Enzyme Commission (E.C) number retrieved from the PDB database and then combine it with the predictions from CLEAN (contrastive learning algorithm). CLEAN makes a prediction of the function by protein sequence based on the pairwise distances between the query sequence and all functional cluster centres of the ESM-1b embeddings, assigning E.C. numbers per protein.

In total, four new ProteinMPNN models have been released. All are included and described in the manuscript Adaptive and Spandrel-like Constraints at Functional Sites in Protein Folds (https://doi.org/10.64898/2026.02.09.704872).

The custom datasets used for retraining and the resulting model weights at their best are available for download here.

NOTE: Related code and documentation available at https://github.com/miriampol2c/architectural-constraints

Files

data-for-training.zip

Files (110.1 MB)

Name Size Download all
md5:62e2578e009c04e4228bb3310f53465e
85.3 MB Preview Download
md5:20878b80dfd07e730f71bb7648f77b66
24.8 MB Preview Download

Additional details

Related works

Is documented by
Software documentation: https://github.com/miriampol2c/architectural-constraints (URL)
Is supplement to
Data paper: 10.5281/zenodo.18922046 (DOI)

Funding

Agencia Estatal de Investigación
Ramon y Cajal program RYC2023-043825-I
Agencia Estatal de Investigación
MEGAFrustratEDS grant - Plan Nacional PID2024-159128OA-I00

Software

Repository URL
https://github.com/miriampol2c/architectural-constraints
Programming language
Python , R
Development Status
Active