Published May 12, 2026 | Version v0.1
Preprint Open

End-to-end de novo design of Zn²⁺ metallohydrolase binders: an open-source canonical pipeline anchored by LigandMPNN's metal-coordination recovery

Authors/Creators

  • 1. Genesis_Medicine Lab; HAN PREDICT, Inc.; Recover Korean Medicine Clinic

Description

De novo design of binders against Zn²⁺ metallohydrolases (matrix metalloproteinases, carbonic anhydrases, thermolysins, and related catalytic-metal enzymes) remains one of the most demanding stress tests for modern generative protein modeling. The catalytic geometry of these enzymes depends on a small set of coordinating residues (typically His/Asp/Glu/Cys) whose identity and side-chain rotamer states must be preserved through every stage of an end-to-end design pipeline. We present a fully open-source canonical pipeline that integrates four publicly released components — RFdiffusion3 for backbone generation, LigandMPNN for metal-aware inverse folding, FlowPacker for side-chain refinement, and AlphaFold3 / Boltz-2x / Chai-1 for cofold validation — into a reproducible workflow we apply to the matrix metalloproteinase-1 (MMP-1) catalytic domain. The pivotal stage is sequence design: on the 1HFC reference scaffold (157 residues, 2× Zn²⁺ + 1× Ca²⁺), LigandMPNN recovers 95.3% of the six Zn-coordinating positions versus 46.4% for plain ProteinMPNN. The disparity is most pronounced at the structural-Zn triad (His183/Asp185/His196), where ProteinMPNN scores 0% versus LigandMPNN's 90.6%. An orthogonal ESM-C 600M zero-shot likelihood oracle independently confirms that LigandMPNN sequences are more native-like (mean perplexity 2.85 vs 3.03). We document a silent failure mode — when HETATM lines are stripped during preprocessing, LigandMPNN reports use_ligand_context=True but quietly degenerates to ProteinMPNN behavior — and provide a preflight check. The pipeline composes naturally with neural network potential (NNP) ranking (paper_A) and physicality-steered cofold validation (paper_B, --use_potentials). We argue that this open canonical stack now matches or exceeds the design quality of closed alternatives (AlphaProteo) at zero licensing cost for academic users.

Keywords: de novo enzyme design, metalloenzyme, matrix metalloproteinase, LigandMPNN, RFdiffusion, FlowPacker, AlphaFold3, Boltz-2x, open source, reproducibility.

Notes

HCW is founder of HAN PREDICT, Inc. and consults for Recover Korean Medicine Clinic. No external funding for this work. In silico only. No wet-lab data and no patient data are reported. IRB approval was filed 2026-04-27 and is pending. Recover Korean Medicine Clinic opens 2026-08-15. Direct Zenodo deposit (no prior journal submission). bioRxiv/ChemRxiv door confirmed closed for in-silico-only manuscripts per 2026-04~05 rejection cohort.

Files

22_paper_C.md

Files (135.7 kB)

Name Size Download all
md5:a84affa4bfeedeb64ad4fe7597dab7e9
44.0 kB Preview Download
md5:e4669156e17838d0791ed91b42809408
91.8 kB Preview Download