There is a newer version of the record available.

Published January 3, 2026 | Version v1.3
Dataset Restricted

SNAC-DB: Structural NANOBODY® (VHH) and Antibody (VH-VL) Complex Database

  • 1. Large Molecule Research, Sanofi, Cambridge, MA, United States
  • 2. Department of Chemical and Biomolecular Engineering, Johns Hopkins University, MD, United States
  • 3. R&D Data & Computational Science, Sanofi, Cambridge, MA, United States
  • 4. Large Molecule Research, Sanofi, Frankfurt, Germany

Description

***Please use the latest version. Access to earlier version can be requested directly from the authors.***

Welcome to the SNAC-DB — a comprehensive and curated resource of antibody and NANOBODY® VHH structures designed to support computational modeling, machine learning, structural biology research, and available in ML-ready formats. This release includes dataset curated by using the SNAC-DB pipeline (https://github.com/Sanofi-Public/SNAC-DB) on protein structures sourced from the RCSB PDB (https://www.rcsb.org/), as well as a benchmarking dataset for evaluation.

At the moment, we have processed all PDBs released up until 31 December, 2025

Paper: https://www.researchgate.net/publication/393900649_SNAC-DB_The_Hitchhiker's_Guide_to_Building_Better_Predictive_Models_of_Antibody_NANOBODY_R_VHH-Antigen_Complexes

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Software

Repository URL
https://github.com/Sanofi-Public/SNAC-DB
Programming language
Python , Jupyter Notebook
Development Status
Active

References

  • H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.
  • Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank (2025) Nucleic Acids Research 53 D564–D574 https://doi.org/10.1093/nar/gkae1091
  • H.M. Berman, K. Henrick, H. Nakamura Announcing the worldwide Protein Data Bank (2003) Nature Structural Biology 10:980 https://doi.org/10.1038/nsb1203-980.
  • van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., and Steinegger, M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology, doi:10.1038/s41587-023-01773-0 (2023)
  • Dunbar, J., & Deane, C. (2015). ANARCI: antigen receptor numbering and receptor classification. Bioinformatics, 32(2), 298–300.
  • Steinegger, M. and Söding, J., (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11), pp.1026-1028.