Published August 21, 2025 | Version v1.2
Dataset Open

SNAC-DB: Structural NANOBODY® (VHH) and Antibody (VH-VL) Complex Database

  • 1. Large Molecule Research, Sanofi, Cambridge, MA, United States
  • 2. Department of Chemical and Biomolecular Engineering, Johns Hopkins University, MD, United States
  • 3. Large Molecule Research, Sanofi, Frankfurt, Germany
  • 4. R&D Data & Computational Science, Sanofi, Cambridge, MA, United States

Description

Welcome to the SNAC-DB — a comprehensive and curated resource of antibody and NANOBODY® VHH structures designed to support computational modeling, machine learning, structural biology research, and available in ML-ready formats. This release includes dataset curated by using the SNAC-DB pipeline (https://github.com/Sanofi-Public/SNAC-DB) on protein structures sourced from the RCSB PDB (https://www.rcsb.org/), as well as a benchmarking dataset for evaluation.

At the moment, we have processed all PDBs released up until 20 August, 2025

Paper: https://www.researchgate.net/publication/393900649_SNAC-DB_The_Hitchhiker's_Guide_to_Building_Better_Predictive_Models_of_Antibody_NANOBODY_R_VHH-Antigen_Complexes

Files

README.md

Files (12.7 GB)

Name Size Download all
md5:6bf7cdffa7353516eee475014a295870
1.5 kB Preview Download
md5:fa7b47339537d10829a3395feb5cfb6f
1.2 kB Preview Download
md5:73533dfb094ecdaca7a63cbed6669efe
14.1 kB Preview Download
md5:f0a7c46b19968315901656de14ba6059
12.7 GB Preview Download

Additional details

Software

Repository URL
https://github.com/Sanofi-Public/SNAC-DB
Programming language
Python, Jupyter Notebook
Development Status
Active

References

  • H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.
  • Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank (2025) Nucleic Acids Research 53 D564–D574 https://doi.org/10.1093/nar/gkae1091
  • H.M. Berman, K. Henrick, H. Nakamura Announcing the worldwide Protein Data Bank (2003) Nature Structural Biology 10:980 https://doi.org/10.1038/nsb1203-980.
  • van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., and Steinegger, M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology, doi:10.1038/s41587-023-01773-0 (2023)
  • Dunbar, J., & Deane, C. (2015). ANARCI: antigen receptor numbering and receptor classification. Bioinformatics, 32(2), 298–300.
  • Steinegger, M. and Söding, J., (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11), pp.1026-1028.