There is a newer version of the record available.

Published July 20, 2025 | Version v1.1
Dataset Restricted

SNAC-DB: Structural NANOBODY® (VHH) and Antibody (VH-VL) Complex Database

  • 1. Large Molecule Research, Sanofi, Cambridge, MA, United States
  • 2. Department of Chemical and Biomolecular Engineering, Johns Hopkins University, MD, United States
  • 3. Large Molecule Research, Sanofi, Frankfurt, Germany
  • 4. R&D Data & Computational Science, Sanofi, Cambridge, MA, United States

Description

***Please use the latest version. Access to earlier version can be requested directly from the authors.***

Welcome to the SNAC-DB — a comprehensive and curated resource of antibody and NANOBODY® VHH structures designed to support computational modeling, machine learning, structural biology research, and available in ML-ready formats. This release includes dataset curated by using the SNAC-DB pipeline (https://github.com/Sanofi-Public/SNAC-DB) on protein structures sourced from the RCSB PDB (https://www.rcsb.org/), as well as a benchmarking dataset for evaluation.

At the moment, we have processed all PDBs released up until 30 April, 2025 and latest deposit date of 31 March, 2025

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/16226208">Log in</a> to check if you have access.

Additional details

Software

Repository URL
https://github.com/Sanofi-Public/SNAC-DB
Programming language
Python , Jupyter Notebook
Development Status
Active

References

  • H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.
  • Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank (2025) Nucleic Acids Research 53 D564–D574 https://doi.org/10.1093/nar/gkae1091
  • H.M. Berman, K. Henrick, H. Nakamura Announcing the worldwide Protein Data Bank (2003) Nature Structural Biology 10:980 https://doi.org/10.1038/nsb1203-980.
  • van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., and Steinegger, M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology, doi:10.1038/s41587-023-01773-0 (2023)
  • Dunbar, J., & Deane, C. (2015). ANARCI: antigen receptor numbering and receptor classification. Bioinformatics, 32(2), 298–300.
  • Steinegger, M. and Söding, J., (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11), pp.1026-1028.