Published January 3, 2026
| Version v1.3
Dataset
Restricted
SNAC-DB: Structural NANOBODY® (VHH) and Antibody (VH-VL) Complex Database
Authors/Creators
- 1. Large Molecule Research, Sanofi, Cambridge, MA, United States
- 2. Department of Chemical and Biomolecular Engineering, Johns Hopkins University, MD, United States
- 3. R&D Data & Computational Science, Sanofi, Cambridge, MA, United States
- 4. Large Molecule Research, Sanofi, Frankfurt, Germany
Description
***Please use the latest version. Access to earlier version can be requested directly from the authors.***
Welcome to the SNAC-DB — a comprehensive and curated resource of antibody and NANOBODY® VHH structures designed to support computational modeling, machine learning, structural biology research, and available in ML-ready formats. This release includes dataset curated by using the SNAC-DB pipeline (https://github.com/Sanofi-Public/SNAC-DB) on protein structures sourced from the RCSB PDB (https://www.rcsb.org/), as well as a benchmarking dataset for evaluation.
At the moment, we have processed all PDBs released up until 31 December, 2025.
Files
Additional details
Software
- Repository URL
- https://github.com/Sanofi-Public/SNAC-DB
- Programming language
- Python , Jupyter Notebook
- Development Status
- Active
References
- H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.
- Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank (2025) Nucleic Acids Research 53 D564–D574 https://doi.org/10.1093/nar/gkae1091
- H.M. Berman, K. Henrick, H. Nakamura Announcing the worldwide Protein Data Bank (2003) Nature Structural Biology 10:980 https://doi.org/10.1038/nsb1203-980.
- van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., and Steinegger, M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology, doi:10.1038/s41587-023-01773-0 (2023)
- Dunbar, J., & Deane, C. (2015). ANARCI: antigen receptor numbering and receptor classification. Bioinformatics, 32(2), 298–300.
- Steinegger, M. and Söding, J., (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11), pp.1026-1028.