Published August 21, 2025
| Version v1.2
Dataset
Open
SNAC-DB: Structural NANOBODY® (VHH) and Antibody (VH-VL) Complex Database
Creators
- 1. Large Molecule Research, Sanofi, Cambridge, MA, United States
- 2. Department of Chemical and Biomolecular Engineering, Johns Hopkins University, MD, United States
- 3. Large Molecule Research, Sanofi, Frankfurt, Germany
- 4. R&D Data & Computational Science, Sanofi, Cambridge, MA, United States
Description
Welcome to the SNAC-DB — a comprehensive and curated resource of antibody and NANOBODY® VHH structures designed to support computational modeling, machine learning, structural biology research, and available in ML-ready formats. This release includes dataset curated by using the SNAC-DB pipeline (https://github.com/Sanofi-Public/SNAC-DB) on protein structures sourced from the RCSB PDB (https://www.rcsb.org/), as well as a benchmarking dataset for evaluation.
At the moment, we have processed all PDBs released up until 20 August, 2025.
Files
README.md
Additional details
Software
- Repository URL
- https://github.com/Sanofi-Public/SNAC-DB
- Programming language
- Python, Jupyter Notebook
- Development Status
- Active
References
- H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.
- Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank (2025) Nucleic Acids Research 53 D564–D574 https://doi.org/10.1093/nar/gkae1091
- H.M. Berman, K. Henrick, H. Nakamura Announcing the worldwide Protein Data Bank (2003) Nature Structural Biology 10:980 https://doi.org/10.1038/nsb1203-980.
- van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., and Steinegger, M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology, doi:10.1038/s41587-023-01773-0 (2023)
- Dunbar, J., & Deane, C. (2015). ANARCI: antigen receptor numbering and receptor classification. Bioinformatics, 32(2), 298–300.
- Steinegger, M. and Söding, J., (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11), pp.1026-1028.