SNAC-DB: Structural NANOBODY® (VHH) and Antibody (VH-VL) Complex Database
Creators
- 1. Large Molecule Research, Sanofi, Cambridge, MA, United States
- 2. Department of Chemical and Biomolecular Engineering, Johns Hopkins University, MD, United States
- 3. Large Molecule Research, Sanofi, Frankfurt, Germany
- 4. R&D Data & Computational Science, Sanofi, Cambridge, MA, United States
Description
Welcome to the SNAC-DB — a comprehensive and curated resource of antibody and NANOBODY® VHH structures designed to support computational modeling, machine learning, structural biology research, and available in ML-ready formats. This release includes dataset curated by using the SNAC-DB pipeline (https://github.com/Sanofi-Public/SNAC-DB) on protein structures sourced from the RCSB PDB (https://www.rcsb.org/), as well as a benchmarking dataset for evaluation.
At the moment, we have processed all PDBs released up until 30 April, 2025 and latest deposit date of 31 March, 2025.
Files
README.md
Additional details
Software
- Repository URL
- https://github.com/Sanofi-Public/SNAC-DB
- Programming language
- Python, Jupyter Notebook
- Development Status
- Active
References
- H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.
- Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank (2025) Nucleic Acids Research 53 D564–D574 https://doi.org/10.1093/nar/gkae1091
- H.M. Berman, K. Henrick, H. Nakamura Announcing the worldwide Protein Data Bank (2003) Nature Structural Biology 10:980 https://doi.org/10.1038/nsb1203-980.
- van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., and Steinegger, M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology, doi:10.1038/s41587-023-01773-0 (2023)
- Dunbar, J., & Deane, C. (2015). ANARCI: antigen receptor numbering and receptor classification. Bioinformatics, 32(2), 298–300.
- Steinegger, M. and Söding, J., (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11), pp.1026-1028.