SNAC-DB: Structural NANOBODY® (VHH) and Antibody (VH-VL) Complex Database

Gupta, Abhinav; Rivero, Bryan Munoz; Touris, Jorge; Li, Ruijiang; Furtmann, Norbert; Fomekong Nanfack, Yves; Wendt, Maria; Qiu, Yu

doi:10.5281/zenodo.16226208

Published July 20, 2025 | Version v1.1

Dataset Open

SNAC-DB: Structural NANOBODY® (VHH) and Antibody (VH-VL) Complex Database

1. Large Molecule Research, Sanofi, Cambridge, MA, United States
2. Department of Chemical and Biomolecular Engineering, Johns Hopkins University, MD, United States
3. Large Molecule Research, Sanofi, Frankfurt, Germany
4. R&D Data & Computational Science, Sanofi, Cambridge, MA, United States

Welcome to the SNAC-DB — a comprehensive and curated resource of antibody and NANOBODY® VHH structures designed to support computational modeling, machine learning, structural biology research, and available in ML-ready formats. This release includes dataset curated by using the SNAC-DB pipeline (https://github.com/Sanofi-Public/SNAC-DB) on protein structures sourced from the RCSB PDB (https://www.rcsb.org/), as well as a benchmarking dataset for evaluation.

At the moment, we have processed all PDBs released up until 30 April, 2025 and latest deposit date of 31 March, 2025.

Files

README.md

Files (12.6 GB)

Name	Size	Download all
LICENSE.txt md5:fa7b47339537d10829a3395feb5cfb6f	1.2 kB	Preview Download
README.md md5:732ab71380810d316da1acdeadd8a0bb	13.8 kB	Preview Download
SNAC-DataBase.zip md5:e2da351721ba9111c6c8b9e848c19000	12.6 GB	Preview Download

Additional details

Repository URL: https://github.com/Sanofi-Public/SNAC-DB
Programming language: Python, Jupyter Notebook
Development Status: Active

H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.
Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank (2025) Nucleic Acids Research 53 D564–D574 https://doi.org/10.1093/nar/gkae1091
H.M. Berman, K. Henrick, H. Nakamura Announcing the worldwide Protein Data Bank (2003) Nature Structural Biology 10:980 https://doi.org/10.1038/nsb1203-980.
van Kempen, M., Kim, S.S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C.L.M., Söding, J., and Steinegger, M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology, doi:10.1038/s41587-023-01773-0 (2023)
Dunbar, J., & Deane, C. (2015). ANARCI: antigen receptor numbering and receptor classification. Bioinformatics, 32(2), 298–300.
Steinegger, M. and Söding, J., (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 35(11), pp.1026-1028.

783

Views

193

Downloads

Show more details

	All versions	This version
Views	783	586
Downloads	193	130
Data volume	1.0 TB	691.1 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Conference

International Conference on Machine Learning (ICML) , Vancouver, Canada (Session Workshop on DataWorld: Unifying Data Curation Frameworks Across Domains)

Languages

English

License: Sanofi Opensource License

Non Commercial License Notice: Copyright (c) 2025 Sanofi. Permission is hereby granted, free of charge, for academic research purposes only and for non-commercial uses only, to any person from academic research or non-profit organizations obtaining a copy of this software and associated documentation files (the "Software"), to use, copy, modify, or merge the Software, subject to the following conditions: this permission notice shall be included in all copies of the Software or of substantial portions of the Software. For purposes of this license, “non-commercial use” excludes uses foreseeably resulting in a commercial benefit. To use this software for other purposes (such as the development of a commercial product, including but not limited to software, service, or pharmaceuticals, or in a collaboration with a private company), please contact SANOFI at patent.gos@sanofi.com. All other rights are reserved, including those for text and data mining, AI training and similar technologies. The Software is provided “as is”, without warranty of any kind, express or implied, including the warranties of noninfringement. The Software is registered.

Technical metadata

Created: July 20, 2025
Modified: July 31, 2025

SNAC-DB: Structural NANOBODY® (VHH) and Antibody (VH-VL) Complex Database

Creators

Description

Files

README.md

Files (12.6 GB)

Additional details

Software

References