Scaffold protein database generated for the purpose of the publication "De novo design of site-specific protein interactions with learned surface fingerprints" (DOI : 10.1101/2022.06.16.496402)

The scaffold database is composed of globular structure ranging from 30 to 100 amino acid originating from:
1) 301 structures from the AlphaFold2 proteome prediction database
2) 2049 structures from the Protein Data Bank (PDB)
3) 1997 structures from 2 publications introducing "miniprotein" designs

Citations:
1)Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).
2)Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
3)Rocklin Gabriel J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).
  Bhardwaj, G. et al. Accurate de novo design of hyperstable constrained peptides. Nature 538, 329–335 (2016).
