Published March 20, 2024 | Version v1
Dataset Open

Data from: Genome-scale annotation of protein binding sites via language model and geometric deep learning

  • 1. ROR icon Sun Yat-sen University
  • 2. Sun Yat-Sen University

Description

The dataset contains the training and test sets of protein binding sites with DNA, RNA, peptide, protein, ATP, HEM, Zn2+, Ca2+, Mg2+ and Mn2+. Each protein is associated with 3 lines indicating the protein name (PDB accession code and chain), sequence and residue labels (0 for non-binding and 1 for binding), respectively. The ESMFold-predicted structures are also provided.

Files

GPSite_dataset.zip

Files (482.2 MB)

Name Size Download all
md5:a9f311ae1ad580d4eed26639a6d43941
482.2 MB Preview Download

Additional details

Software

Repository URL
https://github.com/biomed-AI/GPSite
Programming language
Python