Published March 7, 2024 | Version v1
Dataset Open

Graph Attention Site Prediction (GrASP) HOLO4K Dataset

  • 1. EDMO icon University of Maryland

Description

The modified version of the HOLO4K dataset that was used for GrASP evaluation.

  1. unprocessed_pdb.zip: PDB structures from the original HOLO4k dataset.
  2. split_pdb.zip: PDB structures split into individual chains, except in the case of interfacial binders, where the full interface is retained.
  3. ready_to_parse_mol2.zip: Protein and ligand structures after our additional processing was applied.
  4. raw.zip: NumPy arrays of the features used to construct PyTorch Geometric graphs.
  5. processed.zip: Processed protein graphs used as graph neural network inputs.
  6. mol2.zip: Protein(s) with hydrogens removed and atoms renumbered accordingly. Indices match the node feature order in the NumPy and PyTorch files.
  7. holo4k(mlig)_uniprot.pkl: Pickle containing UniProt ID(s) for each receptor, used to define train/test splits.

Files

ready_to_parse_mol2.zip

Files (17.5 GB)

Name Size Download all
md5:ac4f62ee5ba6dda4fcd97b732e01f09b
68.7 kB Download
md5:675a150408e035f0667f7535e5cf4b2e
311.8 MB Preview Download
md5:fa811fa4180fc5918b0330e1998979b4
1.8 GB Preview Download
md5:ccba0213e4535a12794139ff8ce7011c
14.1 GB Preview Download
md5:02fed1f238691f745b2c3b175c072056
641.8 MB Preview Download
md5:3139414cd76abdeaf457038db61f0e86
262.5 MB Preview Download
md5:23986d00b26902420a50ef2e4fdf4a13
458.7 MB Preview Download

Additional details

Related works

Is supplement to
Publication: https://pubs.acs.org/doi/10.1021/acs.jcim.3c01698 (URL)

Software

Repository URL
https://github.com/tiwarylab/GrASP
Programming language
Python