Published March 24, 2025 | Version v4
Dataset Open

Datasets for AGIMA-Score modeling - latest

  • 1. Hong Kong Metropolitan University

Description

This data repositary includes the following datasets.

(1) 'training.zip' -- It is a secondary dataset that originates from the Refined Set in PDBbind database (version V2020). When training models like AGIMA-Score, the complexes in the validation/test sets need to be removed. 5007 complexes are included.

(2) 'validation.zip' -- It is a secondary dataset that originates from the Core Set (CASF-2016) in PDBbind database (version V2020). It was used as the validation set (for parameter tuning) when building the AGIMA-Score models. Complexes that are similar to those in the training set  (protein sequence similarity > 0.3 and ligand similarity > 0.7) were removed. 195 complexes are included.

(3) 'test1.zip' -- It is a secondary dataset that originates from CSAR-HiQ1. It was used as the Test1 set for evaluating the AGIMA-Score models. Complexes that are similar to those in the training/validation sets  (protein sequence similarity > 0.3 and ligand similarity > 0.7) were removed. 116 complexes are included.

(4) 'test2.zip' -- It is a secondary dataset that originates from CSAR-HiQ2. It was used as the Test2 set for evaluating the AGIMA-Score models. Complexes that are similar to those in the training/validation/Test1 sets  (protein sequence similarity > 0.3 and ligand similarity > 0.7) were removed. 102 complexes are included.

(5) 'indexes.zip' -- It includes the labels (binding affinity data) for the complexes in above (1)~(4) sets.

A file 'xxxx_atm_prop.txt' indicates a specific protein-ligand complex in above sets, with 'xxxx' denoting the original complex ID in PDBbind and the data fields showing the following information. Note that here each row in such as file indicates an atom in the binding complex.

--------------------------------------------------------------------------------
id - atom id with protein atoms starting from 1 and ligand atoms also starting from 1 (integer)

atmnum - atomic number (integer)

x,y,z - the X, Y, Z coordinates for the atom (float)

atmB,atmC,atmN,atmO,atmP,atmS,atmSe - whether the atom is of some specific type, such as B, C, N, O, P, S and Se (binary)

atmHalogen - whether the atom is a halogen atom (binary)

atmMetal,atmMetallic - whether the atom is metal (binary)

hybridization - hybridization type of the atom (integer)

heavyneighbors - number of heavy-atom neighbors (integer)

heteroneighbors - number of hetero-atom neighbors (integer)

hydrophobic,aromatic,acceptor,donor,ring - pharmacophoric properties of the atom (binary)

partialCH - paricial charge of the atom (float)

posionizable,negionizable - whether the atom is positively ionizable or negatively ionizable (binary)

exlvolume - excluded volume of the atom (float)

vdwrad - VDW radius of the atom (float)

moltype - molecule the atom belongs to (0 for protein and 1 for ligand)

"neighbors(nbr:idx--anum--(sbond,dbond,tbond,arombond,ringbond))" - information of the covalent neighboring atoms for the atom
--------------------------------------------------------------------------------

(6) 'docker.zip' -- A Docker container with the trained AGIMA-Score18 model pre-installed.

--------------------------------------------------------------------------------

trained_mdl.keras - the trained AGIMA-Score18 model in keras format

Dockerfile-genpropfile and Dockerfile - Docker files for generating the property files from a protein and a ligand file (Dockerfile-genpropfile) and making predictions for an input property file (Dockerfile)

requirements-genpropfile.txt and requirements.txt - required packages and their versions for generating the models

app-genpropfile.py and app.py - the application files to run the models (indicated in the Docker files)

1a9m_ligand.pdb, 1a9m_protein.pdb, 1ax1_atm_prop.txt - example files for demonstration purposes

NOTE.txt - this file shows how to run the Docker containers and use the API

--------------------------------------------------------------------------------

(7) 'predictions_byAGIMAscore18.zip' -- It includes the predictions generated by the AGIMA-Score18 model for the validation and test sets. Three files ('predictions_validation.csv', 'predictions_test1.csv', and 'predictions_test2.csv',) are included.

Files

docker.zip

Files (832.0 MB)

Name Size Download all
md5:a62faa6ea5ed66ffec73b7b70d503e93
821.2 kB Preview Download
md5:06810cc73049e3ad06b74a8cbafe99d7
22.9 kB Preview Download
md5:dd5e5dbbf104909e8591e15481d6d537
5.9 kB Preview Download
md5:8f4874bc9a2d056dcdf28b704efc1d81
21.3 MB Preview Download
md5:66275c594866c90ba6cef06145f90414
17.6 MB Preview Download
md5:97153f2c65cd5b62e31100a06be678b2
759.7 MB Preview Download
md5:b6f69636f9e6301756c7fd04dab871c5
32.6 MB Preview Download