Published May 27, 2025 | Version v3
Dataset Open

CovDocker

Description

The preprocessed CovDocker dataset for paper "CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions" with associated code at https://github.com/PoloWitty/CovDocker.

The dataset files are saved as lmdb file for the convenience of use.

File structure:

processed
    ├── bonded
    │   ├── 1A0L
    │   │     ├── 1A0L_10Apocket.pdb
    │   │     ├── 1A0L_5Apocket.pdb
    │   │     ├── 1A0L_8Apocket.pdb
    │   │     ├── 1A0L_chain_within_10A.pdb
    │   │     ├── 1A0L_ligand.pdb # ligand part from original complex pdb file
    │   │     ├── 1A0L_ligand.sdf # ligand pdb structure aligned with coresponding SMILES
    │   │     └── 1A0L_protein.pdb # protein part from original complex pdb file
    .....
    │   └── 9XIA
    ├── dataset # lmdb files used for deep learning model
    │   ├── docking
    │   ├── reaction
    │   └── reactive_site
    ├── dataset.csv # used for task2 and task3 (n=2754)
    ├── dataset.filtered.csv # used for task1 (n=2717)
    ├── dataset.filtered.unseen.csv # used for task1 unseen test set (67 unseen test samples)
    ├── dataset.unseen.csv # used for task2 and task3 unseen test set (68 unseen test samples)
    └── pdb2mechanism.csv # (n=2754)


Paper Abstract:

Molecular docking plays a crucial role in predicting the binding mode of ligands to target proteins, and covalent interactions, which involve the formation of a covalent bond between the ligand and the target, are particularly valuable due to their strong, enduring binding nature. However, most existing docking methods and deep learning approaches hardly account for the formation of covalent bonds and the associated structural changes. To address this gap, we introduce a comprehensive benchmark for covalent docking, CovDocker, which is designed to better capture the complexities of covalent binding. We decompose the covalent docking process into three main tasks: reactive location prediction, covalent reaction prediction, and covalent docking. By adapting state-of-the-art models, such as Uni-Mol and Chemformer, we establish baseline performances and demonstrate the effectiveness of the benchmark in accurately predicting interaction sites and modeling the molecular transformations involved in covalent binding. These results confirm the role of the benchmark as a rigorous framework for advancing research in covalent drug design. It underscores the potential of data-driven approaches to accelerate the discovery of selective covalent inhibitors and addresses critical challenges in therapeutic development. Our code is available at https://github.com/PoloWitty/CovDocker.

Files

covDocker_data.zip

Files (708.3 MB)

Name Size Download all
md5:c0f15012216fdb5a4a1a90f4bb672f02
708.3 MB Preview Download

Additional details