Published May 16, 2024 | Version v1
Dataset Restricted

MDD-Molecular Dynamics Dataset: Collection of protein-ligand complex simulations

  • 1. ROR icon Institute of Biochemistry and Biophysics, Polish Academy of Sciences

Description

Dataset is part of the paper: https://chemrxiv.org/engage/chemrxiv/article-details/664c73f6418a5379b0de8152.

This dataset consists of molecular dynamics (MD) simulations of 862 unique protein-ligand complexes, covering a wide range of protein families and diverse chemical classes of ligands. It is derived from publicly available repositories and represents the largest single source of MD simulations to date.

All protein-ligand complexes included in the dataset were prepared following a standardized protocol. Missing atoms in the protein structures were added using the PDBFixer tool. The protein targets were parameterized using the AMBER99SB-ILDN force field, while ligands were parameterized with the ANTECHAMBER module within the ACPYPE tool. Ligand partial charges were determined to match the quantum-mechanically generated electrostatic potential via the Restrained Electrostatic Potential (RESP) method, and the remaining parameters were set using the GAFF2 force field. The molecular dynamics simulations were performed using GROMACS. The simulations were configured in a cubic simulation box with periodic boundary conditions and employed a TIP3P water model within an electrostatically neutral environment. The simulation protocol included an initial minimization cycle, followed by temperature equilibration in the NVT ensemble and pressure equilibration in the NPT ensemble. Production simulations were conducted over a period of 200 ns, with a timestep of 100 ps.

Constructing a large, representative set of MD simulations poses challenges due to the high computational costs and complexities associated with preparing molecular systems. Moreover, given the limited number of suitable training examples (complexes) and the large volume of MD data from each simulation, careful filtering and feature selection are crucial. This dataset is valuable for exploring how molecular dynamics simulation data can be integrated with protein-ligand binding affinity prediction tasks, an essential component of in silico drug discovery pipelines. MD simulations, in particular, offer a dynamic view by illustrating the temporal interactions within protein-ligand complexes, potentially providing additional insights for affinity and specificity estimates.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Funding

National Science Centre
Assessing protein-ligand complexes with ML/DL models trained on molecular dynamics-based descriptors. 2020/39/B/ST4/02747

Software

Repository URL
https://github.com/JPoziemski/md_for_affinity_prediction
Programming language
Python
Development Status
Active