There is a newer version of the record available.

Published May 21, 2025 | Version v1
Dataset Open

Molecular Dynamics Simulation Data for LD_FPG: Generative Modeling of Full-Atom Dopamine D2 Receptor Conformations

Authors/Creators

Description

This dataset comprises selected molecular dynamics (MD) simulation data for the human dopamine D₂ receptor (D2R) in an apo-like state, specifically focusing on replicas “run1” and “run6” drawn from a larger ensemble of simulations. The data serve to support the findings presented in the paper “Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings.”

The archive is organized into two main directories. The first, MD_simulation_data, contains GROMACS simulation files for each individual run. Within this directory, each replica (for example, run1/ and run6/) has two subdirectories:

  • input_files holds all starting structures (in PDB format, such as system_begin.pdb for the complete system and protein_initial.pdb for the protein alone), the topology file (.top), index file (.ndx), and MD parameter files (.mdp) covering every stage of the simulation (energy minimization, several equilibration steps, and production). If any deviations from the standard CHARMM36m force field were applied, the relevant parameters are located in the toppar/ subfolder.

  • production_run contains the outputs from the production phase of the MD simulation. These include the GROMACS portable run input file (production_run.tpr) and a processed trajectory file (traj_protein_noPBC.xtc), which retains only the heavy atoms of the protein and has had periodic boundary conditions removed.

The second directory, Processed_ML_Input_JSON, houses the principal JSON file used for machine-learning input (for example, final_combined.json or my_protein.json as cited in the publication). This file aggregates aligned heavy-atom coordinates and dihedral angles for 12,241 frames sampled from a representative D2R trajectory and served as the training data for the LD-FPG generative model described in the paper.

An overview of the simulation details is as follows:

  • Protein: Human dopamine D₂ receptor (D2R), modeled on PDB entry 6CM4 with the intracellular loop 3 remodeled.

  • Force Field: CHARMM36m.

  • MD Software: GROMACS version 2024.2.

  • Protocol: Each replica underwent an energy minimization, a multi-step equilibration, and a 2-microsecond production run under NPT conditions. The trajectory file traj_protein_noPBC.xtc represents the processed, protein-only portion of the production phase.

A few notes for users:

  1. Full-system production trajectories—including membrane, solvent, and other components (commonly named step7_1.xtc in the original directories)—are excluded from this archive because of their large file size (approximately 14–15 GB each). Researchers interested in these full trajectories may request them from the corresponding authors. If you wish to process a full trajectory yourself (using, for example, the authors’ extract_residues.py script), you can use the provided system_begin.pdb as the reference structure.

  2. To convert the supplied protein-only trajectory (traj_protein_noPBC.xtc) into the JSON format required by the ML pipeline, you would typically run a processing script such as extract_residues.py (available on github) together with a PDB file containing only the protein heavy atoms ( heavy_chain.pdb).

  3. The two replicas, run1 and run6, are included to illustrate simulation variability. Additional replicas can be obtained from the authors upon reasonable request.

For complete methodological details and further context, please refer to the main publication.

Files

D2R_MD_runs.zip

Files (1.9 GB)

Name Size Download all
md5:860aa8f92a28a8fb66f049bf0817b3fe
1.9 GB Preview Download

Additional details

Dates

Available
2025-05-21

Software

Repository URL
https://anonymous.4open.science/r/LD-FPG-040A/
Programming language
Python