Molecular Dynamics Simulation Data for LD_FPG: Generative Modeling of Full-Atom Dopamine D2 Receptor Conformations
Authors/Creators
Description
This dataset comprises selected molecular dynamics (MD) simulation data for the human dopamine D₂ receptor (D2R) in an apo-like state, specifically focusing on replicas “run1” and “run6” drawn from a larger ensemble of simulations. The data serve to support the findings presented in the paper “Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings.”
The archive is organized into two main directories. The first, MD_simulation_data, contains GROMACS simulation files for each individual run. Within this directory, each replica (for example, run1/ and run6/) has two subdirectories:
-
input_files holds all starting structures (in PDB format, such as
system_begin.pdbfor the complete system andprotein_initial.pdbfor the protein alone), the topology file (.top), index file (.ndx), and MD parameter files (.mdp) covering every stage of the simulation (energy minimization, several equilibration steps, and production). If any deviations from the standard CHARMM36m force field were applied, the relevant parameters are located in thetoppar/subfolder. -
production_run contains the outputs from the production phase of the MD simulation. These include the GROMACS portable run input file (
production_run.tpr) and a processed trajectory file (traj_protein_noPBC.xtc), which retains only the heavy atoms of the protein and has had periodic boundary conditions removed.
The second directory, Processed_ML_Input_JSON, houses the principal JSON file used for machine-learning input (for example, final_combined.json or my_protein.json as cited in the publication). This file aggregates aligned heavy-atom coordinates and dihedral angles for 12,241 frames sampled from a representative D2R trajectory and served as the training data for the LD-FPG generative model described in the paper.
An overview of the simulation details is as follows:
-
Protein: Human dopamine D₂ receptor (D2R), modeled on PDB entry 6CM4 with the intracellular loop 3 remodeled.
-
Force Field: CHARMM36m.
-
MD Software: GROMACS version 2024.2.
-
Protocol: Each replica underwent an energy minimization, a multi-step equilibration, and a 2-microsecond production run under NPT conditions. The trajectory file
traj_protein_noPBC.xtcrepresents the processed, protein-only portion of the production phase.
A few notes for users:
-
Full-system production trajectories—including membrane, solvent, and other components (commonly named
step7_1.xtcin the original directories)—are excluded from this archive because of their large file size (approximately 14–15 GB each). Researchers interested in these full trajectories may request them from the corresponding authors. If you wish to process a full trajectory yourself (using, for example, the authors’extract_residues.pyscript), you can use the providedsystem_begin.pdbas the reference structure. -
To convert the supplied protein-only trajectory (
traj_protein_noPBC.xtc) into the JSON format required by the ML pipeline, you would typically run a processing script such asextract_residues.py(available on github) together with a PDB file containing only the protein heavy atoms (heavy_chain.pdb). -
The two replicas, run1 and run6, are included to illustrate simulation variability. Additional replicas can be obtained from the authors upon reasonable request.
For complete methodological details and further context, please refer to the main publication.
Files
D2R_MD_runs.zip
Files
(1.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:860aa8f92a28a8fb66f049bf0817b3fe
|
1.9 GB | Preview Download |
Additional details
Dates
- Available
-
2025-05-21
Software
- Repository URL
- https://anonymous.4open.science/r/LD-FPG-040A/
- Programming language
- Python