Molecular Dynamics Simulation Data for LD_FPG: Generative Modeling of Full-Atom Dopamine D2 Receptor Conformations
Authors/Creators
Description
MD Simulation Datasets for GPCRs
This repository contains Molecular Dynamics (MD) simulation data for four G-Protein Coupled Receptors (GPCRs):
- Dopamine D2 Receptor (D2R)
- Dopamine D1 Receptor (D1R)
- Adenosine A2A Receptor (A2AR)
- Beta-1 Adrenergic Receptor (B1AR)
All simulations were prepared and processed for use with the machine learning model described in the paper "Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings." Each directory contains the necessary files to reproduce or analyze the simulation trajectories.
Repository Structure
The archive is organized into two main directories:
1. MD_simulation_data
Contains GROMACS simulation files for each individual run. Within this directory, each replica (e.g., run1/ and run6/) has two subdirectories:
- input_files/: All starting structures (PDB format), topology files (.top), index files (.ndx), and MD parameter files (.mdp) covering every simulation stage
- production_run/: Outputs from the production phase, including the GROMACS portable run input file (.tpr) and processed trajectory file (.xtc)
- toppar/: Custom force field parameters (if applicable)
2. Processed_ML_Input_JSON
Houses the principal JSON file used for machine learning input (e.g., final_combined.json or my_protein.json). This file aggregates aligned heavy-atom coordinates and dihedral angles for 12,241 frames sampled from a representative D2R trajectory and served as training data for the LD-FPG generative model.
System Details
Dopamine D2 Receptor (apo_d2_inv_start)
- Protein: Human Dopamine D2 Receptor (D2R)
- System: Apo (ligand-free) receptor in inactive state
- Starting Structure: Based on PDB ID 6CM4, with third intracellular loop (ICL3) remodeled
- Force Field: CHARMM36m
- MD Software: GROMACS version 2024.2
- Protocol: Energy minimization → multi-step equilibration → 2-microsecond production run under NPT conditions
- Replicas:
run1andrun6directories contain independent simulation replicas
Dopamine D1 Receptor (apo_d1)
- Protein: Human Dopamine D1 Receptor (D1R)
- System: Apo (ligand-free) receptor
- Starting Structure: TODO: Add PDB ID or reference for the starting model
- Simulation Details:
run1directory contains primary simulation data following similar protocol
Adenosine A2A Receptor (apo_A2AR)
- Protein: Human Adenosine A2A Receptor (A2AR)
- System: Apo (ligand-free) receptor
- Starting Structure: TODO: Add PDB ID or reference for the starting model
- Simulation Details:
run1directory contains primary simulation data following similar protocol
Beta-1 Adrenergic Receptor (apo_beta1)
- Protein: Human Beta-1 Adrenergic Receptor (B1AR)
- System: Apo (ligand-free) receptor
- Starting Structure: TODO: Add PDB ID or reference for the starting model
- Simulation Details:
run1directory contains primary simulation data following similar protocol
File Descriptions
Key Files in Each Run Directory
input_files/
system_begin.pdb- Complete system starting structureprotein_initial.pdb- Protein-only starting structure*.top- Topology file*.ndx- Index file*.mdp- MD parameter files for all simulation stagestoppar/- Custom force field parameters (if deviations from standard CHARMM36m were applied)
production_run/
production_run.tpr- GROMACS portable run input filetraj_protein_noPBC.xtc- Processed trajectory file (heavy atoms only, periodic boundary conditions removed)step7_noPBC_prot.xtc- Alternative name for processed trajectory
Important Notes for Users
Full-System Trajectories
Full-system production trajectories (including membrane, solvent, and other components, commonly named step7_1.xtc) are excluded from this archive due to large file sizes (approximately 14-15 GB each). Researchers interested in these full trajectories may request them from the corresponding authors.
Processing Instructions
- To process full trajectories yourself, use the provided
system_begin.pdbas the reference structure with processing scripts likeextract_residues.py - To convert the supplied protein-only trajectory (
traj_protein_noPBC.xtc) into JSON format required by the ML pipeline, use processing scripts such asextract_residues.py(available on GitHub) with a PDB file containing only protein heavy atoms (heavy_chain.pdb)
Simulation Replicates
The two replicas (run1 and run6) are included to illustrate simulation variability. Additional replicas can be obtained from the authors upon reasonable request.
Data Usage
The processed trajectory files retain only heavy atoms of the protein and have had periodic boundary conditions removed, making them suitable for direct analysis or further processing for machine learning applications.
Citation and Contact
For complete methodological details and further context, please refer to the main publication. For questions regarding full-system trajectories or additional replicas, contact the corresponding authors.
Files
A2AR.zip
Additional details
Dates
- Available
-
2025-05-21
Software
- Repository URL
- https://anonymous.4open.science/r/LD-FPG-040A/
- Programming language
- Python