Published July 27, 2025 | Version v3
Dataset Open

Molecular Dynamics Simulation Data for LD_FPG: Generative Modeling of Full-Atom Dopamine D2 Receptor Conformations

Authors/Creators

Description

MD Simulation Datasets for GPCRs

This repository contains Molecular Dynamics (MD) simulation data for four G-Protein Coupled Receptors (GPCRs):

  • Dopamine D2 Receptor (D2R)
  • Dopamine D1 Receptor (D1R)
  • Adenosine A2A Receptor (A2AR)
  • Beta-1 Adrenergic Receptor (B1AR)

All simulations were prepared and processed for use with the machine learning model described in the paper "Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings." Each directory contains the necessary files to reproduce or analyze the simulation trajectories.

Repository Structure

The archive is organized into two main directories:

1. MD_simulation_data

Contains GROMACS simulation files for each individual run. Within this directory, each replica (e.g., run1/ and run6/) has two subdirectories:

  • input_files/: All starting structures (PDB format), topology files (.top), index files (.ndx), and MD parameter files (.mdp) covering every simulation stage
  • production_run/: Outputs from the production phase, including the GROMACS portable run input file (.tpr) and processed trajectory file (.xtc)
  • toppar/: Custom force field parameters (if applicable)

2. Processed_ML_Input_JSON

Houses the principal JSON file used for machine learning input (e.g., final_combined.json or my_protein.json). This file aggregates aligned heavy-atom coordinates and dihedral angles for 12,241 frames sampled from a representative D2R trajectory and served as training data for the LD-FPG generative model.

System Details

Dopamine D2 Receptor (apo_d2_inv_start)

  • Protein: Human Dopamine D2 Receptor (D2R)
  • System: Apo (ligand-free) receptor in inactive state
  • Starting Structure: Based on PDB ID 6CM4, with third intracellular loop (ICL3) remodeled
  • Force Field: CHARMM36m
  • MD Software: GROMACS version 2024.2
  • Protocol: Energy minimization → multi-step equilibration → 2-microsecond production run under NPT conditions
  • Replicas: run1 and run6 directories contain independent simulation replicas

Dopamine D1 Receptor (apo_d1)

  • Protein: Human Dopamine D1 Receptor (D1R)
  • System: Apo (ligand-free) receptor
  • Starting Structure: TODO: Add PDB ID or reference for the starting model
  • Simulation Details: run1 directory contains primary simulation data following similar protocol

Adenosine A2A Receptor (apo_A2AR)

  • Protein: Human Adenosine A2A Receptor (A2AR)
  • System: Apo (ligand-free) receptor
  • Starting Structure: TODO: Add PDB ID or reference for the starting model
  • Simulation Details: run1 directory contains primary simulation data following similar protocol

Beta-1 Adrenergic Receptor (apo_beta1)

  • Protein: Human Beta-1 Adrenergic Receptor (B1AR)
  • System: Apo (ligand-free) receptor
  • Starting Structure: TODO: Add PDB ID or reference for the starting model
  • Simulation Details: run1 directory contains primary simulation data following similar protocol

File Descriptions

Key Files in Each Run Directory

input_files/

  • system_begin.pdb - Complete system starting structure
  • protein_initial.pdb - Protein-only starting structure
  • *.top - Topology file
  • *.ndx - Index file
  • *.mdp - MD parameter files for all simulation stages
  • toppar/ - Custom force field parameters (if deviations from standard CHARMM36m were applied)

production_run/

  • production_run.tpr - GROMACS portable run input file
  • traj_protein_noPBC.xtc - Processed trajectory file (heavy atoms only, periodic boundary conditions removed)
  • step7_noPBC_prot.xtc - Alternative name for processed trajectory

Important Notes for Users

Full-System Trajectories

Full-system production trajectories (including membrane, solvent, and other components, commonly named step7_1.xtc) are excluded from this archive due to large file sizes (approximately 14-15 GB each). Researchers interested in these full trajectories may request them from the corresponding authors.

Processing Instructions

  • To process full trajectories yourself, use the provided system_begin.pdb as the reference structure with processing scripts like extract_residues.py
  • To convert the supplied protein-only trajectory (traj_protein_noPBC.xtc) into JSON format required by the ML pipeline, use processing scripts such as extract_residues.py (available on GitHub) with a PDB file containing only protein heavy atoms (heavy_chain.pdb)

Simulation Replicates

The two replicas (run1 and run6) are included to illustrate simulation variability. Additional replicas can be obtained from the authors upon reasonable request.

Data Usage

The processed trajectory files retain only heavy atoms of the protein and have had periodic boundary conditions removed, making them suitable for direct analysis or further processing for machine learning applications.

Citation and Contact

For complete methodological details and further context, please refer to the main publication. For questions regarding full-system trajectories or additional replicas, contact the corresponding authors.

Files

A2AR.zip

Files (3.9 GB)

Name Size Download all
md5:5acabf227fa0f840e382b34071b671c4
747.3 MB Preview Download
md5:9d231c09f12888cd32368ffbcc2ee609
414.9 MB Preview Download
md5:e7268490e467cf72891eae386da30dc9
810.6 MB Preview Download
md5:860aa8f92a28a8fb66f049bf0817b3fe
1.9 GB Preview Download

Additional details

Dates

Available
2025-05-21

Software

Repository URL
https://anonymous.4open.science/r/LD-FPG-040A/
Programming language
Python