Molecular Dynamics Simulation Data for LD_FPG: Generative Modeling of Full-Atom Dopamine D2 Receptor Conformations

Anonymous

doi:10.5281/zenodo.16582853

Published July 27, 2025 | Version v3

Dataset Open

Molecular Dynamics Simulation Data for LD_FPG: Generative Modeling of Full-Atom Dopamine D2 Receptor Conformations

Anonymous

MD Simulation Datasets for GPCRs

This repository contains Molecular Dynamics (MD) simulation data for four G-Protein Coupled Receptors (GPCRs):

Dopamine D2 Receptor (D2R)
Dopamine D1 Receptor (D1R)
Adenosine A2A Receptor (A2AR)
Beta-1 Adrenergic Receptor (B1AR)

All simulations were prepared and processed for use with the machine learning model described in the paper "Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings." Each directory contains the necessary files to reproduce or analyze the simulation trajectories.

Repository Structure

The archive is organized into two main directories:

1. MD_simulation_data

Contains GROMACS simulation files for each individual run. Within this directory, each replica (e.g., run1/ and run6/) has two subdirectories:

input_files/: All starting structures (PDB format), topology files (.top), index files (.ndx), and MD parameter files (.mdp) covering every simulation stage
production_run/: Outputs from the production phase, including the GROMACS portable run input file (.tpr) and processed trajectory file (.xtc)
toppar/: Custom force field parameters (if applicable)

2. Processed_ML_Input_JSON

Houses the principal JSON file used for machine learning input (e.g., final_combined.json or my_protein.json). This file aggregates aligned heavy-atom coordinates and dihedral angles for 12,241 frames sampled from a representative D2R trajectory and served as training data for the LD-FPG generative model.

System Details

Dopamine D2 Receptor (apo_d2_inv_start)

Protein: Human Dopamine D2 Receptor (D2R)
System: Apo (ligand-free) receptor in inactive state
Starting Structure: Based on PDB ID 6CM4, with third intracellular loop (ICL3) remodeled
Force Field: CHARMM36m
MD Software: GROMACS version 2024.2
Protocol: Energy minimization → multi-step equilibration → 2-microsecond production run under NPT conditions
Replicas: run1 and run6 directories contain independent simulation replicas

Dopamine D1 Receptor (apo_d1)

Protein: Human Dopamine D1 Receptor (D1R)
System: Apo (ligand-free) receptor
Starting Structure: TODO: Add PDB ID or reference for the starting model
Simulation Details: run1 directory contains primary simulation data following similar protocol

Adenosine A2A Receptor (apo_A2AR)

Protein: Human Adenosine A2A Receptor (A2AR)
System: Apo (ligand-free) receptor
Starting Structure: TODO: Add PDB ID or reference for the starting model
Simulation Details: run1 directory contains primary simulation data following similar protocol

Beta-1 Adrenergic Receptor (apo_beta1)

Protein: Human Beta-1 Adrenergic Receptor (B1AR)
System: Apo (ligand-free) receptor
Starting Structure: TODO: Add PDB ID or reference for the starting model
Simulation Details: run1 directory contains primary simulation data following similar protocol

File Descriptions

Key Files in Each Run Directory

input_files/

system_begin.pdb - Complete system starting structure
protein_initial.pdb - Protein-only starting structure
*.top - Topology file
*.ndx - Index file
*.mdp - MD parameter files for all simulation stages
toppar/ - Custom force field parameters (if deviations from standard CHARMM36m were applied)

production_run/

production_run.tpr - GROMACS portable run input file
traj_protein_noPBC.xtc - Processed trajectory file (heavy atoms only, periodic boundary conditions removed)
step7_noPBC_prot.xtc - Alternative name for processed trajectory

Important Notes for Users

Full-System Trajectories

Full-system production trajectories (including membrane, solvent, and other components, commonly named step7_1.xtc) are excluded from this archive due to large file sizes (approximately 14-15 GB each). Researchers interested in these full trajectories may request them from the corresponding authors.

Processing Instructions

To process full trajectories yourself, use the provided system_begin.pdb as the reference structure with processing scripts like extract_residues.py
To convert the supplied protein-only trajectory (traj_protein_noPBC.xtc) into JSON format required by the ML pipeline, use processing scripts such as extract_residues.py (available on GitHub) with a PDB file containing only protein heavy atoms (heavy_chain.pdb)

Simulation Replicates

The two replicas (run1 and run6) are included to illustrate simulation variability. Additional replicas can be obtained from the authors upon reasonable request.

Data Usage

The processed trajectory files retain only heavy atoms of the protein and have had periodic boundary conditions removed, making them suitable for direct analysis or further processing for machine learning applications.

Citation and Contact

For complete methodological details and further context, please refer to the main publication. For questions regarding full-system trajectories or additional replicas, contact the corresponding authors.

Files

A2AR.zip

Files (3.9 GB)

Name	Size	Download all
A2AR.zip md5:5acabf227fa0f840e382b34071b671c4	747.3 MB	Preview Download
B1AR.zip md5:9d231c09f12888cd32368ffbcc2ee609	414.9 MB	Preview Download
D1.zip md5:e7268490e467cf72891eae386da30dc9	810.6 MB	Preview Download
D2R_MD_runs.zip md5:860aa8f92a28a8fb66f049bf0817b3fe	1.9 GB	Preview Download

Additional details

Available: 2025-05-21

Repository URL: https://anonymous.4open.science/r/LD-FPG-040A/
Programming language: Python

	All versions	This version
Views	237	52
Downloads	120	78
Data volume	198.7 GB	124.1 GB

Molecular Dynamics Simulation Data for LD_FPG: Generative Modeling of Full-Atom Dopamine D2 Receptor Conformations

Authors/Creators

Description

MD Simulation Datasets for GPCRs

Repository Structure

1. MD_simulation_data

2. Processed_ML_Input_JSON

System Details

Dopamine D2 Receptor (apo_d2_inv_start)

Dopamine D1 Receptor (apo_d1)

Adenosine A2A Receptor (apo_A2AR)

Beta-1 Adrenergic Receptor (apo_beta1)

File Descriptions

Key Files in Each Run Directory

Important Notes for Users

Full-System Trajectories

Processing Instructions

Simulation Replicates

Data Usage

Citation and Contact

Files

A2AR.zip

Files (3.9 GB)

Additional details

Dates

Software