There is a newer version of the record available.

Published October 2, 2025 | Version v1

CHAMP: A Coupled Hierarchical Atom-Motif Predictor

Description

This repository provides the official PyTorch implementation for CHAMP (Coupled Hierarchical Atom-Motif Predictor), a novel hierarchical Graph Neural Network framework designed to achieve state-of-the-art performance in molecular property prediction.

Introduction

CHAMP is engineered to address two central challenges in GNN-based molecular science: achieving Structural Completeness in motif representations and ensuring Functional Discriminability through context-aware learning. It systematically overcomes the limitations of conventional models by introducing a dynamic "guidance-fusion-regulation" process that enables true Synergistic Multi-scale Integration.

By operating on a dual-view representation of molecules (atom-level and motif-level graphs), CHAMP learns to generate molecular representations that are not only structurally faithful but also highly sensitive to functional context, leading to superior predictive accuracy and enhanced chemical interpretability.

Overall Architecture

The workflow of CHAMP unfolds in a three-stage hierarchical process designed to synergistically integrate fine-grained atomic details with coarse-grained functional semantics.

  1. Dual-view Graph Encoding for Structural Completeness: An input molecule is decomposed into two parallel graphical representations: a standard atom-level graph and a coarse-grained motif-level graph. The motif graph is processed by our innovative (b) Pairwise-Aggregating Bond-Aware Motif Encoder (PABME), which generates structurally complete motif representations by explicitly modeling their internal topology.

  2. Function-aware Refinement for Discriminability: The initial motif embeddings, while structurally sound, are refined to be function-aware. Our (c) Type-Domain-Label Constrained Contrastive Learning (TDL-CCL) module sculpts the embedding space using property-label supervision. It trains the model to distinguish between motifs that are structurally homologous yet functionally divergent, imbueing the representations with high discriminative power.

  3. Hierarchical Fusion and Molecular Prediction: Finally, the (d) Hierarchical Motif-Guided Synergistic Attention Framework (HMSAF) performs dynamic cross-scale fusion. It leverages high-level motif semantics to provide top-down guidance to atomic attention. Through a symbiotic process involving contextual gating and inter-head synergy, it proactively resolves informational conflicts, ensuring features are mutually refined. The resulting unified molecular fingerprint is fed into an MLP for end-to-end property prediction.

Core Innovative Components

1. PABME: Pairwise-Aggregating Bond-Aware Motif Encoder

  • Problem: Conventional models often treat motifs as a "bag-of-atoms," neglecting the critical semantics encoded in internal topology and chemical bonds.
  • Innovation: PABME explicitly models atom-bond-atom interactions. It employs a dual-attention aggregation strategy to capture both the importance of individual atoms and the relationships between atom pairs.
  • Outcome: Yields motif embeddings that faithfully preserve internal structural fidelity, achieving Structural Completeness.

2. TDL-CCL: Type-Domain-Label Constrained Contrastive Learning

  • Problem: The label-agnostic nature of conventional contrastive learning fails to differentiate between structurally similar motifs with distinct chemical functions.
  • Innovation: TDL-CCL introduces a novel (Type, Domain, Label) triplet constraint for sampling. It compels the model to discern the subtle structural variations that drive functional outcomes by contrasting structurally homologous but functionally divergent motifs.
  • Outcome: Sculpts an embedding space governed by chemical function rather than superficial topology, achieving Functional Discriminability.

3. HMSAF: Hierarchical Motif-Guided Synergistic Attention Framework

  • Problem: Naive fusion mechanisms (e.g., concatenation or pooling) lead to informational conflicts and redundancy between atomic and motif-level features.
  • Innovation: HMSAF implements a dynamic "guidance-fusion-regulation" process:
    • Guidance: Top-down semantic signals from motifs steer atomic-level attention.
    • Fusion & Regulation: A symbiotic fusion process featuring Contextual Gating and Inter-Head Synergistic Attention proactively modulates signals and forces cross-scale alignment.
  • Outcome: Achieves true Cross-scale Synergy, where features from different granularities are not merely combined but are mutually refined into a functionally coherent representation.

Project Structure

CHAMP/
├── Model/              # Model definitions (PABME, TDL-CCL, HMSAF, etc.)
├── motif_extract/      # Motif extraction and graph construction modules
├── Experiment/         # Experimental results and visualization outputs
├── dataset/            # Dataset files
├── best_model/         # Directory for saving best-performing models
├── *.py                # Main execution scripts
├── requirements.txt    # List of dependencies
└── Args.py             # Configuration for command-line arguments

Key Advantages

  1. State-of-the-Art Performance: Achieves leading results on 10 out of 11 challenging MoleculeNet benchmarks, demonstrating superior accuracy and generalizability.
  2. Chemical Interpretability: The hierarchical, motif-centric design ensures that the model's decision-making process aligns closely with established chemical principles.
  3. Synergistic Multi-scale Modeling: Moves beyond simple feature fusion to achieve true synergistic enhancement between atom- and motif-level information.
  4. Robustness and Flexibility: The modular architecture supports both regression and classification tasks and can be readily extended to new applications in drug discovery and materials science.

Installation

pip install -r requirements.txt

Main dependencies:

  • PyTorch (1.12.0+cu113)
  • PyTorch Geometric (2.6.1)
  • RDKit (2024.9.3)
  • scikit-learn (1.7.2)
  • UMAP-learn (0.5.7)

Usage

Parameter Configuration

The argument configuration is defined in Args.py.

Important arguments in the current release include:

  • --dataset: dataset name
  • --data_dir: dataset directory
  • --node_feature_dim: atom feature dimension
  • --edge_feature_dim: edge feature dimension
  • --hidden_dim: hidden representation dimension
  • --batch_size: batch size
  • --epochs: number of epochs
  • --lr: learning rate
  • --weight_decay: optimizer weight decay
  • --patience: scheduler patience
  • --factor: scheduler decay factor
  • --loss_fn: loss function option
  • --alpha: ring-level contrastive loss weight
  • --beta: non-ring contrastive loss weight
  • --Pair_MLP: whether to enable the pairwise motif encoder option
  • --is_contrastive: whether to enable contrastive learning
  • --use_Guide: whether to enable motif guidance
  • --use_gating: whether to enable contextual gating
  • --use_head_interaction: whether to enable inter-head interaction
  • --label_thresh_ratio: threshold ratio used in motif comparison
  • --save_dir: checkpoint directory
  • --log_dir: log directory
  • --device: execution device

Running Experiments

Example for a classification task:
python main_classification.py --dataset MUTAG --use_head_interaction True --use_gating True

Supported Datasets

The framework supports a wide range of datasets from MoleculeNet, including:

  • Regression Tasks: ESOL, FreeSolv, Lipophilicity.
  • Classification Tasks: HIV, BACE, Tox21, ClinTox, BBBP, SIDER, ToxCast

Datasets are expected in a standard graph format, containing node features, edge connectivity, and molecular labels.

Files

BBBP.csv

Files (14.4 MB)

Name Size Download all
md5:8588115f1a06bdef9be672386be4322f
2.7 kB Download
md5:2afc08dfb7344e50d151f2a943bc50d3
10.6 kB Download
md5:66286cb9e6b148bd75d80c870df580fb
148.7 kB Preview Download
md5:b4c5def3a0196f29a72c3b9e9b60d758
8.0 MB Download
md5:be19c9c0591209538dd737d040c65506
9.0 kB Download
md5:79aee512f53ecd5f733c76fc42f29ff5
7.5 kB Download
md5:b64f2d71c2831fb06b6a3329d607911b
31.4 kB Download
md5:af5c42980d2cf39492b60a37dc293188
29.1 kB Download
md5:16fed858697871992112765414377533
26.9 kB Download
md5:098878c2e5af8066b402389a943ebc1c
17.0 kB Download
md5:145dac42c0182df74e57b731a48d70e0
6.1 MB Download
md5:fceaadda80e4ef0b47d2a38070e9c639
13.4 kB Download
md5:f54b124b043348b1fd537d3e48a700e7
7.4 kB Preview Download
md5:cc65cc0a54bccb92b567e6f01a6293f9
189 Bytes Preview Download
md5:f2ac09a1d1a933d6c94b7c49833b0609
15.7 kB Download
md5:e8647e53263935a3407b7c4074ba868a
8.8 kB Download