Published March 19, 2026 | Version v2

CHAMP: A Coupled Hierarchical Atom-Motif Predictor

Description

This repository provides the public PyTorch implementation of CHAMP (Coupled Hierarchical Atom-Motif Predictor) for molecular property prediction.

The code released here serves as the main maintained implementation accompanying the manuscript. It contains the core CHAMP model components, motif-construction modules, configuration utilities, and the public training entry point currently documented for the released pipeline.

Overview

CHAMP is a hierarchical graph neural network framework designed to combine:

  • fine-grained atomic structure,
  • coarse-grained motif semantics,
  • and motif-guided cross-scale fusion

within a unified molecular representation learning pipeline.

The framework is organized around three conceptual stages:

  1. Motif construction and structural encoding CHAMP builds motif-level representations on top of atom-level molecular graphs and models internal motif topology to preserve structural information.

  2. Function-aware motif refinement CHAMP refines motif embeddings through supervised contrastive constraints so that structurally similar motifs with different functional roles can be distinguished more effectively.

  3. Hierarchical atom-motif fusion CHAMP uses motif-level semantics to guide atom-level aggregation and performs cross-scale fusion through gating and inter-head interaction mechanisms.

The current public release focuses on the core modules and the main training workflow implemented in this repository.

Repository Scope

The released codebase includes:

  • the core model components in Model/,
  • motif extraction and motif-graph construction in motif_extract/,
  • shared helper utilities in utils/,
  • argument configuration in Args.py,
  • motif-aware dataset preparation in motif_spilit.py,
  • the main public training script in main_classification.py,
  • the dependency specification in requirements.txt.

Local folders such as dataset/best_model/.idea/, and __pycache__/ may appear in the working directory, but they should be interpreted as local resources or development artifacts rather than as the conceptual core of the released source implementation.

Repository Structure

The current directory structure of the released code is:

Code/
├── Args.py
├── main_classification.py
├── motif_spilit.py
├── overview.png
├── README.md
├── requirements.txt
├── Model/
│   ├── HMSAF.py
│   ├── atom_motif_attention.py
│   ├── contrastive_learning.py
│   └──  motif_embedding.py
└──  motif_extract/
    ├── mol_motif.py
    └──  motif_graph.py

For readers who only want to understand or reuse the main implementation, the primary source files are:

  • main_classification.py
  • Args.py
  • motif_spilit.py
  • Model/*.py
  • motif_extract/*.py

Installation

pip install -r requirements.txt

Main dependencies:

  • PyTorch (1.12.0+cu113)
  • PyTorch Geometric (2.6.1)
  • RDKit (2024.9.3)
  • scikit-learn (1.7.2)
  • UMAP-learn (0.5.7)

Usage

Parameter Configuration

Training parameters can be configured via Args.py:

  • --dataset: dataset name
  • --data_dir: dataset directory
  • --node_feature_dim: atom feature dimension
  • --edge_feature_dim: edge feature dimension
  • --hidden_dim: hidden representation dimension
  • --batch_size: batch size
  • --epochs: number of epochs
  • --lr: learning rate
  • --weight_decay: optimizer weight decay
  • --patience: scheduler patience
  • --factor: scheduler decay factor
  • --loss_fn: loss function option
  • --alpha: ring-level contrastive loss weight
  • --beta: non-ring contrastive loss weight
  • --Pair_MLP: whether to enable the pairwise motif encoder option
  • --is_contrastive: whether to enable contrastive learning
  • --use_Guide: whether to enable motif guidance
  • --use_gating: whether to enable contextual gating
  • --use_head_interaction: whether to enable inter-head interaction
  • --label_thresh_ratio: threshold ratio used in motif comparison
  • --save_dir: checkpoint directory
  • --log_dir: log directory
  • --device: execution device
  •  

Running Experiments

# Example for a classification task
python main_classification.py --dataset MUTAG --use_head_interaction True --use_gating True

Supported Datasets

The framework supports a wide range of datasets from MoleculeNet, including:

  • Regression Tasks: ESOL, FreeSolv, Lipophilicity.
  • Classification Tasks: MUTAG, HIV, BACE, Tox21.

Datasets are expected in a standard graph format, containing node features, edge connectivity, and molecular labels.

Files

requirements.txt

Files (6.2 MB)

Name Size Download all
md5:8588115f1a06bdef9be672386be4322f
2.7 kB Download
md5:2afc08dfb7344e50d151f2a943bc50d3
10.6 kB Download
md5:f2ac09a1d1a933d6c94b7c49833b0609
15.7 kB Download
md5:0e54f281969e6e7339fba9c20d488043
7.5 kB Download
md5:f55df2e67fe985660725fbbbb774af1d
34.0 kB Download
md5:16fed858697871992112765414377533
26.9 kB Download
md5:fceaadda80e4ef0b47d2a38070e9c639
13.4 kB Download
md5:098878c2e5af8066b402389a943ebc1c
17.0 kB Download
md5:2bf397dd7b210ae9c5380a9542cd3f40
9.0 kB Download
md5:145dac42c0182df74e57b731a48d70e0
6.1 MB Preview Download
md5:07425c6a9deb1715a9410313c789c570
7.4 kB Preview Download
md5:cc65cc0a54bccb92b567e6f01a6293f9
189 Bytes Preview Download
md5:e8647e53263935a3407b7c4074ba868a
8.8 kB Download