Published October 31, 2025 | Version v2
Software Open

Brain Tumor Classification: Multi-Architecture Deep Learning with Knowledge Distillation

  • 1. ROR icon North South University

Description

Overview

This repository presents a comprehensive and clinically oriented deep learning framework for brain tumor classification from MRI images, integrating state-of-the-art Vision Transformers (ViTs), lightweight CNNs, hybrid CNN–Transformer models, and ensemble strategies. A key contribution is a knowledge distillation pipeline, where a large-capacity ViT teacher model transfers discriminative knowledge to multiple lightweight student models, including MobileNetV2 and EfficientNet-Lite0, enabling efficient and reliable deployment.

Beyond predictive accuracy, the framework emphasizes model trustworthiness and clinical reliability through prediction calibration and explainability-based faithfulness evaluation, ensuring that high accuracy corresponds to well-calibrated and interpretable predictions.

Dataset

Combined Public MRI Dataset

  • Sources: BRISC, GTS AI, Mendeley, Figshare, Zenodo

  • Classes (4):

    • No Tumor

    • Glioma

    • Meningioma

    • Pituitary Tumor

Data Split

  • Training: 20,000 images (balanced, 5,000 per class)

  • Validation: 2,142 images

  • Test: 2,311 images

Preprocessing

  • Image size: 224×224224 \times 224224×224

  • Augmentations: elastic deformation, random rotation and flipping, color jittering

  • Class balancing applied to reduce bias

Model Architectures

1. Vision Transformer (ViT) – Teacher Model (NeuroTriad-ViT)

  • Parameters: 235M

  • Architecture: 12 transformer encoder layers with multi-head self-attention

  • Patch size: 16×1616 \times 1616×16

  • Embedding dimension: 768

  • Role: High-capacity teacher for knowledge distillation

  • Test Accuracy: ~94.6%

  • Macro F1-score: ~0.93

2. MobileNetV2 – Student Model (Knowledge Distillation)

  • Parameters: ~2.4M

  • Training: Logit-based knowledge distillation from ViT teacher

  • Outcome: Retains teacher-level performance with ~98% parameter reduction

  • Strengths:

    • Strong alignment with teacher logits

    • Superior calibration and reliability

    • Best overall trade-off between accuracy and efficiency

3. EfficientNet-Lite0 – Student Model (Comparative Distillation)

  • Parameters: ~7.9M

  • Role: Lightweight student benchmark for comparison

  • Characteristics:

    • Higher capacity than MobileNetV2

    • Lower distillation alignment and calibration quality

    • Used to justify student model selection

4. Hybrid CNN–ViT Model

  • Backbone: EfficientNet-B0 + Vision Transformer

  • Feature fusion: CNN spatial features + transformer global context

  • Test Accuracy: 94.72%

  • Parameters: ~90.5M

5. Ensemble Models

  • EfficientNet-B2 + DenseNet121 (92.43%)

  • DenseNet121 + VGG16 (92.8%)

  • EfficientNet-B2 + VGG16 (92.0%)

Training Details

  • Optimizer: AdamW

  • Loss: Cross-Entropy (with distillation loss for student models)

  • Regularization: Dropout, weight decay

  • Learning Strategy: Warm-up followed by adaptive learning rate reduction

  • Early stopping applied to prevent overfitting

Reliability, Calibration, and Faithfulness Evaluation

To ensure clinical trustworthiness, the study goes beyond accuracy-based evaluation:

Prediction Calibration

  • Expected Calibration Error (ECE) used to quantify confidence–accuracy alignment

  • Reliability diagrams visualize calibration behavior

Key Findings

  • Teacher model initially exhibits overconfidence

  • Temperature scaling significantly improves teacher calibration

  • MobileNetV2 achieves very low ECE, indicating near-perfect calibration

  • EfficientNet-Lite0 shows higher ECE, reflecting weaker confidence alignment

Explainability and Faithfulness

  • Grad-CAM and LIME used for visual interpretability

  • Mask-based faithfulness metrics:

    • Insertion

    • Deletion

These metrics verify whether highlighted regions truly drive model predictions. MobileNetV2 demonstrates stronger faithfulness scores, confirming that its decisions rely on clinically relevant tumor regions rather than spurious features.

Key Contributions

  • Multi-student knowledge distillation framework (MobileNetV2 vs. EfficientNet-Lite0)

  • Joint evaluation of accuracy, calibration, and faithfulness

  • Demonstration that MobileNetV2 achieves superior distillation alignment

  • Clinically deployable models with minimal computational overhead

  • End-to-end reproducible pipeline across PyTorch and TensorFlow

Results Summary

Model Test Accuracy Parameters Notes
ViT Teacher 94.6% 235M High-capacity teacher
Hybrid CNN–ViT 94.72% 90.5M Best hybrid model
DenseNet121 + VGG16 92.8% Strong ensemble
EfficientNet-B2 + DenseNet121 92.43% Ensemble baseline
EfficientNet-Lite0 (KD) ~93–94% 7.9M Less aligned student
MobileNetV2 (KD) ~94% 2.4M Best efficiency, calibration, and faithfulness

License

Apache 2.0

Contact

For questions or collaboration, please open an issue or contact:
bin.abdullah@northsouth.edu

Files

densene121_vgg16(Ensemble).ipynb

Additional details

Dates

Available
2025-10-26
yes