Brain Tumor Classification: Multi-Architecture Deep Learning with Knowledge Distillation
Description
Overview
This repository presents a comprehensive and clinically oriented deep learning framework for brain tumor classification from MRI images, integrating state-of-the-art Vision Transformers (ViTs), lightweight CNNs, hybrid CNN–Transformer models, and ensemble strategies. A key contribution is a knowledge distillation pipeline, where a large-capacity ViT teacher model transfers discriminative knowledge to multiple lightweight student models, including MobileNetV2 and EfficientNet-Lite0, enabling efficient and reliable deployment.
Beyond predictive accuracy, the framework emphasizes model trustworthiness and clinical reliability through prediction calibration and explainability-based faithfulness evaluation, ensuring that high accuracy corresponds to well-calibrated and interpretable predictions.
Dataset
Combined Public MRI Dataset
-
Sources: BRISC, GTS AI, Mendeley, Figshare, Zenodo
-
Classes (4):
-
No Tumor
-
Glioma
-
Meningioma
-
Pituitary Tumor
-
Data Split
-
Training: 20,000 images (balanced, 5,000 per class)
-
Validation: 2,142 images
-
Test: 2,311 images
Preprocessing
-
Image size: 224×224224 \times 224224×224
-
Augmentations: elastic deformation, random rotation and flipping, color jittering
-
Class balancing applied to reduce bias
Model Architectures
1. Vision Transformer (ViT) – Teacher Model (NeuroTriad-ViT)
-
Parameters: 235M
-
Architecture: 12 transformer encoder layers with multi-head self-attention
-
Patch size: 16×1616 \times 1616×16
-
Embedding dimension: 768
-
Role: High-capacity teacher for knowledge distillation
-
Test Accuracy: ~94.6%
-
Macro F1-score: ~0.93
2. MobileNetV2 – Student Model (Knowledge Distillation)
-
Parameters: ~2.4M
-
Training: Logit-based knowledge distillation from ViT teacher
-
Outcome: Retains teacher-level performance with ~98% parameter reduction
-
Strengths:
-
Strong alignment with teacher logits
-
Superior calibration and reliability
-
Best overall trade-off between accuracy and efficiency
-
3. EfficientNet-Lite0 – Student Model (Comparative Distillation)
-
Parameters: ~7.9M
-
Role: Lightweight student benchmark for comparison
-
Characteristics:
-
Higher capacity than MobileNetV2
-
Lower distillation alignment and calibration quality
-
Used to justify student model selection
-
4. Hybrid CNN–ViT Model
-
Backbone: EfficientNet-B0 + Vision Transformer
-
Feature fusion: CNN spatial features + transformer global context
-
Test Accuracy: 94.72%
-
Parameters: ~90.5M
5. Ensemble Models
-
EfficientNet-B2 + DenseNet121 (92.43%)
-
DenseNet121 + VGG16 (92.8%)
-
EfficientNet-B2 + VGG16 (92.0%)
Training Details
-
Optimizer: AdamW
-
Loss: Cross-Entropy (with distillation loss for student models)
-
Regularization: Dropout, weight decay
-
Learning Strategy: Warm-up followed by adaptive learning rate reduction
-
Early stopping applied to prevent overfitting
Reliability, Calibration, and Faithfulness Evaluation
To ensure clinical trustworthiness, the study goes beyond accuracy-based evaluation:
Prediction Calibration
-
Expected Calibration Error (ECE) used to quantify confidence–accuracy alignment
-
Reliability diagrams visualize calibration behavior
Key Findings
-
Teacher model initially exhibits overconfidence
-
Temperature scaling significantly improves teacher calibration
-
MobileNetV2 achieves very low ECE, indicating near-perfect calibration
-
EfficientNet-Lite0 shows higher ECE, reflecting weaker confidence alignment
Explainability and Faithfulness
-
Grad-CAM and LIME used for visual interpretability
-
Mask-based faithfulness metrics:
-
Insertion
-
Deletion
-
These metrics verify whether highlighted regions truly drive model predictions. MobileNetV2 demonstrates stronger faithfulness scores, confirming that its decisions rely on clinically relevant tumor regions rather than spurious features.
Key Contributions
-
Multi-student knowledge distillation framework (MobileNetV2 vs. EfficientNet-Lite0)
-
Joint evaluation of accuracy, calibration, and faithfulness
-
Demonstration that MobileNetV2 achieves superior distillation alignment
-
Clinically deployable models with minimal computational overhead
-
End-to-end reproducible pipeline across PyTorch and TensorFlow
Results Summary
| Model | Test Accuracy | Parameters | Notes |
|---|---|---|---|
| ViT Teacher | 94.6% | 235M | High-capacity teacher |
| Hybrid CNN–ViT | 94.72% | 90.5M | Best hybrid model |
| DenseNet121 + VGG16 | 92.8% | – | Strong ensemble |
| EfficientNet-B2 + DenseNet121 | 92.43% | – | Ensemble baseline |
| EfficientNet-Lite0 (KD) | ~93–94% | 7.9M | Less aligned student |
| MobileNetV2 (KD) | ~94% | 2.4M | Best efficiency, calibration, and faithfulness |
License
Apache 2.0
Contact
For questions or collaboration, please open an issue or contact:
bin.abdullah@northsouth.edu
Files
densene121_vgg16(Ensemble).ipynb
Files
(4.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:3268b4d816bc11bb73ed9351cc57aa12
|
179.0 kB | Preview Download |
|
md5:88c9a89cc8c25c5ee90a0ce6434f72a4
|
393.5 kB | Preview Download |
|
md5:e327921b0e9e51d5fd82fde3d4474abd
|
209.8 kB | Preview Download |
|
md5:98a339eb8a5e0311c7dddbbdc2a70276
|
201.3 kB | Preview Download |
|
md5:acf9fc457f9d1b891325635f20c8adc4
|
176.6 kB | Preview Download |
|
md5:7f693ba931c5aa3bbf56f126d60a7e0e
|
243.3 kB | Preview Download |
|
md5:f9566a64d805cbbaa719854e39c68420
|
2.5 MB | Preview Download |
|
md5:11bc688f0ff272697c5e26fc26a1cbcb
|
954.7 kB | Preview Download |
Additional details
Dates
- Available
-
2025-10-26yes