Brain Tumor Classification: Multi-Architecture Deep Learning with Knowledge Distillation

Bin Hossain, Mohammod Abdullah

doi:10.5281/zenodo.17494928

Published October 31, 2025 | Version v2

Software Open

Brain Tumor Classification: Multi-Architecture Deep Learning with Knowledge Distillation

Bin Hossain, Mohammod Abdullah (Researcher)¹

1. North South University

Overview

This repository presents a comprehensive and clinically oriented deep learning framework for brain tumor classification from MRI images, integrating state-of-the-art Vision Transformers (ViTs), lightweight CNNs, hybrid CNN–Transformer models, and ensemble strategies. A key contribution is a knowledge distillation pipeline, where a large-capacity ViT teacher model transfers discriminative knowledge to multiple lightweight student models, including MobileNetV2 and EfficientNet-Lite0, enabling efficient and reliable deployment.

Beyond predictive accuracy, the framework emphasizes model trustworthiness and clinical reliability through prediction calibration and explainability-based faithfulness evaluation, ensuring that high accuracy corresponds to well-calibrated and interpretable predictions.

Dataset

Combined Public MRI Dataset

Sources: BRISC, GTS AI, Mendeley, Figshare, Zenodo
Classes (4):
- No Tumor
- Glioma
- Meningioma
- Pituitary Tumor

Data Split

Training: 20,000 images (balanced, 5,000 per class)
Validation: 2,142 images
Test: 2,311 images

Preprocessing

Image size: 224×224224 \times 224224×224
Augmentations: elastic deformation, random rotation and flipping, color jittering
Class balancing applied to reduce bias

Model Architectures

1. Vision Transformer (ViT) – Teacher Model (NeuroTriad-ViT)

Parameters: 235M
Architecture: 12 transformer encoder layers with multi-head self-attention
Patch size: 16×1616 \times 1616×16
Embedding dimension: 768
Role: High-capacity teacher for knowledge distillation
Test Accuracy: ~94.6%
Macro F1-score: ~0.93

2. MobileNetV2 – Student Model (Knowledge Distillation)

Parameters: ~2.4M
Training: Logit-based knowledge distillation from ViT teacher
Outcome: Retains teacher-level performance with ~98% parameter reduction
Strengths:
- Strong alignment with teacher logits
- Superior calibration and reliability
- Best overall trade-off between accuracy and efficiency

3. EfficientNet-Lite0 – Student Model (Comparative Distillation)

Parameters: ~7.9M
Role: Lightweight student benchmark for comparison
Characteristics:
- Higher capacity than MobileNetV2
- Lower distillation alignment and calibration quality
- Used to justify student model selection

4. Hybrid CNN–ViT Model

Backbone: EfficientNet-B0 + Vision Transformer
Feature fusion: CNN spatial features + transformer global context
Test Accuracy: 94.72%
Parameters: ~90.5M

5. Ensemble Models

EfficientNet-B2 + DenseNet121 (92.43%)
DenseNet121 + VGG16 (92.8%)
EfficientNet-B2 + VGG16 (92.0%)

Training Details

Optimizer: AdamW
Loss: Cross-Entropy (with distillation loss for student models)
Regularization: Dropout, weight decay
Learning Strategy: Warm-up followed by adaptive learning rate reduction
Early stopping applied to prevent overfitting

Reliability, Calibration, and Faithfulness Evaluation

To ensure clinical trustworthiness, the study goes beyond accuracy-based evaluation:

Prediction Calibration

Expected Calibration Error (ECE) used to quantify confidence–accuracy alignment
Reliability diagrams visualize calibration behavior

Key Findings

Teacher model initially exhibits overconfidence
Temperature scaling significantly improves teacher calibration
MobileNetV2 achieves very low ECE, indicating near-perfect calibration
EfficientNet-Lite0 shows higher ECE, reflecting weaker confidence alignment

Explainability and Faithfulness

Grad-CAM and LIME used for visual interpretability
Mask-based faithfulness metrics:
- Insertion
- Deletion

These metrics verify whether highlighted regions truly drive model predictions. MobileNetV2 demonstrates stronger faithfulness scores, confirming that its decisions rely on clinically relevant tumor regions rather than spurious features.

Key Contributions

Multi-student knowledge distillation framework (MobileNetV2 vs. EfficientNet-Lite0)
Joint evaluation of accuracy, calibration, and faithfulness
Demonstration that MobileNetV2 achieves superior distillation alignment
Clinically deployable models with minimal computational overhead
End-to-end reproducible pipeline across PyTorch and TensorFlow

Results Summary

Model	Test Accuracy	Parameters	Notes
ViT Teacher	94.6%	235M	High-capacity teacher
Hybrid CNN–ViT	94.72%	90.5M	Best hybrid model
DenseNet121 + VGG16	92.8%	–	Strong ensemble
EfficientNet-B2 + DenseNet121	92.43%	–	Ensemble baseline
EfficientNet-Lite0 (KD)	~93–94%	7.9M	Less aligned student
MobileNetV2 (KD)	~94%	2.4M	Best efficiency, calibration, and faithfulness

License

Apache 2.0

Contact

For questions or collaboration, please open an issue or contact:
bin.abdullah@northsouth.edu

Files

densene121_vgg16(Ensemble).ipynb

Files (4.9 MB)

Name	Size	Download all
85M_parameterized_vision_transformer(Pretrained ViT).ipynb md5:3268b4d816bc11bb73ed9351cc57aa12	179.0 kB	Preview Download
Dataset_workings_of_the_Brain_Tumor_Classification_for_Neuro_Triad_ViT.ipynb md5:88c9a89cc8c25c5ee90a0ce6434f72a4	393.5 kB	Preview Download
densene121_vgg16(Ensemble).ipynb md5:e327921b0e9e51d5fd82fde3d4474abd	209.8 kB	Preview Download
efficientnetb2_vgg16(Ensemble).ipynb md5:98a339eb8a5e0311c7dddbbdc2a70276	201.3 kB	Preview Download
effiecienrnetb2_densenet121(Ensemble).ipynb md5:acf9fc457f9d1b891325635f20c8adc4	176.6 kB	Preview Download
hybrid_model_cnn_vit(CNN+ViT).ipynb md5:7f693ba931c5aa3bbf56f126d60a7e0e	243.3 kB	Preview Download
Manual_Graph_curves_for_Teacher_and_Student_Model_with_GRAD_CAM_&_LIME_Explanation_Visualization.ipynb md5:f9566a64d805cbbaa719854e39c68420	2.5 MB	Preview Download
Neuro_Triad_ViT(235M)_Teacher_Model_with_MobileNet_V2_student_Model_for_Brain_Tumor_Classification_using_Knowledge_Distillation.ipynb md5:11bc688f0ff272697c5e26fc26a1cbcb	954.7 kB	Preview Download

Additional details

Available: 2025-10-26

yes

	All versions	This version
Views	365	126
Downloads	138	73
Data volume	18.0 GB	44.8 MB

Brain Tumor Classification: Multi-Architecture Deep Learning with Knowledge Distillation

Authors/Creators

Description

Overview

Dataset

Model Architectures

1. Vision Transformer (ViT) – Teacher Model (NeuroTriad-ViT)

2. MobileNetV2 – Student Model (Knowledge Distillation)

3. EfficientNet-Lite0 – Student Model (Comparative Distillation)

4. Hybrid CNN–ViT Model

5. Ensemble Models

Training Details

Reliability, Calibration, and Faithfulness Evaluation

Prediction Calibration

Explainability and Faithfulness

Key Contributions

Results Summary

License

Contact

Files

densene121_vgg16(Ensemble).ipynb

Files (4.9 MB)

Additional details

Dates