Sensitivity-Prioritized Comparative Evaluation of VGG16 and ResNet50 Deep Learning Architectures for Early-Stage Alzheimer's Disease Detection from Brain MRI
Authors/Creators
Description
Background: Alzheimer's Disease (AD) is a progressive neurodegenerative disorder affecting approximately 55 million individuals globally, with projections exceeding 139 million cases by 2050. Clinical diagnosis currently depends on manual radiologist interpretation of Structural Magnetic Resonance Imaging (MRI), a workflow subject to substantial inter-observer variability particularly at the Very Mild Demented stage, where delayed detection directly translates into missed therapeutic windows.
Objective: This study addresses a critical methodological gap in the deep learning literature: prior comparative studies of CNN architectures for AD detection have overwhelmingly optimized for aggregate classification accuracy rather than Sensitivity (Recall), the metric of primary clinical concern in medical screening. We present a rigorously controlled Champion vs. Challenger comparative evaluation of two widely adopted CNN architectures VGG16 (sequential) and ResNet50 (residual) with explicit priority on minimizing false negatives, particularly in the clinically decisive Very Mild Demented stage.
Methods: Both architectures were trained on the publicly available Augmented Alzheimer MRI Dataset (33,984 images, four severity classes) using a stratified 80/20 train–validation split. A two-stage Transfer Learning protocol ImageNet-based feature extraction (5 epochs at learning rate 1.26×10⁻³) followed by full-network fine-tuning (5 epochs at learning rate 1×10⁻⁵) was applied uniformly to both models. The Adam optimizer and categorical cross-entropy loss were used throughout. Performance was evaluated using accuracy, macro-averaged precision, macro-averaged recall (primary criterion), macro-averaged F1-score, and class-level confusion matrices.
Results: Contrary to theoretical expectations favoring deeper residual architectures, VGG16 outperformed ResNet50 on every evaluated metric. VGG16 achieved a final validation accuracy of 97.76%, macro-average recall of 97.92%, and validation loss of 0.0653, compared to ResNet50's 94.25%, 94.68%, and 0.1694 respectively. Critically, in the Very Mild Demented class, VGG16 attained 94.05% recall versus ResNet50's 88.64% a 5.41-percentage-point improvement that reduced false negatives from 108 to 74 patients per validation cohort of 1,814 early-stage cases.
Conclusions: These findings challenge the assumption that deeper residual architectures universally outperform shallower sequential networks in constrained medical imaging domains. In applications where training data volume is moderate and target features are spatially coherent (as with cortical atrophy), VGG16's hierarchical sequential learning, combined with careful fine-tuning, produces superior clinical sensitivity. We recommend VGG16 as a foundational architecture for Alzheimer's clinical decision support systems and identify explainability (Grad-CAM), 3D volumetric modelling, and multi-site prospective validation as the highest-priority directions for subsequent research.
Index Terms: Alzheimer's Disease, Convolutional Neural Networks, VGG16, ResNet50, Transfer Learning, MRI Classification, Medical Image Analysis, Dementia Staging, Sensitivity, False Negative Reduction, Computer-Aided Diagnosis, Explainable AI.
Files
alzheimer_research_paper_FINAL.pdf
Files
(1.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:28c2c1af256b109b507d3e34e1a200a7
|
1.4 MB | Preview Download |
Additional details
Software
- Programming language
- Python