A HYBRID CNN–TRANSFORMER FRAMEWORK FOR AUTOMATED SKIN CANCER DETECTION FROM DERMOSCOPIC IMAGES
- 1. Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia; Centre for Wireless Technology (CWT), Multimedia University Department of Data Science and Artificial Intelligence, Faculty of Prince Al-Hussein Bin Abdallah II for IT, Al Al-Bayt University, Mafraq, Jordan Department of Information Technology, Faculty of Prince Al-Hussein Bin Abdullah II for Information Technology, The Hashemite University, Zarqa, Jordan Department of Computer Science, Faculty of Information Technology, Applied Science Private University, Amman, Jordan Faculty of Artificial Intelligence and Engineering, Multimedia University, Cyberjaya, Selangor, Malaysia
Description
The early detection of melanoma and other forms of skin cancer is currently one of the most difficult challenges facing clinicians in the field of dermatology. The difficulty lies in the subtle differences in appearance among benign and malignant lesions. In this research we introduce a new type of deep learning hybrid framework that utilizes both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to overcome the limitations inherent in single paradigm frameworks. Our framework utilizes a pre-trained version of EfficientNet-B4 to extract hierarchical local features from each image and a multi-layer Vision Transformer to capture long range spatial dependencies and global contextual information. To combine the two different types of complementary representation, our framework uses a sophisticated fusion methodology based on feature concatenation, multi-layer perceptron processing, and residual connections. The efficacy of our hybrid architecture was tested on the 33,126 dermoscopic images available on the ISIC 2020 dataset using a stratified 5-fold cross-validation testing approach. Our hybrid architecture achieved a superior diagnostic performance compared to the state-of-the-art previous model, which utilized a pre-trained EfficientNet-B4 + Attention. Specifically, our hybrid architecture achieved a 95.4% classification accuracy rate, a 90.7% sensitivity rate, a 95.1% specificity rate, and a .982 AUC-ROC value. The increases in both sensitivity and specificity rates represent clinically relevant improvements in both melanoma detection and false positive reductions. Therefore, our results demonstrate that combining CNN-based local texture analysis with transformer-based global semantic understanding creates a more accurate and robust computer aided diagnosis system, and offers significant opportunities to support clinicians in their decision-making processes as well as improve patient outcomes.
Files
703.2242-OJS Ready final 1.pdf
Files
(565.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:527bfec599e772f2f079d6e46f4cc416
|
565.0 kB | Preview Download |