DiRecNetV2: A Transformer-Enhanced Network for Aerial Disaster Recognition
Authors/Creators
- 1. KIOS Research and Innovation Center of Excellence, University of Cyprus
Description
Abstract:
The integration of Unmanned Aerial Vehicles (UAVs) for disaster assessment via aerial imagery necessitates models that demonstrate exceptional accuracy, computational efficiency, and real-time processing capabilities. Traditional Convolutional Neural Networks (CNNs), demonstrate efficiency in local feature extraction but are limited by their potential for global context interpretation. On the other hand, Vision Transformers (ViTs) show promise for improved global context interpretation through the use of attention mechanisms, although they still remain underinvestigated in UAV-based disaster response applications. Bridging this research gap, we introduce the DiRecNetV2, a novel hybrid model that merges the strengths of CNNs and ViTs. It combines the robust feature extraction of CNNs with the efficient global context understanding of ViTs, maintaining a low computational load ideal for UAV applications. Additionally, we introduce a new, compact multi-label dataset of disasters, to set an initial benchmark for future research, exploring how models trained on single-label data perform in a multi-label test set.
The study assesses lightweight CNNs and ViTs on the AIDERSv2 dataset, based on the frames per second (FPS) for efficiency and the weighted F1 scores for classification performance. DiRecNetV2 not only achieves a weighted F1 score of 0.964 on a single-label test set but also demonstrates adaptability, with a score of 0.614 on a complex multi-label test set, while functioning at 176.13 FPS on the Nvidia Orin jetson device. The study's results point to a promising direction for future research: a hybrid approach that combines the global context capabilities of Vision Transformers (ViTs) with the robust feature representation maps of CNNs has the potential to achieve high accuracy and could emerge as a mainstream architectural choice in the field.
Cite:
Shianios, D., Kolios, P.S. & Kyrkou, C. DiRecNetV2: A Transformer-Enhanced Network for Aerial Disaster Recognition. SN COMPUT. SCI. 5, 770 (2024).
AIDERv2 dataset:
https://zenodo.org/records/10891054
Notes
Files
manuscript.pdf
Files
(101.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:2a9bc77eff51fe9802d13d4fdee3f704
|
101.4 MB | Preview Download |
Additional details
Related works
- Compiles
- Image: 10.5281/zenodo.10891054 (DOI)
Dates
- Submitted
-
2024-03-03
- Accepted
-
2024-06-03