There is a newer version of the record available.

Published January 16, 2026 | Version v1
Dataset Open

DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes

  • 1. ROR icon Qatar Computing Research Institute
  • 2. Hamad Bin Khalifa University
  • 3. ROR icon Hamad bin Khalifa University

Description

Social media imagery provides a low-latency source of situational information during natural and human-induced disasters, supporting rapid damage assessment and response. While Visual Question Answering (VQA) has made significant progress in general-purpose domains, its application to the complex, domain-specific reasoning required for disaster response remains largely unexplored. To bridge this gap, we introduce DisasterVQA, a novel benchmark and evaluation dataset tailored to perception and reasoning tasks in crisis contexts. The dataset comprises 1,395 real-world images and 4,405 expert-curated image-question-answer triplets spanning diverse events, including floods, wildfires, and earthquakes. Grounded in established humanitarian frameworks such as FEMA ESF and OCHA MIRA, DisasterVQA includes binary, multiple-choice, and open-ended questions covering situational awareness and actionable response tasks, built environment damage, population exposure, accessibility, and movement restrictions. We benchmark seven state-of-the-art vision-language models, achieving accuracies of 0.86-0.91 on yes/no questions, 0.70-0.83 on open-ended questions, and F1-scores of 0.74-0.85 on multiple-choice questions. Despite promising results, the observed performance gaps, especially in open-ended and multiple-choice reasoning, highlight limitations of current models in disaster settings. DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision-language models for disaster response.

Files

DisasterVQA-Dataset.zip

Files (85.1 MB)

Name Size Download all
md5:8a9ad02ea771deca5ccf9546a1ba771e
85.1 MB Preview Download