Multimodal Data Fusion of Social Media and Satellite Images for Emergency Response and Decision-Making

Artificial Intelligence (AI) is already part of our lives and is extensively entering the space sector to offer value-added Earth Observation (EO) products and services. The Copernicus programme provides data on a free, full and open basis, while the recently launched Data and Information Access Service (DIAS) providers index, store and exchange tremendous amounts of data and cloud infrastructure computational resources. Copernicus data and other georeferenced data sources are often highly heterogeneous, distributed and semantically fragmented. One example is the massively generated social media data from citizen observations, including visual, textual and spatiotemporal information. Social media information offers reliable, timely and very prescriptive information about a crisis event. In this work we present the multimodal fusion aspects for combining satellite images and social media for emergency response, such as flood monitoring and extreme weather conditions in polar regions.


INTRODUCTION
During the past decades, emergency response was based mainly on phone calls and other methods to respond during a crisis event, lacking real-time and multimodal information about the affected area. Nowadays, the monitoring of a crisis event takes into account the ubiquitous visual content that becomes available to civil protection agencies and authorities from social media, static cameras or cameras on board of a drone, and satellite images.
Satellite images are often used for rapid mapping and recovery, but the available streams of social media posts are a valuable data source, not only for validation purposes, e.g. validating a flood mapping model [1], but also for fusion in order to enhance the existing estimations, such as the estimated snow depth in a near-urban area [2]. Social media data This work has been supported by the projects H2020-CALLISTO and H2020-EOPEN with grant agreements 101004152 and 776019 respectively. have been exploited to detect disaster events in a timely and accurate manner [3], such as flooding incidents [4] and earthquakes [5], so the mapping services from satellite imagery can be enhanced with geotagged images, relevant text messages and temporal information for multimodal event detection [6,7].
The revisit time of a satellite defines the temporal availability of the first mapping of an area of interest after a disaster occurs. Since the revisit time can vary between 4 and 5 days (or 2-3 days, if both Sentinel-1 and Sentinel-2 images are acquired for the same area), it can lead to significant delays in decision making. To overcome this issue, complementary sources of geospatial information can be taken into consideration, such as citizen observations from social media. The main challenge is to fuse these multimedia data with satellite images for a more effective mechanism that may assist first responders in their daily operations.

DATA MINING IN SOCIAL MEDIA
Social media are collected and filtered in terms of relevancy to a target event (e.g. Twitter posts referring to a specific flood incident) using both visual and textual information with deep vector representations from deep neural networks [8] in a binary classification problem to filter out irrelevant tweets.
The temporal dimension of a tweet is available, allowing for burst analysis to identify extreme events in the time series of relevant tweets, based on the extracted contextual information. However, the spatial dimension is rarely available and the geographical location needs to be estimated for positioning georeferenced tweets (images, text) on a map, offering situational awareness through social media analytics. The tweet localization uses first Named Entity Recognition to extract locations in raw text, which are then linked to external Knowledge Bases (e.g. OpenStreetMap) to get the latitude and longitude of each mentioned location. These social media analytics are demonstrated in Figure 1.
Temporal and spatial information together are correlated with the outcomes of the analysis of a satellite image, such as Sentinel-1 for snow depth estimation or water body detection. Searching in large amounts of Earth Observation data with respect to a multimodal query can be also triggered by a social media search with respect to a multimodal text-image query [9]. The multimodal character of satellite images results from the various number of channels and associated metadata (date, time, geographical location, mission, etc.). Each satellite image can be considered as a collection of satellite image patches with semantic information about each one of them, as concepts correspond to each patch (e.g. urban area, rock, water, snow, etc.). These concepts also appear in social media streams, where a concept extraction algorithm is applied, and allow for semantically correlating social media posts or events with satellite images that contain the same concepts in their patches [10].

MULTIMODAL DATA FUSION WITH SENTINEL IMAGES
Multimodal fusion combines several and heterogenous data sources or information extracted to provide meaningful decisions or enhanced and accurate estimations for operational purposes. Fusion can take place also at a semantic layer where web technologies offer the tools to link data and consider the available policy-driven rules for decision-making [11]. At the feature level the technologies combine the vector representations per modality in an early fusion manner. However, a late or hybrid fusion approach effectively combines the decisions per modality [12].
The multimodal fusion framework for snow depth estimation using Earth Observation and social media data, as it is proposed in [2], is shown in Figure 2. The visual concept snow is extracted, the relevance of the Twitter text to the ac- tual meaning of snow is estimated using Natural Language Processing and the snow depth is estimated from Sentinel-1 images. All these highly-heterogeneous data and knowledge extracted are combined towards a novel snow depth estimation with multimodal data fusion, which has been validated in two areas in Finland.
We observe in Table 1 that social media observations bring a significant added value from the application point of view and serve as a candidate source of data for operational purposes.  Earth Observation and social multimodal data fusion offers a plethora of visual analytics tools for natural hazard monitoring and water safety, security, and management [7]. Satellite image analysis for water mask extraction in flooded areas already fuses satellite image channels with Digital Elevation Model (DEM) measurements [13]. Traffic management during the first hours of a crisis event is also supported by a road passability service [14] to infer whether a road segment from point A to point B is passable or not due to flooding. Annotated datasets have been provided in the context of Multimedia Evaluation Benchmark tasks, such as the Multimedia Satellite Task and its successors 1 .
The problem of fusing social media and satellite images has also been tackled in the EOPEN 2 project. Floods detected by performing change detection on a timeseries of satellite images and flooding incidents detected by applying outlier detection on social media streams are semantically represented in RDF, supporting in this way spatiotemporal SPARQL queries that combine both types of knowledge. Through a dedicated interface, shown in Figure 3, users from the domains of emergency response and civil protection are able to define the time and area of interest and retrieve flood alerts that are generated by the analysis of sentinel as well as social media data.

CONCLUSION AND FUTURE RESEARCH
The gap between the DIAS providers and the application end users can be bridged through dedicated AI solutions that will add value to the large volumes of satellite data (images and associated metadata) which are frequently coming to Earth from Sentinel constellation. AI is a collection of technologies that combine data, algorithms and computing power.
The generation of effective information for the users relies on the fusion of heterogeneous sources of data (Earth Observation, drone navigation and video analysis, social media stream monitoring, in-situ sensor data). The creation of value on EO data goes beyond the space sector and needs to foster the development of geolocation-based services for both the public and the private sector, in applications relative to policymaking, water management, security and journalism.
Complementary data sources such as citizen observations through dedicated mobile apps and social media platforms need to be linked to EO data when spatial information is available or can be automatically estimated. Galileo-enabled mobile devices with authenticated and precise geo-reference need to be involved in the design of fully automated processes in decision support systems. These challenges are expected to be addressed in the context of the CALLISTO project 3 , recently presented in the EO Big Data online workshop 4 , which has started in January 2021 and has a duration of three years.
CALLISTO combines Earth Observation with crowdsourced data, videos from Unmanned Aerial Vehicles and in-situ data, through machine learning and data fusion technologies; the outcomes are semantically-enriched and served to humans in interactive interfaces and mobile apps. The interfaces enable virtual presence and situational awareness, through Virtual, Augmented and Mixed Reality, creating a novel and innovative immersive environment for the Copernicus market and, in particular, for emergency response and security applications.