Deep multi-modal satellite and in-situ observation fusion for Soil Moisture retrieval

This work focuses on the problem of surface soil moisture estimation from multi-modal remote sensing observations. We focus on the scenario where both passive radiometer observations from NASA SMAP satellite, as well as active radar measurements from ESA Sentinel 1 are available. We formulate the problem as multi-source observation fusion and develop a deep learning model for SM estimation. To train and validate the performance of the proposed scheme, we consider observations from in-situ SM sensor networks over the continental USA. Experimental results demonstrate that the proposed model achieves high quality SM estimation, surpassing the performance of available products.


I. INTRODUCTION
Residing at the land-atmospheric boundary, surface and near-surface Soil Moisture (SM) have profound implications on Earth's water and energy cycles. SM retrieval techniques rely on two primary sources of observations, namely observations from remote sensing platforms, typically satellites, and in-situ measurements from wireless sensor networks.
NASA's Soil Moisture Active Passive (SMAP) satellite, is tasked with providing a high-quality estimation of global surface SM and freeze-thaw by capturing observations from an active L-band radar instrument and a passive L-band radiometer. Due to hardware problems, only the radiometer is still operational. To address the lack of radar observations, the use of the European Space Agency Sentinel-1 C-band radar observations has been recently proposed [1]. The active radar measurements from Sentinel-1 encode radar backscatter at 1 km spatial resolution, while the passive radiometer from SMAP capture brightness temperature at 36 km (L2 SM P) spatial resolution, however, a high quality 9 km (SPL2SMP-E) product is also available. Using these two sources of information, disaggregated 9 km (SMAP L2 SM P E) and 1 km (SMAP L2 SM SP) L2 products of SM can be generated.
In this work, we propose a deep learning model for SM retrieval from coarse-resolution passive microwave (radiometer) brightness temperature maps and fine-resolution active microwave (synthetic aperture radar) backscattering crosssection imagery. The proposed model is able to provide both coarse (9 km) and fine (1 km) resolution SM. To train and validate the proposed model, SM measurements from localized in-situ sensor networks cover the Continental United States of America (CONUS) are employed.

II. STATE-OF-THE-ART
Estimation of SM from in-situ and/or satellite observations is an extensively investigated topic and numerous approaches have been proposed [2], while machine learning based approaches have gained considerable attention due to their flexibility and ability to process a large number of inputs [3], [4], [5]. More recently, the Deep Learning framework has gained considerable attention for the enhancement of remote sensing observations [6] and has been considered for the problem of SM estimation. In order to capture the information encoded in time-series (LSTM) networks were explored for SM estimation from Brightness Temperature measurements from SMAP, MODIS Vegetation Water Content and soil temperature in [7]. A work similar to the one reported here is the method proposed by Mao et al. [8] where the authors employed machine learning, random forests in particular, for estimating the high-resolution SMAP/Sentinel-1 estimation given lowresolution SMAP radiometry data. An earlier version considered CNNs for downscaling SMAP radiometer brightness temperature measurements, focusing only on the period when both SMAP radar and radiometer were operational [9]. In [10], a deep learning model was also proposed for SM estimation, however, this model did not assume the availability of in-situ observations or observations from multiple satellite platforms.

III. OBSERVATION SOURCES PROFILES
The data used for the generation of the dataset include remote sensing observations from the NASA SMAP mission, provided by the National Snow and Ice Data Center, observations from ESA Sentinel-1A and -1B satellites, provided by Copernicus, and in-situ observations from the International Soil Moisture Network (ISMN) [11].
Brightness temperatures (TBs) in kelvin are derived from native 36 km SMAP footprint using Backus-Gilbert interpolation on the 9 km EASE-Grid over horizontal and vertical polarization. The Sentinel-1 C-band Synthetic Aperture Radar (C-SAR) measures dual polarization VV + VH in the interferometric wide swath Mode over land, with a center frequency of 5.405 GHz, while σ 0 measurements are derived using SAR processing.
To generate the required datasets, a diverse set of in-situ sensor locations from the International Soil Moisture Networks is explored. In general, these sensors record soil moisture using different techniques, and at different depths. In this work, we are interested in surface soil moisture so only the soil moisture at the top 5 cm is considered. We select sensors currently in operation and focus on networks using the same class of hardware sensors (Stevens Hydraprobe). Specifically, we incorporate data from the ARM, the SCAN, the SNOTEL, and the USCRN networks. The location of the in-situ sensors considered in this work is shown in Figure 2.

A. Satellite derived SM product
The baseline model is the "High-Resolution Enhanced Product Based on SMAP Active-Passive Approach using Sentinel 1A and 1B SAR Data" [12]. The data correspond to the Level-2 (L2), which contains calibrated, geolocated, timeordered TB during 6:00 a.m. descending (and 6:00 p.m. ascending) half-orbit passes and Sentinel 1 C-band backscatter coefficients, transformed to sigma-naught (σ 0 ) values, at a special resolution of 1 km and 3 km.

IV. SM ESTIMATION MODEL
Our model assumes the availability of two inputs, namely X 1 for the T b measurements at 36km spatial resolution and X 2 for the σ 0 measurements at 1km from SMAP and S1 respectively. The primary objective of the model is to estimate the SM value of in-situ at 1km spatial resolution, y 1 , while an auxiliary estimation target could be the SM values at 9km encoded in the NASA L2 product, Y 2 .
The primary objective of the model is thus to estimate the parameters w by minimizing function L 1 : where P Ω is the sampling operator which only preserves the values at locations where in-situ measurements are available. In addition to minimizing L 1 which focused on high-quality estimation using point-like ground-truth, we additionally introduce L 2 in which case the objective is to match the predicted SM at image-level, but at a courser-resolution of 9km, to the estimated SM in the L2 product. In this case, the objective is to minimize where D is a downsampling operator which downscales the SM from 1km to 9km spatial resolution. The end-goal of the model is to minimize the composite of the two loss function, weighted through the value α, i.e., A. Deep multi-modal observation fusion network In the above described framework, the objective is to estimate the model parameters w which will minimize the composite loss function in Eq. 3. Although different approaches could be taken, in this work, the non-linear modeling function f in Eq. 1 and Eq. 2 corresponds to a Convolutional Neural Network (CNN) which consists of four convolution blocks, each one performing the following actions • 32 filters 3D convolution filters applied to the input. • A non-linear relu activation function • A spatial dropout layer • A batch normalization layer The output of the last layer is collapsed across channels/filter in order to extract a single value at each spatial location (pixel). A mask is utilized for selecting on available observation during model training. Furthermore, the outputs is also introduced to a downsampling process which corresponds to the application of an average pooling followed by an upsampling layer. The total number of trainable parameters of the network is in the order of 50k.

A. Dataset characteristics observations
We consider observations during September 2020 since data for new versions of SMAP Level-1, -2, and -3 are currently available from 27 August 2020 onwards. Input corresponds to 3D image patches of size 256 × 256 × 4 at 1 km spatial resolution where 4 corresponds to the concatenation of the 2 polarization from the radiometer and 2 polarization from the radar. Figure 3 presents the distribution of SM value for the training and the validation set. For a patch to be eligible as a training example, at least 5 locations (pixel) must be associated with in-situ sensor locations. We observe that the training set distribution covers to a very large extend the validation set, which is required when training a deep learning model.

B. Experimental results
Figures 4 and 5 present scatter plots of in-situ (actual) and satellite retrieved (predicted) SM value pairs. In both cases, we consider the SMAP/S1 L2 product (in blue), the proposed approach (orange) and the ideal behavior (green). Figure 4 corresponds to the performance on the training set, once training is completed, while Figure 5   in-situ SM value, Figure 6 presents the unRMSE for all the examples in the validation set. We observe that in most cases, the proposed method achieves lower error compared to the L2 product. The average estimation error for each method is also provided in the figure.
In Figure 7 we focus on a representative example and visually present the inputs (top row), the SM prediction at 1km spatial resolution (middle row) and the SM prediction at 9km (bottom row).

VI. CONCLUSION
Retrieving surface soil moisture from remote sensing observation over large scales is a challenging topic. In this work, we developed an analysis-ready dataset encoding satellite and insitu sensor measurements and propose a deep learning model for high accuracy retrieval. In future work, the potential of  introducing additional sources of information like land cover will be explored.