Fractional Abundance Datasets for Salt Patches and Marshes Across the Delmarva Peninsula, v1
Creators
Description
Abstract:
Coastal agricultural lands in the eastern USA are increasingly plagued by escalating soil salinity, rendering them unsuitable for profitable farming. Saltwater intrusion into groundwater or soil salinization can lead to alterations in land cover, such as diminished plant growth, or complete land cover transformation. Two notable instances of such transformations include the conversion of farmland to marshland or to barren salt patches devoid of vegetation. However, quantifying these land cover changes across vast geographic areas poses a significant challenge due to their varying spatial granularity. To tackle this issue, a non-linear spectral unmixing approach utilizing a Random Forest (RF) algorithm was employed to quantify the fractional abundance of salt patches and marshes. Using 2022 Sentinel-2 imagery, gridded datasets for salt patches and marshes were generated across the Delmarva Peninsula (14 coastal counties in Delaware, Maryland and Virginia, USA), along with the associated uncertainty. The RF models were constructed using 100 trees and 27,437 reference data points, resulting in two sets of ten models: one for salt patches and another for marshes. Validation metrics for sub-pixel fractional abundances revealed a moderate R-squared value of 0.50 for the salt model ensemble and a high R-squared value of 0.90 for the marsh model ensemble. These models predicted a total area of 16.34 sq. km. for salt patches and 1,256.71 sq. km. for marshes. In these datasets, we only report fractional abundance values ranging from 0.4 to 1 for salt patches and 0.25 to 1 for marshes, along with the standard deviation associated with each value.
--------------------------------------------
This collection of gridded data layers provides fractional abundance of salt patches and marshes for the year 2022 for 14 counties in the Delmarva Peninsula in the United States of America (USA). This collection is comprised of 4 files in the form of a single band raster:
-
Fractional abundance mean: Salt patch – Mean of per-pixel fractional abundance from an ensemble of 10 RF models. Only pixels with salt patch fraction ≥ 0.40 were retained in this layer.
-
Standard deviation of fractional abundance means: Salt patch – Standard deviation of per-pixel fractional abundance means derived from an ensemble of 10 RF models.
-
Fractional abundance mean: Marsh – Mean of per-pixel fractional abundance from an ensemble of 10 RF models. Only pixels with marsh fraction ≥ 0.25 were retained in this layer.
-
Standard deviation of fractional abundance means: Marsh – Standard deviation of per-pixel fractional abundance means derived from an ensemble of 10 RF models.
Input Data:
This approach integrated Sentinel-2 Level 2 A surface reflectance imagery (June, July, and August - 2022), a global land use/land cover dataset from ESRI (Karra et al., 2021), a NAIP-derived Delmarva land cover dataset (Mondal et al., 2022), high-resolution PlanetScope true color images (Planet Team, 2017), very high-resolution Unoccupied Aerial Vehicle (UAV) imagery, and ground truth data.
We derived several spectral indices (see table below) from the Sentinel-2 Level 2 A bands and then used those as inputs into a Random Forest (RF) classifier in python.
Method:
The research utilized Sentinel-2 Level 2 A surface reflectance imagery for spectral unmixing. This multispectral dataset, corrected for atmospheric and radiometric effects, encompasses 13 spectral bands from visible to near-infrared wavelengths (0.443–2.190 micrometers). The imagery offers spatial resolutions ranging from 10 m to 60 m and is captured every 5 days. To aid in selecting reference points for model training and testing, high-resolution (60 cm) UAV images of specific farmlands in Dorchester and Somerset counties, Maryland, were acquired under optimal weather conditions.
The study incorporated multiple datasets to refine the analysis. The Sentinel-2 derived global land use/land cover dataset from ESRI was employed to isolate relevant land cover classes such as 'Crops' and 'Rangeland'. A NAIP-derived Delmarva land cover dataset with eight classes helped exclude non-agricultural land cover types. High-resolution PlanetScope true color images with 3 m spatial resolution were used as reference data for model validation.
A composite image was generated from Sentinel-2 Level 2 A images using a maximum Normalized Difference Vegetation Index (NDVI) filter. This composite was created from Sentinel-2 images captured between June 1 and August 30, 2022, retaining pixels with the highest NDVI values. This approach effectively highlighted areas of reduced crop cover due to high salinity levels, even during peak growing season. Cloud masking was performed using Sentinel-2 cloud probability imagery, applying a 20% threshold for maximum cloud probability. The pre-processing of Sentinel-2 imagery was conducted on Google Earth Engine (GEE), a cloud-based geospatial data processing platform.
NDVI = (Near infrared – Red) / (Near infrared + Red)
The NDVI maximum composite incorporated seven original Sentinel-2 bands (R, G, B, Red-Edge 1 & 2, NIR, SWIR) and five additional indices. These indices included the Enhanced Vegetation Index (EVI), Moisture Stress Index (MSI), and Modified Soil Adjusted Vegetation Index (MSAVI). Furthermore, two new indices were developed for this study: the Normalized Difference Salt Patch Index (NDSPI) and Modified Salt Patch Index (MSPI). These novel indices were designed to enhance the spectral separability between salt patches and bare soil, maximizing the difference in values between these two land cover types.
Spectral Index |
Equation |
EVI: Enhanced Vegetation Index |
2.5 × ((NIR - RED)) / ((NIR + 6 × RED – 7.5 × BLUE + 1) ) |
MSAVI: Modified soil-adjusted vegetation index |
(2 × NIR + 1 - √(((2 × NIR + 1)^2 – 8 × (NIR - RED)) )) /2 |
MSI: Moisture Stress Index |
SWIR / NIR |
NDSPI: Normalized Difference Salt Patch Index |
(SWIR - B) / (SWIR + B) |
MSPI: Modified Salt Patch Index |
(R + G + B + NIR - SWIR) / (R + G + B + NIR + SWIR) |
For the training process, we identified five common endmembers: salt patch, bare soil, crop, water, and marsh, which were present in and around the selected farmlands. Reference points for bare soil were defined as pixels of soil in farmlands that did not contain salt patches or crops. For salt, reference points were identified as pixels representing salt patches with little to no vegetation. These reference points were gathered using Sentinel-2 imagery, primarily captured on June 29, 2022, and were supplemented by additional UAV imagery from various dates. Farm locations were chosen based on the visibility of significant salt patches, with the imagery dates being as close as possible to the UAV flight dates. Additional ground truth data for land cover was collected during the summer of 2022 to enhance the remotely gathered points. In total, 27,437 reference points were collected for model training and testing: 239 for salt, 1,096 for bare soil, 5,198 for crops, 20,131 for water, and 773 for marsh. Out of these reference points, 142 (69 for salt, 23 for bare soil, and 50 for crops) were collected during field visits; the remainder was obtained digitally with visual support from PlanetLabs data.
In this study, we applied a Random Forest (RF) classifier for nonlinear spectral unmixing. The RF classifier functions by utilizing an ensemble of decision trees that are independently trained on random subsets of training data through bootstrap aggregation. The final classification is determined by aggregating votes from all trees, with the endmember receiving the highest total votes being selected as the final output. To access soft voting information from the RF classifier, we used its probability prediction function called ‘predict_proba’. This function enables each decision tree to produce a probability distribution for each endmember instead of making a single class decision. The probability distribution from a decision tree indicates how likely it is that an input pixel belongs to each endmember. The final predicted probabilities are calculated by averaging these distributions across all decision trees for each of the five endmembers. As a result, each pixel in the final output is represented by five probability values that indicate the fractional abundance of each corresponding endmember within that pixel. These probabilities sum to one, effectively illustrating the spectral unmixing of a mixed pixel. A pixel value of 0 signifies the absence of a specific endmember, while a value of 1 indicates a pure pixel. Values between 0 and 1 reflect varying levels of mixed endmembers.
The RF model used for salt patch unmixing included a total of 4,302 reference points: 239 for salt, 1,195 for crops, and 956 points each for bare soil, water, and marshes. The RF model for marsh unmixing utilized a total of 27,437 reference points: 239 for salt patches, 5,198 for crops, 1,096 for bare soil, 20,131 for water, and 773 for marshes. For both models, the input data was divided into 80% for training purposes and 20% for testing.
Accuracy assessment:
Visual validation of the salt patch model's predictions show low Mean Squared Error (MSE) and Mean Absolute Error (MAE) values of 0.035 and 0.059, respectively (See table below). However, the model does not explain all the variability in the data, as evidenced by the moderate R-squared value of 0.50.
Parameter |
Salt (227 points) |
Marsh (761 points) |
MSE |
0.035 |
0.003 |
MAE |
0.059 |
0.013 |
RMSE |
0.186 |
0.054 |
R-square |
0.50 |
0.90 |
The relatively low R-squared value of 0.50 for the salt unmixing validation can be attributed to several potential factors, including inherent noise in the input data. It's important to consider that only 227 randomly selected points were used for visual validation of the salt unmixed layer. Several factors may contribute to salt patch misclassification: (1) spectral similarities between white built structures (such as crop tunnels or building roofs) and salt patches, (2) the model's possible overreliance on spectral information without considering spatial context, and (3) potentially unrepresentative endmember spectra for salt used in model training. Despite the limited number of outliers, such as a misclassified salt patch pixel, these can have a significant impact on overall model performance. However, the low values of Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Square Error (RMSE) suggest that the models generally performed well. It's worth noting that the models were trained using pure pixels and lack information on mixed pixel characteristics. All performance metrics for marsh unmixing validation showed good model performance.
More details on the methods can be found in Sarupria et al., 2025.
Data format:
The spatial resolution of all the derived datasets is 10 m. These georeferenced datasets are distributed in GEOTIFF format and are compatible with GIS and/or image processing software, such as R and ArcGIS. The GIS-ready raster files can be used directly in mapping and geospatial analysis.
Code:
Sample python code for performing spectral unmixing is available at: https://github.com/Manan-prog/Non-linear-Spectral-Unmixing. To run this code successfully, the user must provide training data for the desired land cover classes and an input raster image for spectral unmixing.
Datasets for download:
All the data layers cover the entire Delmarva Peninsula and have a spatial resolution of 10m.
-
Two layers for salt patch data:
-
Fractional abundance mean: SaltPatch_FractionalAbundance_Mean
-
Standard deviation of fractional abundance means: SaltPatch_FrAb_StandardDev
-
Two layers for marsh data:
-
Fractional abundance mean: Marsh_FractionalAbundance_Mean
-
Standard deviation of fractional abundance means: Marsh_FrAb_StandardDev
Notes
Files
Marsh_FrAb_StandardDev.tif
Additional details
Related works
- Is supplement to
- Dataset: https://doi.org/10.5281/zenodo.6685695 (Other)
Software
- Repository URL
- https://github.com/Manan-prog/Non-linear-Spectral-Unmixing
- Programming language
- Python
References
- Karra, K., Kontgis, C., Statman-Weil, Z., Mazzariello, J.C., Mathis, M., Brumby, S.P., 2021. Global land use / land cover with Sentinel 2 and deep learning, in: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. Presented at the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pp. 4704–4707. https://doi.org/10.1109/IGARSS47720.2021.9553499
- Mondal, P., Walter, M., Miller, J., Epanchin-Niell, R., Yawatkar, V., Nguyen, E., Gedan, K., Tully, K., 2022. High-resolution remotely sensed datasets for saltwater intrusion across the Delmarva Peninsula. https://doi.org/10.5281/zenodo.6685695
- Planet Team, 2017. Planet Application Program Interface: In Space for Life on Earth. San Francisco, CA. https://api.planet.com.
- Sarupria, M., Vargas, R., Walter, M., Miller, J., Mondal, P., 2025. Non-linear spectral unmixing for monitoring rapidly salinizing coastal landscapes. Remote Sensing of Environment 319, 114642. https://doi.org/10.1016/j.rse.2025.114642