Published September 5, 2025 | Version v2
Dataset Restricted

HDBSCAN Clusters Maize Crop Stress - Adige River-Fed Downstream Irrigated Plain, 2022-2023

  • 1. ROR icon CMCC Foundation - Euro-Mediterranean Center on Climate Change
  • 2. National Biodiversity Future Center
  • 3. ROR icon Ca' Foscari University of Venice

Description

Demonstration Case Name

Multi-Hazards in the Downstream Area of the Adige River Basin.

Dataset Name/Title

HDBSCAN Clusters Maize Crop Stress - Adige River-Fed Downstream Irrigated Plain, 2022-2023

Dataset Description

The dataset contains HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) clusters based on a synthetic stress indicator obtained through a PCA (Principal Component Analysis) on NDVI (Normalized Difference Vegetation Index) and NDMI (Normalized Difference Moisture Index) Sentinel-2 based indices. The dataset contains identified vegetation stress clusters on the following dates: 02/07/2022 17/07/2022, 22/07/2022, 01/08/2022, 06/08/2022, 11/08/2022, 16/08/2022, 07/07/2023, 17/07/2023, 27/07/2023, 06/08/2023, 11/08/2023, 16/08/2023, covering portion of 2022 and 2023 maize cropping seasons (July, August) in the selected area. All observations are located in a plain area that relies on the Adige River for cropland irrigation. For each row, the dataset contains the following columns: 

  • date: date of the clustering, in dd/mm/YYYY format

  • cluster: cluster number, -1 identifies the noise (unclassified) cluster

  • mean_ndvi: mean NDVI across all maize fields in the cluster

  • mean_ndmi: mean NDMI across all maize fields in the cluster

  • prevalent_hsg: most frequent Hydrologic Soil Group 

  • SPEI90: Standardized Precipitation Evapotranspiration Index (SPEI) calculated on a 90 days time frame (see Key Methodologies for details)

  • SPEI180: Standardized Precipitation Evapotranspiration Index (SPEI) calculated on a 180 days time frame (see Key Methodologies for details)

  • SPEI365: Standardized Precipitation Evapotranspiration Index (SPEI) calculated on a 365 days time frame (see Key Methodologies for details)

  • temp_anom_X: temperature anomaly (° C) X days before the considered date (see Key Methodologies for details)

  • SWI005_X: Soil Water Index (%) with T-value = 5 (indicating the model water infiltration time) X days before the considered date

  • irr_channel_distance_m: average distance (m) of maize fields in the cluster from the closest irrigation channel

  • geometry: polygon geometry of the cluster in EPSG:32632

Key Methodologies

The 1981-2023 period was used as reference for the computation of the SPEI index, using the Hargreaves equation (Hargreaves, 1994) to estimate the daily potential evapotranspiration, with the extra-terrestrial radiation evaluated from the latitude and the day of the year. The gamma distribution was used for standardizing the water balance time series.

Temperature anomalies were defined for each calendar day as the difference between the mean daily temperature and the 50th percentile of the 1991–2020 calendar day. Percentiles were computed using a centred 15‑day moving window to smooth short-term variability. 

Maize crop field-level NDVI and NDMI values were calculated by averaging pixel-level NDVI/NDMI derived from Sentinel-2 L2A observations within the irrigated districts fed by the Adige River waters (data courtesy of ANBI Veneto). The observations result from a multi-step data cleaning process. Images were cloud masked using the Sentinel-2 Scene Classification Layer (SCL). Crop field observations were excluded if more than 50% of their pixels were unavailable e.g., due to cloud cover; remaining observations were filtered using a Bare Soil Index (BSI) threshold of 0.08 (Mzid et al., 2021), to exclude non vegetated (bare soil) pixels. Finally, fields associated with same year alternating crops were disentangled by analyzing temporal BSI profiles to detect two green-up periods separated by at least one observation identified as bare soil (BSI > 0.08).

A synthetic stress indicator was defined as the first principal component (PC1) derived from a one-component PCA applied to NDVI and NDMI values. HDBSCAN algorithm was used on the synthetic stress indicator to identify clusters following hyperparameters optimization via grid search.

 

Temporal Domain

2022-2023

Spatial Domain

The dataset is provided over the [10.7, 45.0, 12.3, 45.6] spatial domain (min longitude, min latitude, max longitude, max latitude in WGS84, EPSG:4326).

Key Variables/Indicators

Date, cluster, mean_ndvi, mean_ndmi, n_fields, prevalent_hsg, SPEI90, SPEI180, SPEI365, temp_anom_1, temp_anom_2, temp_anom_3, temp_anom_4, temp_anom_5, temp_anom_6, SWI005_1, SWI005_2, SWI005_3, irr_channel_distance_m, geometry

See Dataset Description section for details on the variables.

Data Format

csv

Source Data

  • NDVI, NDMI and BSI were obtained from ESA Copernicus Sentinel-2 L2A

  • Crop field level information on farmer declared crop were obtained from the Veneto Agency for Payments in Agriculture (AVEPA, Agenzia Veneta per i Pagamenti, https://www.avepa.it/web/avepa)

  • Hydrologic Soil Group at 1:250000 scale data were obtained from the Agenzia Regionale per la prevenzione e protezione ambientale Veneto (ARPAV, Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto - Regional Agency for Environmental Prevention and Protection of Veneto, https://www.arpa.veneto.it/), and follow the classification scheme outlined in USDA National Engineering Handbook (USDA-NRCS, 2009)

  • Irrigated districts were provided by courtesy of ANBI Veneto (Associazione Nazionale Bonifiche Irrigazioni - National Association of Land Reclamation and Irrigation)

  • Temperature and precipitation data for SPEI and temperature anomaly were obtained from the SCIA-ISPRA dataset (ISPRA, https://scia.isprambiente.it/dati-e-indicatori/)

Accessibility

https://doi.org/10.5281/zenodo.15301314

Stakeholder Relevance

The dataset provides valuable post-disaster information on crop vegetation dynamics during hot and dry events. Impacted maize cultivated areas (clusters) across multiple dates during 2022 and 2023 cropping seasons are reported. The inclusion of two years, one characterized by severe drought (2022) and one by non-drought conditions (2023) provides insights into how the considered cropland area was affected under different abiotic stressor conditions. Additional information is provided by the inclusion of meteorological (SPEI, temperature anomaly), soil and territorial characteristic layers, thus enabling a more detailed analysis of the underlying impact drivers to support the understanding of the root causes of impacts. Moreover, the dataset provides a spatially and temporally explicit representation of maize stress clusters, highlighting areas that were more impacted during the course of the 2022 maize cropping season thus providing a valuable overview of the most vulnerable areas. This approach represents a promising tool to aid adaptation and management strategies, particularly related to water management (e.g., irrigation) and crop suitability under different meteorological and physical conditions.

Limitations/Assumptions

In cases where a field was associated with more than one crop, a disaggregation technique was applied based on assumptions about crop growth phases. Additionally, NDVI and NDMI are not exclusively influenced by plant responses to drought, extreme heat and their combinations, as other factors (e.g., pests, soil/crop management) can affect plant vigour. However, the spatial extension and intensity of the 2022 drought event, the comparison with 2023 and the large number of fields considered potentially limit these uncertainties.

Additional Outputs/Information

The dataset access is currently restricted due to pending related publication.

Contact Information

Furlanetto, Jacopo (CMCC Foundation - Euro-Mediterranean Center on Climate Change, National Biodiversity Future Center) - Data curator

Albergo, Edoardo (CMCC Foundation - Euro-Mediterranean Center on Climate Change, National Biodiversity Future Center) - Data curator

Masina, Marinella (CMCC Foundation - Euro-Mediterranean Center on Climate Change)- Data curator

Maraschini, Margherita (CMCC Foundation - Euro-Mediterranean Center on Climate Change) - Data curator

Ferrario, Davide Mauro (CMCC Foundation - Euro-Mediterranean Center on Climate Change) - Data curator

Torresan, Silvia (CMCC Foundation - Euro-Mediterranean Center on Climate Change, National Biodiversity Future Center) - Data manager

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Funding

European Space Research Institute
ESA EO4MULTIHAZARDS (Earth Observation for High-Impact Multi-Hazards Science), funded by the European Space Agency and launched as part of the joint ESA-European Commission Earth System Science Initiative

References

  • Mzid, N., Pignatti, S., Huang, W., & Casa, R. (2021). An analysis of bare soil occurrence in arable croplands for remote sensing topsoil applications. Remote Sensing, 13(3), 474.
  • United States Department of Agriculture (USDA), Natural Resources Conservation Service. (2009). National Engineering Handbook
  • Hargreaves, G. H. (1994). Defining and using reference evapotranspiration. Journal of Irrigation and Drainage Engineering, 120, 1132-1139