HyBEAR 🐻
Authors/Creators
Description
Task
The primary task is the detection of bare soil areas in Earth observation data. This is an important step in Precision Agriculture (PA) applications related to quantifying soil parameters and quality. Accurately identifying bare soil allows researchers to isolate the spectral response originating directly from the soil surface, which enhances the reliability of subsequent analyses aimed at estimating crucial soil properties, such as moisture content, nutrient levels, organic matter, and texture. Bare soil identification is also essential for monitoring agricultural practices like tillage and assessing soil erosion risks.
While bare soil detection is commonly addressed at the pixel level (classifying pixels as soil or background), HyBEAR 🐻 aims to support the development of methods that identify entire fields with no vegetation (entire agricultural parcels).
Dataset
HyBEAR 🐻 is introduced as a novel large-scale collection of high-resolution hyperspectral aerial images. It is the largest and most heterogeneous dataset for bare soil detection released to date.
- Size and Scale: The dataset contains 1,954 hyperspectral image patches, totaling 108,064,591 pixels, corresponding to 43,225 hectares. The compressed dataset has a total size of 96 [GB].
- Resolution: The Ground Sampling Distance (GSD) is 2 [m].
- Acquisition: Data was acquired by QZ Solutions in Southern Poland on March 3, 2021. The imaging system used was the HySpex VS-725 (Norsk Elektro Optikk AS), flown on a Piper PA-31 Navajo aircraft.
- Spectral Information: 430 spectral bands are captured for each pixel, covering the range 414.1–2357.4 [nm]. This includes data from two sensors: SWIR-384 (288 bands, 930–2500 nm) and VNIR-1800 (186 bands, 400–1000 nm).
- Location and Heterogeneity: Data was collected for two areas: P1 (Lower Silesian Voivodeship, near Przeworno) and P2 (Opolskie Voivodeship, south of Głubczyce). These areas are geographically separated by more than 60 km, and images were acquired within an hour of each other, introducing variability in acquisition conditions and contributing to the dataset’s heterogeneity.
- Annotations (Ground Truth - GT): GT was meticulously prepared using a combination of automated and manual interpretation methods, verified by domain experts. Manual labeling leveraged RGB, NDVI, and especially CIR (Color Infrared) compositions to accurately delineate bare soil. The annotations are binary:
SOILclass is encoded as (1).NON-SOILclass is encoded as (0).Background/No Datapixels are encoded as (-9999).
- Data Structure: The data consists of square patches of fixed dimensions 250x250 pixels.
- Versions: The dataset is available in two versions:
FULL(the complete collection of 1,954 patches)- and
MINI(a random, stratified subset of 250 images, 50 from each fold).
Validation Procedure and Baseline Results
HyBEAR defines a standardized validation procedure, protocols, and quality metrics to ensure reproducibility and unbiased confrontation of emerging algorithms.
- Cross-Validation: A five-fold cross-validation protocol is defined using 5 spatially-disjoint folds (F0 to F4). Fold F0 represents map P1, and F1–F4 represent map P2. This spatial splitting is designed to evaluate the algorithms' ability to generalize to new, unknown areas and verify their robustness to variable acquisition conditions.
- Evaluation Metrics: Performance is assessed using standard classification and segmentation metrics: Accuracy (ACC), Sensitivity (SEN), Specificity (SPE), F-score (F1), Intersection over Union (IoU), Matthews Correlation Coefficient (MCC), and the Area Under the ROC Curve (AUC).
- Baseline Results: Baseline results were established using classic Machine Learning (ML) models operating on 430-size feature vectors (all spectral bands per pixel). For the FULL dataset, the Logistic Regression (LR) and Support Vector Machines (SVM) models achieved the highest performance. The average accuracy (ACC) for LR was 0.927 ± 0.016, and for SVM 0.926 ± 0.016.
Instructions and Availability
The HyBEAR dataset, along with code and trained baseline models, is released to ensure full reproducibility of bare soil detection research.
- Availability: HyBEAR is published on Zenodo.
- DOI: https://doi.org/10.5281/zenodo.17607897.
- Code: The accompanying package includes Python code (Jupyter Notebooks) for displaying data, reproducing benchmark results, and configuration files necessary to process the dataset.
- Models: Trained models for Logistic Regression and Support Vector Machines (10 files in total) are delivered under the suggested 5-fold cross-validation regime.
Citation
@article{2026HyBEAR, title = {{HyBEAR} : A Large-Scale Hyperspectral Benchmark for Bare Soil Detection}, author = {Wijata, Agata M. and Ruszczak, Bogdan and Niepala, Adriana and Gumiela, Micha\l{} and Smykala, Krzysztof and Long\'ep\'e, Nicolas and Nalepa, Jakub}, journal = {Earth System Science Data (ESSD)}, year = {2026}, % Inferred from the source file name volume = {TBD}, % To Be Determined pages = {TBD}, doi = {TBD}}
The dataset files
HyBEAR_MINI.zip- 250 images (50 images for each fold of 5 folds)- plus: all the metadata, Python code examples, baseline ML models, and configuration files.
HyBEAR_F0_FULL.zip- 310 images from fold 0HyBEAR_F1_FULL.zip- 339 images from fold 1HyBEAR_F2_FULL.zip- 344 images from fold 2HyBEAR_F3_FULL.zip- 350 images from fold 3HyBEAR_F4_FULL.zip- 361 images from fold 4
Files
HyBEAR_F0_FULL.zip
Files
(94.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:e671f80921512e3da3a0760bb1252381
|
15.1 GB | Preview Download |
|
md5:e8fc2db68a5793cb4fa993a01a504209
|
16.6 GB | Preview Download |
|
md5:b0518cc66f89d5a1b0d3be5e1b21b8a3
|
17.3 GB | Preview Download |
|
md5:2c3c8c5e82397efe52f32c96ffb305b5
|
16.2 GB | Preview Download |
|
md5:3f5b5e9ab42ce5c67542a727906b2a71
|
16.9 GB | Preview Download |
|
md5:d561ca0f41fd2fdc1a7d4102a3be0ac8
|
11.9 GB | Preview Download |
Additional details
Additional titles
- Subtitle (English)
- The collection of hig-resolution hyperspectral images with handcrafted annotations for bare soil detection
Related works
- Continues
- Conference paper: 10.1109/IGARSS53475.2024.10640702 (DOI)
- Conference paper: 10.1109/IGARSS53475.2024.10641442 (DOI)
References
- Detection of Bare Soil in Hyperspectral Images Using Quantum-Kernel Support Vector Machines: Agata M. Wijata, Artur Miroszewski, Bertrand Le Saux, Nicolas Longépé, Bogdan Ruszczak, Jakub Nalepa, In: IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, 2024, Institute of Electrical and Electronics Engineers, pp.817-822, ISBN 979-8-3503-6032-5, DOI:10.1109/IGARSS53475.2024.10641442 .
- Intuition-1: Toward In-Orbit Bare Soil Detection Using Spectral Vegetation Indices: Agata M. Wijata, Tomasz Lakota, Marcin Cwiek, Bogdan Ruszczak, Michal Gumiela, Lukasz Tulczyjew, Andrzej Bartoszek, Nicolas Longépé, Krzysztof Smykała, Jakub Nalepa, In: IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, 2024, Institute of Electrical and Electronics Engineers, pp.1708-1712, ISBN 979-8-3503-6032-5, DOI:10.1109/IGARSS53475.2024.10640702 .
- Bringing bare soil detection on-board Intuition-1 through exploiting data-level digital twins: Agata Wijata, Tomasz Lakota, Marcin Cwiek, Bogdan Ruszczak, Michal Gumiela, Łukasz Tulczyjew, Nicolas Longpepe, Jakub Nalepa, 74th International Astronautical Congress. IAC 2023, Baku, Azerbaijan, Congress proceedings, Proceedings of the International Astronautical Congress, 2023, International Astronautical Federation, pp.1-5, paper ID:78645