Dataset Open Access

Sentinel-2 reference cloud masks generated by an active learning method

Louis Baetens; Olivier Hagolle

 Reference classifications generated with Active Learning for Cloud Detection (ALCD)

This data set provides a reference cloud mask data set for 38 Sentinel-2 scenes. These reference masks have been created with the ALCD tool, developed by Louis Baetens, under the direction of Olivier Hagolle at CESBIO/CNES[1]. They were created to validate the cloud masks generated by the MAJA software [2].

- The `Reference_dataset` directory contains 31 scenes selected in 2017 or 2018.
- The `Hollstein` directory contains 7 scenes that were used to validate the ALCD tool by comparison to manually generated reference images kindlyprovided by Hollstein et al[3]
One of these scenes is present in both directories. For the validation of MAJA, the "Hollstein" scenes were not used because of their acquisition at a time period when Sentinel-2 was not yet operational, with a degraded repetitivity of observations.

# Description of the data structure
The name of each scene directory is the name of the corresponding Sentinel-2 L1C product.
In the scene directory, three sub-directories can be found.
- `Classification`
- `Samples`
- `Statistics`

# Description of the files
- `Classification/classification_map.tif` --- the main product, which is the classified scene. 7 classes are available. Each one is represented with a different integer.
0: no_data.
1: not used.
2: low clouds.
3: high clouds.
4: clouds shadows.
5: land.
6: water.
7: snow.

- `Classification/confidence_enhanced.tif` --- enhanced confidence map of the classification. The values are between 0 and 255 (coded on 1 bit).
The original confidence map is, for each pixel, the proportion of votes for the majority class as the classification map has been created via a Random Forest algorithm.
A median filter has been applied to this confidence map. Finally, the value was saved on 1 bit, leading to the value being between 0 and 255.

- `Classification/contours.png` --- the contours of the classes from the classification map, overlayed on the scene. The color code depends on each class.
Green: low and high clouds. Yellow: cloud shadows. Blue: water. Purple: snow.

- `Classification/used_parameters.json` --- the parameters that were used to classify the scene. It includes the tile code, the cloudy and clear dates, along with their product reference.

- `Samples/` --- this directory contains all the shapefiles, one per class.

- `Statistics/k_fold_summary.json` --- results of the 10-fold cross-validation on the scene.
5 metrics are computed, in the order given in the "metrics_names". "all_metrics" is a list of the 10 folds, with the 5 metrics in the correct order for each fold.
"means" and "stds" are the means and standard deviations of the 10 folds.


# References

[1] Baetens, L.; Desjardins, C.; Hagolle, O. Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. Remote Sens. 2019, 11, 433.

[2] A multi-temporal method for cloud detection, applied to FORMOSAT-2, VENµS, LANDSAT and SENTINEL-2 images, O Hagolle, M Huc, D. Villa Pascual, G Dedieu, Remote Sensing of Environment 114 (8), 1747-1755, 2010

[3] Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M. Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666

Files (234.6 MB)
Name Size
SENTINEL_2_reference_cloud_masks_Baetens_Hagolle.tgz
md5:ee035e0d22a441086cfaabcface3cf24
234.6 MB Download
1,994
697
views
downloads
All versions This version
Views 1,9941,993
Downloads 697697
Data volume 163.5 GB163.5 GB
Unique views 1,7751,774
Unique downloads 411411

Share

Cite as