Dataset Open Access

Sentinel-2 reference cloud masks generated by an active learning method

Louis Baetens; Olivier Hagolle

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Louis Baetens</dc:creator>
  <dc:creator>Olivier Hagolle</dc:creator>
  <dc:description> Reference classifications generated with Active Learning for Cloud Detection (ALCD)

This data set provides a reference cloud mask data set for 38 Sentinel-2 scenes. These reference masks have been created with the ALCD tool, developed by Louis Baetens, under the direction of Olivier Hagolle at CESBIO/CNES[1]. They were created to validate the cloud masks generated by the MAJA software [2].

- The `Reference_dataset` directory contains 31 scenes selected in 2017 or 2018.
- The `Hollstein` directory contains 7 scenes that were used to validate the ALCD tool by comparison to manually generated reference images kindlyprovided by Hollstein et al[3]
One of these scenes is present in both directories. For the validation of MAJA, the "Hollstein" scenes were not used because of their acquisition at a time period when Sentinel-2 was not yet operational, with a degraded repetitivity of observations.

# Description of the data structure
The name of each scene directory is the name of the corresponding Sentinel-2 L1C product.
In the scene directory, three sub-directories can be found.
- `Classification`
- `Samples`
- `Statistics`

# Description of the files
- `Classification/classification_map.tif` --- the main product, which is the classified scene. 7 classes are available. Each one is represented with a different integer.
0: no_data.
1: not used.
2: low clouds.
3: high clouds.
4: clouds shadows.
5: land.
6: water.
7: snow.

- `Classification/confidence_enhanced.tif` --- enhanced confidence map of the classification. The values are between 0 and 255 (coded on 1 bit).
The original confidence map is, for each pixel, the proportion of votes for the majority class as the classification map has been created via a Random Forest algorithm.
A median filter has been applied to this confidence map. Finally, the value was saved on 1 bit, leading to the value being between 0 and 255.

- `Classification/contours.png` --- the contours of the classes from the classification map, overlayed on the scene. The color code depends on each class.
Green: low and high clouds. Yellow: cloud shadows. Blue: water. Purple: snow.

- `Classification/used_parameters.json` --- the parameters that were used to classify the scene. It includes the tile code, the cloudy and clear dates, along with their product reference.

- `Samples/` --- this directory contains all the shapefiles, one per class.

- `Statistics/k_fold_summary.json` --- results of the 10-fold cross-validation on the scene.
5 metrics are computed, in the order given in the "metrics_names". "all_metrics" is a list of the 10 folds, with the 5 metrics in the correct order for each fold.
"means" and "stds" are the means and standard deviations of the 10 folds.

# References

[1] Baetens, L.; Desjardins, C.; Hagolle, O. Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. Remote Sens. 2019, 11, 433.

[2] A multi-temporal method for cloud detection, applied to FORMOSAT-2, VENµS, LANDSAT and SENTINEL-2 images, O Hagolle, M Huc, D. Villa Pascual, G Dedieu, Remote Sensing of Environment 114 (8), 1747-1755, 2010

[3] Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M. Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666</dc:description>
  <dc:subject>Cloud mask</dc:subject>
  <dc:title>Sentinel-2 reference cloud masks generated by an active learning method</dc:title>
All versions This version
Views 3,8113,810
Downloads 1,0501,050
Data volume 246.3 GB246.3 GB
Unique views 3,3903,389
Unique downloads 720720


Cite as