Dataset Open Access

Sentinel-2 reference cloud masks generated by an active learning method

Louis Baetens; Olivier Hagolle

DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="" xmlns="" xsi:schemaLocation="">
  <identifier identifierType="DOI">10.5281/zenodo.1460961</identifier>
      <creatorName>Louis Baetens</creatorName>
      <creatorName>Olivier Hagolle</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="">0000-0003-2358-0493</nameIdentifier>
    <title>Sentinel-2 reference cloud masks generated by an active learning method</title>
    <subject>Cloud mask</subject>
    <date dateType="Issued">2018-10-12</date>
  <resourceType resourceTypeGeneral="Dataset"/>
    <alternateIdentifier alternateIdentifierType="url"></alternateIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.1460960</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf"></relatedIdentifier>
    <rights rightsURI="">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
    <description descriptionType="Abstract">&lt;p&gt;&amp;nbsp;&lt;strong&gt;Reference classifications generated with Active Learning for Cloud Detection (ALCD)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This data set provides a reference cloud mask data set for 38 Sentinel-2 scenes. These reference masks have been created with the ALCD tool, developed by Louis Baetens, under the direction of Olivier Hagolle at CESBIO/CNES[1]. They were created to validate the cloud masks generated by the MAJA software [2].&lt;/p&gt;

&lt;p&gt;- The `Reference_dataset` directory contains 31 scenes selected in 2017 or 2018.&lt;br&gt;
- The `Hollstein` directory contains 7 scenes that were used to validate the ALCD tool by comparison to manually generated reference images kindlyprovided by Hollstein et al[3]&lt;br&gt;
One of these scenes is present in both directories. For the validation of MAJA, the &amp;quot;Hollstein&amp;quot; scenes were not used because of their acquisition at a time period when Sentinel-2 was not yet operational, with a degraded repetitivity of observations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;# Description of the data structure&lt;/strong&gt;&lt;br&gt;
The name of each scene directory is the name of the corresponding Sentinel-2 L1C product.&lt;br&gt;
In the scene directory, three sub-directories can be found.&lt;br&gt;
- `Classification`&lt;br&gt;
- `Samples`&lt;br&gt;
- `Statistics`&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;# Description of the files&lt;/strong&gt;&lt;br&gt;
- `Classification/classification_map.tif` --- the main product, which is the classified scene. 7 classes are available. Each one is represented with a different integer.&lt;br&gt;
0: no_data.&lt;br&gt;
1: not used.&lt;br&gt;
2: low clouds.&lt;br&gt;
3: high clouds.&lt;br&gt;
4: clouds shadows.&lt;br&gt;
5: land.&lt;br&gt;
6: water.&lt;br&gt;
7: snow.&lt;/p&gt;

&lt;p&gt;- `Classification/confidence_enhanced.tif` --- enhanced confidence map of the classification. The values are between 0 and 255 (coded on 1 bit).&lt;br&gt;
The original confidence map is, for each pixel, the proportion of votes for the majority class as the classification map has been created via a Random Forest algorithm.&lt;br&gt;
A median filter has been applied to this confidence map. Finally, the value was saved on 1 bit, leading to the value being between 0 and 255.&lt;/p&gt;

&lt;p&gt;- `Classification/contours.png` --- the contours of the classes from the classification map, overlayed on the scene. The color code depends on each class.&lt;br&gt;
Green: low and high clouds. Yellow: cloud shadows. Blue: water. Purple: snow.&lt;/p&gt;

&lt;p&gt;- `Classification/used_parameters.json` --- the parameters that were used to classify the scene. It includes the tile code, the cloudy and clear dates, along with their product reference.&lt;/p&gt;

&lt;p&gt;- `Samples/` --- this directory contains all the shapefiles, one per class.&lt;/p&gt;

&lt;p&gt;- `Statistics/k_fold_summary.json` --- results of the 10-fold cross-validation on the scene.&lt;br&gt;
5 metrics are computed, in the order given in the &amp;quot;metrics_names&amp;quot;. &amp;quot;all_metrics&amp;quot; is a list of the 10 folds, with the 5 metrics in the correct order for each fold.&lt;br&gt;
&amp;quot;means&amp;quot; and &amp;quot;stds&amp;quot; are the means and standard deviations of the 10 folds.&lt;/p&gt;

&lt;strong&gt;# References&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] Baetens, L.; Desjardins, C.; Hagolle, O. Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. &lt;em&gt;Remote Sens.&lt;/em&gt; &lt;strong&gt;2019&lt;/strong&gt;, &lt;em&gt;11&lt;/em&gt;, 433.&lt;/p&gt;

&lt;p&gt;[2] A multi-temporal method for cloud detection, applied to FORMOSAT-2, VEN&amp;micro;S, LANDSAT and SENTINEL-2 images, O Hagolle, M Huc, D. Villa Pascual, G Dedieu, Remote Sensing of Environment 114 (8), 1747-1755, 2010&lt;/p&gt;

&lt;p&gt;[3] Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M. Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666&lt;/p&gt;</description>
All versions This version
Views 3,8013,800
Downloads 1,0471,047
Data volume 245.6 GB245.6 GB
Unique views 3,3803,379
Unique downloads 717717


Cite as