Dataset Open Access

# Sentinel-2 reference cloud masks generated by an active learning method

Louis Baetens; Olivier Hagolle

### DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<identifier identifierType="DOI">10.5281/zenodo.1460961</identifier>
<creators>
<creator>
<creatorName>Louis Baetens</creatorName>
<affiliation>CESBIO/CNES</affiliation>
</creator>
<creator>
<creatorName>Olivier Hagolle</creatorName>
<nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-2358-0493</nameIdentifier>
<affiliation>CESBIO/CNES</affiliation>
</creator>
</creators>
<titles>
<title>Sentinel-2 reference cloud masks generated by an active learning method</title>
</titles>
<publisher>Zenodo</publisher>
<publicationYear>2018</publicationYear>
<subjects>
<subject>Sentinel-2</subject>
<subject>Validation</subject>
</subjects>
<dates>
<date dateType="Issued">2018-10-12</date>
</dates>
<language>en</language>
<resourceType resourceTypeGeneral="Dataset"/>
<alternateIdentifiers>
<alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/1460961</alternateIdentifier>
</alternateIdentifiers>
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.1460960</relatedIdentifier>
<relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/remote-sensing</relatedIdentifier>
</relatedIdentifiers>
<rightsList>
<rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
</rightsList>
<descriptions>
<description descriptionType="Abstract">&lt;p&gt;&amp;nbsp;&lt;strong&gt;Reference classifications generated with Active Learning for Cloud Detection (ALCD)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This data set provides a reference cloud mask data set for 38 Sentinel-2 scenes. These reference masks have been created with the ALCD tool, developed by Louis Baetens, under the direction of Olivier Hagolle at CESBIO/CNES[1]. They were created to validate the cloud masks generated by the MAJA software [2].&lt;/p&gt;

&lt;p&gt;- The Reference_dataset directory contains 31 scenes selected in 2017 or 2018.&lt;br&gt;
- The Hollstein directory contains 7 scenes that were used to validate the ALCD tool by comparison to manually generated reference images kindlyprovided by Hollstein et al[3]&lt;br&gt;
One of these scenes is present in both directories. For the validation of MAJA, the &amp;quot;Hollstein&amp;quot; scenes were not used because of their acquisition at a time period when Sentinel-2 was not yet operational, with a degraded repetitivity of observations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;# Description of the data structure&lt;/strong&gt;&lt;br&gt;
The name of each scene directory is the name of the corresponding Sentinel-2 L1C product.&lt;br&gt;
In the scene directory, three sub-directories can be found.&lt;br&gt;
- Classification&lt;br&gt;
- Samples&lt;br&gt;
- Statistics&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;# Description of the files&lt;/strong&gt;&lt;br&gt;
- Classification/classification_map.tif --- the main product, which is the classified scene. 7 classes are available. Each one is represented with a different integer.&lt;br&gt;
0: no_data.&lt;br&gt;
1: not used.&lt;br&gt;
2: low clouds.&lt;br&gt;
3: high clouds.&lt;br&gt;
5: land.&lt;br&gt;
6: water.&lt;br&gt;
7: snow.&lt;/p&gt;

&lt;p&gt;- Classification/confidence_enhanced.tif --- enhanced confidence map of the classification. The values are between 0 and 255 (coded on 1 bit).&lt;br&gt;
The original confidence map is, for each pixel, the proportion of votes for the majority class as the classification map has been created via a Random Forest algorithm.&lt;br&gt;
A median filter has been applied to this confidence map. Finally, the value was saved on 1 bit, leading to the value being between 0 and 255.&lt;/p&gt;

&lt;p&gt;- Classification/contours.png --- the contours of the classes from the classification map, overlayed on the scene. The color code depends on each class.&lt;br&gt;
Green: low and high clouds. Yellow: cloud shadows. Blue: water. Purple: snow.&lt;/p&gt;

&lt;p&gt;- Classification/used_parameters.json --- the parameters that were used to classify the scene. It includes the tile code, the cloudy and clear dates, along with their product reference.&lt;/p&gt;

&lt;p&gt;- Samples/ --- this directory contains all the shapefiles, one per class.&lt;/p&gt;

&lt;p&gt;- Statistics/k_fold_summary.json --- results of the 10-fold cross-validation on the scene.&lt;br&gt;
5 metrics are computed, in the order given in the &amp;quot;metrics_names&amp;quot;. &amp;quot;all_metrics&amp;quot; is a list of the 10 folds, with the 5 metrics in the correct order for each fold.&lt;br&gt;
&amp;quot;means&amp;quot; and &amp;quot;stds&amp;quot; are the means and standard deviations of the 10 folds.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;# References&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] Baetens, L.; Desjardins, C.; Hagolle, O. Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. &lt;em&gt;Remote Sens.&lt;/em&gt; &lt;strong&gt;2019&lt;/strong&gt;, &lt;em&gt;11&lt;/em&gt;, 433.&lt;/p&gt;

&lt;p&gt;[2] A multi-temporal method for cloud detection, applied to FORMOSAT-2, VEN&amp;micro;S, LANDSAT and SENTINEL-2 images, O Hagolle, M Huc, D. Villa Pascual, G Dedieu, Remote Sensing of Environment 114 (8), 1747-1755, 2010&lt;/p&gt;

&lt;p&gt;[3] Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M. Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666&lt;/p&gt;</description>
</descriptions>
</resource>

2,085
716
views
All versions This version
Views 2,0852,084
Data volume 168.0 GB168.0 GB
Unique views 1,8531,852