Dataset Open Access

Sentinel-2 reference cloud masks generated by an active learning method

Louis Baetens; Olivier Hagolle


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Sentinel-2</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Cloud mask</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Validation</subfield>
  </datafield>
  <controlfield tag="005">20200124192513.0</controlfield>
  <controlfield tag="001">1460961</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">CESBIO/CNES</subfield>
    <subfield code="0">(orcid)0000-0003-2358-0493</subfield>
    <subfield code="a">Olivier Hagolle</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">234569318</subfield>
    <subfield code="z">md5:ee035e0d22a441086cfaabcface3cf24</subfield>
    <subfield code="u">https://zenodo.org/record/1460961/files/SENTINEL_2_reference_cloud_masks_Baetens_Hagolle.tgz</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2018-10-12</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="p">user-remote-sensing</subfield>
    <subfield code="o">oai:zenodo.org:1460961</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">CESBIO/CNES</subfield>
    <subfield code="a">Louis Baetens</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Sentinel-2 reference cloud masks generated by an active learning method</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-remote-sensing</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;&amp;nbsp;&lt;strong&gt;Reference classifications generated with Active Learning for Cloud Detection (ALCD)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This data set provides a reference cloud mask data set for 38 Sentinel-2 scenes. These reference masks have been created with the ALCD tool, developed by Louis Baetens, under the direction of Olivier Hagolle at CESBIO/CNES[1]. They were created to validate the cloud masks generated by the MAJA software [2].&lt;/p&gt;

&lt;p&gt;- The `Reference_dataset` directory contains 31 scenes selected in 2017 or 2018.&lt;br&gt;
- The `Hollstein` directory contains 7 scenes that were used to validate the ALCD tool by comparison to manually generated reference images kindlyprovided by Hollstein et al[3]&lt;br&gt;
One of these scenes is present in both directories. For the validation of MAJA, the &amp;quot;Hollstein&amp;quot; scenes were not used because of their acquisition at a time period when Sentinel-2 was not yet operational, with a degraded repetitivity of observations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;# Description of the data structure&lt;/strong&gt;&lt;br&gt;
The name of each scene directory is the name of the corresponding Sentinel-2 L1C product.&lt;br&gt;
In the scene directory, three sub-directories can be found.&lt;br&gt;
- `Classification`&lt;br&gt;
- `Samples`&lt;br&gt;
- `Statistics`&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;# Description of the files&lt;/strong&gt;&lt;br&gt;
- `Classification/classification_map.tif` --- the main product, which is the classified scene. 7 classes are available. Each one is represented with a different integer.&lt;br&gt;
0: no_data.&lt;br&gt;
1: not used.&lt;br&gt;
2: low clouds.&lt;br&gt;
3: high clouds.&lt;br&gt;
4: clouds shadows.&lt;br&gt;
5: land.&lt;br&gt;
6: water.&lt;br&gt;
7: snow.&lt;/p&gt;

&lt;p&gt;- `Classification/confidence_enhanced.tif` --- enhanced confidence map of the classification. The values are between 0 and 255 (coded on 1 bit).&lt;br&gt;
The original confidence map is, for each pixel, the proportion of votes for the majority class as the classification map has been created via a Random Forest algorithm.&lt;br&gt;
A median filter has been applied to this confidence map. Finally, the value was saved on 1 bit, leading to the value being between 0 and 255.&lt;/p&gt;

&lt;p&gt;- `Classification/contours.png` --- the contours of the classes from the classification map, overlayed on the scene. The color code depends on each class.&lt;br&gt;
Green: low and high clouds. Yellow: cloud shadows. Blue: water. Purple: snow.&lt;/p&gt;

&lt;p&gt;- `Classification/used_parameters.json` --- the parameters that were used to classify the scene. It includes the tile code, the cloudy and clear dates, along with their product reference.&lt;/p&gt;

&lt;p&gt;- `Samples/` --- this directory contains all the shapefiles, one per class.&lt;/p&gt;

&lt;p&gt;- `Statistics/k_fold_summary.json` --- results of the 10-fold cross-validation on the scene.&lt;br&gt;
5 metrics are computed, in the order given in the &amp;quot;metrics_names&amp;quot;. &amp;quot;all_metrics&amp;quot; is a list of the 10 folds, with the 5 metrics in the correct order for each fold.&lt;br&gt;
&amp;quot;means&amp;quot; and &amp;quot;stds&amp;quot; are the means and standard deviations of the 10 folds.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;# References&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;[1] Baetens, L.; Desjardins, C.; Hagolle, O. Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. &lt;em&gt;Remote Sens.&lt;/em&gt; &lt;strong&gt;2019&lt;/strong&gt;, &lt;em&gt;11&lt;/em&gt;, 433.&lt;/p&gt;

&lt;p&gt;[2] A multi-temporal method for cloud detection, applied to FORMOSAT-2, VEN&amp;micro;S, LANDSAT and SENTINEL-2 images, O Hagolle, M Huc, D. Villa Pascual, G Dedieu, Remote Sensing of Environment 114 (8), 1747-1755, 2010&lt;/p&gt;

&lt;p&gt;[3] Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M. Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.1460960</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.1460961</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
2,044
712
views
downloads
All versions This version
Views 2,0442,043
Downloads 712712
Data volume 167.0 GB167.0 GB
Unique views 1,8151,814
Unique downloads 426426

Share

Cite as