Sentinel-2 Cloud Mask Catalogue

Francis, Alistair; Mrziglod, John; Sidiropoulos, Panagiotis; Muller, Jan-Peter

doi:10.5281/zenodo.4172871

Published November 1, 2020 | Version v1

Dataset Open

Sentinel-2 Cloud Mask Catalogue

1. University College London
2. World Food Programme
3. Hummingbird Technologies Ltd

Overview

This dataset comprises cloud masks for 513 1022-by-1022 pixel subscenes, at 20m resolution, sampled random from the 2018 Level-1C Sentinel-2 archive. The design of this dataset follows from some observations about cloud masking: (i) performance over an entire product is highly correlated, thus subscenes provide more value per-pixel than full scenes, (ii) current cloud masking datasets often focus on specific regions, or hand-select the products used, which introduces a bias into the dataset that is not representative of the real-world data, (iii) cloud mask performance appears to be highly correlated to surface type and cloud structure, so testing should include analysis of failure modes in relation to these variables.

The data was annotated semi-automatically, using the IRIS toolkit, which allows users to dynamically train a Random Forest (implemented using LightGBM), speeding up annotations by iteratively improving it's predictions, but preserving the annotator's ability to make final manual changes when needed. This hybrid approach allowed us to process many more masks than would have been possible manually, which we felt was vital in creating a large enough dataset to approximate the statistics of the whole Sentinel-2 archive.

In addition to the pixel-wise, 3 class (CLEAR, CLOUD, CLOUD_SHADOW) segmentation masks, we also provide users with binary
classification "tags" for each subscene that can be used in testing to determine performance in specific circumstances. These include:

SURFACE TYPE: 11 categories
CLOUD TYPE: 7 categories
CLOUD HEIGHT: low, high
CLOUD THICKNESS: thin, thick
CLOUD EXTENT: isolated, extended

Wherever practical, cloud shadows were also annotated, however this was sometimes not possible due to high-relief terrain, or large ambiguities. In total, 424 were marked with shadows (if present), and 89 have shadows that were not annotatable due to very ambiguous shadow boundaries, or terrain that cast significant shadows. If users wish to train an algorithm specifically for cloud shadow masks, we advise them to remove those 89 images for which shadow was not possible, however, bear in mind that this will systematically reduce the difficulty of the shadow class compared to real-world use, as these contain the most difficult shadow examples.

In addition to the 20m sampled subscenes and masks, we also provide users with shapefiles that define the boundary of the mask on the original Sentinel-2 scene. If users wish to retrieve the L1C bands at their original resolutions, they can use these to do so.

Please see the README for further details on the dataset structure and more.

Contributions & Acknowledgements

The data were collected, annotated, checked, formatted and published by Alistair Francis and John Mrziglod.

Support and advice was provided by Prof. Jan-Peter Muller and Dr. Panagiotis Sidiropoulos, for which we are grateful.

We would like to extend our thanks to Dr. Pierre-Philippe Mathieu and the rest of the team at ESA PhiLab, who provided the environment in which this project was conceived, and continued to give technical support throughout.

Finally, we thank the ESA Network of Resources for sponsoring this project by providing ICT resources.

Files

alt_masks.zip

Files (15.4 GB)

Name	Size	Download all
alt_masks.zip md5:0140a8b500d85cc8553ec8ba0a304bde	1.1 MB	Preview Download
classification_tags.csv md5:6911e5a8915daf9a98638eb21ba4afd3	77.9 kB	Preview Download
masks.zip md5:c955efe74c52d07f8e8bb02d5143e182	7.2 MB	Preview Download
README.pdf md5:48fb6afa0195a3736d4ce122d007be36	1.9 MB	Preview Download
shapefiles.zip md5:3ed79b74eb84431e68f764457b1f00ac	1.5 MB	Preview Download
subscenes.zip md5:0ad1de0ebeaff529782f456cad2e966f	15.2 GB	Preview Download
thumbnails.zip md5:ac054a7940e0680e768bdd824e0ee8af	145.9 MB	Preview Download

Additional details

UK Research and Innovation
Data mining for surface change discovery on the lunar surface from orbital images 1912521

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	12,748	12,652
Downloads	11,744	11,642
Data volume	159.3 TB	158.8 TB

Sentinel-2 Cloud Mask Catalogue

Creators

Description

Files

alt_masks.zip

Files (15.4 GB)

Additional details

Funding