CloudSEN12 - a global dataset for semantic understanding of cloud and cloud shadow in Sentinel-2

Cesar Aybar; Luis Ysuhuaylas; Jhomira Loja; Karen Gonzales; Fernando Herrera; Lesly Bautista; Roy Yali; Angie Flores; Lissette Diaz; Nicole Cuenca; Wendy Espinoza; Fernando Prudencio; Joselyn Inga; Valeria Llactayo; David Montero; Martin Sudmanns; Dirk Tiede; Gonzalo Mateo-García; Luis Gómez-Chova

doi:10.5281/zenodo.7034410

Published August 31, 2022 | Version 1.0

Dataset Restricted

CloudSEN12 - a global dataset for semantic understanding of cloud and cloud shadow in Sentinel-2

1. ZGIS Salzburg University
2. National University of San Marcos
3. Research Group on Artificial Intelligence, Pontifical Catholic University of Peru
4. Sub-directorate of Atmospheric and Hydrospheric Sciences, Geophysical Institute of Peru
5. Remote Sensing Centre for Earth Systems Research (RSC4Earth)
6. Image Processing Laboratory, University of Valencia

Description

CloudSEN12 is a large dataset for cloud semantic understanding that consists of 9880 regions of interest (ROIs). Each ROI has five 5090x5090 meters image patches (IPs) collected on different dates; we manually choose the images to guarantee that each IP inside an ROI matches one of the following cloud cover groups:

- clear (0%)

- low-cloudy (1% - 25%)

- almost clear (25% - 45%)

- mid-cloudy (45% - 65%)

- cloudy (65% >)

An IP is the core unit in CloudSEN12. Each IP contains data from Sentinel-2 optical levels 1C and 2A, Sentinel-1 Synthetic Aperture Radar (SAR), digital elevation model, surface water occurrence, land cover classes, and cloud mask results from eight cutting-edge cloud detection algorithms. Besides, in order to support standard, weakly, and self-/semi-supervised learning procedures, cloudSEN12 includes three distinct forms of hand-crafted labelling data: high-quality, scribble, and no annotation. Consequently, each ROI is randomly assigned to a different annotation group:

2000 ROIs with pixel-level annotation, where the average annotation time is 150 minutes (high-quality group).
2000 ROIs with scribble level annotation, where the annotation time is 15 minutes (scribble group).
5880 ROIs with annotation only in the cloud-free (0\%) image (no annotation group).

For high-quality labels, we use the Intelligence foR Image Segmentation\cite{iris2019} (IRIS) active learning technology, a system that combines human photo-interpretation and machine learning. For scribble, ground truth pixels were drawn using IRIS but without ML support. Finally, the no annotation dataset is generated automatically, with manual annotation only in the clear image patch. The dataset is already available here: https://shorturl.at/cgjtz. Check out our website https://cloudsen12.github.io/ for examples of how to download the dataset via STAC.

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/7034410">Log in</a> to check if you have access.

Request access

If you would like to request access to these files, please fill out the form below.

CC BY-NC 4.0

You are currently not logged in. Do you have an account? Log in here

	All versions	This version
Views	2,364	1,885
Downloads	266	234
Data volume	271.3 MB	119.5 MB

CloudSEN12 - a global dataset for semantic understanding of cloud and cloud shadow in Sentinel-2

Authors/Creators

Description

Files

Restricted

Request access