CNES ALCD Open water masks

doi:10.5281/zenodo.4657020

Published October 29, 2019 | Version 1.2

Dataset Open

CNES ALCD Open water masks

1. CNES
2. UNISTRA - ICube
3. UNISTRA - ICubee
4. C-S Group

Producer:

NICOLAS Gael¹

Researcher:

RISSER-MAROIX Olivier

1. C-S

"CNES ALCD Open water masks" is a reference dataset for water masks based on Sentinel-2 (L1C) images.

This dataset generation has been funded by CNES under the SWOT-Downstream programme.

Generation Method

This dataset has been generated with the Active Learning for Cloud Detection (ALCD) software developed by CNES/Cesbio, that enables to generate any kind of reference mask using satellite images.

This procedure involves between 1 or 2 hours of work to generate each reference image : create reference points on the image (water, land, cloud, snow...) manually, do the training (based on Random Forest of OTB) and prediction with ALCD, add new reference points for the most problematic areas, repeat new training/predictions as many times as necessary (usually 3-5 iterations), and finally, do a manual correction of persistent errors.

Dataset format (raw masks)

The dataset contains 26 files (scenes) at 10m resolution for 110km x 110km size.

The content of pixels of the scene files (geotiff) follows the following naming rule
     0 = Non Water observation (as land, snow)
     1 = Open Water observation
     255 = no data (as clouds)

Format of file names:

T{tile}_{YYYMMDD}_{site}_{season}.tif

where : tile = reference Sentinel 2 tile (Cesbio post), YYYYMMDD = date of Sentinel 2 acquisition, site = name of the site, season = summer, winter

Example : T30TXQ_20180201_Bordeaux_winter.tif

T30UXU_20180708_Bretagne_summer.tif

Dataset format (inland masks)

This dataset has a version without coastal/ocean waters called "inland masks" aimed to characterize just inland waters.

The dataset has been processed with the coastal lines of GSSHG layers : https://www.soest.hawaii.edu/pwessel/gshhg/ in H level, and using an erosion of 400m towards the continent.
Thus, any pixel closer to the GSSHG coast line than 400m and beyond will be considered as "no data"(value=255).

The format of the pixels content and file naming follow the same rules as in the "raw masks" version.

Generation process (White book)

A short description of an efficient usage of the Active Learning for Cloud Detection (ALCD) for surface water detection and extraction.

Input data
Sentinel‐2 bands:
‐ B2 : Blue
‐ B3 : Green
‐ B4 : Red
‐ B8 : NIR
‐ B11 : SWIR1
‐ B12 : SWIR2
‐ MNDWI
‐ Slope: derived from SRTM.

Image visual analysis
It is recommended to go through the image visually to identify the different types of water bodies and land covers present on the scene.

Choosing samples
So far, only the « water_1pixel.shp » and « land_1pixel.shp » have been used to host the samples. More precise, they allow to identify pixels which will be used during the algorithm training. Even though the use of these layers requires a larger number of sampling points, they make the selection more precise and give a higher control over the input.

During the first iteration:
‐ Chose around 15 sampling points representing the land cover diversity present in the scene (different types of water, turbidity etc.).
‐ Prefer “pure” pixels.

Further recommendations
‐ Try not to exceed 7 iterations
- Multiplying iterations means a higher number of samples, which increases the chances of introducing false samples and then compromising the classification quality.
- The aim is to find the best compromise between omission and commission.
- It’s not about having a perfect classification but rather finding the classification that will minimize the post‐processing time.
‐ Adding new samples may degrade the classification. Using a dark shadow as “not water”, even though the pixel is spectrally very close to a “water” pixel, will disturb the model and lea to a classification of lesser quality than the previous iteration. If the result of an iteration is significantly worse than the previous iteration, it might be wiser to start again from this previous iteration rather than continuing with the problematic added samples.
‐ As said previously, the user should avoid to add any type of shadow in the « not water » class. Using such pixels will reduce the model efficiency at extracting water. During production, it’s easier to stick to one type of error, usually commissions, and making sure that all water bodies are correctly classified. Falsely detected shadows can be corrected easily during the post‐processing step.
‐ More generally, it’s easier to deal with commissions than omission.

Files