Published October 29, 2019 | Version 1.2
Dataset Open

CNES ALCD Open water masks

  • 1. CNES
  • 2. UNISTRA - ICube
  • 3. UNISTRA - ICubee
  • 4. C-S Group

Contributors

Producer:

  • 1. C-S

Description

"CNES ALCD Open water masks" is a reference dataset for water masks based on Sentinel-2 (L1C) images.

This dataset generation has been funded by CNES under the SWOT-Downstream programme.

Generation Method

This dataset has been generated with the Active Learning for Cloud Detection (ALCD) software developed by CNES/Cesbio, that enables to generate any kind of reference mask using satellite images.

This procedure involves between 1 or 2 hours of work to generate each reference image : create reference points on the image (water, land, cloud, snow...) manually, do the training (based on Random Forest of OTB) and prediction with ALCD, add new reference points for the most problematic areas, repeat new training/predictions as many times as necessary (usually 3-5 iterations), and finally, do a manual correction of persistent errors.

Dataset format (raw masks)

The dataset contains 26 files (scenes) at 10m resolution for 110km x 110km size.

The content of pixels of the scene files (geotiff) follows the following naming rule
     0 = Non Water observation (as land, snow)
     1 = Open Water observation
     255 = no data (as clouds)

Format of file names:

       T{tile}_{YYYMMDD}_{site}_{season}.tif

       where : tile = reference Sentinel 2 tile (Cesbio post), YYYYMMDD = date of Sentinel 2 acquisition, site = name of the site, season = summer, winter

Example :  T30TXQ_20180201_Bordeaux_winter.tif

             T30UXU_20180708_Bretagne_summer.tif


Dataset format (inland masks)

This dataset has a version without coastal/ocean waters called "inland masks" aimed to characterize just inland waters.

The dataset has been processed with the coastal lines of GSSHG layers : https://www.soest.hawaii.edu/pwessel/gshhg/ in H level, and using an erosion of 400m towards the continent.
Thus, any pixel closer to the GSSHG coast line than 400m and beyond will be considered as "no data"(value=255).

The format of the pixels content and file naming follow the same rules as in the "raw masks" version.


Generation process (White book)

A short description of an efficient usage of the Active Learning for Cloud Detection (ALCD) for surface water detection and extraction.

Input data
  Sentinel‐2 bands:
‐ B2 : Blue
‐ B3 : Green
‐ B4 : Red
‐ B8 : NIR
‐ B11 : SWIR1
‐ B12 : SWIR2
‐ MNDWI
‐ Slope: derived from SRTM.

Image visual analysis
  It is recommended to go through the image visually to identify the different types of water bodies and land covers present on the scene.

Choosing samples
  So far, only the « water_1pixel.shp » and « land_1pixel.shp » have been used to host the samples. More precise, they allow to identify pixels which will be used during the algorithm training. Even though the use of these layers requires a larger number of sampling points, they make the selection more precise and give a higher control over the input.

During the first iteration:
‐ Chose around 15 sampling points representing the land cover diversity present in the scene (different types of water, turbidity etc.).
‐ Prefer “pure” pixels.

Further recommendations
‐ Try not to exceed 7 iterations
- Multiplying iterations means a higher number of samples, which increases the chances of introducing false samples and then compromising the classification quality.
- The aim is to find the best compromise between omission and commission.
- It’s not about having a perfect classification but rather finding the classification that will minimize the post‐processing time.
‐ Adding new samples may degrade the classification. Using a dark shadow as “not water”, even though the pixel is spectrally very close to a “water” pixel, will disturb the model and lea to a classification of lesser quality than the previous iteration. If the result of an iteration is significantly worse than the previous iteration, it might be wiser to start again from this previous iteration rather than continuing with the problematic added samples.
‐ As said previously, the user should avoid to add any type of shadow in the « not water » class. Using such pixels will reduce the model efficiency at extracting water. During production, it’s easier to stick to one type of error, usually commissions, and making sure that all water bodies are correctly classified. Falsely detected shadows can be corrected easily during the post‐processing step.
‐ More generally, it’s easier to deal with commissions than omission.

 

Files

CNES_ALCD_Open_water_masks_v1.1.zip

Files (31.5 MB)

Name Size Download all
md5:ce06bff9318446c61c0cc6224050a28b
13.0 MB Preview Download
md5:4bb9860642a6789a7529d103420da7e3
18.5 MB Preview Download