Published February 16, 2024 | Version v1
Dataset Open

MADOS - Marine Debris and Oil Spill

  • 1. Hellenic Centre for Marine Research
  • 2. ROR icon National Technical University of Athens
  • 3. King Abdullah University of Science and Technology (KAUST)

Description

Marine Debris and Oil Spill (MADOS) is a marine pollution dataset based on Sentinel-2 remote sensing data, focusing on marine litter and oil spills. Other sea surface features that coexist with or have been suggested to be spectrally similar to them have also been considered. MADOS formulates a challenging semantic segmentation task using sparse annotations.

Citation: Kikaki K., Kakogeorgiou I., Hoteit I., Karantzalos K. Detecting Marine Pollutants and Sea Surface Features with Deep Learning in Sentinel-2 Imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 2024.

For the implementation code, pre-trained models and exploratory analysis visit our project page https://marine-pollution.github.io/

 

MADOS Overview & Structure

MADOS is structured in 174 scene folders, named Scene_0 through Scene_173 each corresponding to a unique Sentinel-2 (S2) scene. Each of these folders contains multiple image crops of the specific scene, indicated by the `_CROP` identifier. The total number of patch crops (240x240) is 2803.

The next level of hierarchy seperates imagery by spatial resolution into three distinct folders: 10, 20 and 60 denoting 10m, 20m, and 60m resolution data, respectively. This allows to keep each S2 band at the initial resolution, supporting a wide range of applications (e.g., pansharpening).

The 10 folder contains crops of S2 10 m resolution bands (`492` nm, `560` nm, `665` nm, `833` nm) with Rayleigh corrected reflectance values along with the corresponding masks of pixel-level annotations (_cl), confidence levels (_conf) and reports (_rep). RGB images (_rgb) and water turbidity outputs extracted by ACOLITE (_TUR_Dogliotti, _TUR_Nechad2016) are also provided.

The 20 and 60 folders follow a similar pattern, containing cropped images of S2 bands relevant to their respective resolutions. Note that for the 20m resolution, we also provide aggregated mask annotations (_cl), confidence levels (_conf) and reports (_rep).

At the root folder of MADOS, there is a `splits` folder containing three text files: train_X.txt, val_X.txt, and test_X.txt. These files describe the division of the dataset into training, validation and testing sets, respectively, crucial for structuring machine learning experiments to evaluate model performance.

 

Folder structure

└── MADOS

    ├── Scene_0

    │   ├── 10

    │   │   ├── Scene_0_L2R_rhorc_492_CROP.tif

    │   │   ├── Scene_0_L2R_rhorc_560_CROP.tif

    │   │   ├── Scene_0_L2R_rhorc_665_CROP.tif

    │   │   ├── Scene_0_L2R_rhorc_833_CROP.tif

    │   │   ├── Scene_0_L2R_cl_CROP.tif

    │   │   ├── Scene_0_L2R_conf_CROP.tif

    │   │   ├── Scene_0_L2R_rep_CROP.tif

    │   │   ├── Scene_0_L2R_rgb_CROP.png

    │   │   ├── Scene_0_L2W_TUR_Dogliotti_CROP.tif

    │   │   ├── Scene_0_L2W_TUR_Nechad2016_665_CROP.tif

    │   │   └── ...

    │   ├── 20

    │   │   ├── Scene_0_L2R_rhorc_704_CROP.tif

    │   │   ├── Scene_0_L2R_rhorc_783_CROP.tif

    │   │   ├── Scene_0_L2R_rhorc_865_CROP.tif

    │   │   ├── Scene_0_L2R_rhorc_1614_CROP.tif

    │   │   ├── Scene_0_L2R_rhorc_2202_CROP.tif

    │   │   ├── Scene_0_L2R_cl_CROP.tif

    │   │   ├── Scene_0_L2R_conf_CROP.tif

    │   │   ├── Scene_0_L2R_rep_CROP.tif

    │   │   └── ...

    │   └── 60

    │        ├── Scene_0_L2R_rhorc_443_CROP.tif

    │        └── ...

    ├── ...

    ├── Scene_173

    │   ├── 10

    │   │   └── ...

    │   ├── 20

    │   │   └── ...

    │   └── 60

    │        └── ...

    └── splits

        ├── test_X.txt

        ├── train_X.txt

        └── val_X.txt

 

Mapping

The mapping between Digital Numbers and Classes in _cl files is:

0.      Non-annotated

1.      Marine Debris

2.      Dense Sargassum

3.      Sparse Floating Algae

4.      Natural Organic Material

5.      Ship

6.      Oil Spill

7.      Marine Water

8.      Sediment-Laden Water

9.      Foam

10.  Turbid Water

11.  Shallow Water

12.  Waves & Wakes

13.  Oil Platform

14.  Jellyfish

15.  Sea snot

 

The mapping between Digital Numbers and Confidence level in _conf files is:

1: High
2: Moderate
3: Low

 

The mapping between Digital Numbers and marine debris Report existence in _rep files is:

1: Very close
2: Away
3: No

 

The final uncompressed dataset requires 5.35 GB of storage.

Files

MADOS.zip

Files (4.0 GB)

Name Size Download all
md5:1076b25d2797be3095d82a61105c7380
4.0 GB Preview Download

Additional details

Additional titles

Subtitle
Detecting Marine Pollutants and Sea Surface Features with Deep Learning in Sentinel-2 Imagery