Published February 4, 2024 | Version v2
Dataset Open

Dataset for "Enhancing Cloud Detection in Sentinel-2 Imagery: A Spatial-Temporal Approach and Dataset"

  • 1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China, and University of Chinese Academy of Sciences, Beijing 100049, China
  • 2. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

Description

This dataset is built for time-series Sentinel-2 cloud detection and stored in Tensorflow TFRecord (refer to https://www.tensorflow.org/tutorials/load_data/tfrecord).

Each file is compressed in 7z format and can be decompressed using Bandzip or 7-zip software.

Dataset Structure:

Each filename can be split into three parts using underscores. The first part indicates whether it is designated for training or validation ('train' or 'val'); the second part indicates the Sentinel-2 tile name, and the last part indicates the number of samples in this file.

For each sample, it includes:

  1. Sample ID;
  2. Array of time series 4 band image patches in 10m resolution, shaped as (n_timestamps, 4, 42, 42);
  3. Label list indicating cloud cover status for the center \(6\times6\) pixels of each timestamp;
  4. Ordinal list for each timestamp;
  5. Sample weight list (reserved);

Here is a demonstration function for parsing the TFRecord file:

import tensorflow as tf

# init Tensorflow Dataset from file name
def parseRecordDirect(fname):
    sep = '/'
    parts = tf.strings.split(fname,sep)
    tn = tf.strings.split(parts[-1],sep='_')[-2]
    nn = tf.strings.to_number(tf.strings.split(parts[-1],sep='_')[-1],tf.dtypes.int64)
    t = tf.data.Dataset.from_tensors(tn).repeat().take(nn)
    t1 = tf.data.TFRecordDataset(fname)
    ds = tf.data.Dataset.zip((t, t1))
    return ds

keys_to_features_direct = {
    'localid': tf.io.FixedLenFeature([], tf.int64, -1),
    'image_raw_ldseries': tf.io.FixedLenFeature((), tf.string, ''),
    'labels': tf.io.FixedLenFeature((), tf.string, ''),
    'dates': tf.io.FixedLenFeature((), tf.string, ''),
    'weights': tf.io.FixedLenFeature((), tf.string, '')
        }

# The Decoder (Optional)
class SeriesClassificationDirectDecorder(decoder.Decoder):
  """A tf.Example decoder for tfds classification datasets."""
  def __init__(self) -> None:
    super().__init__()

  def decode(self, tid, ds):
    parsed = tf.io.parse_single_example(ds, keys_to_features_direct)
    encoded = parsed['image_raw_ldseries']
    labels_encoded = parsed['labels']
    decoded = tf.io.decode_raw(encoded, tf.uint16)
    label = tf.io.decode_raw(labels_encoded, tf.int8)
    dates = tf.io.decode_raw(parsed['dates'], tf.int64)
    weight = tf.io.decode_raw(parsed['weights'], tf.float32)
    decoded = tf.reshape(decoded,[-1,4,42,42])
    sample_dict = {
      'tid': tid, # tile ID
      'dates': dates, # Date list
      'localid': parsed['localid'], # sample ID
      'imgs': decoded, # image array
      'labels': label, # label list
      'weights': weight
    }
    return sample_dict

# simple function 
def preprocessDirect(tid, record):
    parsed = tf.io.parse_single_example(record, keys_to_features_direct)
    encoded = parsed['image_raw_ldseries']
    labels_encoded = parsed['labels']
    decoded = tf.io.decode_raw(encoded, tf.uint16)
    label = tf.io.decode_raw(labels_encoded, tf.int8)
    dates = tf.io.decode_raw(parsed['dates'], tf.int64)
    weight = tf.io.decode_raw(parsed['weights'], tf.float32)
    decoded = tf.reshape(decoded,[-1,4,42,42])
    return tid, dates, parsed['localid'], decoded, label, weight

t1 = parseRecordDirect('filename here')
dataset = t1.map(preprocessDirect, num_parallel_calls=tf.data.experimental.AUTOTUNE)

#

Class Definition:

  • 0: clear
  • 1: opaque cloud
  • 2: thin cloud
  • 3: haze
  • 4: cloud shadow
  • 5: snow

Dataset Construction:

First, we randomly generate 500 points for each tile, and all these points are aligned to the pixel grid center of the subdatasets in 60m resolution (eg. B10) for consistence when comparing with other products. 
It is because that other cloud detection method may use the cirrus band as features, which is in 60m resolution. 

Then, the time series image patches of two shapes are cropped with each point as the center.
The patches of shape \(42 \times 42\) are cropped from the bands in 10m resolution (B2, B3, B4, B8) and are used to construct this dataset.
And the patches of shape \(348 \times 348\) are cropped from the True Colour Image (TCI, details see sentinel-2 user guide) file and are used to interpreting class labels.

The samples with a large number of timestamps could be time-consuming in the IO stage, thus the time series patches are divided into different groups with timestamps not exceeding 100 for every group.

Notes

This dataset was funded by National Natural Science Foundation of China under Grants 61860206004, and the Fund for Pioneering Research in Science and Disruptive Technologies, Aerospace Information Research Institute, Chinese Academy of Sciences (Grant No. E3Z218010F). Corresponding author: Ranyu Yin.

Files

Files (40.2 GB)

Name Size Download all
md5:b24ef3f5fb85631e9fceb0cb9dfb1108
517.5 MB Download
md5:27914af7d6eb5165e12cee3eb6317f95
519.3 MB Download
md5:f18390410a26ac739777cafb776f0dc7
712.6 MB Download
md5:010b99f4ff1d5b2c1fc80a07f78e508b
1.4 GB Download
md5:3c9ee16b163aa11074b6b52773894ac5
853.2 MB Download
md5:ffbb2dac46c09137bd871b6beef5fe07
548.1 MB Download
md5:569c9b14ea37a64ff08877e807ddab9f
621.3 MB Download
md5:6b7768af410e4931f967f83b5fb5250b
2.7 GB Download
md5:791aab1ef940c584b20891dc3452ec9f
523.0 MB Download
md5:50ea597643af497c4189479fa7e284b2
714.2 MB Download
md5:7a305befbd37698282b34f33b1ecef15
872.1 MB Download
md5:54c16206da751daa38ba1e4c02b2ea0e
477.4 MB Download
md5:0af72bb0837fdca4ade494cf2c03d607
673.2 MB Download
md5:16d8ad9fe4e43e55f48fba0d4977565a
542.8 MB Download
md5:0941db3bac159811a50aedb8945ff642
810.3 MB Download
md5:48ec24dc62a6277798c60d783246388f
747.4 MB Download
md5:84745c50c7403dedaea0a8e4460268a8
1.3 GB Download
md5:2ced1be9397bdd6568f8d1760ce14518
1.4 GB Download
md5:1ca3fe8294b84166f98b92ceecd7e3d0
710.2 MB Download
md5:768f56f25654a24cb490b788fd3d8e48
487.6 MB Download
md5:bc22516437d695576217a30d6b88b6d2
1.2 GB Download
md5:45e96cec9197bfeee752dd6e2e323e4f
1.7 GB Download
md5:166bc5f5c0c9c8ae9fc7a8fca681940f
1.3 GB Download
md5:f5948c607b33ec4fbc5ebf4e5a462ffb
798.2 MB Download
md5:53119a6f1a8d5f73084c46d89c0382a8
794.6 MB Download
md5:777bb493bcf4f47b454e7653522c6116
501.9 MB Download
md5:2ce6e4334390d0eabd3e5fe3a25b9539
465.3 MB Download
md5:5d8256f4766f6f4bf17e8dd2ee0cf6e8
591.6 MB Download
md5:c1041d20301be892035e2f0a89033b1f
2.0 GB Download
md5:badd9c4f94149cb8c4e1c0015cd2110e
672.7 MB Download
md5:cc5167b95be862ffee6f193d3e37db68
485.3 MB Download
md5:8c61bb12c8501e49335b4ba28dd74708
849.7 MB Download
md5:adaa35bb78e293912ae2c25f3cf394b1
897.8 MB Download
md5:a69a40cc1fee39993ce3f79bd3f210bb
973.1 MB Download
md5:41e0823a0f24b2fbb3cbfe053d81d4f5
809.3 MB Download
md5:7aa39f11f801b36feb4be8ff66ea5b68
891.0 MB Download
md5:bc0fffa155fcdd34183ba0e98035c452
128.0 MB Download
md5:ea086b927d757b19c895341490b91a71
128.7 MB Download
md5:db64df88576c788309712938015fd22b
169.1 MB Download
md5:14e055b5f591d752ef4d28739217f9f4
393.4 MB Download
md5:2d89a6324c685baea901e8d95eb514d6
219.5 MB Download
md5:8b527ea95f9e7f785001196a1b04c6f0
138.1 MB Download
md5:07469247b9ca7ff85c54f17a432e8f5d
143.4 MB Download
md5:3e2905e000eb3af2adf75acf70d339ec
681.2 MB Download
md5:7b51d34f6e3482333c8f786c5cdfe790
129.2 MB Download
md5:ca2aa9dd917a4dbfb200b99c9567ea49
173.0 MB Download
md5:18610024b026e9a0a36a973cc6d2a002
219.6 MB Download
md5:0e8ab2ac9160ad73c067fcc71ba0c923
118.4 MB Download
md5:6dbdf2edd6464995fc90d8cbebcf464e
169.7 MB Download
md5:f61496de6f1fdd9d1d6c29f0d81d32cc
136.1 MB Download
md5:b221371b0c3b1cba358a5ab6f5fd6939
202.5 MB Download
md5:f26380620db8f8074282d0f0e8bb4d51
175.4 MB Download
md5:2a83e18d7c9b4aa519c2ed7d09c3cf09
326.7 MB Download
md5:fa148419f97bfba280104ff1c3d4a562
335.7 MB Download
md5:57a736e1ee54cb410da7f2437e55a528
172.5 MB Download
md5:95ced7a3e7edd6b5b6fe6f0ca8795155
124.1 MB Download
md5:9a40e6f94a8035f42dc875c8f1ceb14f
329.8 MB Download
md5:5bc043f8ebc09f9a76d19f1502199e47
472.4 MB Download
md5:857f6b8236730690d9dc9f05579890cc
332.6 MB Download
md5:7bc189f510e4dceea9cae0e1adbd30b5
191.6 MB Download
md5:cbc15f852e4277fa5ddd27ff43fe32ba
202.2 MB Download
md5:8a0744fb7d3239af79448d89b50b0a82
124.1 MB Download
md5:68b485fdec4509fddf8d9551299ac50b
109.6 MB Download
md5:8bc5a3e9eb6af294da1711d61ee8d56e
136.3 MB Download
md5:244e0949eef67391cfbeb99d3b014fd5
499.7 MB Download
md5:e3a017dfb1082424c60f17bbdccdcf13
157.8 MB Download
md5:f9e10b3614ae8c0674259f289429f873
119.9 MB Download
md5:0f4bccd436d9ba30e0772c935acef5e6
208.0 MB Download
md5:ec86561734859c99348b9e90a2768084
219.4 MB Download
md5:4143f608d7e60fd817b9d4434d009617
244.4 MB Download
md5:5f62175964ac36382b0135e72c6d82ae
205.4 MB Download
md5:0a25136c9ef88ffd05ee750b85ed0ce3
213.0 MB Download