Dataset for "Enhancing Cloud Detection in Sentinel-2 Imagery: A Spatial-Temporal Approach and Dataset"
- 1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China, and University of Chinese Academy of Sciences, Beijing 100049, China
- 2. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
Description
This dataset is built for time-series Sentinel-2 cloud detection and stored in Tensorflow TFRecord (refer to https://www.tensorflow.org/tutorials/load_data/tfrecord).
Each file is compressed in 7z format and can be decompressed using Bandzip or 7-zip software.
Dataset Structure:
Each filename can be split into three parts using underscores. The first part indicates whether it is designated for training or validation ('train' or 'val'); the second part indicates the Sentinel-2 tile name, and the last part indicates the number of samples in this file.
For each sample, it includes:
- Sample ID;
- Array of time series 4 band image patches in 10m resolution, shaped as (n_timestamps, 4, 42, 42);
- Label list indicating cloud cover status for the center \(6\times6\) pixels of each timestamp;
- Ordinal list for each timestamp;
- Sample weight list (reserved);
Here is a demonstration function for parsing the TFRecord file:
import tensorflow as tf
# init Tensorflow Dataset from file name
def parseRecordDirect(fname):
sep = '/'
parts = tf.strings.split(fname,sep)
tn = tf.strings.split(parts[-1],sep='_')[-2]
nn = tf.strings.to_number(tf.strings.split(parts[-1],sep='_')[-1],tf.dtypes.int64)
t = tf.data.Dataset.from_tensors(tn).repeat().take(nn)
t1 = tf.data.TFRecordDataset(fname)
ds = tf.data.Dataset.zip((t, t1))
return ds
keys_to_features_direct = {
'localid': tf.io.FixedLenFeature([], tf.int64, -1),
'image_raw_ldseries': tf.io.FixedLenFeature((), tf.string, ''),
'labels': tf.io.FixedLenFeature((), tf.string, ''),
'dates': tf.io.FixedLenFeature((), tf.string, ''),
'weights': tf.io.FixedLenFeature((), tf.string, '')
}
# The Decoder (Optional)
class SeriesClassificationDirectDecorder(decoder.Decoder):
"""A tf.Example decoder for tfds classification datasets."""
def __init__(self) -> None:
super().__init__()
def decode(self, tid, ds):
parsed = tf.io.parse_single_example(ds, keys_to_features_direct)
encoded = parsed['image_raw_ldseries']
labels_encoded = parsed['labels']
decoded = tf.io.decode_raw(encoded, tf.uint16)
label = tf.io.decode_raw(labels_encoded, tf.int8)
dates = tf.io.decode_raw(parsed['dates'], tf.int64)
weight = tf.io.decode_raw(parsed['weights'], tf.float32)
decoded = tf.reshape(decoded,[-1,4,42,42])
sample_dict = {
'tid': tid, # tile ID
'dates': dates, # Date list
'localid': parsed['localid'], # sample ID
'imgs': decoded, # image array
'labels': label, # label list
'weights': weight
}
return sample_dict
# simple function
def preprocessDirect(tid, record):
parsed = tf.io.parse_single_example(record, keys_to_features_direct)
encoded = parsed['image_raw_ldseries']
labels_encoded = parsed['labels']
decoded = tf.io.decode_raw(encoded, tf.uint16)
label = tf.io.decode_raw(labels_encoded, tf.int8)
dates = tf.io.decode_raw(parsed['dates'], tf.int64)
weight = tf.io.decode_raw(parsed['weights'], tf.float32)
decoded = tf.reshape(decoded,[-1,4,42,42])
return tid, dates, parsed['localid'], decoded, label, weight
t1 = parseRecordDirect('filename here')
dataset = t1.map(preprocessDirect, num_parallel_calls=tf.data.experimental.AUTOTUNE)
#
Class Definition:
- 0: clear
- 1: opaque cloud
- 2: thin cloud
- 3: haze
- 4: cloud shadow
- 5: snow
Dataset Construction:
First, we randomly generate 500 points for each tile, and all these points are aligned to the pixel grid center of the subdatasets in 60m resolution (eg. B10) for consistence when comparing with other products.
It is because that other cloud detection method may use the cirrus band as features, which is in 60m resolution.
Then, the time series image patches of two shapes are cropped with each point as the center.
The patches of shape \(42 \times 42\) are cropped from the bands in 10m resolution (B2, B3, B4, B8) and are used to construct this dataset.
And the patches of shape \(348 \times 348\) are cropped from the True Colour Image (TCI, details see sentinel-2 user guide) file and are used to interpreting class labels.
The samples with a large number of timestamps could be time-consuming in the IO stage, thus the time series patches are divided into different groups with timestamps not exceeding 100 for every group.
Notes
Files
Files
(40.2 GB)
Name | Size | Download all |
---|---|---|
md5:b24ef3f5fb85631e9fceb0cb9dfb1108
|
517.5 MB | Download |
md5:27914af7d6eb5165e12cee3eb6317f95
|
519.3 MB | Download |
md5:f18390410a26ac739777cafb776f0dc7
|
712.6 MB | Download |
md5:010b99f4ff1d5b2c1fc80a07f78e508b
|
1.4 GB | Download |
md5:3c9ee16b163aa11074b6b52773894ac5
|
853.2 MB | Download |
md5:ffbb2dac46c09137bd871b6beef5fe07
|
548.1 MB | Download |
md5:569c9b14ea37a64ff08877e807ddab9f
|
621.3 MB | Download |
md5:6b7768af410e4931f967f83b5fb5250b
|
2.7 GB | Download |
md5:791aab1ef940c584b20891dc3452ec9f
|
523.0 MB | Download |
md5:50ea597643af497c4189479fa7e284b2
|
714.2 MB | Download |
md5:7a305befbd37698282b34f33b1ecef15
|
872.1 MB | Download |
md5:54c16206da751daa38ba1e4c02b2ea0e
|
477.4 MB | Download |
md5:0af72bb0837fdca4ade494cf2c03d607
|
673.2 MB | Download |
md5:16d8ad9fe4e43e55f48fba0d4977565a
|
542.8 MB | Download |
md5:0941db3bac159811a50aedb8945ff642
|
810.3 MB | Download |
md5:48ec24dc62a6277798c60d783246388f
|
747.4 MB | Download |
md5:84745c50c7403dedaea0a8e4460268a8
|
1.3 GB | Download |
md5:2ced1be9397bdd6568f8d1760ce14518
|
1.4 GB | Download |
md5:1ca3fe8294b84166f98b92ceecd7e3d0
|
710.2 MB | Download |
md5:768f56f25654a24cb490b788fd3d8e48
|
487.6 MB | Download |
md5:bc22516437d695576217a30d6b88b6d2
|
1.2 GB | Download |
md5:45e96cec9197bfeee752dd6e2e323e4f
|
1.7 GB | Download |
md5:166bc5f5c0c9c8ae9fc7a8fca681940f
|
1.3 GB | Download |
md5:f5948c607b33ec4fbc5ebf4e5a462ffb
|
798.2 MB | Download |
md5:53119a6f1a8d5f73084c46d89c0382a8
|
794.6 MB | Download |
md5:777bb493bcf4f47b454e7653522c6116
|
501.9 MB | Download |
md5:2ce6e4334390d0eabd3e5fe3a25b9539
|
465.3 MB | Download |
md5:5d8256f4766f6f4bf17e8dd2ee0cf6e8
|
591.6 MB | Download |
md5:c1041d20301be892035e2f0a89033b1f
|
2.0 GB | Download |
md5:badd9c4f94149cb8c4e1c0015cd2110e
|
672.7 MB | Download |
md5:cc5167b95be862ffee6f193d3e37db68
|
485.3 MB | Download |
md5:8c61bb12c8501e49335b4ba28dd74708
|
849.7 MB | Download |
md5:adaa35bb78e293912ae2c25f3cf394b1
|
897.8 MB | Download |
md5:a69a40cc1fee39993ce3f79bd3f210bb
|
973.1 MB | Download |
md5:41e0823a0f24b2fbb3cbfe053d81d4f5
|
809.3 MB | Download |
md5:7aa39f11f801b36feb4be8ff66ea5b68
|
891.0 MB | Download |
md5:bc0fffa155fcdd34183ba0e98035c452
|
128.0 MB | Download |
md5:ea086b927d757b19c895341490b91a71
|
128.7 MB | Download |
md5:db64df88576c788309712938015fd22b
|
169.1 MB | Download |
md5:14e055b5f591d752ef4d28739217f9f4
|
393.4 MB | Download |
md5:2d89a6324c685baea901e8d95eb514d6
|
219.5 MB | Download |
md5:8b527ea95f9e7f785001196a1b04c6f0
|
138.1 MB | Download |
md5:07469247b9ca7ff85c54f17a432e8f5d
|
143.4 MB | Download |
md5:3e2905e000eb3af2adf75acf70d339ec
|
681.2 MB | Download |
md5:7b51d34f6e3482333c8f786c5cdfe790
|
129.2 MB | Download |
md5:ca2aa9dd917a4dbfb200b99c9567ea49
|
173.0 MB | Download |
md5:18610024b026e9a0a36a973cc6d2a002
|
219.6 MB | Download |
md5:0e8ab2ac9160ad73c067fcc71ba0c923
|
118.4 MB | Download |
md5:6dbdf2edd6464995fc90d8cbebcf464e
|
169.7 MB | Download |
md5:f61496de6f1fdd9d1d6c29f0d81d32cc
|
136.1 MB | Download |
md5:b221371b0c3b1cba358a5ab6f5fd6939
|
202.5 MB | Download |
md5:f26380620db8f8074282d0f0e8bb4d51
|
175.4 MB | Download |
md5:2a83e18d7c9b4aa519c2ed7d09c3cf09
|
326.7 MB | Download |
md5:fa148419f97bfba280104ff1c3d4a562
|
335.7 MB | Download |
md5:57a736e1ee54cb410da7f2437e55a528
|
172.5 MB | Download |
md5:95ced7a3e7edd6b5b6fe6f0ca8795155
|
124.1 MB | Download |
md5:9a40e6f94a8035f42dc875c8f1ceb14f
|
329.8 MB | Download |
md5:5bc043f8ebc09f9a76d19f1502199e47
|
472.4 MB | Download |
md5:857f6b8236730690d9dc9f05579890cc
|
332.6 MB | Download |
md5:7bc189f510e4dceea9cae0e1adbd30b5
|
191.6 MB | Download |
md5:cbc15f852e4277fa5ddd27ff43fe32ba
|
202.2 MB | Download |
md5:8a0744fb7d3239af79448d89b50b0a82
|
124.1 MB | Download |
md5:68b485fdec4509fddf8d9551299ac50b
|
109.6 MB | Download |
md5:8bc5a3e9eb6af294da1711d61ee8d56e
|
136.3 MB | Download |
md5:244e0949eef67391cfbeb99d3b014fd5
|
499.7 MB | Download |
md5:e3a017dfb1082424c60f17bbdccdcf13
|
157.8 MB | Download |
md5:f9e10b3614ae8c0674259f289429f873
|
119.9 MB | Download |
md5:0f4bccd436d9ba30e0772c935acef5e6
|
208.0 MB | Download |
md5:ec86561734859c99348b9e90a2768084
|
219.4 MB | Download |
md5:4143f608d7e60fd817b9d4434d009617
|
244.4 MB | Download |
md5:5f62175964ac36382b0135e72c6d82ae
|
205.4 MB | Download |
md5:0a25136c9ef88ffd05ee750b85ed0ce3
|
213.0 MB | Download |