Published May 3, 2022 | Version 1.0.0
Dataset Open

SEN2VENµS, a dataset for the training of Sentinel-2 super-resolution algorithms

  • 1. CESBIO, Université de Toulouse, CNES, CNRS, INRAE, IRD, UT3

Description

1 Description

SEN2VENµS is an open dataset for the super-resolution of Sentinel-2 images by leveraging simultaneous acquisitions with the VENµS satellite. The dataset is composed of 10m and 20m cloud-free surface reflectance patches from Sentinel-2, with their reference spatially-registered surface reflectance patches at 5 meters resolution acquired on the same day by the VENµS satellite. This dataset covers 29 locations with a total of 132 955 patches of 256x256 pixels at 5 meters resolution, and can be used for the training of super-resolution algorithms to bring spatial resolution of 8 of the Sentinel-2 bands down to 5 meters.

2 Files organization

The dataset is composed of separate sub-datasets, one for each site, as described in table 1.

Table 1: Number of patches and pairs for each site, along with VENµS viewing zenith angle
Site Number of patches Number of pairs VENµS Zenith Angle
FR-LQ1 4888 18 1.795402
NARYN 3814 25 5.010906
FGMANAUS 129 4 7.232127
MAD-AMBO 1443 19 14.788115
ARM 15859 39 15.160683
BAMBENW2 9018 34 17.766533
ES-IC3XG 8823 35 18.807686
ANJI 2314 16 19.310494
ATTO 2258 9 22.048651
ESGISB-3 6057 19 23.683871
ESGISB-1 2892 13 24.561609
FR-BIL 7105 30 24.802892
K34-AMAZ 1385 21 24.982675
ESGISB-2 3067 13 26.209776
ALSACE 2654 17 26.877071
LERIDA-1 2281 6 28.524780
ESTUAMAR 912 13 28.871947
SUDOUE-5 2176 20 29.170244
KUDALIAR 7269 20 29.180855
SUDOUE-6 2435 14 29.192055
SUDOUE-4 935 7 29.516127
SUDOUE-3 5363 14 29.998115
SO1 12018 36 30.255978
SUDOUE-2 9700 27 31.295256
ES-LTERA 1701 19 31.971764
FR-LAM 7299 22 32.054056
SO2 738 22 32.218481
BENGA 5858 29 32.587334
JAM2018 2564 18 33.718953

For each site, the sub-dataset folder contains a set of files for each date, following this naming convention as the pair id: {site_name}_{mgrs_tile}_{acquisition_date}. For each pair, 5 files are available, as shown in table 2. Patches are encoded as ready-to-use tensors as serialized by the well known Pytorch library1. As such they can be loaded by a simple call to the torch.load() function. Note that bands are separated into two groups (10m and 20m Sentinel2 bands), which leads to four separate tensor files (2 groups of bands \(\times\) source and target resolution). Tensor shape is [n,c,w,h] where \(n\) is the number of patches, \(c=4\) is the number of bands, \(w\) is the patch width and \(h\) is the patch height. In order to save storage space, they are encoded as 16 bits signed integers and should be converted back to floating point surface reflectance by dividing each and every value by 10 000 upon reading.

Table 2: Naming convention for files associated to each pair. {id} is {site_name}_{mgrs_tile}_{acquisition_date}.
File Content
{id}_05m_b2b3b4b8.pt 5m patches (\(256\times256\) pix.) for S2 B2, B3, B4 and B8 (from VENµS)
{id}_10m_b2b3b4b8.pt 10m patches (\(128\times128\) pix.) for S2 B2, B3, B4 and B8 (from Sentinel-2)
{id}_05m_b5b6b7b8a.pt 5m patches (\(256\times256\) pix.) for S2 B5, B6, B7 and B8A (from VENµS)
{id}_20m_b5b6b7b8a.pt 20m patches (\(64\times64\) pix.) for S2 B5, B6, B7 and B8A (from Sentinel-2)
{id}_patches.gpkg GIS file with footprint of each patch

Each file comes with a master index.csv CSV (Comma Separated Values) file, with one row for each pair sampled in the given site, and columns as described in table 3, separated with tabs.

Table 3: Columns of the index.csv file indexing pairs for each site. For file naming conventions, refer to table 2.
Column Description
venus_product_id ID of the sampled VENµS L2A product
sentinel2_product_id ID of the sampled Sentinel-2 L2A product
tensor_05m_b2b3b4b8 Name of the 5m tensor file for S2 B2, B3, B4 and B8 (from VENµS)
tensor_10m_b2b3b4b8 Name of the 10m tensor file for S2 B2, B3, B4 and B8 (from Sentinel-2)
tensor_05m_b5b6b7b8a Name of the 5m tensor file for S2 B5, B6, B7 and B8A (from VENµS)
tensor_20m_b5b6b7b8a Name of the 20m tensor file for S2 B5, B6, B7 and B8A (from Sentinel-2)
s2_tile Sentinel-2 MGRS tile
vns_site Name of VENµS site
date Acquisition date as YYYY-MM-DD
venus_zenith_angle VENµS zenith viewing angle in degrees
patches_gpkg Name of the GIS file with footprint for each patch
nb_patches Number of patches for this pair

Each site folder is compressed to a different 7z file.

3 Licencing

3.1 Sentinel-2 patches

3.1.1 Copyright

Value-added data processed by CNES for the Theia data centre www.theia-land.fr using Copernicus products. The processing uses algorithms developed by Theia's Scientific Expertise Centres. Note: Copernicus Sentinel-2 Level 1C data is subject to this license: https://theia.cnes.fr/atdistrib/documents/TC_Sentinel_Data_31072014.pdf

3.1.2 Licence

Files {id}_05m_b2b3b4b8.pt and {id}_05m_b5b6b7b8a.pt are distributed under the the original licence of the Sentinel-2 Theia L2A products, which is the Etalab Open Licence Version 2.0 2.

3.2 VENµS patches

3.2.1 Copyright

Value-added data processed by CNES for the Theia data centre www.theia-land.fr using VENµS satellite imagery from CNES and Israeli Space Agency. The processing uses algorithms developed by Theia's Scientific Expertise Centres.

3.2.2 Licence

Files {id}_05m_b2b3b4b8.pt and {id}_05m_b5b6b7b8a.pt are distributed under the original licence of the VENµS products, which is Creative Commons BY-NC 4.0 3.

3.3 Remaining files

All remaining files are distributed under the Creative Commons BY 4.0 4 licence.

4 Note to users

Note that even if the VenµS2 dataset is sorted by sites and by pairs, we strongly encourage users to apply the full set of machine learning best practices when using it : random keeping separate pairs (or even sites) for testing purpose, and randomization of patches accross sites and pairs in the training and validation sets.

5 Citing

Please cite the following data paper (preprint, submitted to MDPI Data) and zenodo link when publishing work derived from this dataset:

Michel, J.; Vinasco-Salinas, J.; Inglada, J.; Hagolle, O. SEN2VENµS, a Dataset for the Training of Sentinel-2 Super-Resolution Algorithms. Data 2022, 7, 96. https://doi.org/10.3390/data7070096

https://zenodo.org/deposit/6514159

Footnotes:

1

https://pytorch.org/

2

https://theia.cnes.fr/atdistrib/documents/Licence-Theia-CNES-Sentinel-ETALAB-v2.0-en.pdf

3

https://creativecommons.org/licenses/by-nc/4.0/

4

https://creativecommons.org/licenses/by/4.0/

Files

Files (85.0 GB)

Name Size Download all
md5:ecbf57fc83a8c8ca47ab421642bbef57
1.7 GB Download
md5:2b6521e2fd43fc220557d1a171f94c06
1.5 GB Download
md5:9c264cd01640707f483f78a88c1a40c8
9.6 GB Download
md5:c6d7905816f8c807e5a87f4a2d09a4ae
1.1 GB Download
md5:f804161f30c295dab1172e904ecb38be
5.2 GB Download
md5:a3bdc8fd5ac049b2d07b308fc1f0706a
3.7 GB Download
md5:e7a19cd51f048a006688f6b2ea795d55
5.6 GB Download
md5:226cd7c10689f9aad92c760d9c1899fe
1.1 GB Download
md5:ab1c0e9a70c566d6fe8b94ba421a15d6
2.0 GB Download
md5:20196e6e963170e641fc805330077434
2.0 GB Download
md5:ac42ab2ddb89975b55395ace90ecc0a6
3.7 GB Download
md5:2b540369499c7b9882f7e195699e9438
498.1 MB Download
md5:06d422d9f4ba0c2ed1087c2a7f0339c5
65.7 MB Download
md5:c4305e091b61de5583842f71b4122ed3
4.3 GB Download
md5:1bceb23259d7f101ee0e1df141b5e550
5.1 GB Download
md5:535489d0d3bc23e8e7646a20b99575e6
3.1 GB Download
md5:2e2a6de2b5842ce86d074ebd8c68354b
1.6 GB Download
md5:7abf9ef3f89bd30b905c0029169b88d1
659.9 MB Download
md5:1427c8a4bc1e238c5c63e434fd6d31c6
5.0 GB Download
md5:d507dcbc1b92676410df9e4f650ea23b
1.5 GB Download
md5:373f2ea88a57d51c5f54778c36503027
2.1 kB Download
md5:49e43cd47ecdc5360c83e448eaf73fbb
889.3 MB Download
md5:a21a655812d6cfd309d1e76c95463916
1.3 kB Download
md5:56474220d0014e53aa0c96ea93c03bc9
2.4 GB Download
md5:62b5ce44dc641639079c15227cdbd794
8.2 GB Download
md5:59afd969b950f90df0f8ce8b1dbccd62
510.2 MB Download
md5:5aed36a3d5e9746e5f5c438d10fae413
6.7 GB Download
md5:0eeb556caaae171b8fbd0696f4757308
3.5 GB Download
md5:aac762b62ac240720d34d5bb3fc4a906
580.6 MB Download
md5:69042546af7bd25a0398b04c2ce60057
1.4 GB Download
md5:ca143d2a2a56db30ab82c33420433e01
1.6 GB Download

Additional details

Related works

Is documented by
Journal article: 10.3390/data7070096 (DOI)