Published July 29, 2022 | Version 2.1
Dataset Open

Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB imagery annotated for global land use/land cover mapping with deep learning (License CC BY 4.0)

  • 1. Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, DaSCI, University of Granada, 18071, Granada, Spain. Systems Analysis and Modeling for Decision Support Laboratory, National School of Applied Sciences of Berrechid, Hassan 1st University, Berrechid 218, Morocco
  • 2. Department of Botany, Faculty of Science, University of Granada, 18071 Granada, Spain. iEcolab, Inter-University Institute for Earth System Research, University of Granada, 18006 Granada, Spain
  • 3. Andalusian Center for Assessment and Monitoring of Global Change (CAESCG), University of Almería, 04120 Almería, Spain
  • 4. Department of Botany, Faculty of Science, University of Granada, 18071 Granada, Spain. ENSIAS, Mohammed V University, Rabat, 10170, Morocco
  • 5. Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, DaSCI, University of Granada, 18071, Granada, Spain
  • 1. Systems Analysis and Modeling for Decision Support Laboratory, National School of Applied Sciences of Berrechid, Hassan 1st University, Berrechid 218, Morocco
  • 2. Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, DaSCI, University of Granada, 18071, Granada, Spain

Description

Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.1 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE). 

 

Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder called "Sentinel2LULC_GeoTiff.zip" contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder called "Sentinel2LULC_JPEG.zip" contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder called "Sentinel2LULC_CSV.zip" includes 29 zip-compressed CSV files with as many rows as provided images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames): 

  • Land Cover Class ID: is the identification number of each LULC class
  • Land Cover Class Short Name: is the short name of each LULC class
  • Image ID: is the identification number of each image within its corresponding LULC class 
  • Pixel purity Value: is the spatial purity of each pixel for its corresponding LULC class calculated as the spatial consensus across up to 15 land-cover products 
  • GHM Value: is the spatial average of the Global Human Modification index (gHM) for each image
  • Latitude: is the latitude of the center point of each image
  • Longitude: is the longitude of the center point of each image
  • Country Code: is the Alpha-2 country code of each image as described in the ISO 3166 international standard. To understand the country codes, we recommend the user to visit the following website where they present the Alpha-2 code for each country as described in the ISO 3166 international standard:https: //www.iban.com/country-codes
  • Administrative Department Level1: is the administrative level 1 name to which each image belongs
  • Administrative Department Level2: is the administrative level 2 name to which each image belongs
  • Locality: is the name of the locality to which each image belongs
  • Number of S2 images : is the number of found instances in the corresponding Sentinel-2 image collection between June 2015 and October 2020, when compositing and exporting its corresponding image tile

For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files:

  • A CSV file that contains all exported images for this class 
  • A CSV file that contains all images available for this class at spatial purity of 100%, both the ones exported and the ones not exported, in case the user wants to export them. These CSV filenames end with "including_non_downloaded_images".

To clearly state the geographical coverage of images available in this dataset,  we included in the version v2.1,  a compressed folder called "Geographic_Representativeness.zip". This zip-compressed folder contains a csv file for each LULC class that provides the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name.

© Sentinel2GlobalLULC Dataset by  Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)

Notes

This research has been supported by DETECTOR (A-RNM-256-UGR18 Universidad de Granada/FEDER), LifeWatch SmartEcomountains (LifeWatch-2019-10-UGR-01 Ministerio de Ciencia e Innovación/Universidad de Granada/FEDER), BBVA DeepSCOP (Ayudas Fundación BBVA a Equipos de Investigación Científica 2018), Ramón y Cajal Programme (RYC-2015-18136), DeepL-ISCO (A-TIC-458-UGR18 Ministerio de Ciencia e Innovación/FEDER), SMART-DASCI (TIN2017-89517-P (Ministerio de Ciencia e Innovación/Universidad de Granada/FEDER), BigDDL-CET (P18-FR-4961 Ministerio de Ciencia e Innovación/Universidad de Granada/FEDER), RESISTE (P18-RT-1927 Consejería de Economía, Conocimiento, y Universidad from the Junta de Andalucía/FEDER), and Ecopotential (641762 European Commission). This work is part of the project "Thematic Center on Mountain Ecosystem & Remote sensing, Deep learning-AI e-Services University of Granada-Sierra Nevada" (LifeWatch-2019-10-UGR-01), which has been co-funded by the Ministry of Science and Innovation through the FEDER funds from the Spanish Pluriregional Operational Program 2014-2020 (POPE), LifeWatch-ERIC action line, within the Workpackages LifeWatch-2019-10-UGR-01_WP-8, LifeWatch-2019-10-UGR-01_WP-7 and LifeWatch-2019-10-UGR-01_WP-4.

Files

Geographic_Representativeness.zip

Files (16.3 GB)

Name Size Download all
md5:fad624cb9f1122450bfb9783c9eb4bb9
9.9 kB Download
md5:b82d2626455c9edc754fef708044114a
14.0 kB Preview Download
md5:52980b9c7745e5ceecfd55778517adae
1.1 kB Download
md5:e94db2bbd67eaca888aa21b17680b9e1
64.5 MB Preview Download
md5:4c2b8c790668610aaced88a2f054131a
13.0 GB Preview Download
md5:b3f5bb96de63eb31751cc9b9f320a3a4
3.2 GB Preview Download

Additional details

Related works

Is published in
Journal article: 10.1038/s41597-022-01775-8 (DOI)

Funding

ECOPOTENTIAL – ECOPOTENTIAL: IMPROVING FUTURE ECOSYSTEM BENEFITS THROUGH EARTH OBSERVATIONS 641762
European Commission

References

  • Benhammou, Y., Alcaraz-Segura, D., Guirado, E. et al. Sentinel2GlobalLULC: A Sentinel-2 RGB image tile dataset for global land use/cover mapping with deep learning. Sci Data 9, 681 (2022). https://doi.org/10.1038/s41597-022-01775-8