There is a newer version of the record available.

Published October 11, 2023 | Version v1
Dataset Open

Globe230k: A Benchmark Dense-Pixel Annotation Dataset for Global Land Cover Mapping

  • 1. Sun Yat-Sen University

Description

We (Intelligent Mining and Analysis of Remote Sensing big data, IMARS) create a large-scale annotated dataset (Globe230k) for land use/land cover (LULC) mapping, which is annotated on Google Earth image of 1 m spatial resolution. Globe230k is annotated by numerous experts and students major in survey and mapping after necessary training, through visual interpretation on very high-resolution images, as well as in-situ field survey, under the guidance of the organized annotation pipeline. Globe230k has three superiorities:

1) Large scale: the Globe230k includes 232,819 annotated images with the size of 512x512 and spatial resolution of 1 m, with more than 3x1010 annotated pixels, and it includes 10 first-level categories. 

2) Rich diversity: the annotated images are sampled from worldwide regions, with coverage area of over 60,000 km2, indicating a high variability and diversity. Besides, in order to ensure the category balance, we intentionally give more chance to the rare categories to be sampled, such as wetland, ice/snow, etc.

3) Multi-modal: Globe230k not only contains RGB bands, but also include other important features for Earth system research, such as Normalized differential vegetation index (NDVI), digital elevation model (DEM), vertical-vertical polarization (VV) bands, vertical-horizontal polarization (VH) bands, which can facilitate the multi-modal data fusion research.(This part will updating soon).

The image patches and their corresponding annotated patches are respectively stored in "patch_image.rar" and "patch_label.rar" file. The RGB image is in forms of ".jpg", with size of 512x512, the pixel value is ranged from 0-255. The annotated patches is in forms of ".png", also with size of 512x512, the pixel value is ranged from 1-10, which respectively represent 1#cropland, 2#forest, 3#grass, 4#shrubland, 5#wetland, 6#water, 7#tundra, 8#impervious, 9#bareland, 10#ice/snow. The total 232,819 pairs are officially divided into training set, validation set, and test set, based on ratio of 7:1:2, which can be find in "train.txt","val.txt","test.txt" file. Based on this division, the official baseline accuracy of several state-of-the-art semantic segmentation can be found in the related arcticle (https://spj.science.org/doi/10.34133/remotesensing.0078).

We hope it can be used as a benchmark to promote further development of global land cover mapping and semantic segmentation algorithm development.

Files

test.txt

Files (12.3 GB)

Name Size Download all
md5:067cc86b68abf983edefaa7923ad8ce4
11.5 GB Download
md5:b39b37a96336cc69d4ac575a423ec27b
711.4 MB Download
md5:e6d2701e9e6b536183dc2fac95ae579c
1.5 MB Preview Download
md5:440c12b7283b7de5caeadac7aa61e57d
5.2 MB Preview Download
md5:ca382b7d979b43063c9efcc20e335097
726.0 kB Preview Download