Globe230k: A Benchmark Dense-Pixel Annotation Dataset for Global Land Cover Mapping

Shi, Qian; He, Da; Liu, Zhengyu; Liu, Xiaoping; Xue, Jingqian

doi:10.5281/zenodo.8429200

Published October 11, 2023 | Version v1

Dataset Open

Globe230k: A Benchmark Dense-Pixel Annotation Dataset for Global Land Cover Mapping

1. Sun Yat-Sen University

We (Intelligent Mining and Analysis of Remote Sensing big data, IMARS) create a large-scale annotated dataset (Globe230k) for land use/land cover (LULC) mapping, which is annotated on Google Earth image of 1 m spatial resolution. Globe230k is annotated by numerous experts and students major in survey and mapping after necessary training, through visual interpretation on very high-resolution images, as well as in-situ field survey, under the guidance of the organized annotation pipeline. Globe230k has three superiorities:

1) Large scale: the Globe230k includes 232,819 annotated images with the size of 512x512 and spatial resolution of 1 m, with more than 3x10¹⁰ annotated pixels, and it includes 10 first-level categories.

2) Rich diversity: the annotated images are sampled from worldwide regions, with coverage area of over 60,000 km², indicating a high variability and diversity. Besides, in order to ensure the category balance, we intentionally give more chance to the rare categories to be sampled, such as wetland, ice/snow, etc.

3) Multi-modal: Globe230k not only contains RGB bands, but also include other important features for Earth system research, such as Normalized differential vegetation index (NDVI), digital elevation model (DEM), vertical-vertical polarization (VV) bands, vertical-horizontal polarization (VH) bands, which can facilitate the multi-modal data fusion research.(This part will updating soon).

The image patches and their corresponding annotated patches are respectively stored in "patch_image.rar" and "patch_label.rar" file. The RGB image is in forms of ".jpg", with size of 512x512, the pixel value is ranged from 0-255. The annotated patches is in forms of ".png", also with size of 512x512, the pixel value is ranged from 1-10, which respectively represent 1#cropland, 2#forest, 3#grass, 4#shrubland, 5#wetland, 6#water, 7#tundra, 8#impervious, 9#bareland, 10#ice/snow. The total 232,819 pairs are officially divided into training set, validation set, and test set, based on ratio of 7:1:2, which can be find in "train.txt","val.txt","test.txt" file. Based on this division, the official baseline accuracy of several state-of-the-art semantic segmentation can be found in the related arcticle (https://spj.science.org/doi/10.34133/remotesensing.0078).

We hope it can be used as a benchmark to promote further development of global land cover mapping and semantic segmentation algorithm development.

Files

test.txt

Files (12.3 GB)

Name	Size
patch_image.rar md5:067cc86b68abf983edefaa7923ad8ce4	11.5 GB	Download
patch_label.rar md5:b39b37a96336cc69d4ac575a423ec27b	711.4 MB	Download
test.txt md5:e6d2701e9e6b536183dc2fac95ae579c	1.5 MB	Preview Download
train.txt md5:440c12b7283b7de5caeadac7aa61e57d	5.2 MB	Preview Download
val.txt md5:ca382b7d979b43063c9efcc20e335097	726.0 kB	Preview Download

	All versions	This version
Views	991	977
Downloads	2,555	2,546
Data volume	31.8 TB	31.8 TB

Globe230k: A Benchmark Dense-Pixel Annotation Dataset for Global Land Cover Mapping

Authors/Creators

Description

Files

test.txt

Files (12.3 GB)