Published June 12, 2020 | Version 1.0
Dataset Open

Multispectral and augmented Landsat data with land cover labels

  • 1. INEGI
  • 2. CONACyT

Description

Benchmark set at 77.1% O.A at: https://doi.org/10.1117/1.JRS.14.048503

The dataset consists of 60,000 images, corresponding to Landsat patches of 33x33 pixels with 102 bands. Randomly selected from Mexico (country). Each patch is labeled with one of 12 Land Use and Vegetation classes according to the classification described at https://doi.org/10.3390/rs6053923.

The zip file contains 12 folders numbered 1-12 and each contains 5,000 .npy python files (can be loaded with the NumPy library).

The labeled classes correspond to the following identifier.

1, Temperate Coniferous forest
2, Temperate Decidius Forest
3, Temperate Mixed Forest
4, Tropical Evergreen Forest
5, Tropical Deciduous Forest
6, Scrubland
7, Wetland Vegetation
8, Agriculture
9, Grassland
10, Water body
11, Barren Land
12, Urban Area

To build that dataset, we take the information of the National Continuum of Land Use and Vegetation series number 5 generated by the National Institute of Statistics and Geography from Mexico (INEGI) from The National Commission for the Knowledge and Use of Biodiversity (CONABIO) web page (http://geoportal.conabio.gob.mx/metadatos/doc/html/usv250s5ugw.html).

The file used for this dataset construction is the shape format file with geographic coordinates located in http://www.conabio.gob.mx/informacion/gis/maps/geo/usv250s5ugw.zip.
Later, a transformation to Albers equal-area conic projection was done with the followings parameters:

Fake east: 2500000.0
Fake North: 0.0
Origin longitude: -102.0º
Origin latitude: 12.0º
First standard parallel: 17.5º
Second standard parallel: 29.5º
Linear unit: Meter (1.0)
Reference ellipsoid: GRS80


Once the data was projected, using the classes identified in the National Continuum of Land Use and Vegetation, correspondence was applied to the classes identified in https://doi.org/10.3390/rs6053923, these classes being: Agriculture, Barren land, Grassland, Scrubland, Temperate coniferous forest, Temperate deciduous forest, Temperate mixed forest, Tropical deciduous forest, Tropical evergreen forest, Urban area, Waterbody and Wetland vegetation.

Once the information layer was generated with the 12 classes indicated above, the reference layer was rasterized.
Thus, a national grid of 1,975,940 regions of 1 x 1 kilometers was generated and the percentage of pixels of the dominant class in each corresponding 1 km region was associated.

A total of cells with 70% or more pixels from one dominant class corresponds to 1,640,827 which represents a total of 83% of the Mexican territory. That means, only 17% of cells have less than 70% of their pixels from one dominant class.
Then, 5000 regions were randomly selected from each land cover class at the national level. For this random selection only were selected the regions in which cells have 70% or more of their pixels from one dominant class. The above, for looking to have consistent and reliable data for the automatic classification task. This random selection generates a total of 60,000 regions selected.

Image patches were extracted from the selected regions in the sample.

The image used is the result of the application of multiple time series analysis algorithms on a cube of image data with mainly Tier 1 (T1) quality and a few Tier 2 (T2) as described in https: // www. usgs.gov/land-resources/nli/landsat/landsat-collection-1. An Open Data Cube (ODC, https://www.opendatacube.org/) was constructed from 3,515 Landsat 5 and 7 images corresponding to the year 2011, which is the same reference year of the National Continuum of Land Use and Vegetation Series 5.

From the analysis of the ODC images, the Geomedian (https://doi.org/10.1109/TGRS.2017.2723896) was calculated, which generated a national cloud-free mosaic from 2011, pixels at 30 meters resolution and 6 spectral bands (blue, green, red, nir, swir 1, swir 2). Finally, 15 spectral indices were calculated for each pixel in the image. This resulted in 15 national mosaics from the analysis of the time series of each pixel available for the year 2011 using all the combinations of normalized difference indices, which were possible with the 6 bands that were incorporated into the data cube, with which resulted in 102 information channels. Since Landsat images have a resolution of 30 meters, we have images of 33 pixels x 33 pixels for each region of 1 km x 1 km.

The 102 channels in the patches correspond to:

Geomedian Bands (6): blue, green, red, nir, swir 1, swir 2
Geomedian Based Indexes (15): evi, bu, sr, arvi, ui, ndbi, ibi, ndvi, ndwi, mndwi, nbi, brba, nbai, baei, bi
Geomedian Based Tasseled cap transformation (6): brightness, greenness, wetness, fourth, fifth, sixth

2011 Landsat Time Analysis Series by Pixel

(red-swir 1)/(red+swir 1); (5):    min, mean, max, std, median
(red-nir)/( red+nir); (5): min, mean, max, std, median
(swir 1-swir 2)/( swir 1+swir 2); (5): min, mean, max, std, median
(nir-swir 2)/(nir+swir 2); (5): min, mean, max, std, median
(nir-swir 1)/( nir+swir 1); (5): min, mean, max, std, median
(red-swir 2)/( red+swir 2); (5): min, mean, max, std, median
(green-swir 2)/(green+swir 2); (5): min, mean, max, std, median
(green-swir 1)/(green+swir 1); (5): min, mean, max, std, median
(green-red)/(green+red); (5): min, mean, max, std, median
(green-nir)/(green+nir); (5): min, mean, max, std, median
(blue-swir 2)/(blue+swir 2); (5): min, mean, max, std, median
(blue-swir 1)/(blue+swir 1); (5): min, mean, max, std, median
(blue-red)/(blue+red); (5): min, mean, max, std, median
(blue-nir)/(blue+nir); (5): min, mean, max, std, median
(blue-green)/( blue+green); (5): min, mean, max, std, median

Notes

Class, Description 1, Temperate Coniferous forest 2, Temperate Decidius Forest 3, Temperate Mixed Forest 4, Tropical Evergreen Forest 5, Tropical Deciduous Forest 6, Scrubland 7, Wetland Vegetation 8, Agriculture 9, Grassland 10, Water body 11, Barren Land 12, Urban Area

Files

LandCover.zip

Files (31.6 GB)

Name Size Download all
md5:0b367dbafdebfb0e849824a3d0463d07
31.6 GB Preview Download