Published September 1, 2023 | Version v1
Dataset Open

OEMC Hackathon 2023: EU Land Cover Classification Dataset

  • 1. OpenGeoHub Foundation

Description

Dataset organized by the Open-Earth-Monitor (OEMC) project within the context of Hackathon 2023.

The dataset (both train and test) was produced by stratified sampling of the ground-truth data provided by LUCAS Survey, funded by the European Commission. The target land cover considered level-3 classes from the harmonized legend, resulting in 72 classes distributed over 5 years (20062009201220152018):

All samples were overlaid with 416 raster spatial layers, including satellite (spectral bands and indices) and temperature images (land surface temperature), climate images (precipitation, air temperature), accessibility and distance maps (highways, water bodies, burned areas), digital terrain model (slope and elevation) and other existing maps (population count and snow covering). The result values were organized in columns, one for each spatial layers, which combined represent the feature space available for ML modeling.

Column names:

The columns are formed by six metadata fields separated by _:

  • Example: red_landsat.glad.ard_p50_30m_jun25_sep12
  • Metadata fields:
    • F1 - Variable name: red
    • F2 - Variable procedure including product name: landsat.glad.ard
    • F3 - Position in the probability distribution: p50
    • F4 - Spatial resolution: 30m
    • F5 - Start date: jun25
    • F6 - End date: sep12

Column description:

All the columns can be aggregated in six thematic groups according to F1 and F2:

  • Satellite images (spectral reflectance & vegetation indices):
    • blue_landsat.glad.ard_{..}: Quarterly time-series of Landsat blue band (Witjes et al., 2023)
    • blue_mod13q1_{..}: Monthly time-series of MOD13Q1 blue band (EarthData)
    • evi_mod13q1.stl.trend.ols.alpha_{..}: Alpha coefficient / intercept (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)
    • evi_mod13q1.stl.trend.ols.beta_{..}: Beta coefficient / trend (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)
    • evi_mod13q1.stl.trend_{..}: Deseasonalized monthly time-series (trend component of STL) for MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)
    • evi_mod13q1_{..}: Monthly time-series of MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)
    • green_landsat.glad.ard_{..}: Quarterly time-series of Landsat green band (Witjes et al., 2023)
    • mir_mod13q1_{..}: Monthly time-series of MOD13Q1 mid-infrared band (EarthData)
    • ndvi_mod13q1_{..}: Monthly time-series of MOD13Q1 normalized vegetation index (NDVI) (EarthData)
    • nir_landsat.glad.ard_{..}: Quarterly time-series of Landsat near-infrared band (Witjes et al., 2023)
    • nir_mod13q1_{..}: Monthly time-series of MOD13Q1 near-infrared band (EarthData)
    • red_landsat.glad.ard_{..}: Quarterly time-series of Landsat red band (Witjes et al., 2023)
    • red_mod13q1_{..}: Monthly time-series of MOD13Q1 red band (EarthData)
    • swir1_landsat.glad.ard_{..}: Quarterly time-series of Landsat short-wave infrared-1 band (Witjes et al., 2023)
    • swir2_landsat.glad.ard_{..}: Quarterly time-series of Landsat short-wave infrared-1 band (Witjes et al., 2023)
  • Temperature images:
    • lst_mod11a2.daytime_{..}: Monthly time-series of MOD13Q1 day time land surface temperature (EarthData)
    • lst_mod11a2.daytime.{month}_{..}: Long-term monthly aggregation (2000—2022) for MOD13Q1 day time land surface temperature (EarthData)
    • lst_mod11a2.daytime.trend_{..}: Deseasonalized monthly time-series (trend component of STL) for MOD13Q1 day time land surface temperature (EarthData)
    • lst_mod11a2.daytime.trend.ols.alpha_{..}: Alpha coefficient / intercept (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 day time land surface temperature (EarthData)
    • lst_mod11a2.daytime.trend.ols.beta_{..}: Beta coefficient / trend (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 day time land surface temperature (EarthData)
    • lst_mod11a2.nighttime_{..}: Monthly time-series of MOD13Q1 night time land surface temperature (EarthData)
    • lst_mod11a2.nighttime.{month}_{..}: Long-term monthly aggregation (2000—2022) for MOD13Q1 day time land surface temperature (EarthData)
    • lst_mod11a2.nighttime.trend_{..}: Deseasonalized monthly time-series (trend component of STL) for MOD13Q1 night time land surface temperature (EarthData)
    • lst_mod11a2.nighttime.trend.ols.alpha_{..}: Alpha coefficient / intercept (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 night time land surface temperature (EarthData)
    • lst_mod11a2.nighttime.trend.ols.beta_{..}: Beta coefficient / trend (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 night time land surface temperature (EarthData)
    • thermal_landsat.glad.ard_{..}: Quarterly time-series of Landsat thermal band (Witjes et al., 2023)
  • Climate layers:
    • accum.precipitation_chelsa.annual_{..}: Accumulated precipitation over the entire year according to CHELSA timeseries in mm of water (Karger et al., 2017)
    • accum.precipitation_chelsa.annual.3years.dif_{..}: 3-years difference considering the yearly accumulated precipitation according to CHELSA timeseries in mm of water (Karger et al., 2017)
    • accum.precipitation_chelsa.annual.log.csum_{..}: Cumulative sum, in logarithmic space, consdering the yearly accumulated precipitation according to CHELSA timeseries (Karger et al., 2017)
    • accum.precipitation_chelsa.montlhy_{..}: Accumulated precipitation for each month according to CHELSA timeseries in mm of water (Karger et al., 2017)
    • bioclim.var_chelsa.{variable_code}_{..}: Bioclimatic variables derived variables from the monthly mean, max, mean temperature, and mean precipitation values. For variable_code descriptions see chelsa-climate.org (Karger et al., 2017)
  • Accessibility & distance maps:
    • accessibility.to.ports_map.ox.{variable_code}_{..}: Time-required to access ports of different size according to Nelson et al., 2019
    • burned.area.distance_global.fire.atlas_{..}: Distance to burned areas mapped by Global Fire Atlas
    • cost.distance.to.coast_gedi.grass.gis_{..}: Cumulative cost of moving (derived by r.cost) to the coast
    • road.distance_osm.highways.high.density_{..}: Distance to high density of roads according to OpenStreetMap
    • road.distance_osm.highways.low.density_{..}: Distance to low density of roads according to OpenStreetMap
    • water.distance_glad.interanual.dynamic.classes_{..}: Distance to permanent / seasonal water bodies according to
      Pickens et al., 2020
  • Digital terrain model (DTM):
    • elev.lowestmode_gedi.eml_{..}: Mean estimate of the terrain elevation in dm filtered using SAGA GIS Gaussian filter (Witjes et al., 2023)
    • slope.percent_gedi.eml_{..}: Mean slope in % derived from terrain elevation ([Witjes et al., 2023]
  • Other existing maps:
    • pop.count_ghs.jrc_{..}: Annual time-series of population count in number of people mapped by Schiavina et al., 2023
    • snow.duration_global.snowpack_{..}: Annual duration of snow occurrence mapped by Global SnowPack

Files

  • train.csv: Training set with 42,237 rows and 420 columns, including sample id (sample_id - index column), land cover code (land_cover), land cover label (land_cover_label), reference year (year) and 416 features / covariates
  • test.csv: Test set with 42,271 rows and 418 columns, including sample id (sample_id - index column), reference year (year) and 416 features / covariates
  • sample_submission.csv: a sample submission file with 42,271 rows and 2 columns, including sample id (sample_id - index column) and predicted land cover code (land_cover)

Notes

More information about the hackathon in https://www.kaggle.com/competitions/oemc-hackathon-eu-land-cover-classification/overview

Files

00-hackathon.png

Files (75.0 MB)

Name Size Download all
md5:2e65274a03e65ea567d678928c4f14c1
2.0 MB Preview Download
md5:e1c8a95864f58b6b7c1ff41f2083efee
363.4 kB Preview Download
md5:21ac96daaab62fb7cd29a63abad18f08
36.0 MB Download
md5:af28c970d1ff60ee0bffeb2541628cb4
36.6 MB Download

Additional details

Related works

Is continued by
10.5281/zenodo.8306613 (DOI)

Funding

European Commission
OEMC - Open-Earth-Monitor Cyberinfrastructure 101059548