OEMC Hackathon 2023: EU Land Cover Classification Dataset
Description
Dataset organized by the Open-Earth-Monitor (OEMC) project within the context of Hackathon 2023.
The dataset (both train and test) was produced by stratified sampling of the ground-truth data provided by LUCAS Survey, funded by the European Commission. The target land cover considered level-3 classes from the harmonized legend, resulting in 72 classes distributed over 5 years (2006, 2009, 2012, 2015, 2018):
All samples were overlaid with 416 raster spatial layers, including satellite (spectral bands and indices) and temperature images (land surface temperature), climate images (precipitation, air temperature), accessibility and distance maps (highways, water bodies, burned areas), digital terrain model (slope and elevation) and other existing maps (population count and snow covering). The result values were organized in columns, one for each spatial layers, which combined represent the feature space available for ML modeling.
Column names:
The columns are formed by six metadata fields separated by _:
- Example: red_landsat.glad.ard_p50_30m_jun25_sep12
- Metadata fields:
- F1 - Variable name: red
- F2 - Variable procedure including product name: landsat.glad.ard
- F3 - Position in the probability distribution: p50
- F4 - Spatial resolution: 30m
- F5 - Start date: jun25
- F6 - End date: sep12
Column description:
All the columns can be aggregated in six thematic groups according to F1 and F2:
- Satellite images (spectral reflectance & vegetation indices):
blue_landsat.glad.ard_{..}: Quarterly time-series of Landsat blue band (Witjes et al., 2023)blue_mod13q1_{..}: Monthly time-series of MOD13Q1 blue band (EarthData)evi_mod13q1.stl.trend.ols.alpha_{..}: Alpha coefficient / intercept (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)evi_mod13q1.stl.trend.ols.beta_{..}: Beta coefficient / trend (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)evi_mod13q1.stl.trend_{..}: Deseasonalized monthly time-series (trend component of STL) for MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)evi_mod13q1_{..}: Monthly time-series of MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)green_landsat.glad.ard_{..}: Quarterly time-series of Landsat green band (Witjes et al., 2023)mir_mod13q1_{..}: Monthly time-series of MOD13Q1 mid-infrared band (EarthData)ndvi_mod13q1_{..}: Monthly time-series of MOD13Q1 normalized vegetation index (NDVI) (EarthData)nir_landsat.glad.ard_{..}: Quarterly time-series of Landsat near-infrared band (Witjes et al., 2023)nir_mod13q1_{..}: Monthly time-series of MOD13Q1 near-infrared band (EarthData)red_landsat.glad.ard_{..}: Quarterly time-series of Landsat red band (Witjes et al., 2023)red_mod13q1_{..}: Monthly time-series of MOD13Q1 red band (EarthData)swir1_landsat.glad.ard_{..}: Quarterly time-series of Landsat short-wave infrared-1 band (Witjes et al., 2023)swir2_landsat.glad.ard_{..}: Quarterly time-series of Landsat short-wave infrared-1 band (Witjes et al., 2023)
- Temperature images:
lst_mod11a2.daytime_{..}: Monthly time-series of MOD13Q1 day time land surface temperature (EarthData)lst_mod11a2.daytime.{month}_{..}: Long-term monthly aggregation (2000—2022) for MOD13Q1 day time land surface temperature (EarthData)lst_mod11a2.daytime.trend_{..}: Deseasonalized monthly time-series (trend component of STL) for MOD13Q1 day time land surface temperature (EarthData)lst_mod11a2.daytime.trend.ols.alpha_{..}: Alpha coefficient / intercept (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 day time land surface temperature (EarthData)lst_mod11a2.daytime.trend.ols.beta_{..}: Beta coefficient / trend (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 day time land surface temperature (EarthData)lst_mod11a2.nighttime_{..}: Monthly time-series of MOD13Q1 night time land surface temperature (EarthData)lst_mod11a2.nighttime.{month}_{..}: Long-term monthly aggregation (2000—2022) for MOD13Q1 day time land surface temperature (EarthData)lst_mod11a2.nighttime.trend_{..}: Deseasonalized monthly time-series (trend component of STL) for MOD13Q1 night time land surface temperature (EarthData)lst_mod11a2.nighttime.trend.ols.alpha_{..}: Alpha coefficient / intercept (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 night time land surface temperature (EarthData)lst_mod11a2.nighttime.trend.ols.beta_{..}: Beta coefficient / trend (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 night time land surface temperature (EarthData)thermal_landsat.glad.ard_{..}: Quarterly time-series of Landsat thermal band (Witjes et al., 2023)
- Climate layers:
accum.precipitation_chelsa.annual_{..}: Accumulated precipitation over the entire year according to CHELSA timeseries inmmof water (Karger et al., 2017)accum.precipitation_chelsa.annual.3years.dif_{..}: 3-years difference considering the yearly accumulated precipitation according to CHELSA timeseries inmmof water (Karger et al., 2017)accum.precipitation_chelsa.annual.log.csum_{..}: Cumulative sum, in logarithmic space, consdering the yearly accumulated precipitation according to CHELSA timeseries (Karger et al., 2017)accum.precipitation_chelsa.montlhy_{..}: Accumulated precipitation for each month according to CHELSA timeseries inmmof water (Karger et al., 2017)bioclim.var_chelsa.{variable_code}_{..}: Bioclimatic variables derived variables from the monthly mean, max, mean temperature, and mean precipitation values. Forvariable_codedescriptions see chelsa-climate.org (Karger et al., 2017)
- Accessibility & distance maps:
accessibility.to.ports_map.ox.{variable_code}_{..}: Time-required to access ports of different size according to Nelson et al., 2019burned.area.distance_global.fire.atlas_{..}: Distance to burned areas mapped by Global Fire Atlascost.distance.to.coast_gedi.grass.gis_{..}: Cumulative cost of moving (derived by r.cost) to the coastroad.distance_osm.highways.high.density_{..}: Distance to high density of roads according to OpenStreetMaproad.distance_osm.highways.low.density_{..}: Distance to low density of roads according to OpenStreetMapwater.distance_glad.interanual.dynamic.classes_{..}: Distance to permanent / seasonal water bodies according to
Pickens et al., 2020
- Digital terrain model (DTM):
elev.lowestmode_gedi.eml_{..}: Mean estimate of the terrain elevation indmfiltered using SAGA GIS Gaussian filter (Witjes et al., 2023)slope.percent_gedi.eml_{..}: Mean slope in%derived from terrain elevation ([Witjes et al., 2023]
- Other existing maps:
pop.count_ghs.jrc_{..}: Annual time-series of population count in number of people mapped by Schiavina et al., 2023snow.duration_global.snowpack_{..}: Annual duration of snow occurrence mapped by Global SnowPack
Files
- train.csv: Training set with 42,237 rows and 420 columns, including sample id (
sample_id- index column), land cover code (land_cover), land cover label (land_cover_label), reference year (year) and 416 features / covariates - test.csv: Test set with 42,271 rows and 418 columns, including sample id (
sample_id- index column), reference year (year) and 416 features / covariates - sample_submission.csv: a sample submission file with 42,271 rows and 2 columns, including sample id (
sample_id- index column) and predicted land cover code (land_cover)
Notes
Files
00-hackathon.png
Additional details
Related works
- Is continued by
- 10.5281/zenodo.8306613 (DOI)