Published January 27, 2023 | Version 1.0.0
Dataset Open

A dataset of Earth Observation Data for Lithological Mapping using Machine Learning

  • 1. IIT/ NCSR Demokritos , Greece
  • 2. School of Rural, Surveying and Geo-Informatics Engineering/ NTUA, Greece

Description

Dataset Information

Machine Learning (ML) algorithms had successfully contributed in the creation of automated methods of recognizing patterns in high-dimensional data. Remote sensing data  covers  wide  geographical areas and could be used to solve the problem of the demand of various  in-situ data.  Lithologicall mapping using remotely sensed data  is one of the most challenging  applications of ML algorithms. In the framework of the “AI for Geoapplications” project , ML and especially Deep Learning (DL) methodologies are investigated  for  the identification and characterization of the lithology based on remote sensing data in various  pilot areas  in Greece.  In order to train and test the various ML algorithms, a dataset consisting of  30 ROIs selected  mainly  from low -vegetated areas,  that cover 2% of the total  area of Greece was created

Dataset Preprocessing

Dataset preprocessing was executed using a combination of SNAP, QGIS and ENVI tools.

Preprocessing steps:

Defining areas with the following properties:

  • Zero cloud and snow coverage

  • No water bodies

  • Minimum vegetation

For the Aster Images:

  • Subset on defined areas

  • Mosaic images when needed

  • Digitising clouds

For the Labels:

  • We got the Soil map from YPEN (https://ypen.gov.gr/)

  • Subset on defined areas

  • All categories are represented with good analogies

  • Clip label files with digitised clouds

  • Rasterize

 

For the Labels we have eighteen categories for the twenty-eight areas that we collected data. We use the following coding for the Labels of our Dataset:

Alluvial deposits

0

Limestone colluvial deposits

1

Limestones

2

Schists

3

Quaternary sediments

4

Gneiss

5

Slope fan debris

6

Mixed flysch

7

Flysch shale and cherts

8

Dolomites

9

Granite

10

Sandstone flysch

11

Flysch colluvial deposits

12

Peridotite and Gabbro

13

River bed deposits

14

Gneiss colluvial deposits

15

Not available

-100

cloud coverage

-999

The following table lists the available areas and the categories that each containsLithology_Dataset

 

For the Sentinel-2 images, we made the following process:

  • Resampling 10m

  • Subset on defined areas

The Sentinel-2 map contains: Sentinel 2 false colour composite 11/8/4 with OSM background

The Final step is the collocation of the previous into a datacube i.e a multidimensional array with 25 bands (datacube dimensions differentiate for every area) using the Aster image as base (15m spatial resolution). 

  • Bands 1-14: Aster

  • Bands 15-24: S2

  • Band 25: Label

The code for preprocessing the dataset in order to be used for machine learning algorithms can be found in the following link:  

https://github.com/georgegiannop/Lithology

Citation

If you use this dataset in your work, please cite our paper:

Vernikos, I., Giannopoulos, G., Christopoulou, A., Begaj, A., Stefouli, M., Bratsolis, E., and Charou, E.: A dataset of Earth Observation Data for Lithological Mapping using Machine Learning, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-17570, https://doi.org/10.5194/egusphere-egu23-17570, 2023.

 

 

Files

areas_greece.png

Files (908.0 MB)

Name Size Download all
md5:f87d862cc41187ca08906b1ff5f5c21d
11.1 MB Preview Download
md5:1329499e81af4fa174c42cac1bb76f64
326.0 kB Preview Download
md5:6ee29276d3ce60b9acf98a4f11da807f
294.7 kB Preview Download
md5:7b46f8e82c955d3ba6bd6f547f8da8a0
896.3 MB Preview Download
md5:584f591f040a6d1b50504837dceaa830
9.2 kB Download

Additional details

Related works

Is described by
Conference paper: 10.5194/egusphere-egu23-17570 (DOI)