Published January 4, 2022 | Version 0.3

Dataset Open

Presence-Absence Points for Tree Species Distribution Modelling for Europe

1. OpenGeoHub foundation
2. Institute for Geoinformatics, Münster
3. University of Bremen
4. Wageningen University & Research

The dataset is a collection of presence and absence points for forest tree species for Europe. Each unique combination of longitude, latitude and year was considered as an independent sample. Presence data was obtained from the harmonized tree species occurrence dataset by Heisig and Hengl (2020) and absence data from the LUCAS (in-situ source) dataset.

A set of 50 different forest tree species was selected from the harmonized tree species dataset and data lacking a temporal observation was overlaid with yearly forest masks derived from land cover maps produced by Parente et al. (2021). We overlaid the points with the probability maps for the classes:

311: Broad-leaved forest,
312: Coniferous forest,
313: Mixed forest,
323: Sclerophyllous forest,
324: Transitional woodland-shrub,
333: Sparsely vegetated area.

Points were included in the dataset only if the probability value extracted for at least one of the above classes was ≥ 50% for all the years considered. An additional quality flag was added to distinguish points coming from this operation and the points with original year of observation coming from source datasets.

The final dataset contains 4,359,999 observations for and a total of 630 columns.

The first 8 columns of the dataset contain metadata information used to uniquely identify the points:

id: unique point identifier,
year: year of observation,
postprocess: quality flag to identify if the temporal reference of an observation comes from the original dataset or is the result of spatiotemporal overlay with forest masks,
Tile_ID: contains the tile id from the eu_tiling_system (30 km grid),
easting: longitude coordinates in Coordinate Reference System ETRS89 / LAEA Europe (= EPSG code 3035),
northing: latitude coordinates in Coordinate Reference System ETRS89 / LAEA Europe (= EPSG code 3035),
Atlas_class: name of the tree species according to the European Atlas of Forest Tree Species or NULL in case of absence point,
lc1: contains original LUCAS land cover class or NULL if it's a presence point.

The remaining columns contain the extracted values of a series of predictor variables (temperature, precipitation, elevation, topographical information, spectral reflectance) useful for species distribution modeling applications. These points were used to model the potential and realized distribution of a series of 16 target species for the period 2000 - 2020. The approach involved training three ML models to predict probability of presence (i.e. Random Forest, XGBoost, GLM), which served as input to train a linear meta-model (i.e. Logistic regression classifier), responsible for predicting the final probability of presence for each species.

The RDS file is created from a data.table object and suitable for fast reading in the R-programming environment. The CSV.GZ file contains records as a table with easting and northing in Coordinate Reference System ETRS89 / LAEA Europe (= EPSG code 3035) and can be fed in a GIS after being unzipped.

We provide RDS files for a 30km tile as an example containing raster stacks at 30m resolution of all the covariates included in the regression matrix. You can find the specific geographical location of the tile in Europe using the attached GeoPackage ("eu_tiling_system_30km"): open it in QGIS and filter by "ID".

In our approach we considered both static and dynamic covariates: dynamic covariates are calculated as averages of a 4 years time window (example: 2004 contains averages from 2002 to 2006). To get the predictions for a specific year, covariates contained in the static RDS file need to be bound with the respective year.

To access our predictions (probabilities and uncertainties) produced for the target species access:

Open Data Science Europe viewer: https://maps.opendatascience.eu
Check the Related identifiers section of this repository to access each species individually

If you instead would like to know more about the creation of this dataset and the modeling:

watch the talk at Open Data Science Workshop 2021 (TIB AV-PORTAL)
access the repository with our R/Python scripts and follow the instructions (GitLab)

A publication describing, in detail, all processing steps, accuracy assessment and general analysis of species distribution maps is available on PeerJ. To suggest any improvement/fix use https://gitlab.com/geoharmonizer_inea/spatial-layers/-/issues.

Notes

This work is co-financed under Grant Agreement Connecting Europe Facility (CEF) Telecom project 2018-EU-IA-0095 by the European Union (https://ec.europa.eu/inea/en/connecting-europe-facility/cef-telecom/2018-eu-ia-0095).

Files

00-preview.png

Files (5.7 GB)

Name	Size
00-preview.png md5:636d9474a59c3da1e38229bfbbfef916	4.5 MB	Preview Download
eu_tilling_system_30km.gpkg md5:4e42b7475c7edec002766fc4089c5c33	1.8 MB	Download
regression_matrix.csv.gz md5:ba68c4e7d18d6ba9b215203b27659cb6	2.5 GB	Download
regression_matrix.rds md5:8c2202c14ed13f8dfdc3eb4fa9e384b2	2.0 GB	Download
tile_8766_30m_2000.rds md5:fc576c29459a543c96499bb0026c9c81	204.4 MB	Download
tile_8766_30m_2004.rds md5:43d5e9bbffed04d4e15ec016ea032ed8	204.5 MB	Download
tile_8766_30m_2008.rds md5:3fb0ae0c2a72f851b2d24376dff93d1a	201.9 MB	Download
tile_8766_30m_2012.rds md5:ffd746e409afc9f5a12c1e018731123f	205.1 MB	Download
tile_8766_30m_2016.rds md5:2339f1a25fcb1959224c8a7ab9dd6a45	206.7 MB	Download
tile_8766_30m_2020.rds md5:a458a2ece08fa3e6b92857c74d3b73c3	208.0 MB	Download
tile_8766_30m_static.rds md5:ffea9e0ed8158965b573e15df976b672	40.4 MB	Download

Additional details

Is source of: Dataset: 10.5281/zenodo.5873412 (DOI); Dataset: 10.5281/zenodo.5873917 (DOI); Dataset: 10.5281/zenodo.5874796 (DOI); Dataset: 10.5281/zenodo.5877786 (DOI); Dataset: 10.5281/zenodo.5879371 (DOI); Dataset: 10.5281/zenodo.5881699 (DOI); Dataset: 10.5281/zenodo.5882763 (DOI); Dataset: 10.5281/zenodo.5883180 (DOI); Dataset: 10.5281/zenodo.5883781 (DOI); Dataset: 10.5281/zenodo.5884261 (DOI); Dataset: 10.5281/zenodo.5885196 (DOI); Dataset: 10.5281/zenodo.5886708 (DOI); Dataset: 10.5281/zenodo.5887007 (DOI); Dataset: 10.5281/zenodo.5887415 (DOI); Dataset: 10.5281/zenodo.5896620 (DOI); Dataset: 10.5281/zenodo.5896665 (DOI)

Views

Downloads

Show more details

	All versions	This version
Views	2,635	1,321
Downloads	3,004	1,895
Data volume	1.7 TB	1.1 TB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: May 4, 2022
Modified: July 16, 2024

Presence-Absence Points for Tree Species Distribution Modelling for Europe

Authors/Creators

Description

Notes

Files

00-preview.png

Files (5.7 GB)

Additional details

Related works