There is a newer version of the record available.

Published October 1, 2020 | Version 0.1
Dataset Open

Harmonized Tree Species Occurrence Points for Europe

  • 1. Institute for Geoinformatics, Münster
  • 2. OpenGeoHub foundation

Description

This data set is a harmonized collection of existing data from GBIF, the EU-Forest project and the LUCAS survey. It has about 3 million observations and is supplemented by variables (e.g. location accuracy, land cover type, canopy height) which enable precise filtering for certain needs.

An .rds file is created from an sf-object in R. The .csv contains records as a table with Easting and Northing in Coordinate Reference System ETRS89 / LAEA Europe (= EPSG code 3035).

The code is publicly available on GitLab.

Variables:

  • id = unique point identifier
  • easting = x coordinate
  • northing = y coordinate
  • country = ISO country code
  • species = Latin species name
  • genus = genus name
  • scientific_name = long species name
  • gbif_taxon_key = taxon key from GBIF
  • gbif_genus_key = genus key from GBIF
  • taxon_rank = species or genus
  • year = year of observation
  • accessed_through = database through which data was accessed (GBIF, LUCAS, EU-Forest)
  • dataset_info = data set name (individual sub-data-set)
  • citation = DOI citation of the individual data set
  • license = distribution license
  • location_accuracy = spatial accuracy of observation (meters)
  • flag_location_issue = known location issues present
  • flag_date_issue = known date issues present
  • eoo = Extent of occurrence (applying the concept of natural geographical range used for the EU-Forest data set (Mauri et al., 2017) to all other data points. 1 = point inside species range; 0 = point outside; NA = EOO polygon not available for this species)
  • dbh = Diameter Breast Height (only recorded for observations from the EU-Forest data set)
  • lc1 = LUCAS land cover type 1 (only recorded for observations from LUCAS data)
  • lc2 = LUCAS land cover type 2 (only recorded for observations from LUCAS data)
  • landmask_country = land mask overlay 30 meters (NA = not on land)
  • corine = CORINE 2018 land cover type (extracted from the 100 meter raster data set)
  • nightlights = light pollution observed by VIIRS (proxy for remoteness / distance to human structures)
  • canopy_height = canopy height derived from GEDI waveform LiDAR point data
  • natura_2000 = Natura 2000 site code (if a point falls inside a protected area (GIS-layer) this variable contains the site identification code; all sites can be explored on an interactive map)
  • freq_location = number of points with identical location (in some cases one location has multiple observation, differing in species and/or year. This may lead to difficulties in certain modeling tasks)
  • geometry = point geometry in ETRS89 / LAEA Europe

See this detailed documentation for more insights into each variable.

If you would like to know more about the creation of this data set, see

  1. the R-Markdown documenting the process (GitLab repository)
  2. the talk at OpenGeoHub Summer School 2020 (Youtube)

Some advice: This data set is a puzzle with pieces from many different sources. Take some time to explore before including it in your work. Use summary statistics to see which variables have NAs and how many. Choose your filtering criteria wisely. For example, some points with the highest location accuracy have no record for the year of observations. You would exclude these, if "year > 1990" was your criteria.

 

 

This work has received funding from the European Union's the Innovation and Networks Executive Agency (INEA) under Grant Agreement Connecting Europe Facility (CEF) Telecom project 2018-EU-IA-0095 (https://ec.europa.eu/inea/en/connecting-europe-facility/cef-telecom/2018-eu-ia-0095).

Files

tree_species_occ_harmonized_final.csv

Files (937.8 MB)

Name Size Download all
md5:b4011fd844097e40055a1994539ac53e
867.2 MB Preview Download
md5:1d6e1761f07ed6609d50089a70e440a3
70.6 MB Download

Additional details

References

  • Mauri, A., Strona, G., & San-Miguel-Ayanz, J. (2017). EU-Forest, a high-resolution tree occurrence dataset for Europe. Scientific data, 4(1), 1-8. https://doi.org/10.1038/sdata.2016.123
  • Hengl, T., Walsh, M. G., Sanderman, J., Wheeler, I., Harrison, S. P., & Prentice, I. C. (2018). Global mapping of potential natural vegetation: an assessment of machine learning algorithms for estimating land potential. PeerJ, 6, e5457. https://peerj.com/articles/5457/
  • de Rigo, D., Caudullo, G., Houston Durrant, T., San-Miguel-Ayanz, J., 2016. The European Atlas of Forest Tree Species: modelling, data and information on forest tree species. In: San-Miguel-Ayanz, J., de Rigo, D., Caudullo, G., Houston Durrant, T., Mauri, A. (Eds.), European Atlas of Forest Tree Species. Publ. Off. EU, Luxembourg, pp. e01aa69+ https://forest.jrc.ec.europa.eu/en/european-atlas/