Dataset Open Access

Harmonized Tree Species Occurrence Points for Europe

Heisig, Johannes; Hengl, Tomislav

This data set is a harmonized collection of existing data from GBIF, the EU-Forest project and the LUCAS survey. It has about 3 million observations and is supplemented by variables (e.g. location accuracy, land cover type, canopy height, etc.) which enable precise filtering for specific user applications.

The RDS file is created from an sf-object and suitable for fast reading in the R-programming environment. The CSV.GZ file contains records as a table with Easting and Northing in Coordinate Reference System ETRS89 / LAEA Europe (= EPSG code 3035) and can be fed in a GIS after being unzipped.

The code producing this data set is publicly available on GitLab.

Data sets were last updated in September 2021.


  • id = unique point identifier
  • easting = x coordinate
  • northing = y coordinate
  • country = ISO country code
  • species = Latin species name
  • genus = genus name
  • scientific_name = long species name
  • gbif_taxon_key = taxon key from GBIF
  • gbif_genus_key = genus key from GBIF
  • taxon_rank = species or genus
  • year = year of observation
  • accessed_through = database through which data was accessed (GBIF, LUCAS, EU-Forest)
  • dataset_info = data set name (individual sub-data-set)
  • citation = DOI citation of the individual data set
  • license = distribution license
  • location_accuracy = spatial accuracy of observation (meters)
  • flag_location_issue = known location issues present
  • flag_date_issue = known date issues present
  • eoo = Extent of occurrence (applying the concept of natural geographical range used for the EU-Forest data set (Mauri et al., 2017) to all other data points. 1 = point inside species range; 0 = point outside; NA = EOO polygon not available for this species)
  • dbh = Diameter Breast Height (only recorded for observations from the EU-Forest data set (Mauri et al., 2017))
  • lc1 = LUCAS land cover type 1 (only recorded for observations from LUCAS data)
  • lc2 = LUCAS land cover type 2 (only recorded for observations from LUCAS data)
  • landmask_country = land mask overlay 30 meters (NA = not on land)
  • corine = CORINE 2018 land cover type (extracted from the 100 meter raster data set)
  • nightlights = light pollution observed by VIIRS (proxy for remoteness / distance to human structures)
  • canopy_height = canopy height derived from GEDI waveform LiDAR point data
  • natura_2000 = Natura 2000 site code (if a point falls inside a protected area (GIS-layer) this variable contains the site identification code; all sites can be explored on an interactive map)
  • freq_location = number of points with identical location (in some cases one location has multiple observation, differing in species and/or year. This may lead to difficulties in certain modeling tasks)
  • geometry = point geometry in ETRS89 / LAEA Europe

See this detailed documentation for more insights into each variable and individual GBIF data set citations.

If you would like to know more about the creation of this data set, see

  1. the R-Markdown documenting the process (GitLab repository)
  2. the talk at OpenGeoHub Summer School 2020 (Youtube)

Some advice: This data set is a puzzle with pieces from many different sources. Take some time to explore before including it in your work. Use summary statistics to see which variables have NAs and how many. Choose your filtering criteria wisely. For example, some points with the highest location accuracy have no record for the year of observations. You would exclude these, if "year > 1990" was your criteria.


This work has received funding from the European Union's the Innovation and Networks Executive Agency (INEA) under Grant Agreement Connecting Europe Facility (CEF) Telecom project 2018-EU-IA-0095 (

Files (140.6 MB)
Name Size
1.3 MB Download
71.9 MB Download
67.4 MB Download
  • Mauri, A., Strona, G., & San-Miguel-Ayanz, J. (2017). EU-Forest, a high-resolution tree occurrence dataset for Europe. Scientific data, 4(1), 1-8.

  • Hengl, T., Walsh, M. G., Sanderman, J., Wheeler, I., Harrison, S. P., & Prentice, I. C. (2018). Global mapping of potential natural vegetation: an assessment of machine learning algorithms for estimating land potential. PeerJ, 6, e5457.

  • de Rigo, D., Caudullo, G., Houston Durrant, T., San-Miguel-Ayanz, J., 2016. The European Atlas of Forest Tree Species: modelling, data and information on forest tree species. In: San-Miguel-Ayanz, J., de Rigo, D., Caudullo, G., Houston Durrant, T., Mauri, A. (Eds.), European Atlas of Forest Tree Species. Publ. Off. EU, Luxembourg, pp. e01aa69+

All versions This version
Views 488172
Downloads 10641
Data volume 12.7 GB2.3 GB
Unique views 392144
Unique downloads 6623


Cite as