Ecological niche models for mapping cultural ecosystem services (CES)
Creators
-
Pérez-Girón, José Carlos1
-
Martínez-López, Javier2
-
Alcaraz-Segura, Domingo
(Project leader)2
-
Tabik, Siham2
-
Molina Cabrera, Daniel2
-
del Águila, Ana2
-
Khaldi, Rohaifa
-
Pistón, Nuria
-
Moreno Llorca, Ricardo Antonio2
-
Ros-Candeira, Andrea2
-
Navarro, Carlos Javier
(Contact person)3
-
Elghouat, Akram
-
ARENAS-CASTRO, SALVADOR4
- Irati, Nieto Pacho
-
Manuel, Merino Ceballos
-
Luis F., Romero
Description
Description
This dataset includes the inputs and outputs generated in the spatial modeling of CES using social media data for eight mountain parks in Spain and Portugal (Aigüestortes, Sierra de Guadarrama, Ordesa, Peneda-Gerês, Picos de Europa, Sierra de las Nieves, Sierra Nevada and Teide). This spatial modeling is addressed in the article in preparation entitled: "What drives cultural ecosystem services in mountain protected areas? An AI-assisted answer using social media."
The variables used as inputs to generate the models come from different sources:
-CES presence points come from social media photos (Flickr and Twitter) labeled using AI models and validated by experts. The models used for automatic labeling were Dino v2 and OPENAI's GPT 4.1 model. Consensus was sought on the labels from these two label sources, which showed F1 values above 0.75, and these labels were used as presence data.
The environmental variables used are mainly derived from:
- OpenStreetMap (OSM) https://www.openstreetmap.org/
- Variables derived from remote sensing
- Topographic variables
- Current and future climate variables derived from CHELSA (https://chelsa-climate.org/)
- Landscape metrics (calculated with Fragstats software https://www.fragstats.org/)
- Viewshed
- Land use and land cover maps (https://land.copernicus.eu/en/products/corine-land-cover)
The models were generated with the maximum entropy (MaxEnt) algorithm using the biomod2 R package, leveraging its suitability for presence-only data, low sample sizes, and mixed predictor types. To address sampling bias, we generated 10 pseudo-absence replicates based on the “target-group background” method. Models were evaluated using AUC-ROC and True Skill Statistic (TSS), with performance validation via 10-fold cross-validation, resulting in 100 runs per model. Ensemble models were created from runs with AUC-ROC > 0.6, using the median for spatial projections of CES and the coefficient of variation to estimate uncertainty. We implemented two modelling approaches: one assuming consistent CES preferences across parks, and another assuming park-specific preferences shaped by local environmental contexts.
Table 1. Categories used in social media photo tagging: Stoten, based on the scientific framework proposed by Moreno-Llorca et al. (2020) (https://doi.org/10.1016/j.scitotenv.2020.140067).
|
Stoten |
|
Cultural |
|
Fauna/Flora |
|
Gastronomy |
|
Nature & Landscape |
|
Not relevant |
|
Recreational |
|
Religious |
|
Rural tourism |
|
Sports |
|
Sun and beach |
|
Urban |
Table 2. Table of contents of the dataset
|
Folder |
format |
Description |
|||||
|
Inputs |
Base layers |
by National Park |
100-meter grid |
grid_wgs84_atrib |
.shp |
100 x 100 meter grid for each of the studied national parks that cover the study area |
|
|
Biosphere Reserve |
MAB_wgs84 |
.shp |
Biosphere reserve layers present in each of the national parks studied |
||||
|
Municipality |
Municipality |
.shp |
Layers of municipalities that overlap with the park area, biosphere reserve, Natura 2000 and the socioeconomic influence area with a 100-meter buffer |
||||
|
National park limit |
National_park_limit |
.shp |
Boundaries of each of the national parks studied |
||||
|
Natura 2000 |
RN2000 |
.shp |
Layers of the Natura 2000 for each of the national parks studied |
||||
|
Socioeconomic influence area |
AIS |
.shp |
Area of socioeconomic influence of each of the parks studied |
||||
|
Readme |
.txt |
File containing layer metadata, including download locations and descriptions of shape attributes. |
|||||
|
by National Park |
Accessibility |
.tif |
Accessibility variables that include routes, streets, parking, and train tracks |
||||
|
Climate |
.tif |
Chelsea-derived climate variable layers and solar radiation layers |
|||||
|
Ecosystem functioning |
.tif |
Layers derived from remote sensing that are related with the functional attributes of ecosystems |
|||||
|
Ecosystem structure |
.tif |
Landscape and spectral diversity metrics |
|||||
|
Geodiversity |
.tif |
Topographic and derived variables |
|||||
|
Land use Land cover |
.tif |
Layers related to land use and cover |
|||||
|
Tourism and Culture |
.tif |
Layers related to infrastructure associated with tourism such as bars, restaurants, lodgings and places of cultural interest such as monuments |
|||||
|
Scripts |
Modeling to get output data |
Biomod_modelling_by_park |
.R |
Script used for modeling CES using data from social media by fitting one ENM for each park and CES. |
|||
|
Biomod_modelling_all_parks |
.R |
Script used for modeling CES using data from social media by fitting one ENM for each CES. |
|||||
|
Modeling to get output data |
Downloading and processing variables |
EFAS |
EFAs code |
.js |
GEE scripts used to download the Ecosystem Functional Attributes (EFAs) (Paruelo et al.2001; Alcaraz-Segura et al. 2006) derived from Sentinel 2 dataset for each of the national parks studied |
||
|
OSM |
1) Download layers |
.py |
Python scripts used to download the OpenStreetMap layers of interest for each of the national parks studied. |
||||
|
2) Join layers |
.py |
Scripts used to merge OSM layers belonging to the same category. e.g., primary, secondary, and tertiary highways. |
|||||
|
3) Count point |
.py |
Scripts used to count the number of points in each of the 100 grid cells for each park, used in case of point type data |
|||||
|
4) Presence and absence |
.py |
Scripts used to assess presence in each of the cells of the 100-square grid for each park, used in the case of data types such as points, lines, and polygons. |
|||||
|
Remote sensing |
Canopy |
.js |
GEE scripts used to download the canopy (https://gee-community-catalog.org/projects/canopy/) downloaded and cropped for each of the national parks studied |
||||
|
ESPI |
.js |
GEE scripts used to download the ESPI index (Ecosystem Service Provision Index) downloaded and cropped for each of the national parks studied |
|||||
|
European disturbance map |
.js |
GEE scripts used to download European disturbance maps (//https://www.eea.europa.eu/data-and-maps/figures/biogeographical-regions-in-europe-2) downloaded and cropped for each of the national parks studied |
|||||
|
LST |
.js |
GEE scripts used to download LST maps (from Landsat Collection) downloaded and cropped for each of the national parks studied |
|||||
|
Night lights |
.js |
GEE scripts used to download nighttime light maps (https://developers.google.com/earth-engine/datasets/catalog/NOAA_VIIRS_DNB_ANNUAL_V22) downloaded and cropped for each of the national parks studied |
|||||
|
Population density |
.js |
GEE scripts used to download population density maps (https://developers.google.com/earth-engine/datasets/catalog/CIESIN_GPWv411_GPW_Population_Density) downloaded and cropped for each of the national parks studied |
|||||
|
Soil groups |
.js |
GEE scripts used to download Hydrologic Soil Group maps (https://gee-community-catalog.org/projects/hihydro_soil/) downloaded and cropped for each of the national parks studied |
|||||
|
Solar radiation |
.js |
GEE scripts used to download solar radiation maps (https://globalsolaratlas.info/support/faq) downloaded and cropped for each of the national parks studied |
|||||
|
RGB diversity |
Seasonal KMeans clustering |
.js |
GEE scripts were used to calculate seasonal clusters using Sentinel 2 RGB bands with GEE's .wekaKMeans algorithm. These layers were downloaded and cropped for each of the national parks studied. |
||||
|
Colour diversity analysis |
.R |
R script used to calculate spectral diversity (Shannon, Simpson and inverse Simpson) using the cluster layers and RGB bands derived from Sentinel 2. |
|||||
|
Post processing |
Align_and_Clip_rasters |
.py |
Python scripts used to align and clip the downloaded layers to a 100-meter grid reference layer for each of the national parks studied. |
||||
|
Outputs |
CES projections |
proj_Aiguestortes_Sports_ensemble |
.tif |
Spatial projections for the best models obtained for each CES and park |
|||
References:
Alcaraz-Segura, D., Paruelo, J., and Cabello, J. 2006: Identification of current ecosystem functional types in the Iberian Peninsula, Global Ecol. Biogeogr., 15, 200–212, https://doi.org/10.1111/j.1466-822X.2006.00215.x
Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E., Linder, H.P., Kessler, M., 2017. Climatologies at high resolution for the earth’s land surface areas. Sci Data 4, 170122. https://doi.org/10.1038/sdata.2017.122
Lobo, J.M., Jiménez-Valverde, A., Hortal, J., 2010. The uncertain nature of absences and their importance in species distribution modelling. Ecography 33, 103–114. https://doi.org/10.1111/j.1600-0587.2009.06039.x
Paruelo, J. M., Jobbágy, E. G., and Sala, O. E. 2001: Current Distribution of Ecosystem Functional Types in Temperate South America, Ecosystems, 4, 683–698, https://doi.org/10.1007/s10021-001-0037-9
Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190, 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026
Phillips, S.J., Dudík, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S., 2009. Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications 19, 181–197. https://doi.org/10.1890/07-2153.1
Thuiller, W., Georges, D., Gueguen, M., Engler, R., Breiner, F., Lafourcade, B., Patin, R., 2023. biomod2: Ensemble Platform for Species Distribution Modeling.
Sillero, N., Arenas-Castro, S., Enriquez‐Urzelai, U., Vale, C.G., Sousa-Guedes, D., Martínez-Freiría, F., Real, R., Barbosa, A.M., 2021. Want to model a species niche? A step-by-step guideline on correlative ecological niche modelling. Ecological Modelling 456, 109671. https://doi.org/10.1016/j.ecolmodel.2021.109671
Valavi, R., Guillera-Arroita, G., Lahoz-Monfort, J.J., Elith, J., 2022. Predictive performance of presence-only species distribution models: a benchmark study with reproducible code. Ecological Monographs 92, e01486. https://doi.org/10.1002/ecm.1486
Files
INPUTS.zip
Additional details
Funding
- Ministerio de Ciencia, Innovación y Universidades
- EarthCul project PID2020-118041GB-I00