SCARFACE: a harmonized spatio-temporal dataset integrating socio-economic, environmental, and agricultural indicators for the Po Valley (Italy), 2011--2024
Authors/Creators
Description
The SCARFACE research initiative
SCARFACE (Sequestering CARbon through Forests, AgriCulture, and land usE - https://www.paolomaranzano.net/scarface) is a research initiative funded by the University of Milano-Bicocca (UniMiB), Italy. The project blends complementary and interdisciplinary research experiences from the statistical, data science and environmental and atmospheric chemistry backgrounds at UniMiB. Along with researchers from UniMiB, the project involves researchers from the Italian Council for Agricultural Research and Economics - Research Centre for Agricultural Policies and Bioeconomy (CREA-PB), Italy, and the School of Mathematics and Statistics of the University of Glasgow (Scotland, UK).
The SCARFACE dataset
The project assembles a harmonized spatio-temporal dataset that integrates several domain, such as climate, air quality, pollution emissions, land cover, soil properties, agro-industry dynamics and socio-economic indicators, to jointly investigate interconnected processes linking agricultural systems, atmospheric dynamics, emissions, and socioeconomic conditions in the Po Valley (Northern Italy), an area characterized by strong interactions among agricultural systems, environmental processes, and human activities.
The spatial reference unit adopted in SCARFACE is the Agrarian Sub-Region (ASR), a territorial classification defined by the Italian National Statistics Office (ISTAT). ASRs represent groups of contiguous municipalities that are considered relatively homogeneous with respect to natural conditions, agronomic characteristics, and agricultural production systems. The Po Valley can be partitioned into m=256 ASRs with different sizes and shapes.
The SCARFACE dataset integrates information for the period from 2011 to 2024 (i.e., T=14 time stamps), with the initial and final temporal coverage depending on the availability of the individual data sources. Therefore, the final database adopts an annual panel structure defined over ASR spatial units and composed of a total number of spatio-temporal observations equal to N=mxT=256x14=3584 for each variable.
Overall, SCARFACE comprises a set of p=2748 variables (plus three unique identifiers, that is, year, ASR and geometry) that include administrative records, gridded environmental products, satellite-derived land information, and survey-based socio-economic indicators sourced from national and international public institutions, covering a wide range of thematic domains. Farm activity and agro-economic indicators are derived from the Farm Accountancy Data Network (FADN) survey coordinated by the Italian Council for Agricultural Research and Economics (CREA). Emissions data are obtained from the EDGAR inventories developed by the European Commission, while air quality information is sourced from both the European Environment Agency (EEA) and the Copernicus Atmosphere Monitoring Service (CAMS). Meteorological variables are retrieved from the ERA5-Land reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), and extreme weather indicators are provided by the European Drought Observatory (EDO). Land cover information is based on the CORINE Land Cover dataset from Copernicus and the Global Dynamic Land Cover (GDLC) dataset. Livestock data are obtained from the Italian National Livestock Registry (BDN) managed by the Italian Ministry of Health, while socio-economic indicators are produced by ISTAT. Finally, geographical features and administrative metadata are derived from a combination of Amazon Web Service (AWS), ISTAT and Eurostat.
The dataset is designed as a versatile resource supporting both methodological and applied developments, as well as policy-relevant analyses, including:
- Panel data analyses at moderate spatial and temporal resolutions
- Advanced spatio-temporal modeling in the presence of heterogeneous covariates and high-dimensional settings
- Spatial and spatio-temporal clustering exercises, facilitating the identification of regional typologies and underlying patterns in agricultural and environmental systems.
- Reproducible, cross-domain policy-oriented analyses, particularly in relation to agricultural transitions, air quality management, and climate variability in one of Europe’s most critical environmental hotspots.
The building process of the dataset is detailed in the companion paper.
Table of contents (English)
This repository contains the following files:
- SCARFACE_DatasetSingleObjects_April2026.xlsx: this is an Excel (.xlsx) file containing the 14 source-specific dataset that constitute the SCARFACE framework. Data are organized into 14 distinct sheets that can be matched using the primary keys 'ASR' (unique geographical/spatial ID) and 'Year' (unique temporal ID). Geometries can be added matching the shapefile contained in ASRs_Geometries.zip (with primary key 'ASR');
- SCARFACE_DatasetSingleObjects_April2026.RData: this is a RData file containing the 14 source-specific dataset that constitute the SCARFACE framework. Data are organized into 14 distinct data frame that can be matched using the primary keys 'ASR' (unique geographical/spatial ID) and 'Year' (unique temporal ID). The object ASRsPoValley_sf is of class 'sf' and contains geometries of each polygon;
- SCARFACE_DatasetExtended_April2026.xlsx: this is an Excel (.xlsx) file containing the whole SCARFACE dataset in a single sheet. Individual dataset were matched using a full join approach using the primary keys 'ASR' (unique geographical/spatial ID) and 'Year' (unique temporal ID). Geometries can be added matching the shapefile contained in ASRs_Geometries.zip (with primary key 'ASR');
- SCARFACE_DatasetExtended_April2026.RData: this is a RData file containing the whole SCARFACE dataset in a data frame of class sf, that is, a spatial object. Individual dataset were matched using a full join approach using the primary keys 'ASR' (unique geographical/spatial ID) and 'Year' (unique temporal ID). Geometries can be added matching the shape file contained in ASRs_Geometries.zip (with primary key 'ASR');
- ASRs_Geometries.zip: this is a zip file that contains the shapefiles containing the geometries of the m=256 ASRs polygon;
- SCARFACE_MissingAnalysis_April2026.xlsx: this is an Excel (.xlsx) file containing post-merging information about missing values for each individual dataset. Missing values are described for each year, variable and ASR. Information is reported in a separate sheet for each dataset;
- SCARFACE_StructureAnalysis_April2026.xlsx: this is an Excel (.xlsx) file containing information about the structure of each individual dataset. In particular, the number of rows, the number of ASRs with valid values, the temporal range (coverage), and the number of variables are reported for each dataset;
- SCARFACE - Methodological note.pdf: this a PDF file containing a note on the statistical methodologies used to spatially-align gridded datasets via spatial block kriging (Section S1), the post-stratification procedure adopted to generate the spatio-temporal weighting system (Section S2) and the Generalized Variance Function (GVF) methodology adopted to regularize direct estimates of the variance for FADN survey data (Section S3);
- SCARFACE - Tables and list of available indicators.pdf: this a PDF file containing tables describing the available information (e.g., reclassification, aggregation, etc.) for all the data sources included in the SCARFACE dataset;
- SCARFACE - Data and replication code.zip: this is a zip file containing R and Python code to reproduce the final merged data frame. The zip file contains
- A separate folder for each source-specific dataset (i.e., Animals, CAMSgrid, EDGARgrid, EDOgrid, EEAconc, EnvironmentalVars, LandCover, ISTATSocioEconomicData and CREA)
- A folder with auxiliary functions to apply the spatial block kriging algorith used to upscale gridded data (i.e., AuxFuns_Kriging)
- A folder containing data extracted from the 7th Italian Agricultural Census 2020 and provided by ISTAT used to check the validity of several data sources (i.e., ISTAT_AgroCensus2020)
- A folder containing data and code to generate the geographical metadata at the ASR level (i.e., Match_ASR_Munic). Among others, the folder contains the matching table "Match_LAUs_RegAgrarie_PoValley" (CSV and RData) that reports the complete municipality--ASR correspondence to facilitate reproducibility of the spatial aggregation procedures;
- A folder containing code to merge individual dataset and to check the data quality (i.e., Merge dataset).
Notes (English)
Notes (English)
Notes (English)
Notes (English)
Other (English)
Acknowledgements
This work is part of the "Sequestering CARbon through Forests, AgriCulture, and land usE (SCARFACE)" research project, funded by the University of Milano-Bicocca, under grant number 2024-ATEQC-0048. Further information about the project can be found at the link https://www.paolomaranzano.net/scarface.
We acknowledge the Italian Council for Agricultural Research and Economics -- Research Centre for Agricultural Policies and Bioeconomy (CREA-PB) for providing the research team with access to the RICA-FADN database within the AgroGeoStat research agreement.
We also acknowledge the GEMMA center in the framework of project MUR "Dipartimenti di eccellenza 2023-2027".
We also acknowledge researchers from Associazione Economia e Sostenibilità (EStà) and Terra! for the feedback provided within the joint research projects Allevamenti intensivi e sistemi alimentari sostenibili and Per il lavoro dignitoso e la transizione giusta: verso l’Osservatorio Lavoro e Ambiente nei sistemi alimentari.
We also acknowledge and thank all the colleagues that provided the research team with comments and suggestions, in particular Laura Marcis (University of Valle d'Aosta, IT), Renato Salvatore (University of Cassino and Southern Lazio, IT), Paul Parker (UCSC, USA) and Scott Holan (University of Missouri, USA) for the survey data integration.
Files
SCARFACE_logo_500x500.png
Files
(14.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:e84f53de07fecf3a3bf8091f680d7809
|
23.2 kB | Preview Download |
|
md5:dca04d461a1413ecba93634f640a0253
|
304.0 kB | Preview Download |
|
md5:b8739d57aa655043ec4724d2a6b61374
|
1.2 MB | Preview Download |
|
md5:c0b90e4e4a94f166eefd1bcf4f4527dd
|
6.9 GB | Preview Download |
|
md5:0437f0aac72de1a0355c55cc6c1c750e
|
6.9 GB | Preview Download |
|
md5:35f66732fb225c725d0746908615f87d
|
550.0 kB | Preview Download |
|
md5:300edcd2c3422dfe004a7ea2b682ec0a
|
469.6 kB | Preview Download |
|
md5:2f3c4a05545754d251f2da0bc7e39133
|
39.6 MB | Download |
|
md5:0e4c7fb429a1fe6be0ec34d1a6146675
|
75.5 MB | Download |
|
md5:fdaadfd1c1f6002c4e971b26254cb2d6
|
39.3 MB | Download |
|
md5:29e5aec29f554dda608197e61a56de87
|
73.0 MB | Download |
|
md5:88d72c7be27ac45da049d1277bc6a380
|
74.2 kB | Preview Download |
|
md5:ea5a326e8b7f5a4b1a2e430194584b0a
|
535.9 kB | Download |
|
md5:0d5f40f3b076dab0f919ece548218c63
|
7.2 kB | Download |
Additional details
Funding
Software
- Repository URL
- https://github.com/ScarfaceSeqCARForAgriCultLandusE
- Programming language
- R , Python
- Development Status
- Active