Published March 3, 2026 | Version 1.0.0
Dataset Open

Global Facility-Level Solar Photovoltaic Inventory with Energy Generation and Loss Estimates

  • 1. EDMO icon University of Oxford, Department of Physics
  • 2. EDMO icon University College London, Department of Geography
  • 1. EDMO icon University College London, Department of Space and Climate Physics, Mullard Space Science Laboratory
  • 2. ROR icon University of Leicester
  • 3. EDMO icon University of Oxford, Department of Physics
  • 4. EDMO icon University of Bath

Description

 

Overview

This Zenodo release provides a global facility-level solar photovoltaic (solar PV) inventory with facility-scale energy generation and aerosol-related loss estimates, prepared alongside the manuscript: "Coal plants persist as a large barrier to the global solar energy transition" Nature Sustainability.

The dataset was generated using the framework described in the manuscript and Methods. In brief, a three-step workflow was used: (1) identify candidate PV facilities globally by combining existing inventories, crowd-sourced records, and a CNN-based scan of Sentinel-2 imagery; (2) extract precise panel footprints from confirmed sites using SAM-based segmentation; and (3) integrate the resulting footprints with MERRA-2 atmospheric reanalysis and a validated PV model to estimate facility-level generation and losses from clouds and aerosols.

The release includes PV facility footprints and core attributes:
PV_ID, latitude, longitude, country, year, area_m2.

In the main inventory files, year is the PV facility build/commissioning year (installation year), estimated from Sentinel-2 time-series classification as described in the manuscript Methods.

It contains two complementary data components:

  • A global geospatial PV facility inventory (`.gpkg`, `.csv`, `.parquet`).
  • Annual facility-level PV generation/loss tables (PV_facility_generation_year_YYYY.csv, currently 2017-2023).

Package Contents

  •  `global_pv_facility_inventory.gpkg`
    • Layer: `global_pv_facility_inventory`
    • Geometry: `MultiPolygon`
    • CRS: `EPSG:4326` (WGS 84)
  • `global_pv_facility_inventory.csv`
    • Attribute-only table (no geometry)
  • `global_pv_facility_inventory.parquet`
    • GeoParquet (geometry + attributes)
  • `Year-specific facility-level generation/loss tables (top-level CSV files)`
    • Generated to support the manuscript analysis of facility-level PV energy generation and losses.
    • For each year-specific file, analysis includes only facilities installed by that year; therefore facility counts differ across years.
    • Year-specific facility-level generation/loss tables:
      • `PV_facility_generation_year_2017.csv`
      • `PV_facility_generation_year_2018.csv`
      • `PV_facility_generation_year_2019.csv`
      • `PV_facility_generation_year_2020.csv`
      • `PV_facility_generation_year_2021.csv`
      •  `PV_facility_generation_year_2022.csv`
      •  `PV_facility_generation_year_2023.csv`
    • Each file includes:
      • Core facility columns: `PV_ID`, `latitude`, `longitude`, `country`, `year`, `area_m2`
      • `power_POA (kWh)`: power generation estimated from plane-of-array (POA) irradiance.
      • `power_POA_clr (kWh)`: POA-based power generation under clear-sky (cloud-free) conditions.
      • `power_POA_cln (kWh)`: POA-based power generation under clean-sky (aerosol-free) conditions.
      • `aerosol_loss (kWh)`: facility-level aerosol-related energy loss, computed as `power_POA (kWh) - power_POA_cln (kWh)`.

How to Use This Dataset (Technical)

  • If you need geometry, use:
    • `global_pv_facility_inventory.gpkg` (GIS-friendly)
    • `global_pv_facility_inventory.parquet` (fast analytics with geometry)
  • If you need tabular attributes only, use:
    • `global_pv_facility_inventory.csv`
  • For energy generation/loss analysis, use:
    • `PV_facility_generation_year_YYYY.csv` (currently 2017-2023)
  • Linkages:
    • `PV_ID` is the facility identifier across all files.
    • `year` supports year-specific filtering and aggregation.

How This Dataset Is Used in the Paper

  • To map and quantify global facility-level PV deployment (location, footprint area, and installation year).
  • To estimate facility-level PV generation from POA irradiance under:
    • all-sky conditions (`power_POA (kWh)`),
    • clear-sky conditions (`power_POA_clr (kWh)`),
    • clean-sky conditions (`power_POA_cln (kWh)`).
  • To quantify aerosol-related generation loss at facility level (`aerosol_loss (kWh)`), then aggregate by geography/year for manuscript analysis.

Potential Reuse in Other Research

  • National/regional assessments of aerosol impacts on PV generation.
  • Benchmarking climate and air-quality penalties for existing PV fleets.
  • Integration with grid, policy, or emissions datasets for energy-transition studies.
  • Geospatial analyses linking PV siting patterns with environmental and socioeconomic variables.

Snapshot Statistics

  • Facilities: 140,945
  • Countries: 181
  • Inventory years: 2017-2024
  • Generation/loss tables: 2017-2023
  • Latitude range: 41.61° S to 68.38° N

Contact

  • Dr. Rui Song: (rui.song@physics.ox.ac.uk); or (rui.song90@gmail.com)

Files

global_pv_facility_inventory.csv

Files (1.3 GB)

Name Size Download all
md5:780052777692c776e7c29f7acf1b4680
10.7 MB Preview Download
md5:eda08d47a5a46996892c188b6dce279c
815.9 MB Download
md5:987ba41d05961ec0334ef9ca6dd4513f
324.8 MB Download
md5:244a21d479860fc0a8c87d7f73f8e83b
9.3 MB Preview Download
md5:8f18cf96f67703aad8b4ca83e41f97eb
11.1 MB Preview Download
md5:6c6045f24e2daa7fc6a268289fc749d9
12.7 MB Preview Download
md5:67f92af5de1561e8a996d92cad4da212
14.3 MB Preview Download
md5:ac1f1f790292653824dded69eb3589b1
15.8 MB Preview Download
md5:4451bf9b5764a98b133d7e7511e1ef44
17.0 MB Preview Download
md5:4d2e227f9507369f42cc9de18e7d0ee6
19.1 MB Preview Download

Additional details

Software

Programming language
Python
Development Status
Active