Published May 29, 2024 | Version 1.0
Dataset Restricted

Solar Asset Mapper: A continuously-updated global inventory of solar energy facilities built with satellite data and machine learning

Description

TransitionZero’s Solar Asset Mapper is a global, satellite-derived dataset of utility-scale solar farms generated with a combination of machine learning and human annotation. Our Q1 2024 dataset contains the location and shape of 63,616 assets, along with estimated capacities. We estimate the construction date for over 80% of these assets. The dataset contains over 19,100 square kilometres of solar farms across 183 countries, with a total estimated capacity of 711 GW.

The data can also be downloaded from https://solar.transitionzero.org/. A map of the analysis polygons is also available at https://solar-map.transitionzero.org/.

1. Dataset Description

We publish six files.

  • analysis_polygons.gpkg: our “analysis-ready” dataset containing geometries, capacity estimates and construction date estimates.
  • analysis_polygons.csv: a version of analysis_polygons.gpkg containing a central latitude and longitude in place of a geometry, to allow parsing without geospatial software.
  • sources.csv: a table mapping the IDs of our analysis-ready dataset to the raw geometries that make them up.
  • raw_polygons.gpkg: the raw geometries used to compose analysis_polygons.gpkg.
  • TZ Solar Asset Mapper Q1 2024.xlsx: an Excel formatted version of the analysis_polygons.csv file.
  • tz-sam_scientific_data.pdf: A pre-print aricle that explains the methodology in detail.

1.1 Analysis-level datasets

Our analysis-level dataset comprises our most complete view of global asset-level solar installations, incorporating our own detections as well as known solar farm geometries from other datasets.

The geospatial dataset contains the following fields:

  • id: unique ID for the asset
  • geometry: Polygon or MultiPolygon defining the asset
  • capacity_mw: estimated capacity of the asset in megawatts
  • constructed_before: upper bound for construction date (estimated date of the image in which the solar plant was first seen in a constructed state)
  • constructed_after: lower bound for construction date (estimated date of the image in which construction began for the solar plant)

The CSV version replaces the Geometry column with:

  • latitude: the latitude of the centroid of the asset
  • longitude: the longitude of the centroid of the asset
  • country: administrative country name

1.2 Raw datasets and sources

The analysis-level datasets hide some complexity in the underlying data that we expose in the raw_polygons and sources file.

  • We produce new sets of polygons for each run. Often these overlap, sometimes in complicated ways.
  • We cluster together overlapping and nearby geometries from both our detections and external sources. Currently these sources are:
  • Large solar farms scraped from OpenStreetMap (OSM)
  • Validated geometries from Kruitwagen et. al., A global inventory of solar photovoltaic generating units.

Each cluster comprises one row in the analysis-level dataset. In order to enable tracking raw detections from run to run, as well as to provide detailed sourcing information, we provide all of these raw polygons, along with a source file that lists all of the raw polygons contained in each analysis-level polygon.

raw_polygons.gpkg contains the following fields:

  • id: ID of the raw source polygon
  • geometryPolygon or MultiPolygon defining the asset
  • source: either “solar asset mapper”, “osm” or “2019_global_pv”.
  • acquisition_date: for solar asset mapper polygons, this is the date of the inference run that produced the polygon; for OSM polygons it is the date that the polygon was scraped from OSM; for 2019_global_pv it is 2019-01-01, the approximate detection date of that dataset.

Sources.csv contains the following fields:

  • cluster_id: ID of the corresponding item in the analysis-level dataset
  • source_id: ID of the raw source polygon
  • source: either “solar asset mapper”, “osm” or “2019_global_pv”.
  • acquisition_date: for solar asset mapper polygons, this is the date of the inference run that produced the polygon; for OSM polygons it is the date that the polygon was scraped from OSM; for 2019_global_pv it is 2019-01-01, the approximate detection date of that dataset.

1.3 Caveats and limitations

1.3.1 Capacity Estimates

While we have made every effort to remove false positives from the published dataset, some will remain due to the difficulty of manually validating detections in 10-metre satellite imagery. To estimate false positive prevalence throughout the data a subset of approximately 2000 detections were selected at random from our positively labelled solar assets. Each of these were validated through a higher degree of scrutiny utilising high-resolution imagery. This analysis yielded an expected rate of false positives of around 1%.

1.3.2 Plant Shapes

Our plant outlines are not perfect. They will occasionally be much smaller or larger than the underlying plant. Our tests show that on average, these effects average out.

1.3.3 Capacity Updates

Our capacity estimation model should produce relatively unbiased country-level aggregates, since it is trained to learn the typical ground coverage ratio of plants by country. The model has no way to distinguish between a very dense and a very sparse (e.g. dual-axis-tracking) plant in the same country. Plants with unusually high or low ground coverage ratios will not have accurate capacity estimates.

1.3.4 Construction Date Estimates

We are not able to directly estimate the construction date of a plant. We estimate an upper bound (the date of the image in which the plant was first seen in constructed state) and a lower bound (the date of the image in which the plant was last seen in an unconstructed state). For plants that were constructed before the launch date of Sentinel-2 in 2017, we produce only an upper bound.

We leave it to consumers of the data to interpret these bounds and/or estimate likely grid connection dates.

2. Attribution

TZ-SAM is made available under a Creative Commons Attribution Non-Commercial 4.0 International License (CC-BY-NC-4.0). Attribution to TransitionZero is required. You must also clearly indicate if you have made any changes to the TZ-SAM dataset and what these are. Please refer to the suggested citation formats:

  • “TransitionZero Solar Asset Mapper, TransitionZero, May 2024 release.”
  • “TZ-SAM, TransitionZero, May 2024 release.”
  • “TransitionZero (2024) Solar Asset Mapper.”

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Dates

Created
2024-05-29
Data publication date