Published June 8, 2025 | Version v1
Dataset Open

CPVPD-2024: A photovoltaic plant vector dataset derived from Chinese remote sensing imagery via a topography-enhanced deep learning framework with dynamic spatial-frequency attention

  • 1. ROR icon Beijing University of Technology

Description

As a central pillar in the global energy transition, photovoltaic (PV) power generation plays a crucial role in achieving the carbon peaking and carbon neutrality goals. China, the largest PV market in the world, has been experiencing continuous and rapid growth in PV installed capacity. High-precision, high-resolution, and time-sensitive spatial PV data is an urgent requirement for precise planning, intelligent operation and maintenance, as well as sustainable development in PV industry of China. To address the existing data issues such as data fragmentation, standard heterogeneity, and spatiotemporal incoherence, this study introduces a technical framework that integrates deep semantic segmentation with geospatial verification, using it to build the 2024 China Photovoltaic Power Plant Vector Dataset (CPVPD-2024). Based on the spatial stratified sampling strategy, this study integrates the 30m resolution annual China Land Cover Dataset (CLCD) with global elevation data from the General Bathymetric Chart of the Oceans (GEBCO) to construct a training sample library covering 15 terrain-landcover combination types. Combined with the Dynamic Spatial-Frequency Attention SwinNet (DSFA-SwinNet) semantic segmentation model and a multi-level morphological post-processing, it enables panel-by-panel identification of PV power plants across China. The CPVPD-2024 dataset comprehensively covers all 34 provincial-level administrative regions of China, achieving an overall Precision of 90.38% and Intersection over Union (IoU) of 81.78% in test zones, demonstrating significant improvements in identifying PV array gaps and detecting small-scale distributed power plants. Research results indicate that the total installed PV area in China reached 4,520.47 km² by 2024, exhibiting a characteristic spatial pattern dominated by agrivoltaic systems with concentrated distribution in arid regions. Notably, cultivated land (28.14%) and grassland (39.51%) collectively contributed nearly 70% of the total installed area. As the first panel-level vectorized mapping of PV power plants at national scale, this dataset provides high-precision foundational data for optimizing PV site selection, conducting ecological-environmental assessments, and advancing deep learning-based intelligent interpretation of remote sensing data.

Files

CPVPD-2024.zip

Files (332.0 MB)

Name Size Download all
md5:ea04111a75f43143d534a9920a529547
332.0 MB Preview Download