Dataset Open Access

# A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods

Carreira Pedro, Hugo; Larson, David; Coimbra, Carlos

### Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:creator>Carreira Pedro, Hugo</dc:creator>
<dc:creator>Larson, David</dc:creator>
<dc:creator>Coimbra, Carlos</dc:creator>
<dc:date>2019-06-24</dc:date>
<dc:description>Description
This repository contains a comprehensive solar irradiance, imaging, and forecasting dataset.
The goal with this release is to provide standardized solar and meteorological datasets to the research community for the accelerated development and benchmarking of forecasting methods.
The data consist of three years (2014–2016) of quality-controlled, 1-min resolution global horizontal irradiance and direct normal irradiance ground measurements in California.
In addition, we provide overlapping data from commonly used exogenous variables, including sky images, satellite imagery, Numerical Weather Prediction forecasts, and weather data.
We also include sample codes of baseline models for benchmarking of more elaborated models.

Data usage
The usage of the datasets and sample codes presented here is intended for research and development purposes only and implies explicit reference to the paper:
Pedro, H.T.C., Larson, D.P., Coimbra, C.F.M., 2019. A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods. Journal of Renewable and Sustainable Energy 11, 036102. https://doi.org/10.1063/1.5094494

Although every effort was made to ensure the quality of the data, no guarantees or liabilities are implied by the authors or publishers of the data.

Sample code
As part of the data release, we are also including the sample code written in Python 3.
The preprocessed data used in the scripts are also provided.
The code can be used to reproduce the results presented in this work and as a starting point for future studies.
Besides the standard scientific Python packages (numpy, scipy, and matplotlib), the code depends on pandas for time-series operations, pvlib for common solar-related tasks, and scikit-learn for Machine Learning models.
All required Python packages are readily available on Mac, Linux, and Windows and can be installed via, e.g., pip.

Units
All time stamps are in UTC (YYYY-MM-DD HH:MM:SS).
All irradiance and weather data are in SI units.
Sky image features are derived from 8-bit RGB (256 color levels) data.
Satellite images are derived from 8-bit gray-scale (256 color levels) data.

Missing data
The string "NAN" indicates missing data

File formats
All time series data files as in CSV (comma separated values)
Images are given in tar.bz2 files

Files

Folsom_irradiance.csv                           Primary       One-minute GHI, DNI, and DHI data.
Folsom_weather.csv                              Primary       One-minute weather data.
Folsom_sky_images_{YEAR}.tar.bz2    Primary       Tar archives with daytime sky images captured at 1-min intervals for the years 2014, 2015, and 2016, compressed with bz2.
Folsom_NAM_lat{LAT}_lon{LON}.csv    Primary       NAM forecasts for the four nodes nearest the target location. {LAT} and {LON} are replaced by the node’s coordinates listed in Table I in the paper.
Folsom_sky_image_features.csv           Secondary    Features derived from the sky images.
Folsom_satellite.csv                               Secondary   10 pixel by 10 pixel GOES-15 images centered in the target location.
Sky_image_features_intra-hour.csv       Secondary   Sky image features for the intra-hour forecasting issuing times.
Sat_image_features_intra-day.csv         Secondary   Satellite image features for the intra-day forecasting issuing times.
NAM_nearest_node_day-ahead.csv      Secondary   NAM forecasts (GHI, DNI computed with the DISC algorithm, and total cloud cover) for the nearest node to the target location prepared for day-ahead forecasting.
Target_{horizon}.csv                              Secondary   Target data for the different forecasting horizons.
Forecast_{horizon}.py                            Code            Python script used to create the forecasts for the different horizons.
Postprocess.py                                      Code             Python script used to compute the error metric for all the forecasts.

</dc:description>
<dc:identifier>https://zenodo.org/record/2826939</dc:identifier>
<dc:identifier>10.5281/zenodo.2826939</dc:identifier>
<dc:identifier>oai:zenodo.org:2826939</dc:identifier>
<dc:relation>doi:10.1063/1.5094494</dc:relation>
<dc:relation>doi:10.5281/zenodo.2826938</dc:relation>
<dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
<dc:subject>sky images</dc:subject>
<dc:subject>satellite images</dc:subject>
<dc:subject>numerical weather prediction</dc:subject>
<dc:subject>forecast benchmarking</dc:subject>
<dc:title>A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods</dc:title>
<dc:type>info:eu-repo/semantics/other</dc:type>
<dc:type>dataset</dc:type>
</oai_dc:dc>

743
17,183
views