Dataset Open Access

A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods

Carreira Pedro, Hugo; Larson, David; Coimbra, Carlos

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.2826939", 
  "title": "A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods", 
  "issued": {
    "date-parts": [
  "abstract": "<p><strong>Description</strong><br>\nThis repository contains a comprehensive solar irradiance, imaging, and forecasting dataset.&nbsp;<br>\nThe goal with this release is to provide standardized solar and meteorological datasets to the research community for the accelerated development and benchmarking of forecasting methods.&nbsp;<br>\nThe data consist of three years (2014&ndash;2016) of quality-controlled, 1-min resolution global horizontal irradiance and direct normal irradiance ground measurements in California.&nbsp;<br>\nIn addition, we provide overlapping data from commonly used exogenous variables, including sky images, satellite imagery, Numerical Weather Prediction forecasts, and weather data.&nbsp;<br>\nWe also include sample codes of baseline models for benchmarking of more elaborated models.</p>\n\n<p><strong>Data usage</strong><br>\nThe usage of the datasets and sample codes presented here is intended for research and development purposes only and implies explicit reference to the paper:<br>\n<em>Pedro, H.T.C., Larson, D.P., Coimbra, C.F.M., 2019. A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods.&nbsp;Journal of Renewable and Sustainable Energy 11, 036102.</em></p>\n\n<p>Although every effort was made to ensure the quality of the data, no guarantees or liabilities are implied by the authors or publishers of the data.</p>\n\n<p><strong>Sample code</strong><br>\nAs part of the data release, we are also including the sample code written in Python 3.&nbsp;<br>\nThe preprocessed data used in the scripts are also provided.&nbsp;<br>\nThe code can be used to reproduce the results presented in this work and as a starting point for future studies.&nbsp;<br>\nBesides the standard scientific Python packages (numpy, scipy, and matplotlib), the code depends on pandas for time-series operations, pvlib for common solar-related tasks, and scikit-learn for Machine Learning models.&nbsp;<br>\nAll required Python packages are readily available on Mac, Linux, and Windows and can be installed via, e.g., pip.&nbsp;</p>\n\n<p><strong>Units</strong><br>\nAll time stamps are in UTC (YYYY-MM-DD HH:MM:SS).<br>\nAll irradiance and weather data are in SI units.<br>\nSky image features are derived from 8-bit RGB (256 color levels) data.<br>\nSatellite images are derived from 8-bit gray-scale (256 color levels) data.</p>\n\n<p><strong>Missing data</strong><br>\nThe string &quot;NAN&quot; indicates missing data</p>\n\n<p><strong>File formats</strong><br>\nAll time series data files as in CSV (comma separated values)<br>\nImages are given in tar.bz2 files</p>\n\n<p><strong>Files&nbsp;</strong></p>\n\n<ul>\n\t<li><em>Folsom_irradiance.csv</em>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Primary&nbsp; &nbsp; &nbsp; &nbsp;One-minute GHI, DNI, and DHI data.</li>\n\t<li><em>Folsom_weather.csv&nbsp;</em> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Primary&nbsp; &nbsp; &nbsp; &nbsp;One-minute weather data.</li>\n\t<li><em>Folsom_sky_images_{YEAR}.tar.bz2</em> &nbsp; &nbsp;Primary&nbsp; &nbsp; &nbsp; &nbsp;Tar archives with daytime sky images captured at 1-min intervals for the years 2014, 2015, and 2016, compressed with bz2.</li>\n\t<li><em>Folsom_NAM_lat{LAT}_lon{LON}.csv </em>&nbsp; &nbsp;Primary&nbsp; &nbsp; &nbsp; &nbsp;NAM forecasts for the four nodes nearest the target location. {LAT} and {LON} are replaced by the node&rsquo;s coordinates listed in Table I in the paper.&nbsp;</li>\n\t<li><em>Folsom_sky_image_features.csv </em>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Secondary&nbsp; &nbsp; Features derived from the sky images.</li>\n\t<li><em>Folsom_satellite.csv </em>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Secondary &nbsp; 10 pixel by 10 pixel GOES-15 images centered in the target location.&nbsp;</li>\n\t<li><em>Irradiance_features_{horizon}.csv</em>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Secondary &nbsp; Irradiance features for the different forecasting horizons ({horizon} 1&frasl;4 {intra-hour, intra-day, day-ahead}).&nbsp;</li>\n\t<li><em>Sky_image_features_intra-hour.csv</em>&nbsp; &nbsp; &nbsp; &nbsp;Secondary &nbsp; Sky image features for the intra-hour forecasting issuing times.&nbsp;</li>\n\t<li><em>Sat_image_features_intra-day.csv</em>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Secondary &nbsp; Satellite image features for the intra-day forecasting issuing times.&nbsp;</li>\n\t<li><em>NAM_nearest_node_day-ahead.csv </em>&nbsp; &nbsp; &nbsp;Secondary &nbsp; NAM forecasts (GHI, DNI computed with the DISC algorithm, and total cloud cover) for the nearest node to the target location prepared for day-ahead forecasting.</li>\n\t<li><em>Target_{horizon}.csv</em>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Secondary &nbsp; Target data for the different forecasting horizons.</li>\n\t<li>F<em>orecast_{horizon}.py </em>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Code&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Python script used to create the forecasts for the different horizons.&nbsp;</li>\n\t<li><em></em>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Code&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Python script used to compute the error metric for all the forecasts.</li>\n</ul>\n\n<p>&nbsp;</p>", 
  "author": [
      "family": "Carreira Pedro, Hugo"
      "family": "Larson, David"
      "family": "Coimbra, Carlos"
  "version": "V1", 
  "type": "dataset", 
  "id": "2826939"
All versions This version
Views 1,1031,103
Downloads 18,93718,937
Data volume 277.1 TB277.1 TB
Unique views 987987
Unique downloads 3,1263,126


Cite as