Dataset Open Access
Carreira Pedro, Hugo; Larson, David; Coimbra, Carlos
{ "publisher": "Zenodo", "DOI": "10.5281/zenodo.2826939", "title": "A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods", "issued": { "date-parts": [ [ 2019, 6, 24 ] ] }, "abstract": "<p><strong>Description</strong><br>\nThis repository contains a comprehensive solar irradiance, imaging, and forecasting dataset. <br>\nThe goal with this release is to provide standardized solar and meteorological datasets to the research community for the accelerated development and benchmarking of forecasting methods. <br>\nThe data consist of three years (2014–2016) of quality-controlled, 1-min resolution global horizontal irradiance and direct normal irradiance ground measurements in California. <br>\nIn addition, we provide overlapping data from commonly used exogenous variables, including sky images, satellite imagery, Numerical Weather Prediction forecasts, and weather data. <br>\nWe also include sample codes of baseline models for benchmarking of more elaborated models.</p>\n\n<p><strong>Data usage</strong><br>\nThe usage of the datasets and sample codes presented here is intended for research and development purposes only and implies explicit reference to the paper:<br>\n<em>Pedro, H.T.C., Larson, D.P., Coimbra, C.F.M., 2019. A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods. Journal of Renewable and Sustainable Energy 11, 036102. https://doi.org/10.1063/1.5094494</em></p>\n\n<p>Although every effort was made to ensure the quality of the data, no guarantees or liabilities are implied by the authors or publishers of the data.</p>\n\n<p><strong>Sample code</strong><br>\nAs part of the data release, we are also including the sample code written in Python 3. <br>\nThe preprocessed data used in the scripts are also provided. <br>\nThe code can be used to reproduce the results presented in this work and as a starting point for future studies. <br>\nBesides the standard scientific Python packages (numpy, scipy, and matplotlib), the code depends on pandas for time-series operations, pvlib for common solar-related tasks, and scikit-learn for Machine Learning models. <br>\nAll required Python packages are readily available on Mac, Linux, and Windows and can be installed via, e.g., pip. </p>\n\n<p><strong>Units</strong><br>\nAll time stamps are in UTC (YYYY-MM-DD HH:MM:SS).<br>\nAll irradiance and weather data are in SI units.<br>\nSky image features are derived from 8-bit RGB (256 color levels) data.<br>\nSatellite images are derived from 8-bit gray-scale (256 color levels) data.</p>\n\n<p><strong>Missing data</strong><br>\nThe string "NAN" indicates missing data</p>\n\n<p><strong>File formats</strong><br>\nAll time series data files as in CSV (comma separated values)<br>\nImages are given in tar.bz2 files</p>\n\n<p><strong>Files </strong></p>\n\n<ul>\n\t<li><em>Folsom_irradiance.csv</em> Primary One-minute GHI, DNI, and DHI data.</li>\n\t<li><em>Folsom_weather.csv </em> Primary One-minute weather data.</li>\n\t<li><em>Folsom_sky_images_{YEAR}.tar.bz2</em> Primary Tar archives with daytime sky images captured at 1-min intervals for the years 2014, 2015, and 2016, compressed with bz2.</li>\n\t<li><em>Folsom_NAM_lat{LAT}_lon{LON}.csv </em> Primary NAM forecasts for the four nodes nearest the target location. {LAT} and {LON} are replaced by the node’s coordinates listed in Table I in the paper. </li>\n\t<li><em>Folsom_sky_image_features.csv </em> Secondary Features derived from the sky images.</li>\n\t<li><em>Folsom_satellite.csv </em> Secondary 10 pixel by 10 pixel GOES-15 images centered in the target location. </li>\n\t<li><em>Irradiance_features_{horizon}.csv</em> Secondary Irradiance features for the different forecasting horizons ({horizon} 1⁄4 {intra-hour, intra-day, day-ahead}). </li>\n\t<li><em>Sky_image_features_intra-hour.csv</em> Secondary Sky image features for the intra-hour forecasting issuing times. </li>\n\t<li><em>Sat_image_features_intra-day.csv</em> Secondary Satellite image features for the intra-day forecasting issuing times. </li>\n\t<li><em>NAM_nearest_node_day-ahead.csv </em> Secondary NAM forecasts (GHI, DNI computed with the DISC algorithm, and total cloud cover) for the nearest node to the target location prepared for day-ahead forecasting.</li>\n\t<li><em>Target_{horizon}.csv</em> Secondary Target data for the different forecasting horizons.</li>\n\t<li>F<em>orecast_{horizon}.py </em> Code Python script used to create the forecasts for the different horizons. </li>\n\t<li><em>Postprocess.py</em> Code Python script used to compute the error metric for all the forecasts.</li>\n</ul>\n\n<p> </p>", "author": [ { "family": "Carreira Pedro, Hugo" }, { "family": "Larson, David" }, { "family": "Coimbra, Carlos" } ], "version": "V1", "type": "dataset", "id": "2826939" }
All versions | This version | |
---|---|---|
Views | 1,103 | 1,103 |
Downloads | 18,937 | 18,937 |
Data volume | 277.1 TB | 277.1 TB |
Unique views | 987 | 987 |
Unique downloads | 3,126 | 3,126 |