A Comparison of PV Power Forecasts Using PVLib-Python

—We used the open-source PVLib-Python library to create PV power forecasts for a ﬂeet of utility scale power plants and assessed their accuracies. PVLib-Python allows users to easily retrieve standardized weather forecast data relevant to PV power modeling from NOAA models including the GFS, NAM, RAP, HRRR, and the NDFD. A PV power forecast can then be obtained using the weather data as inputs to the comprehensive modeling capabilities of PVLib-Python. We used these models to benchmark the performance of the University of Arizona’s conﬁguration of the Weather Research and Forecasting model. Standardized, open source, reference implementations of forecast methods using publicly available data may help advance the state-of-the-art of solar power forecasting.


I. INTRODUCTION
PVLib-Python is an open source toolbox for PV modeling [1], [2].Holmgren et.al. developed a forecasting module for PVLib-Python to help the PV modeling community create benchmark solar power forecasts [3].In this paper, we use the PVLib-Python forecasting tool to create hourly average PV power forecasts for a fleet of utility scale power plants and we compare the forecasts to observed plant generation.We compare the forecasts derived from NOAA weather models, including the GFS, NAM, and RAP with forecasts derived from a high resolution mesoscale model run by the University of Arizona.

II. IRRADIANCE FORECAST DATA
The most critical component of a PV power forecast is the forecast of global horizontal irradiance (GHI).A GHI forecast can be obtained directly from a weather model forecast or it can be inferred from a model's cloud cover forecast [4].The suitability of each method depends on the parameterizations of the model, the data availability of the model, and the temporal resolution of the desired PV forecast.Parameterization issues include the accuracy of the solar position equation and the impact of aerosols in their radiative transfer algorithms [5].Model data availability and temporal resolution depends on the data source.The PVLib-Python forecast module described by Holmgren et.al. [3] accesses forecast data from the Unidata THREDDS server.However, the Unidata THREDDS server currently only hosts the most recent 2 weeks of forecast model output.To study a longer period of time, we wrote a Python script to download the relevant point forecast data from the NOAA NOMADS THREDDS data service [6].(1) where offset = 0.35, cloud cover is the total cloud cover, and GHI clear is determined by PVLib's clearsky.ineichenfunction and PVLib's climatological Linke turbidity table.The DISC model is then used to calculate DNI and DHI.Here, we use default values for all functions, however, forecasters may tune the parameterizations to minimize forecast errors.The University of Arizona Department of Hydrology and Atmospheric Sciences runs a convective-permitting, 1.8 km resolution configuration of the Weather Research and Forecasting (WRF) model [7] for operational forecasts of weather, solar power, and wind power in Arizona and New Mexico [8].As an example of the utility of PVLib-Python for creating benchmark forecasts, we will compare PV forecasts derived from the UA-WRF configuration to the PVLib-Python forecasts.WRF versions 3.7 and 3.8 were used for this study.The model parameterization was adjusted throughout the year, but includes SBU-YLIN and Morrison microphysics schemes, and ACM2 and BouLac planetary boundary layer schemes [9].UA-WRF namelists are available at [8].Forecasts from the 0Z, 6Z, and 12Z GFS and NAM runs provide the initial and lateral boundary conditions for a 5.4 km resolution outer domain, which in turn provides initial and boundary conditions for the 1.8 km resolution domain.For this study, we analyzed WRF models initialized with 6Z GFS data.
The UA-WRF model generates 3 minute resolution forecasts of GHI, DNI, 2 m temperature, and 10 m wind speeds, among other variables.This configuration of WRF does not account for the impact of aerosols on irradiance, so we post-processed the model's irradiance forecasts using measurements of the previous day's average aerosol optical depth obtained from the Aeronet site in Tucson, AZ [10].We calculated daily average broadband AOD, τ bb , from AOD measured at 380 nm and 500 nm [11], and then computed modified DNI and GHI as DNI = DNI wrf exp(−τ bb / cos θ z ) and GHI = GHI wrf exp(−0.01/cos 0.4 θ z ), where θ z is the solar zenith angle.We also studied UA-WRF derived PV forecasts in which DNI and DHI were inferred from the uncalibrated GHI using the DISC model.
For all models, we linearly interpolate the model forecast data from its native resolution, shown in Table I, to 5 minute resolution.For the GFS, NAM, and RAP models we then apply equation 1 and the DISC model, as discussed above, to determine a forecast GHI, DNI, and DHI.For the NAM model, we also create forecasts directly from its GHI forecasts.Table II summarizes the combinations of weather model data and processing algorithms studied in this paper.Figure 1(top) shows the result of the 3 hourly cloud cover to 5 minute irradiance conversion for the half-degree 2016-01-06-6Z GFS forecast.Figure 1(bottom) shows the UA-WRF GHI forecast and DISC-generated DNI and DHI forecasts.This UA-WRF model was initialized using the same 2016-01-06-6Z GFS forecast as shown in Figure 1(top).

III. FORECASTING PV POWER
We created forecasts for six PV systems in Arizona.The systems included three single axis trackers, totaling 63 MW AC, and three fixed tilt systems, totaling 14 MW AC.Five of the six systems are located near Tucson, Arizona, and the sixth system is located near Kingman, Arizona.Of the five systems in Tucson, three are in the same forecast model grid box.All of the studied PV systems are smaller than a forecast model grid box.

IV. RESULTS
We studied forecast errors for all 2016 6Z GFS, 6Z NAM, and 9Z RAP models that were available on the online NO-MADS server.Half-degree GFS data was available for all of 2016, 12 km NAM data was available for August through December 2016, and 13 km RAP data was available for June through December 2016.We downloaded GFS data through 168 hours (out of a possible 384), NAM data through 72 hours (out of a possible 84), and RAP data through 18 hours (out of a possible 18 hours).These initialization times and time ranges ensured that an integer number of local sunrise through sunset periods was available for each model.The initialization times also ensured that data from these models would have been available by local sunrise of the first forecast day.Metered 1 minute resolution PV data was manually filtered for errors, and, where possible, scaled to correct for partial system outages.We focus our analysis on the accuracy of the forecast for all systems added together because this is often a more relevant number for a utility company that manages generation from a fleet of power plants.Hourly average forecasts derived from each day's weather models and the observed power are shown for four days in Figure 2.
We calculated absolute and normalized mean bias error, mean absolute error, and root mean squared error under many conditions, only some of which can be shown here.Additional information is available upon request.Errors were normalized First, we examined GFS forecast errors as a function of forecast day for each plant in the study period.Figure 3 shows that forecast errors grow as a function of forecast horizon for all systems.Other forecast models demonstrate similar trends for all systems and these trends serve as a sanity check of the algorithms.For different systems, NMAE for hours 0-23 ranges from 8%-10%, while NMAE for hours 144-167 ranges from 9%-12%.The remainder of this paper analyzes the aggregated generation and forecasts for all systems.
We compared the accuracy of all of the NOAA forecast models as a function of the forecast horizon, shown in Figure 4. Times at which any forecast was missing were removed from the comparative analyses.Therefore, the analysis of the NOAA models comprises most dates in August through December 2016.PV forecasts derived from NAM cloud cover and GHI have the lowest errors for hours 0-23, while forecasts derived from RAP cloud cover have the highest errors.The GFS and the two NAM forecasts have similar errors in hours 24-47, while the GFS slightly outperforms both NAM forecasts in hours 48-71.The cloud cover and GHI-based NAM forecasts perform similarly until forecast hours 48-71, at which the cloud cover based forecast is more accurate than the GHI based forecast.This is likely due to the fact that the NAM's temporal resolution switches from hourly to 3-hourly at 36 hours, and the interpolation of the 3-hourly GHI data has significant errors.Next, we use the GFS forecasts to benchmark the accuracy of the UA-WRF model initialized on the GFS data.The NMAE in Figure 5 shows that, for the systems studied here, the UA-WRF model's dynamic downscaling of the GFS forecast yields a more accurate PV power forecast than the GFS under most forecast horizons and data subsets.The GFS forecast errors are similar for hours 0-23 and 24-47, and steadily increase beyond that.In contrast, the UA-WRF model's hours 0-23 forecasts are more accurate than its hours 24-47 forecasts, but its hours 48-71 forecasts are no worse than its hours 24-47 forecasts.There is little difference in accuracy between the UA-WRF forecasts based on GHI and the DISC model, and UA-WRF forecasts based on Aeronet-corrected DNI and GHI.
Finally, we examined forecast accuracy as a function of month of year.Figure 6 shows the accuracy of each method for each month.The model errors exhibit similar trends, with some outliers.For most models, forecast accuracy is worse June through September, and best in May and November.These results led us to examine the relationship between forecast accuracy and clear sky condition.We downloaded irradiance observations from the NREL OASIS station located at the University of Arizona [18].We used PVLib-Python's detect clear function to determine if a minute is clear or not, summed the number of cloudy minutes in each month, and normalized this by the number of minutes with GHI greater than 1 W/m 2 .Figure 7 shows NMAE versus the percentage of cloudy minutes per month.Relatively clear months, such as February, May, October, and November 2016 have lower errors, especially for the UA-WRF model.

V. FUTURE WORK
The emphasis of this work is to illustrate the use of PVLib-Python to facilitate comparison and benchmarking of forecasts.The benchmarks suffer due to the restricted data availability of the NOMADS THREDDS service.NAM and RAP data were only available for half of 2016.The RAP model's GHI variable was not available on the archive.Furthermore, the High Resolution Rapid Refresh model was not available on the NOMADS THREDDS service.A more comprehensive forecast archive would enable more accurate comparisons of forecast accuracy.
It is likely that the benchmark forecasts can be improved with modest effort.The GFS data used here is from the half-degree model.The newer quarter-degree GFS model may yield more accurate predictions, particularly at longer forecast horizons.A bias correction algorithm with a rolling training period could reduce seasonal trends in forecast skill.Scaling clear sky GHI by a different amount for low, mid, and high level cloud cover, rather than one scaling factor for total cloud cover, may yield significant improvements.Forecasts of aerosol optical depth and precipitable water could be used in the clear sky model for GHI.

VI. CONCLUSION
We used the PVLib-Python forecasting module to compare the accuracy of solar power forecasts for a fleet of PV power plants.The PVLib-Python library enables users easily access weather forecasts and process them into PV power forecasts.The tool creates a standard set of data for PV modeling from the mixed data types provided from weather models.This work supports the standardization of PV power forecast methods, simplification of the use of weather forecast data for PV modeling, and fair and transparent PV power forecast model performance evaluation.As an example, we used PVLib-Python forecasts to benchmark the accuracy of the UA-WRF model.
The PVLib-Python documentation [16] provides examples for how to use the forecasting tool along with general PV system modeling.Readers are encouraged to participate in the PVLib-Python community via its GitHub page [17] and the pvlib tag on stackoverflow.com.

2 )
Fig. 1. 3.5 days of GHI (blue), DNI (green), and DHI (red) forecasts derived from a GFS forecast (top) and the UA-WRF initialized with the same GFS forecast (bottom) for Tucson, Arizona.GFS model irradiance is derived using equation 1 to determine GHI and the DISC model to determine its DNI and DHI.The UA-WRF model directly forecasts GHI and, for this figure, the DISC model was used to infer its DNI and DHI.

Fig. 2 .Fig. 3 .
Fig.2.Four days of PV observed generation (black) and forecasts (colors) derived from the GFS, NAM, RAP, and UA-WRF models using cloud cover (CC) or irradiance forecasts.Generation and forecasts are summed from 6 PV systems in Arizona.

Fig. 4 .
Fig.4.NMAE forecast errors from the PVLib-Python processed forecasts as a function of forecast horizon for the studied NOAA forecast models.The GFS-CC (blue), NAM-CC (green), and RAP-CC (purple) PV forecasts were derived from cloud cover forecasts, while the NAM-GHI (red) PV forecasts were derived from the NAM's GHI forecast and the DISC model.The time range for this analysis is Aug. 2016-Dec.2016.Times at which any forecast was missing were removed from the analysis.

Fig. 5 .Fig. 6 .
Fig.5.NMAE forecast errors from the PVLib-Python processed forecasts as a function of forecast horizon for the GFS with GHI derived from cloud cover (blue), UA-WRF with DNI derived from DISC (green), and UA-WRF with DNI post-processed with Aeronet data (red).The UA-WRF model was initialized with the GFS forecasts.The time range for this analysis is Jan. 2016-Dec.2016.Times at which any forecast was missing were removed from the analysis.

Fig. 7 .
Fig. 7. Aggregate NMAE intraday forecast errors vs. percentage of cloudy minutes per month in Tucson, AZ.Points are labeled by month of year.Forecast errors are correlated with non-clear conditions.
Table I describes the model variable data availability and temporal