Statistical Bias Correction of Fifth Coupled Model Intercomparison Project Data from the CGIAR Research Program on Climate Change, Agriculture and Food Security - Climate Portal for Mount Makulu, Zambia

Although Global Climate Models (GCMs) are regarded as the best tools available for future climate projections, there are biases in simulating precipitation and temperature due to their coarse spatial resolution and modelled impacts between the future climate scenarios and the baseline. A widely used bias correction method is the QM. QM adjusts a GCM value by mapping quantiles of the model’s distribution onto quantiles of the observed time series data. In spite of nudging being robust and easy to implement, it suppresses high-frequency variability and introduces artificial phase shifts. CF cannot provide information on future climate changes in high frequency variability that may be critical for specific impact applications such as estimates of peak discharge in hydrological catchments or inputs for crop models. Future climate signals shows that the number of days with and the amount of precipitation (mm/year) for 2020-2050 would range from 62 - 92 days and 211.9 906 mm/year, respectively. On the other hand, maximum and minimum temperature would increase in the in the range of 1.23 - 1.97°C and 1.45 - 2.6 8°C, respectively. QM can be used for precipitation while the CF can be used for temperature. Nudging is a widely used technique for online bias reduction, where modelled fields are continuously forced toward observed climatology.


Global
Climate Models (GCMs) from Intergovernmental Panel on Climate Change (IPCC) Third and Fifth Coupled Model Intercomparison Projects (CMIP3 and CMIP5) are tools currently available for simulating the response of the global climate system due to increasing greenhouse gas (GHG) concentration [1,2]. The GCMs are used as the primary source of information for constructing climate scenarios and they provide the basis for climate change impacts assessments at local, regional and global scales. Climate information for assessments of future crop yields tends to come from Atmosphere-Ocean Global Climate Models (AOGCMs) [3]. The impact of climate change on natural resources is usually assessed at the local scale [4]. Despite the improvements in CMIP5 model resolution and the description of the physical processes, modeling of precipitation is still inadequate for use in most local impact studies [5].
Although GCMs are regarded as the best tools available for future climate projections, there are biases in their outputs due to coarse spatial resolution (50 km or even more). This means they cannot be used directly at local or regional scale for impact studies, particularly in the tropics, where orographic and climatic conditions vary significantly across relatively small distances [6]. The biases are the deviation of GCM output from the observations [7,8]. It has been reported by researchers such as [9] that errors in GCM simulations outputs relative to historical observations are large. Therefore, statistical downscaling methods such as delta-based approaches [10][11][12] and stochastic weather generators (Long Ashton Research Station Weather Generator [LARS-WG]) [13] are used to generate future climate scenarios with highspatial resolution for a point or station data (localscale variables) [14,15]. A scenario is a coherent internally consistent and plausible description of a possible future state of the world [2,16]. Statistical downscaling is an empirical approach that establishes statistical relationships between predictors (pressure, geopotential height, humidity) and predictand (temperature, precipitation) variables [17].
Evaluating the potential impact of climate change on society requires scenarios that accurately project future climate [18]. Many statistical bias correction approaches have been developed and are being utilized to remove systematic model errors [19]. According to [6], it is important to bias-correct and downscale the raw climate model outputs in order to produce climate projections that can be used in impact studies such as agricultural modeling. [20] noted that statistical bias correction is commonly applied within climate impact modeling to correct climate model data for systematic deviations of the simulated historical data from observed time series data. The bias correction methods are based on transfer functions which are generated to map the distribution of the simulated historical weather data to that of the observed time series. Statistical Bias Correction (BC) need to be performed to better match the GCM outputs to the observed daily time series data [21]. The BC approach corrects the projected raw daily GCM output using the differences in the mean and variance correction between GCM and observations in a baseline or reference period [6]. The bias correction methods are designed to bridge the gap between the information that is provided by the climate modeling community and the GCM output required for quantitative climate impact projections [20].
Correcting and accounting for biases in climate model output is vital in producing reliable climate model simulations. Any method for correcting biases in the GCM outputs requires a baseline or reference data sets and the bias adjustment quality is thus restricted by the quality and availability of the observed time series or reanalysis data. Three different calibration approaches are used to produce reliable daily climate for future periods under the CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS) -Climate portal interface (www.ccafsclimate.org/data_bias_corrected/) and these are: (a) 'nudging' (bias correction) [3], (b) change factor (CF) (delta-based approach) [3,6,22]; and (c) Quantile Mapping (QM) [4,23]. The nudging bias-correction and change factor approaches work well for non-stochastic variables such as temperature. Temperature is non-stochastic or a continuous variable as it can assume all possible values in the possible range while precipitation is a discrete variable. The QM is a more sophisticated approach for bias-correcting stochastic variables such as precipitation and solar radiation as elaborated by [6]. All the three bias correction methods are used for adjusting the bias in GCMs.
The CF is a simple downscaling method that uses the average values of observations and predictions [24]. [24] noted that the CF method is implemented simply by scaling the average change factor to each day. Due to its simplicity, it has been used in many climate related biascorrection applications. Correction for bias using the CF changes only the average, maxima and minima of the climatic index in the scenarios, while all the other properties, such as the number of wet/dry days and the variance of temperature remain unchanged. The QM scheme on the other hand, corrects GCM outputs based on the Cumulative Distribution Function (CDF) with a statistically good match. This method has been widely employed to correct the biases in GCMs. It has limitations in capturing extreme values beyond the range of the observed time series data. Nudging bias correction approach adds the difference between AOGCM and observed time series data in a baseline to the future AOGCM data to correct the mean bias [25]. However, this method uses the AOGCM distributions of daily climate, aspects which may also need correcting such as the temporal correlation. Nudging as reported by [26] is robust and easy to implement, but suppresses high-frequency variability and introduces artificial phase shifts. Both CF and QM have better computational efficiency and have the ability to handle higher order moments than other physical-based approaches [19].
The Agricultural Climate Forecast System Reanalysis (AgCFSR), Global Risk Assessment toward Stable Production of Food (GRASP), Agricultural Modern-Era Retrospective Analysis for Research and Applications (AgMERRA), Princeton, WFD and WFDEI are six widely used datasets to "calibrate" daily outputs of GCMs from the IPCC CMIP5 [6,[27][28][29][30]. The AgCFSR and AgMERRA climate forcing datasets provide daily, high-resolution, continuous, meteorological series over the 1980-2010 period and they are designed for applications examining climate variability and climate change in agricultural modeling [27]. The six datasets (Table 1) are bias-corrected from existing reanalysis datasets. The reanalysis involves reprocessing observational data spanning a long historical period using a consistent analysis system to produce a dataset that can be used for agrometeorological and climatological studies. The CCAFS-Climate data portal provides global and regional future high-resolution climate datasets that serve as a basis for assessing the climate change impacts and adaptation in a variety of fields including biodiversity, agricultural and livestock production, and ecosystem services and hydrology [6]. The study objective was to investigate how bias correction methods impact the modelled future climate change under Representative Concentration Pathway 8.5 (RCP8.5) for 2020-2050.

AgMERRA Dataset
Historical climate data for daily rainfall, minimum and maximum temperature from the AgMERRA Climate Forcing Dataset for Agricultural Modeling [27,32] was used as the baseline data. The datasets are stored at 0.25°×0.25° horizontal resolution (~25km), with global coverage and daily values from 1980-2010 in order to form a "baseline or current period" climatology. Furthermore, [32] elaborated that the AgMERRA climate forcing datasets were created as an element of the Agricultural Model Intercomparison and Improvement Project (AgMIP) to provide consistent, daily time series over the 1980-2010 period with global coverage of climate variables required for agricultural models [32,33]. These datasets were designed to be useful for AgMIP coordinated, protocol-based studies of agricultural impacts ranging from biophysical process studies to global agricultural economic models [33].

Statistical Downscaling of Precipitation and Temperature
The GCMs are tools used to project future climate change information. An actual bias correction was performed with daily data from four GCMs (GFDL-ESM2M, MIROC-MIROC5, MPI-ESM-MR, and NCAR-CCSM4) output and AgMERRA site observational data for three different metrics; projected change, rainy days and time series. Basic bias correction methods include an adjustment of the mean value by adding a temporally constant offset, or by applying an associated correction factor to the simulated data. This additive or multiplicative constant quantifies the average deviation between the simulated and the observed time series over the historical period. The daily GCM data were calibrated using observations (Reanalysis) and bias correction approaches: delta (change factor), nudging (bias correction) and quantile mapping. In this study, two windows were used, 1980-2000 as baseline and 2020-2050 as future climate scenario period. The methods described below were applied to the baseline to produce calibrated projections of future climate change scenarios.

Bias correction (BC) approach
The BC approach corrects the projected raw daily GCM output using the differences in the mean and variance correction between GCM and baseline daily time series data [6,21]. The biascorrection method corrects for both the mean values and temporal variance correction of the GCM output in accordance with the observations is as reported by [25,34] and represented by the equation below. The bias-correction procedure for the GCM output could be applied to correct both the historical and future periods.
Where ߪ ்,௦ and ߪ ை,௦ represent the standard deviation (ߪ) in the baseline period of the daily GCM output and observations, respectively.

Change factor (CF)
In this approach, the raw GCM outputs current values are subtracted from the future simulated values resulting in "climate anomalies" which are then added to the present day observational or historical monthly dataset [21]. As defined by [21], change factor is a ratio between values of current climate and future GCM simulations. Change factor methods are techniques of combining the coarse-resolution change 'signal' from GCM outputs with finer-resolution observed datasets. This method is quick and convenient and produces data that look like observed weather datasets [35]. The change factor is the simplest bias correction method, which consists of adding the mean change signal to the observations as presented in the equation below. This method is applicable to any kind of variable but it is preferable not to apply it to bounded variables such as precipitation, solar radiation and wind speed because values out of range could be obtained. The CF assumes the daily variance correction is of the same magnitude in the future and baseline periods and the corrected daily time series data is computed by the equation below which considers changes in variance as reported by [34].
Where ߪ ்,ோௐ and ߪ ்,௦ represent the standard deviation (ߪ) in the future time period of the daily GCM output and observations, respectively.

Quantile mapping (QM)
Quantile-quantile mapping (QM) utilizes the empirical cumulative distributions of the observed and modeled precipitation for the downscaling [5]. GCM-simulated values are "mapped" by quantile onto historical observed data and each simulated quantile value receives its own adjustment. The Quantile Mapping (QM) is a more sophisticated approach for bias-correcting stochastic variables such as precipitation and solar radiation as reported by [6]. Furthermore, [6] that GCM outputs are known to have a "drizzle problem," too many low-magnitude rain events as compared to observations and they do not capture realistic interannual variance correction associated with events such as El Niño and La Niña. GCM outputs are bias-corrected for monthly totals and wet-day frequency using qmap library written in R statistical software. This ensures realistic daily and interannual variance correction. QM is routinely applied to correct biases of climate model simulations compared to time series data [36]. Furthermore, where time series data are of similar resolution as the climate model, QM is a feasible approach. In the case where observations are of much higher resolution, QM also attempts to bridge this scale mismatch. [24] stated that the QM method minimizes the differences between the observed/predicted data based on empirical probability distributions as presented in the equations below.
Where ‫ܨ‬ is the cumulative distribution function of the observed daily data for day i, ‫ܨ‬ ௦ is the cumulative distribution function of the simulated data from historical simulations, and ܻ and ܼ are the simulated and transformed (bias-corrected) data, respectively, for day i (2). [24] described that the transformed predictions have the same probability distribution with the observations, but QM has a limitation in generating distributions on a monthly basis due to the small number of available data points.

Statistical analysis
R Programming qq Plot function in car package was used to generate quantile-comparison plots using modeled and observed data for the baseline. Three statistical tests were used in the analysis: coefficient of determination (ܴ 2 ), Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE) and normalized root mean square error (RMSEn) [24]. The ܴ 2 measures the degree of co-linearity between observations and simulations. The NSE is a normalized statistic that gives the relative magnitude of the residual variance compared to the observed variance and the RMSE is one of the commonly used error index statistics for observed and simulated data. RMSEn values provide a measure (%) of relative differences between observed and simulated output [37][38][39]. The simulation is considered excellent with RMSEn <10%, good if 10-20%, acceptable or fair if 20-30%, and poor >30% [38], [40].

Application and Suitability of the CF, QM and Nudging Bias Correction Methods
Annual comparison of historical observed against QM and nudging of precipitation are presented in Fig 1. and statistics in Table 1. The accuracy of QM and nudging were >30% and considered poor as specified by [38,40] and this can also be seen in Fig. 1. A realistic representation of precipitation amounts in future climate projections from GCMs is crucial for impact and vulnerability assessment [41]. The correction coefficient between the observed and modeled precipitation was poor as presented in Table. The results of the observed and CF precipitation and temperature have similar distributions as presented in Figs. 2, 3, 4 and Fig. 5 The CF method performed better in correcting the bias of the annual precipitation data. The CF [21] used in Agricultural model Inter-comparison and Improvement Project (AgMIP) protocols can be applied to most adaptation activities. On the other hand, CF cannot provide information on future climate changes in high frequency variability that may be critical for specific impact applications such as estimates of peak discharge in hydrological catchments or inputs for crop models. Selecting the best bias correction method can assist in obtaining reliable projected precipitation changes at Mt Makulu in future which can be used in impact studies or as inputs into crop models. Statistical bias correction is commonly applied within climate impact modeling to correct climate model data for systematic deviations of the simulated historical data from observations. All the three bias correction methods can also be applied to seasonal forecasts, with the provision that biases are not only a function of time-of-year, but also a function of lead-time. Application of the statistical properties of the data is limited to the specific timescale of the fluctuations under consideration.
Bias correction methods enables the comparison of observed time series and simulated impacts between the future climate scenarios and the baseline period [20]. Without correcting for bias in the simulated historical period, future impacts that depend on the exceedance of critical absolute thresholds such as temperature cannot be accurately described. Among the bias correction methodologies of climate data that are to serve as input data into impact models such as hydrological and crop simulation models, the QM is widely accepted. QM is able to appropriately bias-correct GCM output for monthly totals and wet-day frequency while ensuring realistic daily and interannual variability [6].
[42] reported that many bias correction methods have been applied in climate impact studies and one widely used method is the quantile mapping (QM). QM adjusts a GCM value by mapping quantiles of the model's distribution onto quantiles of the observed time series data. It has been applied to GCM globally. Researchers such as [4,43] suggested that QM is one of the best bias correction method and quantile mapping on seasonal precipitation trends does not systematically degrade projected differences. As can be seen in Fig. 3, the QM precipitation for the baseline is different from the observed after applying the correction. This finding is supported by [43] who explains that QM can change the GCM trend, so much that the raw GCM modelled change is modified during the bias correction process. This effect is largely due to variability among GCMs. Moreover, this has raised concerns regarding the effect of modifying the precipitation change simulated by GCMs for water constrained regions where climate adaptation plans relies on projected changes in water resources [43]. Nudging (with and without variability) underestimates daily, monthly and annual precipitation amount during bias correction as presented in Figs. 2, 6 and 7. This supported by [26] who observed that despite nudging being robust and easy to implement, it suppresses high-frequency variability and introduces artificial phase lags or shifts. He further indicated that nudging is a widely used technique for online bias reduction, where modelled fields are continuously forced toward observed climatology. [26] stated that conventional nudging is widely used in biogeochemical ocean models. These ocean model simulation climatological nutrient distributions in order to infer net community production and other biogeochemical processes in the oceans. Bias correction as a statistical method fails to discriminate between the physical processes determining trends associated with anthropogenic forcing and shorter-term fluctuations associated with natural internal climate variability [42]. Bias correction is an integral part to downscaling of GCM output [42]. It is not normally expected to replicate the baseline climate perfectly and hence cannot be a substitute for real observations to represent the present climate. The bias corrected data can contribute considerably to the preparation of adaptation options due to uncertainty in climate change signals.

Empirical Quantile-Quantile (QQ) Plots of Modelled and Observed Precipitation
Figs. 10, 11 and Fig. 12 shows quantile-quantile (QQ) plots for modelled precipitation and temperature (QM and nudging with and without variance correction) against baseline observation for Mt Makulu. The discrepancy between the modelled and observed data represents the overall effect of model biases and the illustrativeness of the problem. At the lower and upper tails, a significant fraction of the discrepancy is caused by the scale mismatch between grid-box and local scale and this is also reported by [44]. Additionally, [44] elaborated that even in perfect boundary setting, the trajectories of modelled precipitation might randomly and systematically slightly diverge from the observed trajectories. Furthermore, precipitation exhibits high spatial and temporal variability, the temporal correspondence between grid-box-modelled and observed local-scale daily precipitation is relatively weak.

Projected Change of Precipitation, Minimum and Maximum Temperature
The daily time series from a selected decade for present day conditions (1980 -2000) and future (2020 -2050) precipitation, minimum and  Fig. 3. The methods described above were applied to the historical observations (reanalysis) to produce calibrated projections of future climate (2020-2050). For many scenarios, there is consensus on the direction of change such as a warming climate due to increasing greenhouse gases, but the models may differ greatly on the projected magnitude of the change as observed in Figs. 1-15. One widely used method in bias-correcting precipitation is quantile mapping (QM). The total monthly precipitation from the baseline and GCM output under quantile mapping (Fig. 20) are not significantly different. The figure below shows that the minimum and maximum temperature for the 2020-2050 times slice indicates an increase in temperature. The GCM multi-model averaging or ensemble for mean annual mean minimum and maximum temperature and precipitation total are plotted for all bias correction methods. [45] observed that the ensemble mean serves to filter out biases of individual models. There is some evidence that the multi-model mean field is often in better agreement with observations than any of the fields simulated by the individual GCMs which supports continued reliance on a diversity of modeling approaches in projecting future climate change and provides some further interest in evaluating the multi-model mean results [45].

Annual Change in Mean Temperature and Total Precipitation
The number of days with precipitation and the precipitation amounts (mm/year) during the baseline are 86 days and 753 mm/year, respectively as presented in Table 2. The number of days with precipitation and the precipitation amounts (mm/year) for the future scenarios ranges from 62 -92 days and 211.9 -906 mm/year, respectively. Results indicated that the number of days with precipitation would decrease while the amounts of precipitation would increase during the time slice 2020-2050. On the other hand, calibration results for precipitation show that they will be an increase in the amount of precipitation when quantile mapping is used by 9.1% (Table 2.).
The Change Factor (CF) approaches overestimate the number of days with precipitation and the amount during the baseline due to inherit biases. The use of nudging and CF shows that the amount of precipitation during 2020 -2050 would decrease in the range of 71.86% to 3.90%. Future climate changes in CMIP5 GCM emission scenarios are largely uncertain. It is normally difficult to project climate change signal accurately for the future, especially precipitation, hence uncertain precipitation and temperature values in the original GCMs introduced uncertainties in the generated future scenarios [46]. Only the Quantile mapping method is able to reduce the errors in the high-and low-precipitation characteristics and this finding is supported by [47]. These findings agree with [48], who noted that that utilizing climate model projections poses a challenge for climatologists, crop modeler and decision makers. Projections from a set of models often exhibit considerable scatter and may even differ on the sign of a future climate change (a location would become wetter or drier). Three bias correction methods have been used in this study and the baseline and future climate signals are different. [47] stated that these "bias correction" methods have been developed in an attempt to minimize the biases associated with GCM outputs. [49] showed that the uncertainty due to the choice of calibration methodology is a significant contributor to the uncertainty in future climate scenarios to be used as inputs in crop simulation models. Utilizing different types of calibration methods on different GCM outputs is vital to produce climate data that would ensure robust and reliable crop growth and yield projections.
The mean minimum and maximum temperatures during the baseline are 15.44°C and 28.88°C, respectively ( Fig. 1 and Fig. 2). Results from the GCM outputs show that there would be an increase in temperatures in the range of 1.23-1.97°C for the minimum and 1.45 -2.68°C for the maximum. This study has used an ensemble mean and this supported by [48] who reported that the multi-model average/ensemble is the most commonly cited single estimate of future climate scenarios. In spite of the multi-model ensemble being a single estimate of future climate [48] reiterated that the availability of GCM output from climate model ensembles of the CMIP5 has greatly expanded information about future projections but unfortunately, there is no accepted blueprint for utilizing these outputs.

Monthly Change in Temperature and Precipitation
The CF is the simplest bias correction method that consists of adding the mean change signal to the observations. This method is applicable to any kind of variable but it is preferable not to apply it to bounded variables such as precipitation because values out of range may be obtained as documented by [50]. The ensemble total monthly precipitation and monthly mean minimum and maximum temperature are presented in Figs. 3, 4 and Fig. 5. Results show that there would be reduced precipitation during 2020-2050 for all the bias correction and raw data except for the month of January. On the other hand, the minimum and maximum temperatures would increase. All bias correction method output including raw data indicates that the highest temperature increase would be in the month of November. Statistical bias correction such adding the mean deviation from the observed data to the simulated one, often destroys the physical consistency of the different climate variables. For instance, after applying bias correction the temperature might be zero [20].

CONCLUSION
Bias correction methods differ considerably and can influence the expected local scale or regional climate impacts of climate change. The daily GCM outputs were calibrated using observations (Reanalysis) and bias correction approaches.
The three bias correction methods may widely be adopted for assessing calibration methodologies for crop modeling. Preparing adaptations plans due to climate change are being planned for or ongoing throughout the world. This study suggests a CF to help prepare adaptation and mitigation options due to the negative effects of climate change. Quantile mapping can be used to correct for bias in GCM output for variable such as precipitation while bias correction and change factor could be used for variables such as temperature. The most direct means of obtaining higher spatial resolution climate scenarios is to apply coarse-scale climate change projections to a high resolution of observed climate baseline using the CF method.