Prospects for decadal climate prediction in the Mediterranean region

The Mediterranean region stands as one of the most sensitive to climate change, both in terms of warming and drying. On shorter time‐scales, internal variability has substantially affected the observed climate and in the next decade might enhance or compensate long‐term trends. Here we compare the multi‐model climate predictions produced within the framework of the CMIP5 (Coupled Model Intercomparison Project Phase 5) project with historical simulations to assess the level of multi‐year climate prediction skill in the Mediterranean region beyond that originating from the model accumulated response to the external radiative forcings. We obtain a high and significant skill in predicting 4‐year averaged annual and summer mean temperature over most of the study domain and in predicting precipitation for the same seasons over northern Europe and sub‐Saharan Africa. A lower skill is found during the winter season but still positive for temperature. Although most of this high skill originates from the model response to the external radiative forcings, the initialization contributes to the temperature skill over the Mediterranean Sea and surrounding land areas. The high and significant correlations between the observed Mediterranean temperatures and the observed Atlantic multidecadal oscillation (AMO) in the summer and annual means are captured by the CMIP5 ensemble which suggests that the added skill is related to the ability of the CMIP5 ensemble to predict the AMO. Such a link to the AMO seems restricted to western Africa and summer means only for the precipitation case.


Introduction
The Mediterranean region is projected to be among the most heavily affected by the twenty-first century greenhouse gas induced climate change, with significant regional warming and drying by the end of the century (e.g. Mariotti et al., 2008). In the shorter term, a few decades into the future, decadal variability of both internal and external origin will largely determine the conditions to be experienced in the region, as already seen over the course of the twentieth century (Mariotti and Dell'Aquila, 2012). A key question and one of great societal relevance, is to what extent will natural variability enhance or reduce externally forced changes and for how long. In view of the seemingly robust climate change signal in the Mediterranean region, projected external forcing constitutes a source of decadal predictability for this region. What remains to be understood is the degree of predictability of regional internal decadal variability and its impact on our ability to predict future changes a few years ahead.
Early potential predictability studies highlighted the Atlantic Ocean as the most promising region for decadal climate prediction (Griffies and Bryan, 1997;Boer, 2000;Pohlmann et al., 2004;Collins et al., 2006). The Atlantic Ocean exhibits strong multidecadal variability which involves variations in the strength of the ocean thermohaline circulation associated with a large-scale signature in sea-surface temperatures (SST) referred to as the Atlantic multidecadal oscillation (AMO: Kerr, 2000;Knight et al., 2005Knight et al., , 2006. Much of the current quest to predict future decadal climate variations hinges on the potential predictability associated with the variations of the ocean gyres and the Atlantic meridional overturning circulation (AMOC). At the edge between seasonal forecasting and climate change projections, the decadal climate prediction exercise consists of initializing a climate model from an estimate of the observed climate state while also prescribing the changes in external radiative forcings, assuming these are known (Hawkins and Sutton, 2009;Meehl et al., 2009;Murphy et al., 2010;Doblas-Reyes et al., 2011). Based on this approach, hindcast (i.e. retrospective climate prediction) studies (Smith et al., 2007(Smith et al., , 2012; Keenlyside et al., 2008;Pohlmann et al., 2009) showed that the Atlantic Ocean stands out as a region of increased forecast skill with initialization. The initialization allows for more accurate predictions of the AMO compared to coupled experiments accounting only for the model accumulated response to changes in external radiative forcings , in particular in the multi-model ensemble climate hindcasts produced in the framework of the Climate Model Intercomparison Project Phase 5 (CMIP5: Doblas-Reyes et al., 2013;. A recent study using observational data and a regional climate model (Mariotti and Dell'Aquila, 2012) suggests a linkage between the Atlantic multidecadal oscillation/variability (AMO/AMV) and decadal climate variability in the Mediterranean region. Mediterranean SSTs exhibit AMO-like variability, which in turn is reflected in AMO-like evaporation variability (Fenoglio-Marc et al., 2013). There is also a linkage between the AMO and Mediterranean surface air temperatures which appears to be strong in summer, also present to a lesser extent in the transition seasons, but negligible in winter. The modelling results from Mariotti and Dell'Aquila (2012) suggest that atmospheric processes are the primary mechanism connecting the AMO with the Mediterranean rather than a direct influence of the Atlantic on the Mediterranean Sea. This study therefore suggests the potential for a certain degree of decadal climate (namely, temperature and evaporation) predictability in the Mediterranean region, to the extent that the AMO is predictable. In contrast, decadal variability of regional precipitation was found to be largely affected by the seasonal manifestations of the North Atlantic Oscillation (NAO), rather than the AMO. It is currently unknown whether there is any predictability in NAO variability. Hence the prospects for predicting regional precipitation variability, outside of what may be the externally forced long-term drying trend, are also unclear.
In this article, we aim at assessing the level of decadal climate prediction skill in the Mediterranean region beyond that originating from the model accumulated response to the external radiative forcings, and the seasonal dependence of this skill, contrasting summer (defined as June to August throughout our study) and winter (December to February) values. We also aim at assessing to what extent the added-value from the initialization is originating from the predictability of the AMO and its linkage to the Mediterranean region. It should be noted that the focus of the present study is not on the AMO drivers (Ottera et al., 2010;Booth et al., 2012;Terray, 2012;Zhang et al., 2013) but on its prediction skill and the predictability gained from simulating its teleconnections, in particular over the Mediterranean region as suggested by Matei et al. (2012). These assessments rely on the CMIP5 multi-model ensemble (MME) climate hindcasts (Taylor et al., 2012) which are introduced in section 2 together with the observational data used for verification and the skill scores used to measure the performance of this MME. Section 3 then illustrates the performance of the CMIP5 MME in predicting Mediterranean temperatures and precipitation, and compares this skill with the one from historical climate simulations which do not take into account the internal variability as a source of predictability and which contain potential accumulated errors in the model response to the external radiative forcings. The added-value from the initialization is then related to the ability of the MME to capture the linkage between the AMO and those regional climate variables. Section 4 and 5 provide respectively a discussion of major results and the conclusions from our study.

Model data
The multi-model ensemble (MME) analysed here includes the  contributions to the CMIP5 project produced with the HadCM3,  MRI-CGCM3, MIROC4h, MIROC5, CanCM4, CNRM-CM5,  MPI-M, GFDL-CM2, CMCC-CM, IPSL-CM5 and EC-Earth  v2.3 coupled climate models. These contributions consist in 10year long hindcasts initialized from an estimate of the observed climate state every 5 years over the period 1960-2006 between 1 November and the following 1 January. This ensemble is referred to as Init hereafter. Complete calendar years from January to December are considered as forecast years here. Two ranges of forecast years are analysed in the following: from forecast years 2 to 5 and 6 to 9, which are classically used as target periods for analyses of climate predictions . The MME hindcast set gives ten predicted values for a specific forecast range. The forecast fields are interpolated using a conservative method onto the grid of the observational dataset used for verification before any analysis. More details on the model experiments analysed in this work are provided in Table 1. A sister ensemble has been built from the 'historical' coupled simulations and the first years of the climate projections following the CMIP5 RCP4.5 (Representative Concentration Pathway 4.5 W m −2 : Taylor et al., 2012) scenario run with the same models. In these experiments, the simulated natural variability is not constrained to follow the contemporaneous observed one except through the effect of the external radiative forcings, which are exactly the same as in the Init MME. Additionally, errors in the response to the external radiative forcings could accumulate over time. This ensemble is referred to as NoInit hereafter. Both Init and NoInit CMIP5 experiments include observed timedependent solar irradiance variations and volcanic aerosols until 2005 whereas these would not be known in a true prediction exercise.
The Init and NoInit ensembles do not contain the same number of members. We have checked that restricting those ensembles to the subsets allowing for the same number of members for each model in the two ensembles do not affect our main conclusions (not shown).

Observational data
For temperature verification, we have used a combination of land surface temperatures from the National Centers for Environmental Prediction (NCEP) GHCN/CAMS dataset (Fan and van den Dool, 2008), SST from the National Climatic Data Center (NCDC) ERSST v3b dataset (Smith et al., 2008) and, north of 60 • N, the GISTEMP dataset with 1200 km decorrelation scale (Hansen et al., 2010). This dataset covers the 1960-2012 period. For precipitation, a land-only monthly dataset is considered because of the sparse observational coverage over the oceans before 1979. We use the Global Precipitation Climatology Centre (GPCC) v5 data (Rudolf et al., 2010) for the 1960-2012 period. For sea-level pressure, we have used the HadSLP2 dataset (Allan and Ansell, 2006) for the 1960-2004 period.

Forecast skill assessment
In our study we focus on the analysis of the skill for the annual values and contrast the summer (JJA) and winter (DJF) values. We compute JJA, DJF and annual means for both the model experiments and observations. The performance of the MME is assessed for the average of the predicted values over forecast years or seasons 2-5 and 6-9. For the winter season, the average over forecast seasons 2-5 and 6-9 uses the December values from years 1 to 4 and 5 to 8 respectively. When initialized from an estimate of the observed climate state, climate models drift towards their attractor, which leads to systematic errors in Table 1. CMIP5 model experiments analysed in this work. The number of ensemble members analysed in the Init and NoInit MME are specified.

Model
Initialization methodology No. of Init members No. of NoInit members HadCM3 (Gordon et al., 2000;Pope et al., 2000) Coupled anomaly assimilation: ERA-40 and ERA Interim atmospheric reanalyses, ocean observations 10 10 MRI-CGCM3 (Yukimoto et al., 2001;Tatebe et al., 2012) Full-field assimilation 9 1 MIROC4  Assimilation in the coupled model of ocean anomalies of gridded subsurface observations of temperature (T) and salinity (S) 3 3 MIROC5 (Hasumi and Emori, 2004;Watanabe et al., 2010) Assimilation in the coupled model of ocean anomalies of gridded subsurface observations of temperature (T) and salinity (S) 6 1 CANCM4 (Fyfe et al., 2011;Merryfield et al., 2013) Coupled assimilation of the ERA40 and ERA-Interim atmospheric reanalyses, observed SSTs and SODA and GODAS subsurface ocean T and S, beforehand adjusted to preserve T-S relationship 10 10 CNRM-CM5 (Voldoire et al., 2013) Ocean T and S nudged (3D Newtonian damping) toward NEMOVAR-COMBINE in the coupled model; NEMOVAR-COMBINE reanalysis produced using multivariate 3D-VAR data assimilation in NEMO ocean model 10 6 MPI-M (Jungclaus et al., 2006;Matei et al., 2012) Nudging in the coupled model of T and S anomalies obtained from an ocean-only run forced with NCEP atmospheric reanalyses 10 3 GFDL-CM2 (Delworth et al., 2006;Yang et al., 2013) Coupled assimilation of atmospheric reanalysis and ocean observations of 3D T and S and SST 10 10 CMCC-CM (Scoccimarro et al., 2011;Bellucci et al., 2013) Full field ocean initialization from three different realizations of CMCC-INGV ocean synthesis of T and S 3 1 IPSL-CM5 (Dufresne et al., 2013;Swingedouw et al., 2013) Nudging in the coupled model of SST anomalies to Reynolds observations 6 3 EC-Earth v2 (Hazeleger et al., 2010(Hazeleger et al., , 2013Du et al., 2012) Full-field initialization: ERA-40 and ERA-Interim atmosphere/ land reanalyses; NEMOVAR-S4 ocean reanalysis 10 11 climate forecasts. These systematic errors depend on the forecast time and we therefore apply a drift correction to our multi-model ensemble (Goddard et al., 2013;Meehl et al., 2014). Anomalies are computed using the so-called 'per-pair' methodology (García-Serrano and Doblas-Reyes, 2012): the modelled or observed climatology is defined as a function of the forecast time (multiyear or multi-season average), by averaging the predicted variable (temperature or precipitation) across the start dates taking into account only the predicted values for which observations are available at the corresponding date; a forecast-time-dependent climatology is computed for each individual model averaging all members. The climatology of each model is then subtracted from each hindcast to obtain anomalies over the whole predicted period for each member of each model: where X msf is the model variable at forecast time f , for start date s and member m, O sf is the corresponding observation, X msf and O sf , their respectively anomalies, M is the number of members, S the number of start dates and NA refers to a missing value. Then, ensemble-mean anomalies are computed for each model averaging all its members and the multi-model ensemble-mean anomalies are computed giving an equal weight to each model, regardless of the number of ensemble members available for that model. The same method is applied to the observations to obtain anomalies over the whole verification period. All the climate prediction scores described below are computed from those anomalies.
Hindcast prediction skill is measured using the anomaly correlation coefficient (ACC), the mean square skill score (MSSS) or the Brier Skill Score (BSS). The ACC is computed by correlating the predicted MME anomalies with the observed ones across the start date dimension: where X sf is the MME mean anomaly for forecast time f and start date s, and O sf is the corresponding observed anomaly. A t-test, after a Fisher-Z transformation, is performed to assess the significance level of an ACC or of a difference in ACC, accounting for the autocorrelation of the data when computing the degrees of freedom following von Storch and Zwiers (2001). Indeed, hindcasts initialized every 5 years cannot be considered as independent when attempting to predict decadal climate variability, hence our reduction of the degrees of freedom based on von Storch and Zwiers (2001). It is worth noting that only 9 or 10 observations (when verifying predictions for forecast years 6-9 or 2-5 respectively) can be used to compute the ACC or any other skill score, and those 9-10 data are reduced by about a factor 2 to obtain the number of effective independent data. Increasing the frequency of the start dates would not increase significantly the number of independent data, since the additional hindcasts would still not be independent for analyses of decadal climate variability. Extending backward the period which is sampled by our start dates would be necessary to obtain more independent hindcasts and more robust estimates of the added-value from initialization in terms of skill scores. Obtaining accurate initial conditions is unfortunately harder to achieve as we extend the hindcast period back into the past. The MSSS is computed as 1 minus the ratio of the mean square error (MSE) of the predicted per-pair MME anomalies over the MSE of a climatological forecast (Goddard et al., 2013): A Fisher test is performed to assess the significance level of an MSSS accounting for the autocorrelation of the data following von Storch and Zwiers (2001). Since the MME is used in the forecast skill assessment through the ACC and the MSSS, externally forced signals and any signal due to the initialization will stand out (that is, only the robust patterns consistent across members and models will stand out). In contrast, any 'noise' will be to a large extent eliminated by the MME averaging operation.
The BSS is computed for the lower, middle and upper terciles: where p sft is the forecast probability for a given tercile t to occur, for forecast time f and start date s, o sft is its observed probability (0 or 1) and ot is its observed climatological probability. The model terciles and the model forecast probabilities are estimated from the multi-model ensemble of model mean anomalies which makes an ensemble of 11 values (models) for each start date, each variable and each forecast average. This approach ensures equal representativeness for all models in a similar fashion as we operated for the deterministic skill measures (ACC and MSSS). The skill gained by the initialization process is assessed by computing the difference between Init and NoInit ACC. Whenever this difference in skill is positive the initialization process increases the skill beyond what is achievable based on the integration of the model response to external radiative forcing only. For temperature, we compute additionally the skill scores after detrending the observed, Init and NoInit grid-point temperature by subtracting their respective global land mean averaged between 60 • S and 60 • N at land points and global ocean mean averaged between 60 • S and 60 • N at ocean points. We have checked that a linear detrending led to similar qualitative conclusions (not shown). The robustness of our conclusions on the added-value from the initialization is also assessed by computing the ratio between the root mean square error (RMSE) of Init over the RMSE of NoInit. A Fisher test is performed to assess the significance level of this ratio accounting for the autocorrelation of the data following von Storch and Zwiers (2001).
The linkage between the AMO and Mediterranean temperature and precipitation is assessed via correlation maps of observed temperature and precipitation with the observed AMO index. The AMO index used here is defined as in Trenberth and Shea (2006): the SST averaged over the North Atlantic (80 • W-0 • E/EQ-60 • N) minus the global (60 • S-60 • N) SST average. Previous research shows that this definition largely decontaminates the AMO index from the long-term warming (e.g. van Oldenborgh et al., 2009van Oldenborgh et al., , 2012. This AMO index represents the differential warming of the North Atlantic Ocean with respect to the global oceans. The ability of the CMIP5 MME to capture the linkage between the AMO and Mediterranean temperature and precipitation is assessed by computing similar correlation maps but for the predicted variables. The temperature is detrended prior to the computation of those correlation maps whereas the precipitation is not. The subsequent comparison between the AMO correlation patterns and the ACC patterns aims at highlighting any AMOderived predictability. By extending these analyses to the sea-level pressure, we aim to highlight the atmospheric mechanisms by which the AMO predictability contributes to the Mediterranean temperature and precipitation skill and to which extent those are captured by our MME.

Temperature skill and role of the AMO
The Init ACC maps (Figure 1(a,b)) show a high and significant skill over most of the study domain for the temperature averaged over forecast years 2-5 (Figure 1(a)) and 6-9 (Figure 1(b)). Though slightly lower than the ACC, the MSSS maps show a very consistent pattern of high and broadly significant skill (Figure 1(g,h)). Most of this high skill originates from the external radiatively forced climate signal since the differences in ACC between Init and NoInit are small (Figure 1(c,d)). None of the differences in ACC are significant. However, as mentioned in section 2.3, only 9 ( Figure 1(b,d,f,h,j)) or 10 ( Figure 1(a,c,e,g,i)) data are used to compute the ACC, and those 9-10 data, only 5 years apart, cannot be considered as independent, hence a high significance threshold. Additionally, the skill in NoInit is already very high due to the long-term warming trend. Hence, any increase in ACC when capturing the superimposed internal variability or correcting the model response to the external radiative forcings prior to the initialization date is of minor magnitude compared to the NoInit ACC. Despite the lack of significance of the differences in ACC, those differences show an increase in ACC over the eastern Atlantic and the Mediterranean Sea. To highlight the added-value from the initialization, we re-compute the maps of difference in ACC between Init and NoInit after detrending the grid-point observed and modelled (both Init and NoInit) temperature as described in section 2.3 ( Figure 1(e,f)). It highlights the increase in ACC thanks to the initialization over the eastern Atlantic and the Mediterranean Sea and a decrease over northern Europe and sub-Saharan Africa. The ratio between the Init RMSE and the NoInit RMSE ( Figure 1(i,j)) exhibits again a similar pattern of improved performance with initialization over the eastern Atlantic and the Mediterranean Sea and of decreased performance over northern Europe and sub-Saharan Africa. The probabilistic verification ( Figure S2) shows broadly consistent features apart from over sub-Saharan Africa where there is no clear signal. In agreement with the conclusions from the deterministic verification, we observe: (i) a larger BSS in Init than NoInit over the eastern Atlantic for all terciles and both forecast averages except for the lower tercile and forecast years 6-9, (ii) a larger BSS in Init than NoInit over the Mediterranean Sea for the middle tercile but a noisier response for the other two terciles, and (iii) a lower BSS in Init than in NoInit over northern Europe for the lower and upper terciles but a noisier response for the middle tercile.
The increase in ACC over the eastern Atlantic and the Mediterranean Sea with initialization might be associated with their linkage to the AMO, the latter already known to have an increased skill with initialization in the CMIP5 MME . The observed annual AMO index underwent a positive phase during the first decade of the hindcast period, followed by a negative phase that lasted about three decades and a final positive phase, stronger than the first positive phase (Figure 2(a,b)). Those multidecadal oscillations produce a slight positive trend over the hindcast period. NoInit does not exhibit more than this slight positive trend at both forecast averages. A higher and significant skill in predicting the annual AMO index is obtained in Init which captures the multidecadal oscillations, but also successfully reproduces the minimum in the 1970s and the recent maximum at forecast years 2-5. The positive trend is less pronounced in the winter (Figure 2(c,d)) than in the annual mean AMO index. Accordingly a larger added-value of the initialization appears in winter: the difference between the Init and NoInit ACC is larger. By contrast, the positive trend is more pronounced in summer (Figure 2(e,f)) and the differences between Init and NoInit performance weaker. The difference between Init and NoInit performance decreases at forecast years 6-9 as compared to forecast years 2-5 ( Figure 2(b,d,f)).
The correlation maps of observed temperature with the observed AMO index (Figure 3(a,b)) exhibit a rotated T-structure, with positive correlation covering the eastern Atlantic, northern Africa and the Mediterranean Sea while negative correlations appear over Europe. The maps for forecast years 2-5 ( Figure  3(a)) and 6-9 (Figure 3(b)) are very similar even though they are based on different observed time windows (1962-1965 to 2006-2009 for forecast years 2-5; 1966-1969 to 2006-2009 for forecast years 6-9) which confirms the robustness of this correlation pattern against sampling (which is not the case for Skill in annual temperature.  trend, related to its negative-to-positive phase transition described above, which might correlate well with the external radiatively forced warming signal. Subtracting the correlation between the NoInit AMO and temperature from the correlation between the Init AMO and temperature allows isolating the linkage between the temperature and the multidecadal AMO oscillations around the model accumulated response to the external radiative forcings, identifying the AMO-related signature coming from the initialization. Such differences between Init and NoInit correlations of the AMO index with temperatures ( Figure 3(e,f)) share strong similarities with the differences between Init and NoInit ACC after detrending (see Figure 1(e,f)). The ability of the Init MME to capture the observed AMO variability seems to contribute substantially to its skill in predicting multi-year averaged temperature over our study domain. In boreal summer, the Init ACC is similarly high and significant in most of the study domain (Figure 4(a,b)), consistent with the Init MSSS pattern, and mostly originating from the externally forced climate signal. The differences in ACC between Init and NoInit, though still small, show a slightly different pattern than for the annual mean temperature, and are less spatially coherent. The ratio of the Init RMSE over the NoInit RMSE provides consistent patterns of added-value from the initialization with the differences between Init ACC and NoInit ACC. There is still an increased skill over the eastern Atlantic with initialization but the added-value over the Mediterranean Sea is less pronounced. There are also two additional areas where the initialization tends to increase the skill: over eastern Europe and sub-Saharan Africa. In those two regions, the correlation between the observed JJA AMO index and the observed JJA temperature has an opposite sign to the one for annual temperature: it becomes positive over eastern Europe and negative over sub-Saharan Africa (Figure 5(a,b)). The latter is probably due to the positive link between the AMO and the West African monsoon (Mohino et al., 2011;van Oldenborgh et al., 2012), which negatively feedbacks on local land-surface temperature (Giannini et al., 2003(Giannini et al., , 2005Kucharski et al., 2013). The correlation between the Init JJA AMO index and the Init JJA temperature is positive over most of the domain except for sub-Saharan Africa ( Figure 5(c,d)), as in the annual correlation maps (Figure 3(c,d)). The Init and observed correlation maps are therefore more consistent over eastern Europe and sub-Saharan Africa for the JJA means than for the annual ones, hence a larger added-value of the initialization in JJA over those two regions. The probabilistic verification ( Figure S3) consistently shows an increased BSS with initialization in the eastern Atlantic for the  upper and middle terciles and in the Mediterranean region and over eastern Europe for all the terciles and forecast averages. The winter (DJF) stands out as the season for which the increase in ACC with initialization is the largest, especially over the oceans (Figure 6(c-f)), and consistently the season for which the decrease in RMSE is the largest (Figure 6(i,j)). The skill, both deterministic ( Figure 6) and probabilistic ( Figure S4), remains improved over the eastern Atlantic but it is also improved over eastern North Africa, the eastern Mediterranean and eastern Europe, in terms of RMSE and ACC. The skill is strongly reduced though over western Africa and western Europe, especially for forecast years 2-5, and over the Arabian Peninsula. DJF is also the season for which the Init ACC (Figure 6(a,b)) and MSSS (Figure 6(g,h)) skill are the lowest, as well as the BSS for all terciles ( Figure S4). The warming trend has indeed lower relative amplitude in this season compared to the amplitude of the internal natural variability ( Figure S1). The correlation maps of observed DJF temperature with the observed DJF AMO index rather exhibit an L-pattern with positive values over the eastern Atlantic, northern Africa, eastern Mediterranean Sea and the Arabian Peninsula, and negative values over Europe (Figure 7(a,b)). This pattern is however not robust against the sampling (Figure 7(a,b)) and slightly different from the one obtained by Mariotti and Dell'Aquila (2012) over a different period. The observed linkage between the DJF AMO and the DJF temperature is not properly captured by Init over Europe. Over the Mediterranean Sea, the signature is noisy (Figure 7(c,d)). The increase in skill with initialization in those regions therefore does not seem to be related to the ability of the MME in predicting the AMO. However, Init is able to simulate the observed AMO correlation pattern over the eastern Atlantic and northern Africa reasonably well and over the Arabian Peninsula to some extent. Although boreal winter (DJF) is the season for which the initialization allows for the largest increase in skill in predicting the AMO index (Figure 2(c,d)), there is little agreement between the pattern of differences in ACC skill Init-NoInit and the pattern of Init-NoInit differences in correlations of the temperature with the AMO index (Figure 7(e,f)). The ability of the MME to capture the observed winter AMO seems to contribute to the skill in predicting winter temperature only over confined regions such as the Egypt-Sudan area and the eastern Atlantic.

Precipitation skill and role of the AMO
As discussed in the introduction, previous studies have not highlighted any AMO-related precipitation variability in the Mediterranean. Hence the basis for decadal precipitation prediction is currently unclear. The precipitation patterns additionally exhibit less spatial coherence than the temperature patterns. We therefore face a double challenge for precipitation forecast skill: the lack of knowledge about predictability sources and the spatial noise. Nevertheless, an exploratory analysis of  decadal prediction skill is presented here to potentially gain further insight in regional precipitation predictability including that from externally forced causes. Despite the noisy ACC Init patterns for total precipitation (due to the low spatial coherence of precipitation events compared to temperature), significant ACC appear over northern Europe and over sub-Saharan Africa for both forecast year ranges: 2-5 and 6-9 (Figure 8(a,b)). A high MSSS is also seen in those two regions, though not significant (Figure 8(e,f)) as well as a high BSS for the lower and upper terciles ( Figure S5). The Init skill is however generally lower for precipitation than for temperature, mainly because the externally forced signal has lower relative amplitude compared to the internally generated variability ( Figure S1). The high skill in northern Europe and sub-Saharan Africa originates mainly from the accumulated model response to the external radiative forcing since weak differences in ACC appear between Init and NoInit (Figure 8(c,d)), except for part of western Africa and for the south-eastern tip of the Arabian Peninsula. A larger BSS in Init than NoInit is also seen over part of western Africa for the lower and upper terciles and for both forecast averages ( Figure S5). The longterm warming has been shown to have an impact on the West African monsoon, with maximum amplitude over the western Sahel (Ting et al., 2009;Mohino et al., 2011;Smith et al., 2012). A larger added-value from the initialization (Figure 8(c,d,g,h)) is obtained for precipitation than for temperature, because the signal from internal variability has larger relative amplitude compared to the externally forced signal (Figure S1), hence a larger room for improvement with initialization. The patterns of precipitation forecast skill are however much less spatially coherent than those of temperature hence we have a lower confidence in their physical robustness. The correlation maps of observed precipitation with the observed AMO index exhibit a tripolar pattern, with a positive band over northern Europe, a negative band extending from western Europe toward the Arabian Peninsula and a positive band over northern Africa (Figure 9(a,b)). This tripolar pattern is captured by Init, although with a slight southward (northward) shift of the negative band over western (eastern) Europe (Figure 9(c,d)). The correlations are higher and less noisy in Init than in the observations since the MME mean operator smooths out any signal other than Skill in DJF temperature. externally forced or persisting initial conditions. The core of the positive correlation bands, i.e. over northern Europe and sub-Saharan Africa, which are well captured by Init, corresponds to high ACC and MSSS in Init. In transitional areas between positive and negative correlation bands of precipitation with the AMO index (southern Europe and the Mediterranean Sea), Init fails to capture accurately the linkage between precipitation and the AMO (slight shift of the patterns) and the ACC and MSSS are low in those areas. The only region where the Init-NoInit differences in correlation with the AMO index (Figure 9(e,f)) are consistent with the pattern of Init-NoInit differences in ACC skill (Figure 8(c,d)) locates over the westernmost part of North Africa.  Figure 7. Correlation between the AMO index and the winter (DJF) temperature anomalies from (a,b) the observational datasets, and (c,d) the ensemble-mean decadal hindcasts, averaged along the forecast times ranging from (a,c,e) the second to fifth years and (b,d,f) the sixth to ninth years. Third row: difference between (c,d) and its equivalent in NoInit. The temperature anomalies have been detrended by subtracting the global mean SST to the grid-point SST and the global land temperature to the grid-point land temperature prior to the computation of correlation. The black dots indicate the regions where the correlation or difference in correlation is significant at the 95% level.

RMSE Init / NoInit
In summer (JJA), high and significant ACC cover also most of northern Africa and a few points in northern Europe (Figure 10(a,b)). The pattern of Init skill is robust against the score considered but the MSSS (Figure 10(e,f)) and the BSS ( Figure S6) show more modest performances. MSSS are significant over Egypt but this performance is not supported by the other skill scores. Although the differences in ACC between Init and NoInit are not significant, a spatially coherent increase in ACC with initialization appears over western Africa (Figure 10(c,d)), together with a decrease in RMSE (Figure 10(g,h)). This low but positive added-value from the initialization in multi-year prediction skill over the West African monsoon region might be associated with the positive, although not statistically significant, skill of the Sahelian precipitation regime found by García-Serrano et al. (2013). The correlation maps of observed JJA precipitation with the observed JJA AMO index exhibit a similar tripolar pattern to the annual mean maps, but with a negative band centred on the Mediterranean Sea (Figure 11(a,b)). The positive centre of action over Africa is well captured by Init, including its shape, which could explain Init's high skill in predicting JJA precipitation in this area (Figure 11(b,c)). As for annual means, the westernmost part of North Africa is the region showing the largest consistency between Init-NoInit differences in correlation with the AMO index ( Figure 11(e,f)) and Init-NoInit differences in ACC skill (Figure 10(c,d)). This result is consistent with previous evidence on the predictor role of the AMO upon the Sahelian rainfall (Mohino et al., 2011;van Oldenborgh et al., 2012;García-Serrano et al., 2013).
The winter Init ACC patterns are particularly noisy, with a few points of significant ACC surrounded by negative ones spread over northern Africa and a few positive ACC over northern Europe ( Figure S7). The MSSS show a significant skill over the eastern Sahel which does not correspond to any positive ACC but with a decrease in BSS for all terciles and forecast averages ( Figure S8). The correlation maps of observed DJF precipitation with the observed DJF AMO index exhibit a very different pattern from the summer one with rather a dipolar structure across the western Mediterranean which Init fails to capture ( Figure S9).

Discussion
As mentioned in section 3.1, the AMO fluctuations exhibit a recent phase transition from negative conditions to positive Skill in annual precipitation. conditions, which shapes a positive trend. Note that this index is computed as the SST averaged over the North Atlantic minus the global SST average to minimize the effect of the long-term warming. For consistency, the temperature has been detrended by subtracting its global ocean mean at ocean grid points and its global land mean at land grid points prior to computing the correlation between the AMO and the grid-point temperature as explained in section 2.3. The correlation between the AMO and grid-point temperature anomalies shown in Figures 3, 5 and 7 might be affected by a residual warming signal in the gridpoint temperature. Similar qualitative results were obtained with a linear detrending method (not shown) but such a method does not filter out perfectly the warming signal either. Whether this residual warming and the warming trend in the AMO are related to a larger sensitivity to radiative forcings in our study domain than in other regions or whether it is related to internal variability remains unclear. Given the shortness of the available time series, a proper assessment is highly challenging. For precipitation, we Correlation annual precipitation with AMO index Figure 9. Correlation between the AMO index and the precipitation anomalies from (a,b) the observational datasets and (c,d) the ensemble-mean decadal hindcasts, averaged along the forecast times ranging from (a,c,e) the second to fifth years and (b,d,f) the sixth to ninth years. (e,f) Difference between (c,d) and its equivalent in NoInit. The black dots indicate the regions where the correlation or the difference in correlation is significant at the 95% level.
have chosen not to detrend because of the regional sensitivity to climate change and the uncertainties about whether those changes are attributable to climate change or not (Trenberth et al., 2007). As explained in section 2.3, only 9-10 data can be used to compute the skill scores shown in this article and those 9-10 data are not independent since the CMIP5 retrospective predictions are initialized every 5 years and we focus on decadal climate variability. An extension of the period sampled by our start dates would allow for more independent hindcasts and more robust estimates of the added-value of initialization in terms of skill scores. However, an extension backward in time is highly challenging given the lack of accurate observational data to initialize the predictions. The denser observational network that grows with time will allow for a forward extension in the coming decades and therefore for more robust estimates of the prediction skill scores. Likewise, 9-10 data are used to detrend time series, which questions the accurate applicability of any method. This leads to consider that detrending methodologies represent an important source of uncertainty in decadal climate prediction. Mariotti and Dell'Aquila (2012) suggested that the linkage between the AMO and the Mediterranean region relies mainly on atmospheric processes. To assess to what extent those atmospheric mechanisms are captured by the CMIP5 MME, we computed correlation maps of the AMO index with the sea-level pressure in both the observations and the model data for annual means ( Figure S10), summer ( Figure S11) and winter ( Figure S12) values. The pattern obtained for the observations is not robust against the sampling, contrary to the patterns obtained previously for temperature or precipitation, which suggests that the sea-level pressure variability is dominated by other factors and we do not have enough data to isolate accurately the AMO influence on this variable. The MME fails to capture the observed patterns, but it remains unclear whether this failure is only an apparent failure due to a sampling issue or a real failure suggesting some room for improvement of the skill in predicting Mediterranean decadal climate variability through a better representation of the atmospheric mechanism linking the Atlantic to the Mediterranean region.

Conclusions
In this article, we have used the extensive CMIP5 database of multi-model ensemble decadal hindcasts to assess the stateof-the-art skill in predicting Mediterranean temperature and  precipitation and the reasons behind this skill. High and significant skill in predicting 4-year averaged annual and summer mean temperature is found over most of our study domain. Still positive but slightly lower skill is found during the winter season. Correlation JJA precipitation with AMO index Figure 11. Correlation between the AMO index and the summer (JJA) precipitation anomalies from (a,b) the observational datasets, and (c,d) the ensemble-mean decadal hindcasts, averaged along the forecast times ranging from (a,c,e) the second tofifth years and (b,d,f) the sixth to ninth years. (e,f) Difference between (c,d) and its equivalent in NoInit. The black dots indicate the regions where the correlation or difference in correlation is significant at the 95% level.
skill with initialization, the linkage between the AMO and the Mediterranean temperatures is poorly captured by the multimodel ensemble. The increased skill in annual mean temperature with initialization seems to originate from the winter for the eastern Mediterranean and from the summer for the western Mediterranean. The results for the annual means are more robust than the ones for the seasonal means though. High and significant skill is also found for annual and summer mean precipitation over northern Europe and sub-Saharan Africa, which is mostly explained by the model accumulated response to the external radiative forcings except over western Africa in summer. The positive correlation between summer observed AMO and precipitation in the latter region are well captured by the multi-model ensemble which seems to contribute to the precipitation forecast skill. The AMO-related skill in summer precipitation does not seem to compensate for the weak signal in winter. The winter precipitation forecast skill is much lower and noisier than the summer and annual mean ones but improving the winter precipitation forecast skill would be highly relevant. Improving the precipitation forecast skill stands as a challenge though, given the current lack of knowledge about its predictability sources. NA10OAR4310208. Oriol Mula-Valls and Domingo Manubens-Gil are thoroughly acknowledged for their invaluable technical support.

Supporting information
The following supporting information is available as part of the online article: Figure S1. Ratio of the slope of the long-term trend over the standard deviation of the departures from the long-term trend in the observational datasets of temperature (first column) and precipitation (second column). The trend and the departures from the trends are computed from yearly means (first row), summer means (second row) and winter means (third row) after applying a 4-year running mean. Figure S2. Brier Skill Score (BSS) of the multi-model ensemble temperature anomalies averaged along the forecast times ranging from the second to the fifth years (left) and the sixth to the ninth years (right). The first two rows provide the BSS for the upper tercile in the decadal hindcasts and the historical simulations. The third and fourth rows provide the BSS for the middle tercile. The bottom two rows provide the BSS for the lower tercile. Figure S3. Brier Skill Score (BSS) of the multi-model ensemble summer (JJA) temperature anomalies averaged along the forecast times ranging from the second to the fifth years (left) and the sixth to the ninth years (right). The first two rows provide the BSS for the upper tercile in the decadal hindcasts and the historical simulations. The third and fourth rows provide the BSS for the middle tercile. The bottom two rows provide the BSS for the lower tercile. Figure S4. Brier Skill Score (BSS) of the multi-model ensemble winter (DJF) temperature anomalies averaged along the forecast times ranging from the second to the fifth years (left) and the sixth to the ninth years (right). The first two rows provide the BSS for the upper tercile in the decadal hindcasts and the historical simulations. The third and fourth rows provide the BSS for the middle tercile. The bottom two rows provide the BSS for the lower tercile. Figure S5. Brier Skill Score (BSS) of the multi-model ensemble precipitation anomalies averaged along the forecast times ranging from the second to the fifth years (left) and the sixth to the ninth years (right). The first two rows provide the BSS for the upper tercile in the decadal hindcasts and the historical simulations. The third and fourth rows provide the BSS for the middle tercile. The bottom two rows provide the BSS for the lower tercile. Figure S6. Brier Skill Score (BSS) of the multi-model ensemble summer (JJA) precipitation anomalies averaged along the forecast times ranging from the second to the fifth years (left) and the sixth to the ninth years (right). The first two rows provide the BSS for the upper tercile in the decadal hindcasts and the historical simulations. The third and fourth rows provide the BSS for the middle tercile. The bottom two rows provide the BSS for the lower tercile. Figure S7. First row: Correlation between the ensemble-mean winter (DJF) predicted and observed precipitation anomalies averaged along the forecast times ranging from the second to the fifth years (left) and the sixth to the ninth years (right). Second row: Difference between the correlations in the decadal hindcasts and the equivalent in the historical simulations. Third row: Mean Square Skill Score of the ensemble-mean predicted anomalies. Fourth row: Ratio of the root mean square error (RMSE) of the decadal hindcasts with respect to the historical simulations. The black dots indicate the regions where the correlation, the difference in correlation, the MSSS or the ratio of the RMSE is significant at the 95% level. Figure S8. Brier Skill Score (BSS) of the multi-model ensemble winter (DJF) precipitation anomalies averaged along the forecast times ranging from the second to the fifth years (left) and the sixth to the ninth years (right). The first two rows provide the BSS for the upper tercile in the decadal hindcasts and the historical simulations. The third and fourth rows provide the BSS for the middle tercile. The bottow two rows provide the BSS for the lower tercile. Figure S9. Correlation between the AMO index and the winter (DJF) precipitation anomalies from the observational datasets (first row), and the ensemble-mean decadal hindcasts (second row), averaged along the forecast times ranging from the second to the fifth years (left) and from the sixth to the ninth years (right). Third row: Difference between second row and its equivalent in NoInit. The black dots indicate the regions where the correlation or difference in correlation is significant at the 95% level. Figure S10. Correlation between the AMO index and the sealevel pressure anomalies from the observational datasets (first row), the ensemble-mean decadal hindcasts (second row), and the historical simulations (third row), averaged along the forecast times ranging from the second to the fifth years (left) and from the sixth to the ninth years (right). The black dots indicate the regions where the correlation is significant at the 95% level. Figure S11. Correlation between the AMO index and the summer (JJA) sea-level pressure anomalies from the observational datasets (first row), the ensemble-mean decadal hindcasts (second row) and the historical simulations (third row), averaged along the forecast times ranging from the second to the fifth years (left) and from the sixth to the ninth years (right). The black dots indicate the regions where the correlation is significant at the 95% level. Figure S12. Correlation between the AMO index and the winter (DJF) sea-level pressure anomalies from the observational datasets (first row), the ensemble-mean decadal hindcasts (second row) and the historical simulations (third row), averaged along the forecast times ranging from the second to the fifth years (left) and from the sixth to the ninth years (right). The black dots indicate the regions where the correlation is significant at the 95% level.