Improving the long-lead predictability of El Niño using a novel forecasting scheme based on a dynamic components model

El Niño (EN) is a dominant feature of climate variability on inter-annual time scales driving changes in the climate throughout the globe, and having wide-spread natural and socio-economic consequences. In this sense, its forecast is an important task, and predictions are issued on a regular basis by a wide array of prediction schemes and climate centres around the world. This study explores a novel method for EN forecasting. In the state-of-the-art the advantageous statistical technique of unobserved components time series modeling, also known as structural time series modeling, has not been applied. Therefore, we have developed such a model where the statistical analysis, including parameter estimation and forecasting, is based on state space methods, and includes the celebrated Kalman filter. The distinguishing feature of this dynamic model is the decomposition of a time series into a range of stochastically time-varying components such as level (or trend), seasonal, cycles of different frequencies, irregular, and regression effects incorporated as explanatory covariates. These components are modeled separately and ultimately combined in a single forecasting scheme. Customary statistical models for EN prediction essentially use SST and wind stress in the equatorial Pacific. In addition to these, we introduce a new domain of regression variables accounting for the state of the subsurface ocean temperature in the western and central equatorial Pacific, motivated by our analysis, as well as by recent and classical research, showing that subsurface processes and heat accumulation there are fundamental for the genesis of EN. An important feature of the scheme is that different regression predictors are used at different lead months, thus capturing the dynamical evolution of the system and rendering more efficient forecasts. The new model has been tested with the prediction of all warm events that occurred in the period 1996–2015. Retrospective forecasts of these events were made for long lead times of at least two and a half years. Hence, the present study demonstrates that the theoretical limit of ENSO prediction should be sought much longer than the commonly accepted “Spring Barrier”. The high correspondence between the forecasts and observations indicates that the proposed model outperforms all current operational statistical models, and behaves comparably to the best dynamical models used for EN prediction. Thus, the novel way in which the modeling scheme has been structured could also be used for improving other statistical and dynamical modeling systems.


143
Steady state is reached after weak convergence relative to 1e-007. The level component is modeled as a random walk process and is given by where NID(0, σ 2 ) refers to a normally independently distributed series with mean zero and variance σ 2 . The 147 disturbance series η t is serially independent and mutually independent of all other disturbance series related to 148 y t . The initial trend µ 1 is for simplicity treated as an unknown coefficient that needs to be estimated together 149 with the unknown variance σ 2 η . This component is included in the model to account for the long-term fluctuations 150 of the N3.4 time series around its mean, assuming that it is stationary (see Figure 1(a)). A simple interpretation 151 is that the level at the current step is equal to the level in the previous step plus a white noise disturbance 152 (Harvey, 1989). in the model. More specifically, γ t represents the seasonal effect at time t that is associated with season s = s(t) 156 for s = 1, . . . , S, where S is the seasonal length (S = 12 for monthly data). In particular, we adopt the notation if we need to emphasize that γ t represents the seasonal effect for season s.
We use a seasonal pattern that is fixed over time (see Figure 1(b)), i.e. we have a set of S seasonal effects γ 1 , . . . , γ S which are taken as unknown coefficients that need to be estimated together with the other coefficients 160 in the model. The seasonal effects must have the property that they sum to zero over the full year to make sure 161 that they are not confounded with the trend component, that is 162 γ 1 + . . . + γ S = 0, γ t = γ t−S , t = S + 1, . . . , n. ( The seasonal pattern could also change slowly over time, by relaxing the summing-to-zero requirement with a 163 stochastic equation In the present study the disturbance variance ω t =0. In this way we have S − 1 unknown seasonal coefficients 165 that need to be estimated by the Kalman Filter. 166 There is a marked seasonal cycle in the tropical Pacific, which has a substantial impact on the ENSO cycle  time-varying trigonometric process, but with frequency λ c associated with the typical length of a cycle. We have where the discount factor 0 < ϕ ψ < 1 is introduced to enforce a stationary process for the stochastic cycle 182 component. The disturbances and the initial conditions for the cycle variables are given by where the disturbances κ t and κ + t are serially independent and mutually independent, also with respect to 184 disturbances that are associated with other components. The coefficients ϕ ψ , λ c and σ 2 κ are unknown and need 185 to be estimated together with the other parameters. This stochastic cycle specification is further discussed by 186 (Harvey, 1989). 187 Generally, the ENSO oscillation is said to have a period of between two and seven years. As we already 188 pointed out, seasonality has an important role in the evolution of the overall irregular cycle (Tziperman et al., not only for understanding ENSO, but also for its simulation.

Near-Annual Cycle Component, ψ 1 t
Previous studies have identified one significant signal at a frequency close to the annual cycle, recognized as a 195 separate coupled mode of variability of the equatorial Pacific ocean-atmosphere system (Zebiak, 1985), (Neelin,196 1990), (Mantua and Battisti, 1995), (Jiang et al., 1995), (Fedorov and Philander, 2001), (Jin et al., 2003). Its 197 relevance to the evolution of the ENSO cycle and its prediction has been demonstrated especially by (Jin et al.,198 2003), where this mode is called a "near-annual" mode and defined to have a period of 12-18 months. It is 199 characterized by a westward propagation of SST and zonal wind stress anomalies from the eastern equatorial  (Gill, 1985), (Neelin, 1991), and some others. In the present

208
This near-annual mode has been shown to be modulated both by the annual cycle and ENSO. It has been 209 hypothesized by (Jiang et al., 1995) that it is produced through the nonlinear resonance of the low-frequency 210 ENSO mode, which is discussed later, with the annual cycle. Then, its interaction with the main ENSO cycle  Similarly to the fast-frequency component, the biannual oscillation also tends to be phase-locked to the annual

244
The evolution of SSTs in the biannual mode, on the other hand, develops from near neutral anomalies in 245 March-April (Figure 4(a)) to a typical ENSO-like structure in September-October of the first year of the cycle 246 (Figure 4(b)). In the second year, the SSTA pattern is reversed and the opposite phase of the ENSO-like 247 structure develops (Figure 4(d) and (e)). In this way, equatorial negative (positive) SSTAs in the CPAC and 248 EPAC are preceded by easterly (westerly) surface zonal wind anomalies in the WPAC and CPAC (Figure 4).

249
Thus, the estimation of a period close to two years (between 26-30 months; Figure 1

274
The robustness of the analysis about the significance of these modes of variability holds both in the case of 275 using raw data or anomalies (Jiang et al., 1995). Our results confirm this, as the estimated frequencies of the 276 three cycle components were the same regardless of filtering the mean annual cycle (not shown). As can be seen 277 in Figure 2, they are associated with the main frequency signals in the spectrum of the N3.4 time series (also 278 see Figure 1). We then assume that these are fundamental time scales of the ENSO phenomenon, and that its 279 irregularity could be to a large degree explained in terms of the relative phase positions and amplitudes of the 280 three cycle components described here. with constant amplitude and periodicity (Section 3.2, Figure 1(b)), we incorporate regression variables that not 289 only aim to explain more variability in the N3.4 time series, but also to model more subtle seasonal fluctuations.

290
Thus, we extend the decomposition with a multiple regression effect x t δ for t = 1, . . . , n (grey time series 291 in Figure 1(a)), where x t is a K × 1 vector of predetermined covariates and δ is a K × 1 vector of regression 292 coefficients. Elements of δ could also be allowed to change over time. However, as we want to establish stable 293 relationships between the dependent variable and a set of explanatory variables, δ is kept constant for the full 294 sample length or, for at least, a large part of the sample. Therefore, for the present study we have kept the 295 coefficients δ fixed in time, and we will explore the time-varying case in a future study.

296
In very general terms, ENSO represents the alternation of heat buildup and release from the equatorial region 297 (Wyrtki, 1985), (Zebiak, 1989), (Jin, 1997a Table 1). It is interesting to note that one of these regions (Region  Therefore, another critical region for the model is located over the easternmost part of the WPAC and over the 331 CPAC (see Figure 6(c), Region III in Table 1). Over this region we average zonal wind stress to obtain a time 332 series used as a regression variable at short lead times (5-8 months before the typical December peak).

352
In this way, a variable that accounts for this heat buildup and the onset of its eastward migration could be  Table 2). The first set of regions corresponds to the early subsurface warming in the WPAC (between latitudes Table 2). As can be seen in Figure 7  Based on the recharge-discharge oscillatory theory (Jin, 1997a), and as discussed previously, the buildup of heat 384 at the equatorial Pacific is a prerequisite for El Niño, and the event itself discharges this excess heat poleward.
In addition, the time between two consecutive events could generally be determined by the time needed for the 386 tropical ocean to accumulate enough heat, and normally the amplitude of the event is in a direct proportion to  (Tables 3, 4, 5 and 6 in Appendix B, (Harvey, 1989)). The core of the model 401 described in Section 3 is kept constant, but the regression variables given by x t δ are varied depending on the 402 time before the peak when a forecast is started.

403
As seen in Tables 7 and 8 Niño is clearly present even at these very long lead times. The 21 months ahead predictions (beige curves in 423 Figure 9(a)-(e)) already represent the events very well in terms of timing and amplitude. It is important to note 424 that at 26 months lead the peak of the 1997/98 EN is predicted to reach an anomaly of +2 • C (dark green 425 curve in Figure 9(a) and blue curve in Figure 10, Table 9), and that an event of extreme magnitude is foreseen 426 more than two years in advance, long before the appearance of a series of westerly wind bursts in early 1997, 427 assumed by some studies to be the prime reason for the strength of this El Niño (McPhaden and Yu, 1999).

428
This is in support of the theory proposed by (Chen et al., 2004) that the evolution especially of the big events 429 is to a greater extent determined by the initial condition of the ocean, and the atmospheric forcing is rather 430 a secondary modulator or consequence. In addition, the 2014/15 weak EN is forecast with high accuracy 24 months ahead (dark green curve in Figure 9(e), black curve in Figure 10,  (Glantz, 2015).

434
Medium-range (10-19 months in advance; red, blue and green curves in Figure 9) forecasts of the same events 435 are depicted in Figure 9(f)-(j). It is evident that for some of the events the forecast improves as the lead time  Table 9), and the consistency of the forecast indicating a minor event (black curve in Figure 10) instead of a 440 major one.

441
As expected, the overall performance of the model is improved at short lead times (3-6 months; dark blue and 442 dark yellow curves in Figure 9(k)-(o)). The peak of the temperature anomaly is already accurately forecast in 443 all 5 cases (Table 9, Figure 10). An interesting feature is that most events (the 1997/98, the 2002/03, and the 444 2009/10) are better forecast 12-14 months in advance than 9-11 months ahead (velvet curves in Figure 9(k)-(o), 445 blue, red and yellow curves in Figure 10, Table 9). We relate this issue to the fact that 8-9 months before a 446 typical boreal winter peak is the time of the spring barrier, when the signal to noise-ratio is lower and when 447 most models tend to lose their predictability skills (Barnston et al., 2012). This result confirms that some El

448
Niños are more reliably forecast at medium-and long-lead times (Izumo et al., 2014), rather than at shorter 449 ones, something that could seem counterintuitive and calls for further investigation.  (Figure 11(b)). With its current formulation, however, the cold anomalies The results presented in the previous section suggest that the structural time series model (Section 3) described 507 in this study outperforms the statistical ENSO models included in the cross-validation exercise, and it is com-508 parable to some of the most accurate dynamical models. Our prediction scheme successfully goes beyond the 509 spring barrier, and also through the one after that, which is an implication for the much longer predictability Pacific, and zonal wind stress; and it does not require extensive computations. Therefore, our model appears to 518 have a clear advantage over the more complicated statistical schemes, as well as over the computationally more 519 expensive dynamical models. We would also like to discuss one important advantage that is especially due to 520 the state space approach applied here.

521
Two of the forecast events in Figure 9 were unusual in the respect that the frequencies of the three cycle  The other event, for which the frequencies of the cycle components were estimated differently was the promi-542 nent 1997/08 event. In the case of this El Niño, which is also the most extreme on record, the quasi-biennial  The model has been tested and cross-validated through the retrospective forecasts of the El Niños events in the recent period 1996-2015, and it has successfully predicted all the events that occurred at both long lead 579 times (two years and a half in advance), as well as at shorter lead times before and after the spring predictability 580 barrier. Essentially, the model has accurately forecast a moderate EN for the end of 2104/beginning of 2015, 581 unlike the majority of the operational models that warned of an event of substantial magnitude similar to that 582 of the 1997/08 one.

583
A direct conclusion of high relevance is that the predictability limit of ENSO could be extended to at least 584 two and a half years before the El Niño peak. Our study also clearly demonstrates that there is still room for 585 improvement in the prediction of ENSO, not only in the forecasting of the cold phase of the oscillation, but also 586 in that of EN itself. One immediate avenue for future work should be the inclusion of explanatory covariates 587 especially aimed at improving the prediction of LN events, by taking into account the La Niña specific dynamics.

588
Another area for improvement would be the transformation of the modeling scheme, so that all the important 589 cycles discussed here -the additional quasi-quadrennial cycle, the decadal cycle, as well as independent near-  Table 2). Data is  Table 2). . . 34   The linear Gaussian state space model form that we have used is as in (de Jong, 1991), that is:  where 0 k is a k × 1 vector of zeros, I k is a k × 1 vector of ones, p(i) = cos λ i and q(i) = sin λ i for λ i = 2πi S , 667 i=1,2,..., S 2 ; c(j) = ϕ ψ,j cos λ c,j and s(j) = ϕ ψ,j sin λ c,j for j = 1, 2, 3.
Every explanatory regression variable that has been used in the analysis (surface temperature, subsurface 671 temperature at different depth levels and regions, and zonal wind stress at different regions) has been tested 672 separately with the model described in Section 3 during the fitting procedure. In this way each variable was        Table 9 Predictions of the January target month for all EN events shown in Figure 9. Given in brackets is the probability that the respective value would occur based on a kernel normal probability density estimation of the N3.    Table 2). Data is filtered using a low-pass Butterworth filter (cut-off frequency 18, order 10).  Table 2).   Table 9, and lead month 0 corresponds to the observations. Bulletin of the American Meteorological Society.