Pairwise Markov models for stock index forecasting

Common well-known properties of time series of financial asset values include volatility clustering and asymmetric volatility phenomenon. Hidden Markov models (HMMs) have been proposed for modeling these characteristics, however, due to their simplicity, HMMs may lack two important features. We identify these features and propose modeling financial time series by recent Pairwise Markov models (PMMs) with a finite discrete state space. PMMs are extended versions of HMMs and allow a more flexible modeling. A real-world application example demonstrates substantial gains of PMMs compared to the HMMs.


I. INTRODUCTION
Stock market prediction remains a significant challenge of modern risk management theory and practice.Universally acknowledged features of financial time series include volatility clustering, autocorrelation in returns and asymmetric volatility phenomenon (AVP).A well-established methodology consists in using a mathematical model to describe available data and to project it into the future.The autoregressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroscedasticity (GARCH) models are popular among practitioners.These models are reviewed in [1].The GARCH model describes the volatility clustering in the data and some of its variants describe the AVP as well, while the ARIMA model describes autocorrelation in returns.Alternative techniques include artificial neural networks [2], fuzzy logic [3], support vector machines classifiers [4] and their combinations.
In recent years, there was an increasing interest in the regime-switching models, reviewed e.g. in [5].In financial markets, these models allow identifying bull and bear alternating regimes.A bull state is characterized by positive expectation of log-returns and low volatility, while a bear state is driven by negative expected log-returns and high volatility.The hidden Markov models (HMMs) provide a suitable framework for modeling regime-switching.An important example of such framework is available in e.g.[6].These models use a hidden sequence of the same length as the sequence of observed log-returns.The HMMs are known to be robust and straightforward to implement.However, the HMMs do not take the following potential features of stock dynamics into account: • (F1): log-returns may be correlated conditional on the state variables; • (F2): the future state and current log-return may not be independent conditional on the current state.The Pairwise Markov models (PMMs) are introduced and studied in [7] as a general statistical concept.Particularly, they are able to include both features (F1) and (F2) in the HMMs for the same processing cost.
The purpose of the paper is to introduce a modeling of financial time series with the PMMs.Specifically, we investigate if the PMMs can allow improving forecasting performance and if both features (F1) and (F2) should be taken into account.Throughout this paper, we assume that the state space is finite discrete in both HMMs and PMMs.
The paper is organized as follows.In Section II we recall the Hidden and Pairwise Markov models.Section III is devoted to modeling stock dynamics with the PMMs and to related estimation methods.Section IV contains experiments on realworld data and Section V is a discussion of the results.Section VI concludes the paper and presents perspectives for further researches.

II. MODELS Let
.N be a time series and Ω be a finite discrete set.The idea is to describe the probability distribution of Y 1..N by using a hidden time series R 1..N , where for each n in {1, .., N }, R n ∈ Ω.Specifically, one defines the probability distribution p (r 1..N , y 1..N ) of the pair (R 1..N , Y 1..N ).In this case, we have Both HMMs and PMMs are used to define p (r 1..N , y 1..N ).
In this section we recall the definition and statistical properties of these models.

A. Hidden Markov Models
Any HMM has the following properties: • (P1): R 1..N is a Markov chain;   From the above equation, we see that a PMM is an HMM if, and only if, for each n in {1, .., N − 1} : We also consider two subclasses of PMMs where only one of constraints (1a)-(1b) is relaxed.

III. METHODS
In this section, we introduce a Pairwise Markov modeling of asset log-returns.Specifically, we explain how the PMMs allow modeling features (F1) and (F2) mentioned in Section I. We also outline various types of PMM data processing, such as the state estimation, forecasting and parameter inference.

A. Modeling financial time series
Let S n be the stock price at time n, n ∈ N. The log-return Y n at time n > 0 is defined by ( In the classic Black-Scholes model, the log-returns Y 1..N are assumed to be normally distributed and to have the same mean μ and standard deviation σ.In other words, we have, for each n > 0, where {U n } n>0 are zero-mean, unit-variance independent Gaussian random variables, also known as the standard Gaussian white noise.μ and σ are known as the average return (or drift) and the volatility of the stock.
The HMM allows extending the classic Black-Scholes model by making μ and σ dependent on hidden variables.Let R 1..N be a Markov chain, then let with {U n } 1≤n≤N standard Gaussian white noise variables.
The parameters of this model include the initial state distribution p (r 1 = i) for each i ∈ Ω, Markov chain transition matrix p (r n+1 = j |r n = i ) for each i, j ∈ Ω and the values of the drift and volatility per state {μ(i), σ(i)} i∈Ω .For example, if ω 1 is associated with the bear market state and ω 2 with the bull state, one would expect μ(ω 1 ) < 0 < μ(ω 2 ) and σ(ω 1 ) > σ(ω 2 ).The Hidden Markov modeling of Y 1..N is given by (P1)-(P3) and where N(., .)denotes the normal probability distribution with specified mean and variance.
The PMMs provide a more flexible framework than that of HMMs.In order to fulfill the requirement (F1) presented in Section I, we define a first-order autoregressive model of Y 1..N conditional on R 1..N .We set (5) where n > 0, U 1 , {V n } n>0 are standard Gaussian white noise variables and for each i, j ∈ Ω, |ρ(i, j)| < 1.
As regards the feature (F2), we make R n+1 dependent on Y n conditional on R n by using the concept of the logistic function.Specifically, in the case where Ω has only two elements {ω 1 , ω 2 }, we set where Finally, we combine (3), ( 5) and ( 6) to define a pairwise Markov modeling of Y 1..N : p (y 1 |r 1 ) = N μ(r 1 ), σ 2 (r 1 ) ; (7a) The parameters of this model are where π(i) = p (r n = i).This model is presented for Ω = {ω 1 , ω 2 }, but one can consider a more general definition by using the multinomial logistic function, as explained in [8].

B. State estimation and forecasting
Real-time processing of incoming data {Y n } n>0 in a PMM involves determining p (r n |y 1..n ), known as the filtering distribution.Algorithm 1, derived in [7], allows computing the filtering distribution with a complexity linear in n.

Algorithm 1. Filtering in PMMs
• Consider, ∀n > 0, r n ∈ Ω, α n (r n ) = p (r n , y 1..n ); • Initialization: ∀r 1 ∈ Ω, α 1 (r 1 ) = p (y 1 |r 1 ) p (r 1 ); • Recursion: Given {α n (r n )} rn∈Ω and y n+1 , compute, ∀r n+1 ∈ Ω, The filtering distribution is given by Forecasting consists in computing p (y n+1..n+p |y 1..n ) for p > 0. An important case of forecasting is the one-step-ahead forecasting, for which p = 1.In this case, it is also particularly important to forecast Z n+1 , where Z n+1 represents the direction of the stock price change during the day n + 1.The anticipated price change at n + 1 given the information available at n is defined by and as the mean of the mixture, that is Contrary to the one-step-ahead forecasting, there is no apparent closed-form expression for p (y n+1..n+p |y 1..n ) in the case of multistep forecasting in PMMs.

C. Parameter estimation
Let N > 0, Y 1..N be an observed time series of log-returns.The goal of a PMM parameter estimation is to infer the parameter vector θ (8) from the observed data Y 1..N .
The Expectation-Maximization (EM) and the Iterative Conditional Estimation (ICE) are well-known parameter estimation algorithms and are similar to the maximum likelihood estimation.These algorithms are well suited for both HMMs and PMMs, and the details may be found in [9].
Alternatively, θ can be estimated by using the principle of empirical risk minimization (ERM).Several methods for proving consistency of such estimators are provided in e.g.[10].Let us recall the general idea of the ERM.Assume a training set (x 1..N , y 1..N ) in (X ×Y) N , a prediction function h : X → Y and a loss function L : Y × Y → R + .The empirical risk associated with the prediction function h is defined as Thus, the idea of the ERM is to find a function h for which the risk is minimal.
Regarding the context of forecasting, we have x n = y 1..n and h(x n ) = y θ n+1|n (y 1..n ), where y θ n+1|n (y 1..n ) is computed from θ and y 1..n by (11).We consider the following loss functions: Let λ > 0, the following risk function realizes a trade-off between R 1 (θ) and R 2 (θ): In our study, we estimate θ by minimizing ( 14) for various values of λ.There is no closed expression known for the corresponding update equations and we solve the optimization problem by the particle swarm optimization (PSO).PSO methods [11] are non-convex global optimization algorithms.

IV. EXPERIMENTS
Let us present our methodology to compare the efficiency of PMM with that of HMM on historical stock quotes.Given a data set H = {y 1 , .., y M } with successive daily log-returns of an asset E, we split H into two juxtaposed sets as follows: The first set is used to estimate the parameter θ by minimizing (14) for a given λ, while the second set serves only to assess the efficiency of each model considered.The models are compared in terms of the outcome produced by the following trading system.At the beginning of each day n + 1, N ≤ n < M, the system buys asset E only if the one-day-ahead forecast (10) produced by the model is positive, i.e. if z n+1|n = 2, and sells the asset at the end of the day.In the case of a negative forecast, the system avoids any trading operations on E. Next, we compute the absolute return of the system on H test and compare it with that of the asset.Let us recall that the absolute return of E relative to date N is defined as for n ≥ N .Equivalently, τ (n; N ) can be written as a function of the log-returns: Thus, the absolute return of the trading system considered can be written as We apply this methodology to Cliffs Natural Resources Stock prices (NYSE:CLF).Stock quotes are taken from the Yahoo!database and correspond to the business days from 01/02/1990 to 12/13/1993 for H training and from 12/14/1993 to 09/29/1994 for H test .In this configuration, the size of H training is N = 1000, the size of H test is 200 and the total size of the data set H is M = 1200.In every experiment, the state space consists of only two elements.Figures 3 and 4 display the values of risks R 1 (θ) and R 2 (θ) cf.(13) for θ minimizing (14), in function of λ.Absolute returns generated by four models on the test set are given in Table I for various values of λ.Let us make several brief observations.Figures 3 and 4 are consistent with the definition of θ as the minimum of (14).When λ increases, R * 1 (λ) = R 1 (θ) We can see from Figure 5, that PMM-F1 implies a more risk-adverse trading strategy than that of HMM, and the related generated return increases almost monotonically.However, PMM-F1 may not be well suited for a bull market.PMM-F2 and HMM appear to be better suited for bull dynamics, while PMM-F2 seems to be less vulnerable than HMM to abrupt drops of asset value.

V. DISCUSSION
We proposed a meaningful parameterization of PMM for modeling financial time series.The results show that both features (F1) and (F2), mentioned in Section I, can be captured by PMMs, which was expected.One can intuitively understand why using the feature (F1) should improve forecasting, while (F2) is more difficult to interpret.Suppose for example that during the bull state, the return Y n appears to be excessively negative compared to the average return of the bull market.In this case, the current state may become fairly uncertain in an HMM, i.e. p (r n = ω 1 |y 1..n ) ≈ p (r n = ω 2 |y 1..n ).The PMM incorporates (F2) by using the distribution p (r n+1 |r n , y n ) which allows to decide to which extent Y n should affect the expectation of R n+1 .
Table I indicates that the outcome produced by each model is sensitive to the value of λ.In general, such a parameter should be chosen by a cross-validation procedure accordingly to the application considered.
Our experiments indicate that a more complex structure of PMMs may allow identifying better suited regimes for specific application.We believe that the presented way of use of the flexibility of PMM will allow overcoming principal constraints of HMMs.
This study has several limitations.Firstly, we assume only two regimes in our models.Next, the Gaussian mixture density and non-Gaussian heavy tailed observation distributions could be considered as well.We consider only closing price per day, while daily opening, low and high prices are also available as well.Finally, our study concerns only one period of stock prices and only one stock was used in the experiment.An upcoming research article will contain more extensive experiments and deal with the outlined points.

VI. CONCLUSION
The paper introduces a Pairwise Markov model for financial time series, obtained by incorporating features (F1) and (F2), mentioned in the Introduction, into the classic Hidden Markov model.The results show that both of these features contribute to improving the performance of the model in some applications.Let us mention the triplet Markov models, which allow, in particular, dealing with the mixture observation distributions and semi-Markovian hidden process simultaneously [12].Such general models models are also potentially capable of improving the outcome of the classic HMMs.

Figure 2
Figure 2 presents directed dependency graphs of PMM-F1 and PMM-F2.In practice, one should specify the families of distributions to which p (y n |r n ), p (r n+1 |r n , y n ) and p (y n+1 |r n , r n+1 , y n ) belong to.
R 2 (θ) increases, and vice versa, and this holds for the four models.Progressive inclusion of features (F1) and (F2) in the HMM improves both risk values computed on H training , as expected, independently of the value of λ.