MODEL SELECTION AND COMPARISON OF TIME SERIES MODELS

P. Mariyappan 1 and P.Arumugam 2 . 1. Research scholar Manonmaniyam Sunadaranar university,Thirnelveli,Department of statistics, Mahabararhi Engineering College, Vasudevanur. 2. Asso. Professor in statistics, Annamalai university, Annamalai Nager, Chidhambaram. ...................................................................................................................... Manuscript Info Abstract ......................... ........................................................................ Manuscript History

The choice of a particular econometric model is not prespecified by theory and many competing models can be entrained. Comparing models can be done formally in a bayesian frame work through so-called posterior odds, which is the product of the prior oods and the Bayes factor between any two models in the ratio of the likelihoods integrated out with the corresponding prior and summarizes how the data favor one model over other. Given a set of possible models, this immediately leads to Posterior model probabilities. Rather than choosing a single model, a natural way to deal with model uncertainty is to use the posterior model probabilities to average out the inference (parameters) corresponding to each of the separable models. This is called bayesian model averaging.
The modern time series analysis, considerable attention is paid to the question of the number of times an individual series must be differenced to achieve stationarity. For the ARIMA (p ,d , q) class of models , introduced by Box & Jenkins (1970), the question then is of the number of unit autoregressive roots in the generating process. Most often in practical applications the choice is between and , though of course choice between and can be made along identical lines by working with the first differences of the original time series.

1102
The most commonly applied approach to this issue is through formal hypothesis testing, with the null hypothesis taken as and the alternative as stationarity in levels . In practical applications, the parametric test of Dickey and Fuller (1979) has been employed far more often than any other though the nonparametric approach of Phillips and Perron (1988) is also used on occasions as discussed by for example, Banerjee (1993), the decision on degree of differencing for individual series is an important first step in a methodology for constructing models linking those series.
The prototypical problem is differentiating between a random walk and a stationary first order autoregressive model. Then, for a time series , the Dickey-Fuller test is based on the least square fit (1) The test statistic is the usual t-ratio associated with ̂ , though the null distribution is nonstandard Dickey & Fuller (1979) More generally, equation (1) is a generated adding term in laggeorst differences to account for autocorrelation beyond first order auto-regression. The decision on the level of differencing is then based on the outcome of a test at a significance level that the user must specify. It is not at all clear that this approach accurately reflects the analyst's priorities about . Indeed, tests where stationarity is the null hypothesis have also been developed (see, for example, Kwiaatkowski (1992), Leybourne and McCabe (1994)). When these tests are applied to the same series, it is common to find that neither null hypothesis-stationarity a unit autoregressive root-can be rejected at the usual significance levels.
It seems natural to explore the Bayesian approach to the comparison of stationary models with those involving a unit autoregressive root, and there has been interest in this possibility in the econometric literature, dating from Sims (1998) and Sims and Uhlig (1991). For the this title we address fundamental issues arising from the practical application of that approach . Our concern is not with asymptotic properties of decision rules, an issue addressed in Phillips and Ploberger (1996), and references therein. Indeed, we view as strength of the bayesian approach that recourse to asymptotics for justification is unnecessary. Instead, we consider inference from the perspective of an analyst with a single time series, requiring posterior odds for a unit root model compared with a stationary competitor.
The practical issues involved can be dealt with through the analysis of the simple case (1), a critical issues is the specification of a prior for the parameter under stationarity. Possibilities that are sometimes adopted are the uniform prior on (-1,1) and the Jeffreys prior Yang, 1994, Uhlig, 1994). It seems to us unreasonable that an analyst could simultaneously hold such vague prior belief while at the same time attaching non-zero probability mass at , as would be implied by testing hypotheses with this null. Presumably the same considerations that lead to the suspicion that "true generating model" is a random walk also suggest the likelihood of large positive values for under stationarity. Intuitively, a prior density function that approaches infinity as approaches one seems more plausible. Accordingly, we explore the use of the beta distribution as a prior specification. It emerges from our analysis that it is incumbent on the analyst to give Careful consideration to what constitutes an appropriate prior for , as this can have substantial impact on the posterior odds for sample sizes commonly analyzed in practice.
A further issue to be faced is the specification of a relatively uninformative prior for the mean of the series, or equivalently for the parameter of equation (1), under stationarity. We show how this issue can be circumvented by carrying out the analysis in first differences of the given series. In deriving posterior odds, we work with the exact likelihood, assuming a Gaussian process. A by-product of our analysis is a demonstration that leads to superior decision rules compared with the usual practice of basing tests on the least squares estimation of (1).
Finally, in section 5 of the chapter we briefly demonstrate how our approach can be extended to more general models. Specifically, we consider the comparison of an ARIMA model with a stationary ARMA competitor.
Much of the published discussion of this problem has concerned the choice of price for and . Schotman and Van Dijk (1991) point out that the lack of identification of in the random walk model means that a uniform prior cannot be used for this parameter. Several different priors are considered by the cited authors, many of whom model an interaction between and . Schotman (1994), in a useful chapter that reviews much of the work in this area, concludes that prior assumptions about the dependence between and have a marked effect on the posterior location of . In many practical applications, however, the analyst may be reluctant to specifyan informative prior on . Indeed, it is difficult to contemplate a situation where the analyst simultaneously feels able to specify a sharp prior for while entertaining non-zero prior probability for the random walk model where that parameters is undefined. Our concern here is with the case where the analyst's inclination is to use an improper perior for . This causes no difficulty in deriving a posterior for under the stationary autoregressive model. However, when improper priors are used for parameters occurring in one model and not the other posterior odds ratios are undefined. (See O'Hagan, 1995, for a general discussion and Schotman & Van Dijk(1991), for analysis in the context of the present problem). In our approach we remove this as a problem by formulating both of the above models in terms of the first differences so that the random walk model and first order autoregressive model we consider here are Given a sample , the bayesian comparision of the two models proceeds by computing the posterior model probabilities, which are given by Bayes theorem as ⁄ ⁄ ∑ ( )

… (4)
In (4), is the probability assigned to model ⁄ ∫ ∫ ⁄ ⁄ is the integrated joint density of , Where ⁄ is the joint prior density for the parameters, and ⁄ is the likelihood. and then ⁄ ∫ ⁄ .. (5) It is straightforward to show that the integrated joint density in the case of is given by Equation (4)  [∑ ] We note here that if the conjugate prior

{ }
Is adopted for , we obtain the same expression for ⁄ but now with It should be noted that the analysis of first differences involves no information loss about the parameter of the autoregressive model compared with an analysis of levels with an improper prior on To see this, suppose that inference is based on and consider the transformation, with Jacobian one, to . Then, we have ⁄ ⁄ ⁄ ⁄

1105
Now, only appears in ⁄ , which necessarily takes the form For particular function and that need not be specified. It then follows on integrating out that Hence the posterior density of is the same whether it is based on the levels ,What is lost in working with first differences is the opportnuity to specify have prior on or more generally on . We have noted, a considerable debate has ensured over the choice of prior for .For example ,Berger & Yang (1994) considered reference priors, Poirier(1991) considered a proper Gaussian prior. All of these authors allocated some prior probability to . Our view is conservative in that we believe, along with Schotman & Van Dijk(1991), Sims(1991) and Kwiaatkowski(1992), that evidence of explosive roots is indicative of alternative type of model not considered here. In any event we believe that explosive behaviour of the type implied by allowing some prior probability for is not seen in time series arsing in practice and so we explicitly exclude this type of behaviour by only considering priors for | | Another important consideration is how much prior information is actually available for Can uniform prior really be considered a sensible choice if the investcator seriously beliveves that a random walk model could provide a reasonable explanation of the behaviour of a time series?. It becomes to use to be inconsistent to attach non-zero perior probability mass to on the one hand, while at the same time adopting a uniform prior when on the other.Surely, if is a likely value, then close to 1 is a perior more plausible than distant from 1.We must bear in mind that acceptance of a uniform prior implies that the perior belief must be plausible.In what follows we use for purposes of comparision the uniform perior for , together with two, sharper, beta periors with densities | |

Numerical Study:-
We generated 1000 samples of 100 observations from first order autoregressive processes with = 1, 0.95, 0.9, 0.85 and 0.8, the empirical distributions function of ⁄ for each of the three priors for each value : We also computed the proportion of the samples for which ⁄ ; that is the proportion of the time that the rand m walk model would have been chosen. The results are presented in Table 1.1.
It is clear that, when the "true" process is stationary autoregressive, the very sharp prior, beta 2, performs extremely well and would clearly be the best of the three in such a situation. However, the opposite is the case for a random walk generating process. The sharp prior weight to those values of that are very close to 1 than dose the uniform prior. The random walk is therefore a much more plausible model for the "true" random walks in the uniform prior case. The message here is clear; priors matter in this case and the investigator should think carefully before blindly following a "formula" of noninformative or uniform priors for , which clearly favour the unit root model even when the process generating the data is stationary with Of course, appropriate prior specification does require careful thought. For example values of very close to one are a prior more plausible for high frequency data then for low frequency data.
The shapes of the empirical distribution functions are interesting. For the beta2 prior, these functions exhibit very steep growth around the median. The implication is that only very rarely when this prior is adopted will the posterior odds favoring a particular model be either very high or very low. This relative posterior uncertainty results, of course, from the shape of The beta 2 prior. With 100 observations it would be difficult to distinguish with great certainty between a random walk and a stationary first order autoregressive model in which the parameter was drawn at random from the beta2 distribution. Somewhat greater posterior certainty can result if the alternative to a random walk is a stationary first order autoregressive model with parameter drawn from the more disperses beta1 distribution. The results of table 1.1, clearly demonstrate that, for sample of 100 observations, the posterior odds favouring a particular model, and 1106 the outcome of any decision rule based on those odds, can strongly depend on the prior for the autoregressive parameter. The dependence of posterior on prior is sometimes viewed as a shortcoming of the Bayesian approach, compared with alternative "objective" approaches to statistical inference. We take the opposite stance and claim that the flexibility in specifying a prior is strength in that approach to our problem. In effect, the prior for the parameter is a part of our model . The comparison is then between a random walk model and a stationary first order autoregressive model for which nature first draws the autoregressive parameter from some distribution. An analysis who believes that this parameter is more likely than not to be close to one could and should incorporate this belief into prior distribution. We feel that attaching labels such as "objective" and "uninformative" when analysis based on a prior such as the uniform is misleading. Indeed, such a prior carries the "information" that the autregression viewed as an alternative to the random walk is a model in which the autoregressive parameter is just as likely to be negative as positive. It seems absurd to simultaneously hold this view while attaching probability mass to the belief that this parameter is exactly one. The classical hypothesis testing parading does not truly provide an objective criterion for deciding between models. Indeed, in the context of our problem, that approach is tantamount to imposing a uniform prior for , and manipulating the prior odds for two models to achieve a particular desired significance level.
Although our interest is not in the usual hypothesis testing framework, the results in Table 1.1 can be interpreted within that framework. In that context, the results for the uniform prior are interesting. The implication is that our decision rule has the properties of a set of size 0.039, and power 0.446 when .The usual Dickey-Fuller test of the random walk null against a stationary first order autoregressive model is based on least squares estimation o the autoregressive model. However, it has recently been recognized that tests of considerably more power can be achieved through alternative estimators, including maximum likelihood. Interesting simulation results are given by Pantula (1994). In this chapter for samples of 100 observations, with , it is reported that the usual Dickey-Fuller test at the 5% level has power 0.311. While a test based on maximum likelihood estimation has power 0.644. It appears from the results of table 1.1 that our Bayesian approach also captures these powers gains over the usual Dickey-Fuller test. Presumably that outcome arises through our use of the exact likelihood function of model .

ARIMA
versus Stationary ARMA :-The approach of the previous section can extended to allow comparison of an ARIMA model with a stationary ARMA model. As one possible generating process, we consider the ARIMA model … (6) where And B is the backshift operator. It is assumed that and are given, and that the conditions for stationary and invariability are satisfied. The stationary alternative is the ARMA (p+1, q) model This reduces to equation (6) when . Again, we work with first differences ; so from equation (6)  The posterior model probabilities again follow from equation (4), where now Where denotes the vector of parameters( ). The likelihoods can be determined in terms of the elements of , either through the technique in Newbold (1974), or the computationally more efficient algorithm of Ansley (1979).We propose for the prior density ⁄ The same principles as in the previous section apply to the prior for , while as we saw there either a no informative or conjugate prior can be employed for . We would propose a no informative prior for the autoregressive and moving average parameters of equation (6) following, for example Box & Jenkins (1970), Monahan (1983), and Marriott & Smith (1992). This seems reasonable in practice, as typically one would expect the analyst to have little genuine prior information about these parameters. Of course, the same priors should be applied to as to , expect that in the former is taken to be one. Finally, we not that the general form of the ARMA likelihood is

⁄ [ ]
So that analytic integration over in equation (8) is possible. The numerical integration over can easily be accomplished using the Bayes integration rules employed by Marriott & Smith (1992).

Conclusion:-
If often occurs in empirical work that an investigator wants to assess the strength of evidence in the data for particular stationary model compared with a model with a unit autoregressive root. The use of Bayes theorem to compute posterior odds provides a natural and attractive mechanism for such model comparison. Presumably this issue arises when an investigator believes that an autoregressive root is either large and positive, or precisely equal to one. The Bayesian approach allows such prior belief to be incorporated into the analysis. We have demonstrated how Bayesian calculations can be carried out, noting the importance of the analyst giving careful thought to the question of what might be an appropriate prior. Of course, one possible strategy is to compute model posterior probabilities for a range of priors. This should allow the analyst insight into the impact of the prior, and the extent to which data are able to distinguish between, for example, a random walk and a first order autoregressive model whose parameter is drawn from a distribution with substantial mass close to one. Such insight is of course not possible from the testing of a null hypothesis at some arbitrary significance level. As a by -product of our investigation, we noted the superiority of decision rules when the exact likelihood function is employed to calculate posterior probabilities, and strongly recommend the use of exact likelihood for this problem, whether a Bayesian or classical hypothesis testing approach is used.