Shrinkage Parameters for Each Explanatory Variable Found Via Particle Swarm Optimization in Ridge Regression

Ridge regression method is an improved method when the assumptions of independence of the explanatory variables cannot be achieved, which is also called multicollinearity problem, in regression analysis. One of the way to eliminate the multicollinearity problem is to ignore the unbiased property of  . Ridge regression estimates the regression coeffi cients biased in order to decrease the variance of the regression coeffi cients. One of the most important problems in ridge regression is to decide what the shrinkage parameter (k) value will be. This k value was found to be a single value in almost all these studies in the literature. In this study, different from those studies, we found different k values corresponding to each diagonal elements of variance-covariance matrix of  instead of a single value of k by using a new algorithm based on particle swarm optimization. To evaluate the performance of our proposed method, the proposed method is fi rstly applied to real-life data sets and compared with some other studies suggested in the ridge regression literature. Finally, two different simulation studies are performed and the performance of the proposed method with different conditions is evaluated by considering other studies suggested in the ridge regression literature.. Research Article Shrinkage Parameters for Each Explanatory Variable Found Via Particle Swarm Optimization in Ridge Regression Eren Bas1*, Erol Egrioglu1 and Vedide Rezan Uslu2 1Department of Statistics, Faculty of Arts and Science, Forecast Research Laboratory, Giresun University, Giresun, 28200, Turkey 2Department of Statistics, Faculty of Arts and Science, University of Ondokuz Mayis, Samsun, 55139, Turkey Dates: Received: 02 December, 2016; Accepted: 30 December, 2016; Published: 31 December, 2016 *Corresponding author: Eren Bas, Giresun University, Faculty of Arts and Science, Department of Statistics, Gure Campus, Giresun, Turkey, Tel: +90 454 3101400; Fax: +90 454 3101477; Email:


Introduction
The functional relation between a dependent variable and more than one independent variable is examined by multiple regression analysis. The purpose of the multiple regression analysis is the creation of the best model that can predict the dependent variable by using the independent variables. For this purpose, the most common method to create the best model is ordinary least square (OLS) estimates method. In this method, the sum of error squares to be minimal is calculated to predict the parameters of the model.
There are some valid assumptions for the implementation of the multiple regression analysis. These are; the absence of multicollinearity problem among independent variables, the variance of error term must be constant for all independent variables and the covariance between error term and independent variables must be equal to zero.
One of the major problems in multiple regression analysis is multicollinearity problem. If there is a full or high degree linear relationship among independent variables, this situation is called as multicollinearity. Besides, multicollinearity has some important effects on OLS estimates of the regression coeffi cients. In the presence of multicollinearity, the OLS of regression coeffi cients have large variance. And also, the regression coeffi cients can be estimated incorrectly and the standard errors of regression coeffi cients can be found as exaggerated in the presence of multicollinearity. If the regression coeffi cients can be estimated incorrect, it can be obtained incorrect results statistically. Therefore, ridge regression method is used to obtain stable coeffi cient estimates for the estimation of the regression coeffi cients. That means, ridge regression has been suggested to overcome the multicollinearity problem.
In the literature, it is commonly accepted that if the variance infl ation factors (VIF) values are greater than 10 there is a multicollinearity problem. This is a rule of thumb and this is not exact information. Similarly, condition number can be used to determine multicollinearity problem by using rule of thumbs. As a result of, determining of multicollinearity problem can be realized by using some criteria.
The two methods most commonly used to determine the effects of multicollinearity problem are VIF and condition  (2) In this Equation,  shows the eigenvalues of . ' X X the relationship between condition number and multicollinearity is given in Table 1.
In summary, the determining of multicollinearity problem can be done by following two rules of thumbs. The fi rst one is that if VIF values are greater than 10 multicollinearity is high. The second one is checking condition number as given in Table 1.
In addition, another problem in ridge regression is fi nding optimal biasing parameter (k) value. This k value is a very small constant determined by the researcher [1]. Several methods were proposed for fi nding it in the literature. These methods have been proposed in the studies of .
And also, there are many methods in the literature for ridge regression [23][24][25][26][27][28][29]. And also, [30] proposed some new methods that take care of the skewed eigenvalues of the matrix of explanatory variables. [31] Proposed an iterative approach to minimize the mean squared error in ridge regression. [32] Proposed new ridge parameters for ridge regression. [33] Proposed an optimal estimation for the ridge regression parameter. [34,35] Proposed some new estimators for estimating the ridge parameter.
This k value was found to be a single value in almost all these studies in the literature. But in this study, we found different k values corresponding to each diagonal elements of variancecovariance matrix of  instead of a single value of k by using a new algorithm based on particle swarm optimization.
The rest part of the paper can be outlined as below: The second section of the paper is about ridge regression. The methodology of the paper is given in Section 3. The implementation of our proposed method is given in Section 4. Two different simulation studies are performed under the title of simulation study and fi nally, discussions are presented in Section 6.

Ridge regression
Ridge regression is a remedy used in the presence of multicollinearity problem and it was fi rstly proposed by [1]. Ridge regression method has two important advantages according to OLS method. One of them is to solve the multicollinearity problem and the other one is to decrease the mean square error (MSE). The solution technique of ridge regression is similar with OLS. Besides, the difference between ridge regression and OLS is the k value. This k value is also called as biased parameter or shrinkage parameter and it takes values between 0 and 1. This k value is added to the diagonal elements of the correlation matrix and thus biased regression coeffi cients are obtained.
The OLS estimates of regression coeffi cients and ridge estimates of regression coeffi cients are shown in the Equations 3 and 4 respectively.
As noted above, ridge regression is a biased regression method. The proof of this situation is shown in Equation 5.
t is clearly seen that ridge estimates of regression coeffi cients   R ˆ are biased estimates. One of the most important points to be considered in the ridge regression is the k value. There are many methods proposed in the literature to fi nd the optimal k value. Ridge trace is one of these methods.
Ridge trace is a plot of the elements of the ridge estimator versus k usually in the interval (0, 1) [1].
The other methods in the literature used to fi nd the optimal k value were given in the Equations 6-14, respectively.
In this paper, for the purpose of comparing the results we just consider the methods of which a brief introduction is given as below.
[2] Suggested another method for fi nding k value which is In this Equation 2  and ˆ are the OLS estimates. This method is called as fi xed point ridge regression method (FPRRM).
[39] Introduced an iterative method for fi nding the optimal k value. In this method k is calculated in Equation 16; (16) In this Equation,   And also, the generalized ridge regression estimator of Hoerl and Kennard [1,40] is given in [41] by following Equations 17-20.
Let  and Q be the matrices of eigenvalues and eigenvectors of   X X ' . In the orthogonal version of the classical linear regression model: (17)  Is the generalized ridge estimator of  . Hoerl and Kennard [1,40], have shown that the values of i k which minimize the MSE of regression coeffi cient are given by And the estimation of i k values can be obtained by using In [41], other estimation formulas for optimum shrinkage parameters are given below.

Methodology
Finding the optimal k value is an important problem in ridge regression. The k values recommended in the literature were given in the previous section. And also, there are some heuristic methods such as genetic algorithms to fi nd the optimal k value in the literature proposed by [18,21]. And also, [22] have found the k value by using particle swarm optimization (PSO). In all these methods suggested in the literature, this k value was found as a single value. But in this study, we found different k values corresponding to each explanatory variable instead of a single value of k by using an algorithm based on particle swarm optimization. And also, this paper is the improvement form of the study of [22].
The objective function of the paper was created by considering both mean absolute percentage error (MAPE) criterion and VIF values at the same time. The aim of the objective function is to fi nd the optimal k values by fi nding the VIF values less than 10 and SSE (sum of square errors) minimum, at the same time. And also, we add a parameter to the second part of the objective function. This parameter can be called as penalty parameter. If the VIF value corresponds to any explanatory variable is bigger than 10 the value of the objective function is increased. This is an effect of the penalty parameter. This is an undesirable result.
The optimization problem in the proposed method can be given in Equation 21.
(p shows the number of explanatory variables.) The optimization problem defi ned as in (21) was solved by using PSO in the proposed method. PSO is a popular artifi cial intelligence technique and it was fi rstly proposed by [42]. The algorithm of the proposed method is given below.

Algorithm
Step 1. The parameters such as pn, 1 c , 2 c etc., are determined. These parameters are as follows: pn: particle number of swarm Step 3. The fi tness function was defi ned as in (21) and the fi tness values of the particles are calculated.
Step 4. Pbest and Gbest particles given in (24) and (25) Pbest is constructed by the best results obtained in the related positions at iteration t. Gbest is the best result in the swarm at iteration t.
Step 5. New velocities and positions of the particles are calculated by using the Equations given in (26) and (27).
Where 1 rand and 2 rand are random numbers generated from U (0,1).
Step 3 to Step 6 is repeated until t<maxt.
Step 7. The optimal   1 2 , , , p k k k  values are obtained as Gbest.

Implementation
The proposed algorithm was applied to two different and well known data sets in order to investigate of the proposed method. These two data sets named "Import Data" and "Longley Data" were used to evaluate the performance of the proposed method. Import data was analyzed by [43]. The  Tables 2 and 3, respectively.
As we can see from Table 2, our proposed method has minimum SSE and MAPE values. And also there is no multicollinearity problem when "Import Data" solved by our proposed method. But, there is a multicollinearity problem when "Import Data" solved by FPRRM and IRRM methods because of the VIF values of these methods are bigger than 10. Although, other methods can give smaller SSE and MAPE values they do not still solve the multicollinearity problem.
Because it is clearly seen that some VIF values of these methods are greater than 10.
As we can see from Table 3, our proposed method has minimum MAPE value when compared with other methods.
But SSE value of our proposed method is not the smallest one.
The SSE value of OLS is smaller than our proposed methods.
But, it is clearly seen that the OLS method has multicollinearity problem when "Longley Data" solved by this method. But our proposed method has no multicollinearity problem.
As a result, fi nding k values for each explanatory variable gives better results than fi nding a single k value. And also, our proposed has no multicollinearity problem.

Simulation study
Two different simulation studies are performed in this section of the paper in order to show the performance of the proposed method in different levels of multicollinearity and standard deviation of error term and the superiority of the proposed method when compared with other methods.

The First Simulation Study:
In this simulation study, the proposed method was compared with ridge regression methods given in [2,22,39] by a simulation study. The number of observations (n) was taken as 100, 500 and 1000; the standard deviation of error term ( )  was taken as 0.01 and 1 and comparisons were made for the total 6 cases. For each case, 1000 data set including multicollinearity problem was created.
The fi rst three independent variables were generated from standard normal distribution as given in Equation 28.
The last two independent variables were generated by using Equation 29. Thus, it is provided to arise multicollinearity problem for the data set by providing a high correlation between independent variables 1 X and 4 X , 1 X and 5 X .
The observations of dependent variable were obtained using Equation 30. So, all the coeffi cients in the regression model are taken as 1.
For each data generated in each case, The most important indicator for the comparison of methods is that VIF and CN would be small. The methods [2] and [39] do not guarantee the solution of multicollinearity problem as seen in the numerical examples. The method [22] and proposed method guarantee that all VIF values are smaller than 10. Therefore, it is suitable to compare the proposed method with [22]     In this simulation study, different levels of standard deviation of error term are also employed. As a result of this simulation study it is clearly seen that when standard deviation of error term value is greater than 1 and >1 the model has very big deviation from linear regression model because MAPE values are obtained about 60 and this value is not suitable. And also, it is clearly seen that in the tables of the simulation study 2, the prediction performance of the proposed is affected quite negatively when standard deviation of error term is increased.

Discussion
There are some valid assumptions to create a model in multiple regression analysis. One of them is that it should not be multicollinearity problem among independent variables.
Ridge regression method is often used in the literature when there is a multicollinearity problem among independent variables.
But, ridge regression has also some problems. One of the most important problems in ridge regression is to decide what      In the future studies, different artifi cial intelligence optimization techniques can be used to fi nd these k values for each explanatory variable.