Circular Complex-Valued GMDH-Type Neural Network for Real-Valued Classification Problems

Recently, applications of complex-valued neural networks (CVNNs) to real-valued classification problems have attracted significant attention. However, most existing CVNNs are black-box models with poor explanation performance. This study extends the real-valued group method of data handling (RGMDH)-type neural network to the complex field and constructs a circular complex-valued group method of data handling (C-CGMDH)-type neural network, which is a white-box model. First, a complex least squares method is proposed for parameter estimation. Second, a new complex-valued symmetric regularity criterion is constructed with a logarithmic function to represent explicitly the magnitude and phase of the actual and predicted complex output to evaluate and select the middle candidate models. Furthermore, the property of this new complex-valued external criterion is proven to be similar to that of the real external criterion. Before training this model, a circular transformation is used to transform the real-valued input features to the complex field. Twenty-five real-valued classification data sets from the UCI Machine Learning Repository are used to conduct the experiments. The results show that both RGMDH and C-CGMDH models can select the most important features from the complete feature space through a self-organizing modeling process. Compared with RGMDH, the C-CGMDH model converges faster and selects fewer features. Furthermore, its classification performance is statistically significantly better than the benchmark complex-valued and real-valued models. Regarding time complexity, the C-CGMDH model is comparable with other models in dealing with the data sets that have few features. Finally, we demonstrate that the GMDH-type neural network can be interpretable.


I. INTRODUCTION
C OMPLEX numbers are used to express real-world phenomena (such as signal amplitude and phase) and to analyze various mathematical and geometrical relationships. To process directly complex values with artificial neural networks (ANNs), complex-valued neural networks (CVNNs) have been developed [1]- [3]. CVNNs' states, connection weights, and activation functions are complex-valued. Thus, they not only deal with complex-valued signals but also have the following excellent properties [4]- [6]: 1) the average convergence speed is two or three times faster than the realvalued neural networks (RVNNs); 2) the number of required hidden parameters is approximately half that of the RVNNs; 3) the orthogonal decision boundaries of the CVNNs help them to solve classification problems more efficiently than their real-valued counterparts; and 4) the CVNNs have a better generalization ability than the RVNNs. In addition, the CVNNs, as extensions of RVNNs, have some other advantages, such as their strong nonlinear modeling and generalization capabilities.
The first type is a phase-encoded transformation [27]- [36]. First, real-valued input feature x R is encoded to be phase ϕ of the unity magnitude complex number. Next, the complexvalued input feature x is obtained by Euler's formula: e iϕ = cos ϕ + i * sin ϕ. For example, x = exp(i π x R ), where x R is normalized in [0,1], and x is bounded and distributed along the unit circle in quadrants I and II of the complex plane [30].
The second type is a nonlinear transformation [37]- [42]. First, the real-valued input feature x R is mapped to be the complex number by a linear function, cx R + d, where c is a random complex number and d is a constant. Next, a final complex-valued input feature x is obtained by a nonlinear function. For instance, x = sin(ax R + ibx R + α), where x R is normalized in [0, 1], a, b ∈ [0, 1] are the randomly chosen scaling constants, and α ∈ [0, 2π] is used to shift the origin to effectively use the four quadrants of the complex plane [39].

B. Motivation
Previous work significantly advanced the state of the art for the CVNNs and their applications, especially for realvalued classification problems. However, some limitations of the CVNNs remain. First, these models pose difficulties in confirming the optimal training parameters [43], such as the number of hidden layers and the number of neurons in each layer. Thus, the model must be debugged repeatedly to find the optimal parameters. Second, they are prone to overfitting the noise data in the training set [44]. The overfitted neural network model will fit the training data perfectly, but it fails to predict well the unseen testing data. Third, because most ANNs are black-box models [45], so are the CVNNs, these networks cannot be interpretable. Thus, the explanatory power of the modeling results is poor.
In recent years, deep learning has received widespread attention [46]- [49], and the group method of data handling (GMDH)-type neural network, a heuristic self-organizing modeling technology, is perhaps the first deep learning system [50]. To overcome the above shortcomings, the GMDH-type neural network is a good approach. It was first developed by Ivakhnenko [51] as a method of multivariate analysis for modeling and identification of complex systems. In the GMDH-type neural networks, training set T r is randomly divided into two subsets: model learning set A and model selecting set B. The basic idea of this algorithm is to build a multilayer feedforward network structure. It starts with the input layer, generates new candidate models in each layer by combining two models of the previous layer, and estimates the model parameters in A. In particular, the transfer function between the input and output variables can be expressed by the Kolmogorov-Gabor (KG) polynomial [52], [53]. Next, the algorithm uses an external criterion to evaluate and select the middle candidate models in B. Finally, a model with an optimal complexity is found by the termination principle [54], [55]. Therefore, GMDH-type neural networks can automatically determine the number of layers of the neural network, the variables that enter the optimal complexity model, and the model parameters. In summary, GMDH-type neural networks add several advantages to contemporary ANNs [52], [56]- [59]: 1) they have strong antinoise capabilities and generate polynomial equations that are interpretable better than any other ANN model (i.e., white-box models) and 2) they learn the weights rapidly in a single step by standard ordinary least squares (LS), which eliminates the need to search for their values and guarantees finding locally good weights thanks to the reliability of the fitting technique. Thus, this approach can avoid the limitations of traditional ANNs to some extent. In recent years, the real-valued GMDH (RGMDH)-type neural network has been successfully applied in various fields, such as engineering, science, and economics [60]- [66].
Therefore, if an RGMDH model is extended to the complex field, the complex-valued GMDH-type neural network may outperform the RGMDH and avoid the limitations of the existing CVNNs to some extent. Recently, Xiao et al. [67] proposed a phase-encoded complex-valued GMDH (PE-CGMDH)-type neural network for real-valued classification. The experimental results show that PE-CGMDH outperforms RGMDH and other four CVNN and RVNN models.

C. Our Contributions
In fact, several shortcomings of the PE-CGMDH model remain as follows.
1) A complex-valued symmetric regularity criterion (CSRC) was constructed, but the properties of the CSRC were not studied, that is, whether it has sufficient property similarity to the real external criterion [68] or its global minimum value exists. 2) The model used a simple mean square error deviation between the actual and predicted outputs (i.e., regularity criterion) as the external criterion in the complex field. This approach has problems with phase approximation, because it explicitly minimizes only the magnitude error.
3) The phase encoding for the transformation of real-valued inputs may be ineffective, because this phase encoding only restricts the projection to I and II quadrants. To overcome the shortcomings of the PE-CGMDH model, a new model, the circular complex-valued GMDH (C-CGMDH)-type neural network, is proposed in this study. First, the parameter estimation method of the RGMDH model is extended to the complex field. Next, a new reasonable complex-valued external criterion is proposed and studied theoretically to solve the first and second problems. Finally, a circular transformation is adopted to transform the real-valued input features into the complex field to use effectively all four quadrants of the complex plane. The experimental results on 25 real-valued UCI classification data sets show that C-CGMDH significantly outperforms PE-CGMDH and RGMDH, as well as other complex-valued and real-valued models.
The novelty of this article can be summarized as follows. 1) This article provides sufficient theoretical research of the properties of the CSRC, which makes the work technically sound and complete, and proposes a new CSRC (NCSRC) with the logarithmic function to overcome the limitations of the CSRC. 2) The circular transformation is introduced for the first time to construct a C-CGMDH model for overcoming the limitations of phase-encoded transformation.

D. Structure of the Article
This article is organized as follows. Section II describes the model of RGMDH. Section III presents the C-CGMDH model in detail, including the parameter estimation, the construction of the external criterion, and the modeling process. The experiments are shown in Section IV to demonstrate the performance of the C-CGMDH model, including the ablation study, convergence speed, feature selection, classification performance, time complexity, and explanation performance. Finally, the conclusions are summarized in Section V.

II. RGMDH-TYPE NEURAL NETWORK
A model can be represented as a set of neurons by the RGMDH model; different pairs of neurons in each layer are connected with a polynomial to produce new neurons in the next layer. Let and Y R be the realvalued input vector and the actual output, respectively. Given m observations of "multidimensional input, single output" data pairs . . , m) in the model learning set, A R ∈ R, we hope to train an RGMDH model to predict the output valuesŷ R The challenge now is to validate an RGMDH model The transfer function between the input and output variables can be expressed by a complicated discrete form of the Volterra functional series, as follows: which is also known as the KG polynomial [52], [53].
In particular, the linear transfer function only consists of two variables (i.e., features and neurons). We have x R j p , x R j q for different vales of p, q ∈ {1, 2, . . . , n}, p = q, in the form of where a 0 , a 1 , a 2 are calculated with LS [52], [53]. Thus, the coefficients of each linear function, g j , are obtained to optimally fit output y R j in set A R . That is In the basic form of the RGMDH model, any two out of n input variables are combined to construct the regression polynomial in the form of (4). Consequently, C 2 n = n(n − 1)/2 candidate models will be built up in the first layer of the feedforward network from the observations, In other words, it is now possible to construct m data triples, {(y R j , x R j p , x R j q )}, from the observations with p, q in the form . Furthermore, the corresponding matrix equation can be readily obtained as T is the vector of observations, and . The LS leads to the solution of the normal equations, as follows: which determines the vector of the best coefficients of (4) for the entire set of m data triples. The above procedure is repeated for each neuron of the next hidden layer per the connectivity topology of the network. In each layer, the algorithm uses LS to estimate the parameters of the candidate models in the model learning set, A R , and uses an external criterion to evaluate and select the candidate models in the model selecting set, B R . The algorithm continues and stops when it finds the optimal model by the termination principle, presented by the theory of optimal complexity [52]. Along with the increase in model complexity, the value of the external criterion will first decrease and then increase, and the global extreme value will correspond to the optimal complexity model [53]. The modeling process of the RGMDH model is shown in Fig. 1.
III. C-CGMDH-TYPE NEURAL NETWORK By Occam's razor principle, entities should not be multiplied beyond necessity [69]; thus, complex models are not necessarily superior to simple models. Hence, in the proposed C-CGMDH model for real-valued classification, the transfer function between the input and output variables is also described by the linear function in (4), as in the RGMDH model. In the C-CGMDH model, parameter estimation and selection of the external criterion are two key steps. Therefore, in this section, we first extend the LS parameter estimation method of the RGMDH model to the complex field. Furthermore, we introduce complex-valued external criterion construction for the C-CGMDH model, including a relation between the complex-valued external criterion and the complexity of the model. We propose a new reasonable complexvalued external criterion. Finally, we summarize the detailed modeling steps of the C-CGMDH model for real-valued classification.

A. Complex LS Parameter Estimation
In the RGMDH model, all the middle candidate models are trained on the model learning set, A R ; the parameters are estimated by LS. In the C-CGMDH model, we need to adopt a parameter estimation method similar to the one of RGMDH.
To meet this requirement, let A = (Y A , x 1 , x 2 , . . . , x n ) ∈ C m×(n+1) be the model learning set in the complex field and let A contain m training samples. Suppose x p , x q ( p, q ∈ {1, 2, . . . , n}, and p = q) are the two initial variables. The transfer function between the inputs x p , x q and the output Y A can be expressed by (4), as follows: Similarly, the corresponding complex matrix equation can be readily obtained as Then, by the theory of matrices [70], we can obtain a solution to the complex matrix equation, as follows: where (.) * and (.) −1 denote the conjugate transpose matrix and inverse matrix of some matrix, (.), respectively. The rank of D mostly reflects the column number, because the number of the samples is usually much larger than that of the variables. Thus, matrix D * D is mostly nonsingular. Because D * D may be singular, we compute the so-called Moore-Penrose pseudoinverse [71]. The method mentioned in [67] can also be used.

B. Selection of the External Criterion
When we model real-valued systems, different requirements may be proposed, reflecting the aims of the model or prior knowledge of the systems. In a GMDH-type neural network, an external criterion is a mathematical description of these specific requirements that can select the optimal model among the candidate models according to the theory of optimal complexity [53]. Therefore, when we construct a GMDH-type neural network model, the selection of the external criterion is a key issue [52].
In this study, we select the SRC [53] to measure the performance of the middle candidate models. For the training set T R r of the real-valued problems, the SRC has the following form: where y B R and y A R are the actual outputs of the model selecting set B R and the model learning set A R . Next, X B R and X A R are the input vectors of B R and A R . Then,â A R andâ B are the coefficient vectors of the model trained on A and B, respectively. Therefore, 2 (B R |A R ) in (9) indicates the classification error on B R of the model constructed on A R , and 2 (A R |B R ) indicates the classification error on A R of the model constructed on B R . As shown, the SRC uses the information in sets A R and B R equally and considers the error of the model in different parts. Mueller and Lemke demonstrated that the SRC satisfied the theory of optimal complexity [53].
To construct a CSRC, the complete complex-valued training set T r is equally but randomly partitioned into two subsets: model learning set A and model selecting set B. Furthermore, let Y A be the actual output of A, letŶ A be the prediction output on A by the model trained on B, let Y B be the actual output on B, and letŶ B be the prediction output on B by the model trained on A. Then, we define the CSRC as follows: are the prediction errors in B and A, respectively. The next question is whether the CSRC has a good property that is similar to the SRC [53].
To solve this problem, we want to first investigate the relation between the complexity of the model and the former part of the CSRC [i.e., 2 (B|A)], which is also called the complex-valued asymmetric regularity criterion (CARC). Hence, consider (3) and (7) again, and suppose that the number of variables for learning the model (i.e., the complexity of the model) is S − 1; then the model is of the form where l m×1 , and a S = (a 0 , a 1 , . . . , a S−1 ) T ∈ C S×1 . Then, the rank of D S is predominantly S, because the number of the samples is usually much larger than that of the variables.
In the CARC, data set A is used to learn the model first. The estimates of the coefficients are obtained by (8) where H AS = D * AS D AS . Therefore, the estimate of the output in G iŝ where Next, the property of the CARC in the mean sense (i.e., the mathematical expectation) is analyzed in this study to discuss the relation between the CARC and the complexity. In addition, a natural approach is to check the change in the CARC when the complexity increases.
Typically, data contain noise. Hence, we have the following propositions about CARC.
Proposition 1: The expected value of the CARC is related to the model structure and the noise. The proof is given in supplementary material.
Theorem 1: If U is an n × m complex matrix and V is an m × n complex matrix, then tr(U V ) = tr(V U).
Proof: According to the definition of trace, we have Here, we first analyze the special situation G = A and then expand to G = B.
Proposition 2: If G = A, then as the complexity of the model increases; there is a minimum expected value for 2 (G|A). It first monotonically decreases and then monotonically increases. The proof is given in supplementary material.
Proposition 3: If G = B, then as the complexity of the model increases, there is a minimum expected value for 2 (B|A). It first decreases, but the decrease may not be monotonic. Then, it monotonically increases. The proof is given in supplementary material.
Because (10) is symmetrical, there is a similar conclusion to the latter part of CSRC, 2 (A|B), with the increase in the complexity, S. Thus, we can obtain Proposition 4.
Proposition 4: The property of the CSRC is similar to the SRC. The proof is given in supplementary material.
In summary, the CSRC satisfies the theory of optimal complexity [52] and can be used as an external criterion for the CGMDH-type neural network.
However, CSRC is limited: it explicitly minimizes only the magnitude error. For a better phase approximation, we try to improve the CSRC in this study. To minimize simultaneously both the magnitude and phase errors, we use the logarithmic function to represent explicitly the magnitude and phase of the actual and predicted complex outputs where ln(.) represents the natural logarithmic function, m is the sample number of B,

Proposition 5:
The improved CARC [see (14)] satisfies Propositions 1 and 3 of the CARC. The proof is given in supplementary material.
Therefore, we construct an NCSRC as follows: In addition, to facilitate the comparison of experimental results, we first divide the NCSRC by the sample number of A ∪ B (i.e., a constant) and then obtain the square root of the corresponding result. Thus, the final form of the NCSRC used in this study is of the form Importantly, (16) does not affect the monotonicity of the NCSRC about the complexity of the model theoretically by the calculus.

C. Modeling Process
Suppose that a real-valued classification data set, D = {y R , x R 1 , x R 2 , . . . , x R n } ∈ R n+1 , includes two parts: training set T R r and testing set T R e . There are m + m and u samples in T R r and T R e , respectively. Then, the process of the C-CGMDH model for real-valued classification proposed in this study is shown in Fig. 2.
In many real-world classification problems, the data sets are usually imbalanced. Some classes have much more instances than others [72]. The level of imbalance (i.e., ratio of the majority class to the minority class) can be as huge as 10 6 [73]. The class imbalance has a serious impact on the model performance. To solve this issue, scholars have proposed several methods [74], mostly resampling techniques and costsensitive learning. Among those, resampling techniques are used more often, and the commonly used ones are random undersampling and random oversampling. Marqués et al. [75] compared logistic regressions and support vector machine (SVM) with and without resampling on five real customer classification data sets and demonstrated superior performance with oversampling compared with undersampling techniques. Hence, we first adopt random oversampling to balance the class distribution of the training data set T R r in this study. In general, training a C-CGMDH model for real-valued classification problems requires the class labels to be coded in the complex field and the real-valued input features to also be mapped onto the complex space, R m → C m . As for the class labels, taking the training set T R r of a binary classification problem as an example, with labels 1 and −1, coded class label y j in the complex field is given by where y R j ( j = 1, 2, . . . , m + m ) is the real-valued actual output. That is, in the complex field, label 1 and label −1 are L 1 = 1 + 1i and L 2 = −1 − 1i , respectively.
For real-valued input feature x R j k ( j = 1, 2, . . . , m + m , k = 1, 2, . . . , n) of T R r , we adopt the circular transformation in this study, because it performs a one-to-one mapping of the real-valued inputs to the complex field and effectively uses all the four quadrants of the complex plane. It can overcome the issues in the existing transformations [39]. Let r 1 ) and the complex-valued input feature, x j k , is obtained by the circular transformation where a, b ∈ [0, 1], α k ∈ [0, 2π] are all randomly chosen. Next, we train the C-CGMDH model with the complexvalued inputs and outputs until the optimal complexity model y opt is found. Next, we classify the testing set with y opt . The prediction output of the model is a complex number. Thus, the final prediction result (i.e., the real-valued class label) is obtained as follows: whereŷ j is the prediction output by the model y opt , and d(ŷ j , L i ) = lnŷ j − ln L i 2 , j = 1, 2, . . . , u. Finally, the detailed steps of the C-CGMDH model for realvalued classification are shown next.
1) Data Preprocessing: a) Balance the class distribution of the real-valued training set T R r by random oversampling. b) Transform real-valued training set T R r and testing set T R e into complex-valued sets, T r and T e , by (17) and (18).

2) Train the CGMDH-Type Neural Network:
a) Equally divide the complex-valued training set T r into model learning set A and model selecting set B at random, and let n features of A be the base models of the initial layer. b) Set the layer L = 0, F 0 = n, and the smallest external criterion value V = g (g is a large positive number, such as g = 10 6 ). c) Combine every two models of layer L by (7) to generate n L+1 = C 2 F L middle candidate models of layer L + 1 and estimate the parameters by (8) in A. d) Compute the NCSRC values of all candidate models by (16) in B and sort them in ascending order. e) Consider the smallest external criterion value V min of layer L + 1, and if V min ≥ V , then STOP and find the optimal complexity model y opt with the smallest external criterion value in the layer L; else CONTINUE. f) Select F L+1 (≤F L ) models with the smallest external criterion values to enter the next layer. g) Repeat Steps 3)-6) with L = L + 1 and V = V min .

3) Classify the Testing Set:
a) Classify the complex-valued testing set, T e , by the optimal complexity model y opt and obtain the complex-valued classification result. b) Obtain the real-valued classification result by (19).

IV. EXPERIMENTS
We demonstrated the classification performance of the C-CGMDH model on a set of binary and multiple-category classification data sets. First, we did the ablation study to analyze the influence of the external criterion NCSRC and circular transformation on the performance of the C-CGMDH model. Second, we analyzed the convergence speed and feature selection performance of the C-CGMDH, PE-CGMDH, and RGMDH models. Next, we compared the classification performance of the C-CGMDH model with that of existing models, including four CVNNs: PE-CGMDH [67], fully complexvalued relaxation network (FCRN) [39], fully complex-valued fast learning classifier (FC-FLC): AF 1 and AF 2 [41]; and three real-valued classification models: RGMDH [54], SVM, and real-valued multilayer perceptron (RMLP). Then, we analyzed the time complexity of the above eight models. Furthermore, we analyzed the explanation performance of three GMDH-type neural networks on a data set.

A. Data Sets and Experimental Setup
We used 25 classification data sets from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/ datasets.ph-p). Table I summarizes the characteristics of the data sets used in this study. Among them, there are 17 binary classification problems and eight multiple-category ones.
Because the class distributions of most data sets are imbalanced, we defined the imbalanced factor (IF) [73] as follows: IF = max j =1,2,...,M N j /min j =1,2,...,M N j , where M is the number of classes in the data set and N j is the total number of samples belonging to class j. Then, IF ≥ 1, and a larger value of IF means a worse imbalance of the class distribution. If IF = 1, the class distribution is balanced. The last column of Table I is the IF of each data set. Data sets "Seeds" and "Iris" are balanced, whereas data set "Zoo" has the worst imbalance (its IF value is 10.25).
For the parameters in the FCRN, FC-FLC(AF 1 ), FC-FLC(AF 2 ), SVM, and RMLP models, we let their values reflect the best performance from the repeated experiments. The external criteria of the RGMDH, PE-CGMDH, and C-CGMDH models were SRC, CSRC, and NCSRC, respectively. Meanwhile, the eight models did not consider the impact of class imbalance on performance, except for the C-CGMDH and PE-CGMDH models. To ensure fairness of comparison, we balanced the class distribution of the training data set using the random oversampling technique before training the six classification models. Furthermore, we adopted fivefold cross-validation in our experiments to evaluate the classification performance of different models. All experiments were performed on an MATLAB 2016a platform with a dual-processor, 2.40-GHz Core i7 Windows 10 PC. In each case, the result was the average of ten experiments. We implemented the SVM with the radial basis function kernel function and the RMLP with two hidden layers. Finally, regarding the C-CGMDH and RGMDH models, a one-versus-one approach was used to deal with multiple-category classification problems, as in the SVM model.
To evaluate the classification performance of each model, we used the following performance measures.

1) Total accuracy (T ac )
where u is the number of total samples in the testing set and u c is the number of correctly classified samples.

2) Average accuracy (A ac )
where u j is the number of samples with class label j in the testing set and u cj is the number of correctly classified samples with class label j .

B. Influence of the External Criterion NCSRC and Circular Transformation on C-CGMDH Performance
To analyze the influence of the external criterion NCSRC and circular transformation on the performance of the C-CGMDH model, we did the ablation study comparing it with the following three models: 1) PE-CGMDH; 2) PE-CGMDH 1 , which is similar to the PE-CGMDH model, but uses the circular transformation; and 3) PE-CGMDH 2 , which is also similar to the PE-CGMDH model, but uses the NCSRC. Table II shows the T ac of the four GMDH-type models on 25 data sets, and the last row shows the average T ac . For each data set, the boldface marks the highest T ac .
To analyze whether the difference in T ac of the above four models is statistically significant, we used the nonparametric Wilcoxon signed-rank test [76] to perform the pairwise test. The null hypothesis is that the two compared classification models have equivalent performance. In this study, we let R + be the sum of ranks for the data sets, in which the former  In each row, if T = min(R + , R − ) is less than or equal to the corresponding CV, we reject the null hypothesis, that is, there are significant differences between the two models. Furthermore, if T = R − ≤ 89, it means that the former model significantly outperforms the latter, and T = R + ≤ 89, for the opposite case. From Table III, we can obtain the following conclusions.
1) C-CGMDH significantly outperforms the PE-CGMDH 1 , PE-CGMDH 2 , and PE-CGMDH models, which indicates that the C-CGMDH model that simultaneously considers the circular transformation and NCSRC achieves the best performance, and the model proposed in this study is reasonable and effective. 2) PE-CGMDH 1 significantly outperforms PE-CGMDH, and the performance of the PE-CGMDH 2 model is significantly worse than that of the PE-CGMDH model. This indicates that the circular transformation can significantly improve the performance of the PE-CGMDH model. The reason may be that it can use all the four quadrants of the complex plane effectively. Furthermore, the NCSRC degrades the performance of the PE-CGMDH model. The reason may be that it does not conform to the class label prediction method in the PE-CGMDH model, especially in the multiple-category classification problems. We can also obtain similar conclusions for A ac .

C. Convergence Speed of C-CGMDH, PE-CGMDH, and RGMDH Models
Because the convergence speed of the GMDH-type neural networks is usually related to the number of features, we selected data sets with fewer features (such as "Monk3" and "MAGIC") and more features (such as "Australia-c" and "Ionosphere") to comprehensively analyze the convergence speed of the C-CGMDH, PE-CGMDH, and RGMDH models. In addition, to facilitate the comparison of experimental results, we performed the same process as in (16) for the external criteria of the PE-CGMDH and RGMDH models.
The learning curves of the three models in one experiment on the four selected data sets are shown in Fig. 3. Taking "Monk3" as an example [see Fig. 3(a)], the vertical axis denotes the external criterion values, and the horizontal axis denotes the number of layers of the GMDH-type neural networks. According to its modeling mechanism, the maximum number of layers is the number of features of the data set. Hence, the possible maximum number of layers is 6 for "Monk3." The external criterion values for three models are of no practical significance, because they are different. We are more interested when the external criterion value is the minimum, which corresponds to the optimal complexity model. From Fig. 3(a), the optimal complexity model of the RGMDH and PE-CGMDH models is found in the second layer. Meanwhile, that of C-CGMDH is found in the first layer, meaning that the external criterion value of the C-CGMDH does not have a decreasing phase on this data set. It is a special case of the theory of optimal complexity. Similar conclusions can be reached for three other data sets. In short, the convergence speed of the C-CGMDH model is faster than that of the RGMDH and PE-CGMDH models.
Considering that there may be deviations in the conclusions based on only one experiment, we repeated our experiment ten times and found interesting results. In data set "Australia-c," the three models can be ranked according to the convergence speed, from fast to slow: C-CGMDH, PE-CGMDH, and RGMDH. The optimal complexity model of C-CGMDH was mostly found in the first or second layer. Otherwise, it was approximately 0.9 of the probability that the convergence speed of the C-CGMDH model was the fastest. Therefore, according to our experiments, the convergence speed of the C-CGMDH model proposed in this study was the fastest.

D. Feature Selection Performance of C-CGMDH, PE-CGMDH, and RGMDH Models
Feature selection was related to the convergence speed: a faster convergence speed typically meant fewer selected features. In Section IV-C, we compared the convergence speed of different GMDH-type neural networks. In this section, we further analyze the feature selection performance of the three models.
We experimented on the four data sets from Section IV-C. On each data set for each model, we performed feature selection ten times and counted the number of selected features  Table IV. It would be inconvenient to compare directly the results of ten selections for each model. Thus, in the last column of Table IV, we give the mode of the ten experiments for each model. In most of the cases, the RGMDH model selected 9 out of 15 features from this data set. The PE-CGMDH model selected six, whereas the C-CGMDH (the proposed) model only selected three. Table V shows the mode of the number of features selected by each GMDH-type neural network in ten experiments on our selected data sets. On the four data sets, the number of features selected by the C-CGMDH model was typically smaller than that of the PR-CGMDH and RGMDH models. Table VI reports T ac of eight classification models on 25 data sets, and the average T ac value of each model is in the last row. From this table, it can be observed that the average T ac value of the C-CGMDH model is the largest. In "Monk3," compared with the RGMDH model, the improvement of the PE-CGMDH model is approximately 12.45%, whereas that of the C-CGMDH model is approximately 16.25%.

E. Comparison of Classification Performance
To find significant differences among the results obtained by the eight models, a statistical analysis is necessary. Thus, to check whether there are significant differences in the classification performance of the eight models, we employed the Friedman test [77], [78] and the Iman-Davenport test [79]. If there are statistically significant differences in the performance, we can proceed with the Nemenyi test [80] post hoc.
As shown in Table VI, we first conducted the Friedman test to establish the statistical significance of the C-CGMDH model. The rank of performance of different models was calculated for each data set. For every model, we ordered the accuracy from the largest to the smallest with ranks, 1, 2, . . . , 8. If the accuracy of two or more models was the same, we selected their average values as their rank values. For example, in the second data set, "Australia-c," T ac of the five models was 85.51, the largest. Thus, their rank values are the same, (1+2+3+4+5)/5 = 3. Similarly, the average rank of T ac of the eight models on 25 data sets is shown in the last row of Table VI. A smaller average rank means a better model performance. The null hypothesis states that all models are equivalent, and thus, their ranks must be equal. The Friedman statistic is 74.54. A better statistic, derived by Iman and Davenport, following the F-distribution, is 17.81. The modified statistic follows the F-distribution with 7 and 168 degrees of freedom; the CV for rejecting the null hypothesis is 2.06. Because the modified Friedman statistic is greater than the CV (17.81 > 2.06), we can reject the null hypothesis. Thus, it can be inferred that the models used in this study are not equivalent.
After the null hypothesis was rejected, we applied the Nemenyi test. This test assumes that the performance of the two models is considered significantly different if the corresponding average ranks differ by at least the critical difference. When the number of models is equal to 8 and the significance level is 0.05, the CV is 3.03 and the critical distance (CD) is 3.03(8 * 9/(6 * 25)) 1/2 = 2.10. Fig. 4 shows the test results, where the models connected by a line segment are not significantly different. Thus, from Fig. 4, we can identify that T ac of the C-CGMDH model is statistically significantly better than the others.
Similarly, the average accuracy among the classes (A ac ) and their ranks of eight models on 25 data sets are summarized in Table VII. The last row shows the average A ac value and the average rank of different models. The average A ac value and the average rank of the C-CGMDH model remain the best. Furthermore, we can also infer that the performance of the C-CGMDH model is statistically significantly better than the others in A ac , using the Friedman test and the Nemenyi post hoc test.

F. Comparison of Time Complexity
To compare the time complexity of the eight models, we compare the computation time in the same conditions.
Table VIII presents the average computation time for each model on every data set. As in Tables VI and VII, the ranks are marked with superscripts. Furthermore, considering the modeling characteristics of the GMDH-type neural networks, Table VIII is divided into three parts. The last row of each part shows the average computation time and the average rank of different models. From Table VIII, we obtain the following conclusions.
1) When the number of features in the data set is greater than ten (displayed in the first part of Table VIII), the eight models can be ranked according to their average rank, from low to high: FC-FLC(AF 1 ), SVM, FCRN, RMLP, FC-FLC(AF 2 ), RGMDH, PE-CGMDH, and C-CGMDH. The computation time of three GMDH-type neural networks is not ideal, perhaps owing to their modeling mechanism. When modeling with the GMDH-type neural networks, the new candidate model in each layer is a combination of any two models of the previous layer. Thus, its time complexity is high if the classification problem has many features. 2) When the number of features in the data set is between 7 and 10 (displayed in the second part of Table VIII), the eight models can be ranked according to the average rank, from low to high: FCRN, FC-FLC(AF 1 ), PE-CGMDH, RGMDH, SVM, RMLP, C-CGMDH, and FC-FLC(AF 2 ). 3) When the number of features in the data set is fewer than seven (displayed in the third part of Table VIII), the eight models can be ranked according to the average rank, from low to high: FCRN, RGMDH, PE-CGMDH, C-CGMDH, FC-FLC(AF 1 ), SVM, FC-FLC(AF 2 ), and RMLP. Three GMDH-type neural networks are only poorer than the FCRN model. Regarding time complexity, the C-CGMDH model is comparable with others when dealing with the data sets having few features.

G. Explanation Performance
In this section, we present the final models of the three GMDH-type neural networks on the same data set to show their explanatory power.
For convenience, we take the data set "Monk3" as an example. Its IF is 1.08, which does not require balancing of the training set and, thus, excludes the impact of randomness of resampling on the experimental results. In addition, there is a substantial difference among the three models on this data set, as shown in Tables VI and VII. This data set contains 554 samples and two classes. For reproducibility of results, the data set was divided in its original order. We first selected the former 222 samples from each class to constitute the training set, and the remaining 110 samples constituted the testing set. Furthermore, the training set was equally divided into a model learning set and a model selecting set. Finally, the modeling results of the three GMDH-type neural networks are shown as follows: 1) The final form of the RGMDH model y = 0.2547 + 0.3350x 2 + 0.0427x 4 + 0.1889x 5 T ac = 76.36%, A ac = 75.00%.
2) The final form of the PE-CGMDH model 3) The final form of the C-CGMDH

H. Discussion
From our experiments, we reach the following conclusions. 1) Compared with the PE-CGMDH and RGMDH models, the convergence speed of the C-CGMDH model is faster, and the number of features selected by it is fewer, whereas its classification performance is significantly better. In addition, combined with the results of the ablation study, the improvement of the PE-CGMDH model is quite necessary. The C-CGMDH model can recover the disadvantages of the PE-CGMDH model effectively by the circular transformation and the new external criterion NCSRC together. In addition, the contribution of circular transformation is greater than NCSRC in the C-CGMDH model. 2) Compared with the three real-valued models, RGMDH, SVM, and RMLP, the classification performance of C-CGMDH is statistically significantly better in both T ac and A ac . This is, perhaps, owing to the orthogonal decision boundaries of the C-CGMDH model, which helps it to classify more efficiently than the real-valued models [5], [39].
3) The C-CGMDH model has significantly better T ac and A ac than three other CVNNs: FCRN, FC-FLC(AF 1 ), and FC-FLC(AF 2 ). The results seem reasonable. The three CVNNs require confirmation of the optimum model parameters, which is very difficult. However, the C-CGMDH model is a heuristic self-organizing modeling technology, which can automatically determine the number of layers of the neural network, variables entering the optimal complexity model, and model parameters. Meanwhile, the convergence speed of the C-CGMDH model is fast, and its time complexity is comparable with others when dealing with the data sets having few features, whereas it is not ideal when the number of features is relatively large. This is, perhaps, owing to its modeling mechanism. In fact, the three GMDH-type neural networks have the same orders of magnitude with respect to the time complexity, because they use the same modeling mechanism. However, the time complexity of the PE-CGMDH model is lower than that of the RGMDH and C-CGMDH models on some data sets. This may be mainly because it does not use one-versus-one strategy for multiple-category classification problems.
In addition, compared with other CVNNs, the C-CGMDH model has some advantages. First, once the external criterion and the transfer function are selected, it can complete the entire modeling process by self-organizing modeling, including selecting some of the most important features from all features. Second, it is a white-box model that can display an interpretable expression. Thus, its explanation performance is stronger than most other CVNNs. However, it also has some disadvantages. If the real-valued classification problem has many features, then the number of initial models participating in the combination is large, and the number of middle candidate models in each layer is also relatively large. Thus, the time complexity may be high, although it is suitable for classification problems with relatively low dimensions.
Finally, compared with the RGMDH model, the C-CGMDH model has the following differences.
1) This study extends the GMDH method from real field to complex field, and it constructs a complex-valued classification model C-CGMDH for real-valued classification problems. 2) To improve the classification performance of the C-CGMDH model, a new complex-valued external criterion NCSRC is proposed and proved to satisfy the theory of optimal complexity, which indicates its effectiveness and rationality. 3) Its convergence speed is faster. 4) It can select fewer important features. 5) Its classification performance is significantly better.

V. CONCLUSION
To overcome the shortcomings of existing CVNNs, this study presented a C-CGMDH model. First, we proposed a complex LS for parameter estimation. Next, a new complexvalued external criterion NCSRC was constructed with the logarithmic function to represent explicitly the magnitude and phase of the actual and predicted complex outputs, to evaluate and select the middle candidate models. Its property was proven to be similar to that of the real external criterion. Finally, before training this model, a circular transformation was used for data preprocessing to transform the real-valued input features to the complex-valued input features. Experiments on 25 UCI data sets showed that our improvement was quite necessary. Among the three GMDH-type neural networks, the C-CGMDH model's convergence speed was the fastest, and the number of features selected by it was the smallest. Furthermore, its classification performance was statistically significantly better than that of the four complexvalued and three real-valued models in terms of both T ac and A ac . The time complexity of the C-CGMDH model was comparable with that of other models when dealing with the data sets that have few features. In addition, it is a white-box model that displays an interpretable expression.
The C-CGMDH model proposed in this study uses the first-order linear transfer function, and its classification performance remains poor on some data sets. For example, on the data set "Teaching-a," as shown in Table VI, its T ac increases by 2.65%, but this value is still less than 60%, compared with the best of the other seven alternative models. In fact, for data sets with a complex data structure, nonlinear transfer functions may achieve better classification performances. Therefore, to improve the classification performance, we should adopt more complex transfer functions on these data sets in the future, such as the second-order and third-order KG polynomials. In addition, due to the unique modeling mechanism of GMDH-type neural networks, the middle candidate model in the first layer is a combination of any two input features. Thus, the time complexity may be very high for data sets with many features. To address this issue, a possible approach is to calculate the correlation coefficients between any two features and rank them in the ascending order. In theory, combining pairs of features with a lower correlation for modeling may achieve better performance. Therefore, if we select a part of the feature pairs ranking in the top to construct the model, then it is expected to reduce the time complexity without affecting the model's performance. We plan to work on this in the future.