System Diagnosis of Coronary Heart Disease Using a Combination of Dimensional Reduction and Data Mining Techniques: A Review

Coronary heart disease is a disease with the highest mortality rates in the world. This makes the development of the diagnostic system as a very interesting topic in the field of biomedical informatics, aiming to detect whether a heart is normal or not. In the literature there are diagnostic system models by combining dimension reduction and data mining techniques. Unfortunately, there are no review papers that discuss and analyze the themes to date. This study reviews articles within the period 2009-2016, with a focus on dimension reduction methods and data mining techniques, validated using a dataset of UCI repository. Methods of dimension reduction use feature selection and feature extraction techniques, while data mining techniques include classification, prediction, clustering, and association rules.

This paper is composed of several sections. Besides the introduction, the second section is a brief overview of data mining and dimension reduction methods, describing also the comparison of performance of the system diagnosis among articles, with reference to the dimension reduction algorithms and data mining techniques used. The third section discuss the development of research, while the fourth section draws the conclusions.

Research Method
Most of the coronary heart disease diagnosis systems are based on the combination of data mining techniques and dimension reduction. Data mining is often also called knowledge discovery in databases (KDD). KDD is a computer-assisted process to explore and analyze large amounts of datasets and may extract useful information and new knowledge [7]. KDD can be decomposed into a step-by-step process, i.e. data selection, data cleaning, data reduction, data mining and interpretation evaluation [8]. Data mining techniques used in this study are focused on classification, clustering [9], prediction [10], and association rules [9]. Algorithms in classification techniques can be grouped into two approaches, namely the algorithm with blackbox and non black-box approach [11].
Dimension reduction is the process of reducing high-dimensional data into a low dimension of data, with minimal lack of information [8]. The dimension of the dataset is the number of attributes which represent objects in the dataset. The data dimension tends to differ qualitatively lower than data in high dimension. The reduction process can be grouped into two dimensions, namely, feature selection and feature extraction. Feature selection is a process that aims to select a set of data features more suitable to infer an almost complete knowledge about the problem represented by those data. Feature selection usually addresses three types: embedded, filtering and wrapper. Furthermore, feature extraction is a process to m ake highdimensional data transformation to a low-dimensional space through the adoption of several mapping [12]. Feature extraction can also create a new feature based on the transformation and combination of original features [13]. Feature extraction can be grouped into two categories, namely linear and non-linear.

Feature Selection Embedded Type
The embedded type of feature selection utilizes machine learning in the process. In this selection system a feature is removed if the machine learning assumes this feature as not so influential. The selection process that utilizes machine learning is similar to the system diagnosis using classification algorithms with the non black -box approach. The results of a review of the diagnosis system using embedded feature selection type, is shown in Table 1. Comparing the performance of the system w ith C4.5 and bagging ensemble. Both systems w ere validated using 10-fold cross validation.
Bagging ensemble system is able to provide improved performance compared to using only C4.5.
The test results showed that the average performance of C4.5 is better than FDT. FDT is superior in terms of a reduced number of attributes. At least attribute makes the size of the resulting tree FDT smaller, so the decision process results faster. [16] Decision tree (DT) In a study comparing a number of classification algorithms, one of w hich is a decision tree The results show that the decision tree is better than logistic regression algorithm, but it loses w ell w ith the neural netw ork. [17] C4.5, CART, RIPPER and Fuzzy inference system (FIS) Fuzzy-based diagnosis system, but the rule extraction is done using the algorithm C4.5, CART, and RIPPER then transformed into a fuzzy rule. Membership functions used in fuzzy rules optimization w ith algorithm Imperialist competitive algorithm (ICA).
The resulting performance indicates that the use of extraction rule w ith C4.5 is better. Unfortunately, in these studies is not explained anyw here the attributes reduced.  [18] Fuzzy Inference System (FIS) w ith w eighted fuzzy rule (WFR) The system is divided into tw o phases, namely making a w eighted fuzzy rule and build a fuzzy rule-based decision support system. The WFR-making process is done by mining techniques and selected attributes. As for the decision-making process, it is done by using FIS.
The test results provide accuracy performance for the dataset Cleveland 62.3%, Hungarian 46.93%, and Sw itzerland 51.22%. Unfortunately, it is not show n the reduced number of attributes. [19] Decision tree w ith fuzzy inference system (FIS) The method used is divided into several processes, namely, conducting mining techniques, selection attributes and creates a tree diagram w ith reference to the decision tree algorithm. The process of tree-rule result then converted into a fuzzy system. The result of such conversion will be obtained by fuzzy rules. The decisionmaking process is done by FIS models.
The test results showed the performance parameters of accuracy by using a dataset cleveland of 68.35%, w hile the dataset hungarian of 49.9%. The results also show ed improvement from past studies. Unfortunately, also not show n the reduced number of attributes. [20] The combination of k-means, Weighted Associative Classifier (WAC) and C5.0.
The system leverages the k-mean algorithm, for clustering of data. Furthermore, the WAC to produce a number of rules for classification. Furthermore, the classification process by using C5.0.
The use of K-mean and WAC are able to provide improved performance C5.0. Unfortunately, in these studies did not indicate the number of attributes generated from the dimension reduction process. [21] Fuzzy Inference System (FIS) and C4.5 w ith tiered approach The model system uses a tiered approach, based on three levels. They are prediction using risk factors, diagnosis with reference of symptoms and Electrocardiography (ECG), and the specialist examination. At the level of the 2nd and 3rd, the extraction rule is done w ith C4.5 then transformed into a fuzzy rule. While level-1 refers to the Framingham risk score.
The reduction process occurs also at each level. The advantage of tiered models the possibility to perform dynamic dimension reduction. The reduction show ed that, when a diagnosis is measuring a certain negative level, then no further examination is required. It is also able to reduce the cost of inspection.

Feature Selection through Filtering
In such cases the process is filtered by providing scores for each attribute, then selecting the attributes based on the assigned score. Feature selection filtering type has several advantages: the process is fast, scalable, independent from the classification algorithms and reduces the computational complexity [22]. The weakness of this type is the lack of interaction with the classification algorithm. Examples of types of filtering algorithms are: information gain [23], genetic search [12], correlation-based feature selection [12] and rough sets [8]. Review Articles on coronary heart disease diagnosis systems that uses this type are divided into two groups, namely diagnosis systems with binary outputs (normal and abnormal) and multiclass (abnormal output is divided into 4 types/levels). The binary approach in the learning process is preceded by the conversion from multiclass to binary class, whereas multiclass approach is performed without any conversion process. Diagnosis systems with binary output are shown in Table 2, while the multiclass systems are reported in Table 3. The combined use of fuzzy logic with DMS-PSO provides more effective diagnosis system and is able to provide improved accuracy. In addition, the system is better than a number of other research. [35] Correlation Coefficient Method, R-Squared Method, and Weighted Least Squared Method.
Fuzzy inference system w ith fuzzy weighted rule is generated using a genetic algorithm (GA) The combination is capable of providing more effective diagnosis system and is able to provide improved accuracy. In addition, the system is able to reduce attributes. [36] Integrate the Relief approach w ith the Rough Set method (RFRS) Ensemble classifier based on C4.5 algorithm Dimension reduction is done in tw o stages: first w ith relief algorithm, w hile the second stage is a rough sets heuristic algorithm. It obtains 7 attributes, w ith an accuracy of 92.59%. [37] Logistic regression w ith tiered approach Multilayer perceptron neural netw ork with gradient descent The tiered approach can reduce the dimension to tw o complex attributes, namely scintigraphy, and fluoroscopy. The proposed system performance, by using classification algorithms based on MLP, is better, especially w ithout using these tw o expensive attributes. Being able to reduce the 3 attributes from 19 attributes, but it can provide improved accuracy and true positive rate, either using feature selection or not, w ith an average accuracy of 90%.

Feature Selection Based on Wrappers
The feature selection based on wrapper systems works by selecting concurrently with the modeling. Selections of this type using a criterion that utilizes classification rate of classification methods are used. The process of wrapping has several advantages: it is simple, interacts with the classification algorithm, is dependent with algorithm classification and provides good classification accuracy. This type also has the disadvantage of requiring intensive computational processes [22]. Feature selection based on wrappers makes use of commonly used algorithms, like genetic algorithm (GA) [12,[44][45][46], artificial bee colony [47] and particle swarm optimization (PSO) [48]. The review of a number of articles that combine wrapping feature selection with data mining techniques, for the diagnosis of coronary heart disease, can be grouped into two categories, by considering the cost of inspection for each attribute or not considering the cost of the examination. The results of such review are shown in Table 4 and Table 5, respectively. Naive Bayesian, support vector machine, C4.5 and multilayer perceptron (MLP).
The reduction process produces 7 from 13 attributes, and the best accuracy performance parameters are given from the combination of GA and Naive Bayesian [45] GA w ith multiobject function

Neural Netw ork (NN): Back-propagation
The combination of GA w ith NN produces 8 attributes. The performance accuracy is 89.58%. [46] Genetic Algorithm Fuzzy Inference System It produces 7 attributes and obtains 86% of accuracy, using a validation method 10-fold cross validation [47] Artificial Bee Colony (ABC) Support vector machine Feature selection resulted in 7 attributes and the resulting accuracy is 86.76%. Feature selection resulted in 7 attributes. An improvement of the diagnosis accuracy is done by combining w ith kmeans, w hereas the classification uses multiple algorithms such as MLR, MLP, FURIA, and C4.5. The test results using a 10-fold cross validation, a model w ith a combination of CFS, PSO, k-mean and MLP obtain an improvement of 11.4% compared to using 13 attributes. [49] Associative Rule

Multi-Layer Perceptron (MLP)
Combination association rule and the MLP are able to provide an accuracy of 84.97%, compared w ith no use of association rule. The reduction process produces 8 attributes from 13. The use of expensive attributes, namely of scintigraphy and fluoroscopy, is reduced. Performance accuracy is produced using the same relatively costly attributes, w hich is 85.18% w ith the naive bayesian algorithm. [51] PSO using a fitness function that takes into account the cost of inspection Feed Forw ard Neural Netw ork (FFNN) Results of dimension reduction are 8 attributes, w ith expensive attributes, namely scintigraphy and fluoroscopy. The resulting performance is better than a number of previous studies.

Feature Extraction
Feature extraction has the sense to make high-dimensional data transformation into a low-dimensional space, through the adoption of several mappings [12]. Feature extraction can be classified into two categories, depending on the type of transformation used, such as linear and non-linear. Linear feature extraction algorithms are Principle component analysis (PCA) [12] and linear discriminant analysis (LDA) [12], while for non-linear category, the locally linear embedding and ISOMAP [8]. ISOMAP is a special type of linear feature extraction algorithm known as multidimensional scaling [8]. The most often used in the case of coronary heart disease is the PCA.
The authors [52] proposed a diagnosis model by combining PCA with SVM. PCA performs a transformation to a particular domain, which in turn produced a principal component of influence. The results of the performed rotation principal component are mapped to the original attributes. The results of the PCA produces 8 attributes. The classification using 8 attributes and SVM produced a performance with an accuracy of 84.1%, using a linear kernel function. In addition, if the kernel function used in SVM is the Radial Basis Function (RBF), the method is able to increase the accuracy to 88.6%.
PCA is also used in research by [53], which combine PCA with adaptive neuro-fuzzy inference system (ANFIS). Results of dimension reduction using PCA obtained 7 attributes from the original 13 attributes. The test demonstrated an accuracy of 93.2%. The authors [54] also use PCA combined with ANN for diagnosis of coronary heart disease. This study generated 10 principal components, after transformation. The test results with ANN classification obtained the highest accuracy performance (86.1%) when using 5 principal components. Unfortunately, the study did not mention specificaly the number of attributes that can be reduced from the use of the 5 principle components.
Feature extraction is also widely approached with Fisher Discriminant Ratio, also known as LDA. The authors [55] proposed a diagnosis system using a hybrid system. The proposed system consists of several stages. First, undergo a ranking of the attributes by using LDA. Second, the top ranking attributes are used as the initial population of the genetic algorithm. Genetic algorithms are used to provide an array of optimal attributes. Third, a modified k-NN is applied to improve the performance of the back-propagation neural network. The test results show that the performance of the system is better than a number of classification algorithms, such as ANFIS, ID3, NB, and radial basis function.

Results and Analysis
Reviewed articles related to the research on coronary heart disease diagnosis system, using a combination of dimension reduction and data mining techniques, taken over a period of years 2009-2016, are 41. The distribution of these articles per year is shown in Table 6. Referring to Table 6, the feature selection is dominated by filtering methods, while the embedded and wrapper type are lower The advantage of embedded type is the dimensional reduction process that is attached to the classification process. The classification algorithm used is non-black box approach, such as fuzzy system. This approach makes knowledge based modeled in the form of IF-THEN, so it is easily understood by clinicians. Most studies rarely give reasons for selecting an algorithm taking into account the ease of understanding the processes in the system by clinicians. Most research on the diagnosis system, do not consider the convenience of the clinician in understanding the process in diagnosis, to choose the classification algorithm. Such considerations are only used by some studies as performed by [21], [26], and [43]. Data mining techniques used in the development of a model of coronary heart disease diagnosis system, within the range of 2009-2016 can be shown in Table 7. The most widely used data mining technique is classification, whereas the classification algorithm which mostly used are C4.5, ANN and Fuzzy system. The algorithm C4.5 almost every year as an option in developing a system of diagnosis of coronary heart disease. The use of data mining clustering techniques is the least used, it is caused in the case of coronary heart disease is a classification case, although clustering techniques can also solve classification cases. Classification process solving with clustering technique is not optimal, the technique is mostly used to optimize the performance of classification algorithms such as those done by [48] and [20]. The prediction techniques in data mining are mostly also used for the optimization of classification algorithms, and are even used as dimensional reduction algorithms, such as research conducted by [27] and [26]. The latter technique is association rule in data mining, the algorithm in this technique is also mostly used as a method for dimensional reduction, such as research conducted by [49].  Algorithm  2009  2010  2011  2012  2013 2014 2015 2016 2017 Total   Bagging  1  1  2  Adaboost  1  2  1  4  Clustering  SFCM  1  1  Prediction  LR  1  1  MLR  1  1  Association  Rule   WAC  1  1 The development of a diagnostic system model with a combination of dimensional reduction and data mining techniques is dominated with the aim of achieving the best accuracy performance with few attributes. It has not used much consideration of the cost of examining each attribute. The research which have used these considerations are conducted by [50] and [51]. Research by [50], using the genetic algorithm for dimensional reduction with its fit ness function is a cost function. This algorithm, combined with the Bayesian naïve classification algorithm or the support vector machine which able to reduce the two attributes that require expensive fees in the examination. If viewed from the performance, the system has a performance that is not much different when using these two attributes.The same concept was also carried out by [51]. The research uses particle swarm optimization algorithm with its fitness function is a cost examination function. The research combined with Feed Forward Neural Network classification algorithm. Both studies, research by [50] and [51] use a classification algorithm with the black-box approach. It was making difficult to understand the process of system work by the clinicians. Another disadvantage is not using the approach, which is commonly used in the diagnostic process, namely a tiered approach.

Conclusion
Review of a number of articles about the system diagnosis of coronary heart disease by using the combination of dimension reduction and data mining techniques provides several important points. First, the development of a diagnostic system focuses on improving the performance parameters of accuracy and the number of reduced attributes. Secondly, there has not been much research that included consideration of ease of understanding of the diagnosis process for clinicians, in choosing a classification algorithm. Thirdly, the diagnostic syst em approach used has not been much to refer to the tiered diagnostic approach, as is commonly used by clinicians. Fourth, the development of many diagnostic systems has not used the cost variables in dimensional reduction. Referring to these points, the model of the diagnostic system can be developed in the direction of the diagnostic system model with a tiered approach, using non-black-box classification algorithms, and Dimensional reduction that considers the cost of attribute inspection.