Arrhythmia Classification Based on Combined Chaotic and Statistical Feature Extraction

Received Feb 6, 2018 Revised May 8, 2018 Accepted Jun 14, 2018 Obvious information content in Electro cardio graph has become mandatory to reveal the abnormalities in the heart functions. Arrhythmia is commonly seen heart disorder and results in fatal end, if not identified and treated properly within time limits. The straight forward scene in such diagnosis is to detect the salient features from the Electro cardio graph data using signal processing methods followed by proper classification methods. 16 classes of Arrhythmia had been classified in this work by adopting the traditional method of abnormality detection while introducing a novelty in the type of features to be extracted. Lyapunov Exponents, Kolmogorov Sinai Entropy Density, Kolmogorov Sinai Entropy Universality and R-R interval features based on Kurtosis and Skewness had been used to classify the heart beats from the benchmark MIT-Arrhythmia database. Since alternative features had been utilized, common Support Vector Machines based classification could produce an accuracy of 98.95% in the proposed work with just 13 features.


INTRODUCTION
Though Arrhythmias never cause abrupt death, it is a sign of heart abnormality which would play lethal roles in due course. Besides the automation in the detection of abnormalities in heart functionality, digital signal processing methods help doctors for accurate diagnosis and following therapy. Since abnormality is caught just based on the Electro cardio graph (ECG) morphology and related time instant of occurrences of various peaks and peculiar shapes, human based diagnosis for a large data base would be difficult and inaccurate. Therefore, employing signal processing techniques would yield enormous benefits in the domain of biomedical instrumentation. Automatic ECG analysis has attracted researchers in the recent decades and there have been sufficient positive impacts due to those methodologies. But, it is unfortunate that the results are highly related to noise in ECG data, accuracy and time consumption. Moreover, classification accuracy is inversely proportional to the number of classes handled in the research.
Morphological and temporal characteristics of ECG signal considerably vary among different test subjects. Naturally this leads to challenges in automatic analysis and related classification services through digital signal processing techniques. All the techniques involve a type of feature extraction method followed by the classification method in order to obtain the complete characteristics of ECG signal. Time domain analysis [1][2][3][4], frequency domain analysis [5], time frequency analysis [6][7][8], statistics based analysis [9][10][11] and hybrid feature based methods [12][13] are common in the study of ECG signals. Extracted features are classified using neural networks, neurofuzzy systems [14], Linear Discriminant Analysis (LDA) [15] and Support Vector Machines (SVM) [16][17] to detect the irregularities in heart function. In ECG related classification, superiority in work is based on number of classes involved in diagnosis. Many works  [29], as it involves all the 16 classes as mentioned in MIT-BIH database with a classification accuracy of 98.82%. Those researchers could get this maximum accuracy through a new approach, i.e., discrete orthogonal stockwell transform using discrete cosine transform for efficient representation of the ECG signal in timefrequency space. Further, a dimension reduction had been done using principal component analysis, representing the morphological characteristics of the ECG signal. Besides this, dynamic R-R interval feature was also computed and concatenated to constitute the final feature set. As a whole, 20 features had been extracted to classify the MIT-BIH arrhythmia database using SVM classifier optimized through Particle Swarm Optimization (PSO). The experimental results yielded an improved overall accuracy, sensitivity (Sp), and positive predictivity (Pp) of 98.82% in comparison with the conventional approaches available earlier to this research literature.
Heart beat detection based on Phytagoras theorem had been done in [30]. However, the algorithm fails to classify various abnormalities in ECG. Works done is [31] and [32] are useful to the researchers to perform preprocessing on ECG signals.
While carefully observing the earlier literatures in arrhythmia classification, it was observed that the foremost problem is to obain better classification accuracy when more number of classes of abnormalities in ECG is involved. It is the task of the researcher to choose a suitable feature extraction method which would represent the complete morphological characteristics of various classes of ECG signal. As a novel approach, along with conventional R-R interval features, statistical features and chaotic metrics have been concatenated in feature sets based which the classification is done using Support Vector Machines (SVM).

RESEARCH METHOD 2.1. Mit-bih arrhythmia database
For all the experiments conducted, ECG recordings of 47 different subjects comprising 48 records studied by BIH Arrhythmia laboratory have been used [33]. The MIT-BIH database contains 110109 beat labels while the data are passed through a band pass filter between 0.1 Hz -100 Hz. The digitized outputs with sampling frequency of 360 samples/s and each sample over 10 mV range is represented by digital data with 11 bit resolution. Modified limb lead from the database has been used particularly heart beat segments obtained using a window across each R-peak. Ground truth is obtained from the class annotations provided by the bench mark database. The summary of the data sets and the details of 16 ECG signal classes are provided in Table 1. In order to maintain the generality, as selected in the state of art work, ECG signals from each of the 16 classes are chosen randomly to constitute the training and testing data sets by dividing the whole data set into 16 bunches, where each group represents their category. Particularly, 15% from normal category, 35% from "L", "A", "R", "V", and "P" category, and 40% from each of the ten classes of ECG signals are selected randomly for the training data set, i.e., a total of 21.79% (less proportion for training) events of the whole data set are selected for the training data set while the rest of ECG signals are utilized for testing the proposed method.

Lyapunov Exponents
Lyapunov Exponents (LE) is very useful in analyzing the dynamical systems. The sensitivity of divergence or convergence of trajectories in phase space with respect to the initial conditions is measured through Lyapunov Exponents. A system with at least one positive exponent is considered to be in chaotic region. LE is a measure of how diverse the lattices during each time iteration and it is given by Equation (1).
Where refers iterations and λ (i) is LE. λ (i) are calculated from the Eigen values ( ) of as given in [34]. Rn is calculated using Equation (2) from the initial values of the lattices from the construction of Jacobian matrix Jn as done in [35] in each iteration. Then we define.

∏ (2)
After calculating the LE values, those lattices have positive values are understood to be in chaotic region. In this work, lattice values are nothing but the time series values obtained from test database. The sum of Lyapunov exponents reveals the damping nature of a system and any changes in damping could be monitored with LE. Calculation of LE is done in many methods; the one given in Equation (3) is related to discrete time system. Few other approaches to calculate LE for a continuous time series are reported below. Computing LE and Instantaneous Lyapunov exponents (ILE) utilized phase space and tangent space approach in [36]. In an algorithm developed in [37] Short term averaged Lyapunov Exponents (SLE) were introduced. This is needed when the experimental data (time series) gives inaccurate ILE from a time series due to computational errors. A similar concept to the SLE, Local Lyapunov Exponents (LLE) was proposed in [38]. It is convenient to model a dynamical continuous time system by ordinary differential equations which is of the form given in Equation (3) and Equation (4).
Where, x=[x 1 ,x 2 , ..x n ] T The above equation gives a set of trajectories in phase space. The ith Lyapunov Exponent is calculated as given in Equation (5).
Where, the Eigen values are ordered from largest to smallest. Since the integration time is of infinite, it is practically not possible for infinite time series. Hence, LE calculation based on finite number of iterations is given below in Equation (6).
LE gives a better idea on how the nearby orbits diverge due to initial conditions. The method of calculating Lyapunov exponents have been already dealt in almost similar methods as given in [39][40][41][42].

Kolmogorov Sinai Entropy Density
The spatiotemporal chaotic system of the proposed system can be considered as L dimensions dynamics, the Kolmogorov-Sinai entropy (KSE) of the L dimensions dynamics is the sum of positive LEs. Without loss of generality, the Kolmogorov-Sinai entropy density is employed here to eliminate the effect of number of lattices, which is presented in Equation (7)  Where, h is the KSE density and the numerator is the sum of positive values of LE [43].

Kolmogorov-Sinai Entropy Universality
The Kolmogorov-Sinai entropy density indicates whether or not the spatiotemporal chaotic system is in chaos. However, KSE density cannot present chaotic majority of L lattices since KSE density is positive. Here, we employed KSE generality (or universality) hu as given in Equation (8).
Where, hu is the KSE generality and L′ is the number of positive Lyapunov exponents in spatiotemporal chaotic system of the proposed system [43]. The KSE generality is the percentage of lattices in chaos, which evaluates the space complexity in L dimensions of dynamics [44].

Standard Deviation
Standard deviation is a measure of the dispersion of the data from its mean [45]. The lower the standard deviation, the data points tend to be more close to the mean and vice versa. The formula for the standard deviation of the given matrix is; Where M=1, number of rows and N is the number of columns.

Kurtosis
Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak [45]. The formula for the kurtosis of the selected window length is as follows.
Where M=1, number of rows and N is the number of columns.

Skewness
Skewness is a measure of the asymmetry of the data. Qualitatively, a negative skewness indicates that the tail on the left side of the Gray Level Histogram (GLH) is longer than the right side, and the bulk of the values (including the median) lie to the right of the mean. A positive skewness indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean [45]. The formula for the skewness of the given matrix is Where M=1, number of rows and N is the number of columns.

R-R Interval Features
By nature, the process of pumping blood is not synchronized to any standard clock. Based on the bio clock of any individual, there could be variations in the rhythm of the heart. This variation is usually caught from the R-R interval between two heart beats and this feature is a good representative of the dynamic characteristic of the ECG signals. Four R-R features are computed that correspond to the pattern of ECG signal, namely, pre R-R, post R-R, local R-R, and average R-R interval. In this paper, the interval between a previous R-peak and the current R-peak is computed to determine the pre R-R feature, while the interval between a given R-peak and the followed R-peak is computed to determine the post R-R feature. The combination of the pre and post R-R interval feature of the ECG signal corresponds to an instantaneous rhythm characteristic. The average R-R interval feature is derived by averaging the R-R intervals of the past 3-min episode of a particular event. Likewise, the local-R-R feature is derived by averaging all the R-R intervals of the past 8-s episode of a particular event. The local and average features represent the average characteristics of a series of ECG signals. Further, kurtosis, skewness and standard deviation are calculated and 3 statistical features) are determined to represent each denoised input ECG and are further processed for classification. Prior to the extraction of aforementioned features, ECG data is filtered using a low pass filter with cut off frequency 400 Hz to remove unwanted noise signals.
All the aforementioned features are extracted from the MIT-BIH benchmark database by means of MATLAB tool boxes effectively. The proper utilization of MATLAB functions (both built-in and user defined), toolboxes such as statistical tool box, Digital Signal processing tool box, mathematical tool box, etc., can lead to work with ECG signals for processing and analysis both in real time and by simulation with great accuracy and convenience [46]. Evaluation of Statistical features has been inspired by [45] and [47]. Kurtosis, skewness and standard deviation is calculated for the window of R-R interval. Since an automatic knowledge discovery is essential in this proposed arrhythmia classification, chaotic map algorithm is proposed to recognize the patterns based on chaotic metrics shown in Figure 1. This chaotic map algorithm succeeds in efficient classification of normal and abnormal patterns with better sensitivity and specificity [48]. However, parameters such as KSE density and KSE universality are the additionally extracted feature in the proposed method in order to improve the classification performance.

RESULTS AND ANALYSIS 3.1. Performance Evaluation Parameters
The performance analysis for each class of event is estimated by computing the parameters such as true positive (TP), true negative (TN), false positive (FP) and false negative (FN) parameters, where (TP) and (TN) represent the correct classification of the normal and abnormal ECG signals. On the basis of these TP, TN, FP, FN parameters, the performance metrics for each class of signal are calculated namely, sensitivity,specificity and positive predictivity where sensitivity is the rate of correctly classified events among the total number of events, whereas positive predictivity refers to the rate of correctly classified events in all detected events. Using these definitions, sensitivity and specificity can be defined as , The overall accuracy and error rate can be defined as given in Equation 13 and 14 respectively.
All these above-mentioned parameters are computed and highlighted based on the simulation carried out using MIT-BIH database.

Results and Analysis
The proposed feature extraction and classification methods discussed in the previous section are implemented using mathematical and statistical tool boxes available in MATLAB software (version 9.0, R2016a) package installed on Windows 8 pro platform (i.e., AMD E1-2100 processor, 1 GHz, 4 GB RAM) for the analysis of ECG signals. The experiments for the proposed methodology are carried out and validated using the benchmark MIT-BIH arrhythmia database. The SVM classifier is trained on the training data set mentioned in Table 1 and its performance is analyzed for each tested ECG signal. The prediction performance of the tested ECG signals into their subsequent categories using the proposed methodology is presented here in the form of confusion matrix shown in Table 2.
Normally in all classification works, if the training signals are increased, the increase in the number of training signals will lead to increased classification accuracy. In [37], five classes of ECG signals are classified achieving an overall average accuracy of 93.48%. In this paper, third higher order statics features are classified using least square SVMs with limited number of testing signals. However, the experiments seen in [37] are performed only selected records and beats which remains unjustified by the respective researchers. The advantage of the proposed feature set is that it consists of the combination of morphological features and dynamic features, i.e., based on R-R interval information and chaotic features representing the different characteristics of the ECG signal. The combination of both these combined features yielded improved classification accuracy. Nonetheless, the computational complexity of the proposed methodology needs to be evaluated for real-time applications. In addition, the proposed technique is validated on all the ECG data (i.e., without excluding any segment) of benchmark MIT-BIH arrhythmia database with 21.8% training data leading to less consumption of training time and memory on the hardware (is valid, because, running time of algorithm is a real threat during training tenure).

Confusion Matrix for Proposed Model
In order to explain the confusion matrix better, an example is presented by taking normal class of signals and related count values for an example. The first row corresponds to the normal category and implies that 63187 signals are correctly detected as normal signals by the proposed methodology among 63764 actual numbers of normal signals and the rest of the normal signals are misclassified in the other categories. In column 1, 63328 normal signals are detected in the normal category that includes signals from the other categories, i.e., 63187 normal signals are correctly classified and the signals from other classes are misclassified into the normal category representing a total of 63328 signals. In the same procedure, the classification results for the other 15 categories of ECG signals are also calculated and presented in Table 3. Moreover, out of 86113 test signals in total, 85209 signals are correctly classified and 904 signals are misclassified for the 16 classes of ECG signals. The accuracy and error rate of the proposed methodology is computed using (13) and (14), which is 98.95% and 1.05% respectively.   The performance assessment of the proposed methodology is carried out by computing the parameters such as TP, FP, and FN based in Table 2, using (12) to evaluate the sensitivity and positive predictivity analysis for each class of ECG signal, which is presented in Table 3. Both the average sensitivity and positive predictivity performance evaluation parameters reported for the proposed methodology is 98.95% respectively.
The proposed method and the conventional works reported in the literature [8][9][10], [29], [49][50][51] on the basis of the number of classes of ECG signals classified and classification accuracy are highlighted in Table 4. Though noise immunity was better, Hilbert transform used in [52] could not yield exact R-R interval due to variation in R-R interval between the beats. Unlike the detection of only one class of Arrhythmia in [53], proposed method classifies 16 classes of ECG signal with significantly better classification accuracy compared to other reported works in the literature. All the literature taken for comparison suffers either with insufficient accuracy or less number of classes. In [50], though an acceptable accuracy is obtained for all 16 arrhythmia classes, the experiments are performed using 66% of training set and only 33% of testing sets. However, training the system with lesser number of features is considered to be more efficient while gaining maximum classification accuracy. In the proposed work only 21.8% training data are consumed and classification has been done with only 13 features. Approach Classes Accuracy (%) Melgani [5] Morphology + PCA + SVM 6 91.67 Li et.al [46] PCA + k-ICA + SVM 5 97.78 Osowski [6] Higher order statistics + Hermite + SVM 13 95.91 Raj et.al [13] Wavelet

CONCLUSION
The major hurdles tackled in this paper includes the selection of appropriate R-peak detection algorithm, a novel application of statistical features such as kurtosis, skewness and standard deviation of R-R window (first time used in ECG data mining for abnormality detection) and chaotic metrics. This paper has presented an automated ECG signal analysis scheme for long-term monitoring and analyzing the nonstationary behavior of the ECG signals. A new statistical and chaotic based feature extraction methodology is proposed to produce features representing the dynamic variations in the ECG morphology. Finally, 13 features obtained by combining the statistical,chaotic features and R-R interval features are utilized for the prediction of 16 ECG signal classes using the SVM classifier. It is to be noted that no optimization methods have been used to fine tune the classifier performance. The proposed method profits an improved accuracy of 98.95% on the benchmark MIT-BIH arrhythmia database. This research work has a scope extending further to incorporate the classification with still lesser number of features for arrhythmia analysis. The time consumption analysis has not been considered in this paper. However, in real time ECG processing, parallel processing schemes, FPGA and ASIC implementation would reduce the overall computation time in both online and offline modes.