Wavelet packet analysis of disease-altered recurrence dynamics in the long-term spatiotemporal vectorcardiogram (VCG) signals

Vectorcardiogram (VCG) signals contain a wealth of dynamic information pertinent to space-time cardiac electrical activities. However, few, if any, previous investigations have studied disease-altered nonlinear dynamics in the spatiotemporal VCG signals. Most previous nonlinear dynamic methods considered the time-delay reconstructed state space from a single ECG trace. This paper presents a novel multiscale recurrence approach to not only explore VCG recurrence dynamics but also resolve the issue of recurrence computation for the large-scale datasets. As opposed to the traditional single-scale recurrence analysis, we characterize and quantify the recurrence behaviours in multiple wavelet scales. In addition, wavelet dyadic subsampling enables the large-scale recurrence analysis, but it is used to be highly expensive for a long-term time series. The classification experiments show that multiscale recurrence analysis detects the myocardial infarctions from 3-lead VCG with an average sensitivity of 96.8% and specificity of 92.8%, which show superior performance (i.e., 5.6% improvements) to the single-scale recurrence analysis.


I. INTRODUCTION
The human heart is a 3-dimensional object and its electrical activities are near-periodically conducting across space and time. Electrocardiogram (ECG) contains a wealth of dynamic information pertinent to cardiac functioning, but 1-lead ECG only captures one directional view of spatiotemporal heart activities. In contrast, 3-lead vectorcardiogram (VCG) monitors the spatiotemporal cardiac electrical activity along three orthogonal X, Y, Z planes of the body, namely, frontal, transverse, and sagittal [1]. However, 3-lead VCG is not as commonly used as 12-lead ECG because medical doctors are accustomed to using the time-domain ECG in clinical applications. Dower et al. [2] and our previous study [3] showed that 3-lead VCG can be linearly transformed to 12-lead ECG without a significant loss of clinically useful information. Thus, 3-lead VCG surmounts not only the information loss in 1-lead ECG but also the redundant information in 12-lead ECG. Orthogonal VCG signals provide an unprecedented opportunity to investigate the disease-altered electrical activity in both space and time.
This work is supported in part by the National Science Foundation (IOS-1146882)  In addition, technological advancements make enormous amount of ECG/VCG data readily available in the healthcare system of the 21 st century. It is often tiresome and implausible for human experts to visually inspect the large-scale datasets for disease patterns. There is an urgent need to develop novel methodologies that can efficiently recognize disease-altered patterns underlying long-term spatiotemporal VCG signals. However, realworld physiological systems show high level of nonlinear and nonstationary behaviors in the presence of extraneous noises. Conventional frequency-domain analysis and linear statistical approaches tend to have limitations to capture the nonlinear and nonstationary behaviours. It may be noted that nonlinear dynamic methods (e.g., recurrence analysis) have been widely developed and utilized to extract the knowledge pertinent to cardiac disorders from the new perspective of complex systems.
However, most of previous nonlinear methods only considered the lag-reconstructed state space from 1-lead ECG signals. Although 3-lead VCG provides a new way to investigate the cardiac dynamical behaviors, few previous approaches have studied the disease-altered recurrence dynamics in the space-time VCG signals. This present paper developed a novel multiscale recurrence approach to not only explore recurrence dynamics but also resolve the computational issues for the large-scale datasets. The main contributions are as follows: (1) Single-scale vs. multi-scale recurrence analysis: As opposed to the traditional recurrence analysis in a single scale, we delineate the recurrence dynamics into multiple wavelet scales. (2) Long-term recurrence analysis: Few, if any, previous approaches have been capable of quantifying the recurrence dynamics from a long-term time series. Recurrence computation is highly expensive (i.e., 1:J:J . s; t ¤ ;) for a long-term time series of size J. The dyadic subsampling in wavelet packet decomposition effectively resolves the computational issues for the large-scale recurrence analysis. (3) Diseasealtered recurrence dynamics: It is shown that recurrence dynamics are significantly different in wavelet scales between healthy control (HC) and myocardial infarction (MI) subjects. Multiscale recurrence analysis identifies the MI with an average sensitivity of 96.8% and specificity of 92.8%, which is much better (i.e., 5.6% increase) than the single-scale recurrence analysis.
The structure of paper is organized as follows: Section II introduces the research methodology. Materials and implementation results are presented in Section III. Section IV discusses and concludes this study.

II. MULTISCALE RECURRENCE ANALYSIS
Wave let analysis is an effective time-frequency decomposition tool, including continuous wavelet transformation (CWT), discrete wavelet transformation (DWT) and wavelet packet decomposition (WPD). In CWT, sub-signals in wavelet scales maintain the same length as the original signal, resulting in redundant information [ 4]. DWT introduces both the wavelet function and scaling function for decomposing the original signal into the approximations and details [ 5]. WPD is similar to DWT, but it further divides not only the approximations but also the details in each wavelet scale. This provides a better resolution in both time and frequency scales [6]. As shown in figure 1, the long-term VCG signal, followed by the dyadic subsampling, is decomposed into wavelet subseries. Each subseries is iteratively decomposed to produce 2k subsets of wavelet sub-signals, denoted as w k,n, n = 0, ... , zk -1, in the kth level. These shorter subseries will make the expensive recurrence computations not only plausible for the longterm time series but also more effective under the stationary assumptions in multiple wavelet scales.  Within each wavelet scale, recurrence analysis is utilized to quantify the underlying dynamics of nonlinear systems. We have previously utilized recurrence quantification analysis (RQA) methods to characterize 2 and quantify cardiac electrical dynamics [5][6][7]. This present paper firstly integrates wavelet packet d~composition with recurrence analysis to quantify disease-altered dynamics underlying long-term VCG signals. Recurrence plot (RP) is an effective tool to visualize the recurrences of system states in the state space. As shown in figure 2, recurrence plot captures topological relationships existing in the 3-lead VCG vector loops. The recurrence plot, RP;,j == 0(Ellx(i) -x(j)ll), defines the recurrence of states x(i) and x(j), where 0 is the Heaviside function. The texture patterns in recurrence plots reveal nonlinear characteristics of the 3-lead VCG (see figure 2(b)). For examples, diagonal structures represent the near-periodic patten:is and vertical structures show the nonstationary behav10rs. Furthermore, six RQA features are extracted to ~uanti~y the recurrence dynamics of nonlinear systems, mcludmg recurrence rate (RR), determinism (DET), longest diagonal line (LMAX), entropy (ENT), laminarity (LAM) and trapping time (TT) [8].

A. Databases
The database contains 448 VCG recordings (368 Mis and 80 HCs) available in the PhysioNet PTB Database [9]. Each recording contains 15 simultaneous heartmonitoring signals, namely, the conventional 12-lead ECG and the 3-lead VCG. The signals were digitized at 1 kHz sampling rate with a 16-bit resolution over a range of ±16.384 mV. The 80 HC recordings are acquired from 54 healthy subjects and the 368 MI recordings from 148 patients. The VCG recordings are typically of ~2 min duration, and all the signals are recorded for at least 30s. Our previous investigation [3] demonstrated the 12-lead ECG can be derived from 3-lead VCG using a customized transform, which shows better performance than the traditional Dower transform.

B. Feature selection
Because six RQA quantifiers, namely RR, DET, LMAX, ENT, LAM, TT are extracted for each wavelet subseries, the kth level wavelet packet decomposition will lead to a high-dimensional feature space (i.e., 6x2k number of features). In this present study, there are 288 features extracted from the VCG database, i.e., L~= 4 6 x zk for the level 4 and 5 decomposition. As a result, this may bring the issues of "curse of dimensionality" for classification models, e.g., increased model parameters and overfitting problems. Hence, sequential forward feature selection is utilized to surmount the complexity and overfitting problems in classification models, but also provide cost-effective models with the optimal feature subset.  As shown in figure 3, error rates are decreased when the optimal features are sequentially added into classification models. It may be noted that the error rate oscillates rather than decreases for a set of features larger than 10. Thus, the optimal size of feature subset is selected as 10 to avoid a complex model. Table I   In addition, Table II shows that logistic regression (LR) models yield the mean sensitivity from 93.8% to 96.5% when the K-fold number is varied from 2-fold to 10-fold. The mean specificity of LR models is increased from 87.8% to 91.1 % with respect to the K-fold number, which is significantly better (about 10%) than the KNN models. The correct rates of LR models are increased from 92.7% to 95.5% with small deviations (<0.7%) when the K-fold number is varied from 2-fold to 10-fold.
However, Table II shows that the ANN models achieve the best classification performances for the identification of myocardial infarctions based on multiscale recurrence features extracted from 3-lead VCG signals. The mean sensitivity is from 95.1 % to 96.8%, the mean specificity is from 86.3% to 92.8%, and the mean correct rate is from 93 .6% to 96.1 % when the K-fold number is varied from 2-fold to 10-fold.

D. Single-scale vs. multi-scale recurrence
It may be noted that we have previously extracted RQA features from the 3-lead VCG in the original scale for the identification of myocardial infarctions [7]. In addition, we utilized the DWT to decompose VCG signals into multiple wavelet scales, and compute RQA features from not only the original single scale but also multiple wavelet scales [5]. It is worth mentioning that only 4000 data points in the 3-lead VCG are used for the single-scale and DWT recurrence analysis due to the computational complexity. In this present paper, we further utilized wavelet packets decomposition for not only quantify multiscale recurrence dynamics but also resolve the computational issues for large-scale datasets.
It may be noted that the 3-lead VCG of 16000 data points are utilized for recurrence quantification analysis in this present study with the use of WPD dyadic sampling.  As shown in figure 4, multiscale recurrence analysis (i.e., DWT and WPD) show better performances (in terms of correct rates) than the single-scale recurrence analysis. The correct rate using DWT recurrence analysis (93.2% from 10-fold cross validation) is 2.7% higher than the single-scale recurrence analysis (90.5% from 10-fold cross validation). Moreover, the proposed WPD recurrence analysis increases the correct rate about 2.9% from the previous DWT recurrence analysis. In summary, the correct rate for the identification of MI subjects is 96.1 % in the WPD recurrence analysis, which is about 5.6% increase from the single-scale analysis.

IV. DISCUSSION AND CONCLUSIONS
Real-world physiological systems show high level of nonlinear and nonstationary behaviors in the presence of extraneous noises. Nonlinear dynamic methods provide a great opportunity to explore hidden patterns and relationships underlying complex cardiovascular systems. However, most of previous nonlinear dynamic methods only considered the time-delay reconstructed state space from 1-lead ECG signals to investigate cardiovascular dynamics. Few, if any, previous approaches considered disease-altered recurrence dynamics underlying long-term spatiotemporal VCG signals.
In this present paper, we developed a novel multiscale recurrence approach to analyze the 3-lead VCG signals for the detection of Mis. Computer experiments demonstrate that the proposed approach yields better performances by delineating nonlinear and nonstationary behaviors in multiple wavelet scales. The multiscale recurrence analysis of VCG signals leads to a superior classification model that detects the myocardial infarction with an average sensitivity of 96.8% and specificity of 92.8%, which is much better (i.e., 5.6% increase in terms of correct rates) than the single-scale recurrence analysis in previous investigations.