Multi-lead ECG signal analysis for myocardial infarction detection and localization through the mapping of Grassmannian and Euclidean features into a common Hilbert space

Background and Objective : Electrocardiogram is commonly used as a diagnostic tool for the monitoring of cardiac health and the detection of possible heart diseases. However, the procedure followed for the diagnosis of heart abnormalities is time consuming and prone to human errors. Thus, the development of computer-aided techniques for the automatic analysis of electrocardiogram signals is of vital importance for the diagnosis and prevention of heart diseases. The most serious outcome of coronary heart disease is the myocardial infarction, i.e. the rapid and irreversible damage of cardiac muscles, which, if not diagnosed and treated in time, continues to damage further the myocardial structure and function. In this paper we propose a novel approach for the automatic detection and localization of myocardial infarction from multi-lead electrocardiogram signals. Methods: The proposed method initially reshapes the multidimensional signal into a third-order tensor structure and subsequently extracts feature representations in both Euclidean and Grassmannian space. In addition, two different methods are proposed for the mapping of the two different feature representations into a common Hilbert space before the final classification of signals. The first approach is based on the mapping of both Grassmannian and Euclidean features in a Reproducing Kernel Hilbert Space (RKHS), while the second one

directly to Grassmann manifold and then concatenate the two VLAD representations.
Both approaches are generic and can be easily applied to various application fields.
The remainder of this paper is organized as follows: Section 2 presents the proposed methodology including data preprocessing, ECG signals modeling, fusion of feature representations and classification. Subsequently, the dataset and experimental results are discussed in Section 3, while finally conclusions are drawn in Section 4.

Methods
The overall structure of the proposed methodology for the classification and localization of MI is shown in Figure 1. More specifically, multi-lead ECG signals are initial pre-processed and reshaped into a third-order tensor structure, while, subsequently, Euclidean and Grassmannian feature representations are extracted and fused in a common Hilbert space. In the case of Euclidean features, VLAD encoding is applied after a dyadic discrete wavelet transform and a subsequent multiscale higher-order SVD analysis on sub-band tensors. On the other hand, Grassmannian features, i.e., points on Grassmann manifold, are extracted by feeding the third-order tensor structure to a higher-order linear dynamical system.

Multi-lead ECGs signals
Tensor formation Modeling through LDS Wavelet transformation ( + 1) = ( ) + ( ) (1) where ∈ is the hidden state process, ∈ ℝ is the observed data, ∈ ℝ × is the transition matrix of the hidden state and ∈ ℝ × is the mapping matrix of the hidden state to the output of the system. The quantities ( ) and ( ) are the measurement and process noise respectively, with ( )~( , ) and ( )~(0, ), while ̅ ∈ ℝ is the mean value of the observation data. The extracted tuple LDS descriptor, = ( , ), models both the appearance and dynamics of the observation data, represented by and , respectively. The descriptor's parameters, and , can be estimated through a suboptimal method initially proposed by Doretto et. al [19].
However, in our case the multi-lead ECG signal is represented by the third-order tensor . To this end, we decompose the ECG formation using a higher order singular value decomposition: where ∈ ℝ × × is the core tensor, while (1) ∈ ℝ × , (2) ∈ ℝ × and (3) ∈ ℝ × are orthogonal matrices containing the orthonormal vectors spanning the column space of the matrix and × denotes the -mode product between a tensor and a matrix. Since the columns of the mapping matrix of the stochastic process need to be orthonormal, we can consider = (3) and Then, the transition matrix A can be estimated using least squares [20] as follows: Furthermore, to improve the stability of the dynamical system (i.e., to estimate the stabilized transition matrix ), we obtain an approximate solution, based on a convex optimization technique [21], by solving the following quadratic problem: subject to ≤ 1 where = ( ), = ( 1 2 ), = ( 2 2 ) and = ⊗ ( 1 1 ), is the identity matrix, (•) indicates the trace of a matrix and (•) operator converts a matrix to vector and ⊗ denotes the Kronecker product. Also, = ( 1 1 ) where vectors 1 and 1 correspond to the first eigenvalue of the transition matrix .
Having modeled each ECG signal using a higher-order linear dynamical system approach, our next step is to represent the parameters of each dynamical system, = ( , ), as a point on the space of the extracted descriptors. Towards this end, we initially estimate the finite observability matrix of each dynamical system, ( ) = [ , ( ) , ( 2 ) , … , , ( −1 ) ] and then, we apply a Gram-Schmidt othonormalization [22] procedure, i.e., = , in order to represent each descriptor as a point, ∈ ℝ ×T×3 , on the Grassmann manifold (in our experiments we set the size of the observability matrix equal to 3, while T is the number of samples).

Modelling of ECG signal using locally aggregated descriptors
In this section, we propose the representation of ECG signals, i.e., the third-order tensor , as a Vector of Locally Aggregated Descriptors (VLAD). The extracted VLAD descriptor can be considered intrinsically as Euclidean, as it encodes the features' distribution in their native vector space. More specifically, we initially apply a dyadic discrete Wavelet Transform (with Daubechies 9/7 Biorthogonal wavelet filters as mother wavelet) on every vector ( , , : ) of the tensor ∈ ℝ × × where = 1, … , and = 1, … , . This transformation results 2 × sub-band tensors ( is the number of levels and depends on the sampling frequency of the signal) comprising of one approximation ∈ × × , with = /2^, and number of details (where = 1, … , ) sub-band tensors with dimensions × × , with = /2^ [14].
Subsequently, each sub-band tensor is decomposed according to equation (3) and a feature vector is formed by the concatenation of mode-singular values σ (in our case is equal to 3) of all extracted core tensors S. In particular, for the 3 modes of each sub-band tensor and , we form the corresponding feature vectors: , 1 (2) , … , (2) , … , 1 (3) , … , and and then, we concatenate the individual features to form the final feature vector as follows: In our experiments we extracted the singular values of sub-band tensors 7 , 7 , 6 , 5 and 4 and used the first three singular values of each unfolded submatrix for the construction of feature vectors in equations (8) and (9). These sub-bands contain 'PQRST' segmented information, while the rest do not contain any meaningful information [14].
Finally, for the modelling of each ECG signal, we apply VLAD encoding, which is considered as a simplified coding scheme of the earlier Fisher Vector (FV) representation and has shown to outperform histogram representations in bag of features approaches [23], [24].
while the final VLAD representation is determined by the L2-normalization of vector ̅ :

Fusion through the mapping of data in a Reproducing Kernel Hilbert Space
To fuse the extracted feature representations, we propose in this section their mapping into a common Hilbert space (it is defined as proposed-1). Our main problem here stems from the fact that the two feature representations, i.e., Grassmannian points and VLAD encodings, lie in different geometrical spaces. More specifically, in the first case we have points in the non-Euclidean space of the dynamical model, known as Grassmann manifold, which is a quotient of the special orthogonal group SO(n), i.e., the subset of all orthogonal matrices with determinant equal to +1 (this simply means that we can extend the notion of tangent spaces, geodesics etc. from the base manifold SO(n) to the quotient space of Grassmann manifold).
On the other hand, the second feature representation is a VLAD descriptor, which lies in Euclidean space ( Figure 2).
To address the problem, we attempt to transform the two feature representations into a common Hilbert space using two kernel functions, : → for the Grassmann manifold and : ℝ → for the Euclidean space. In the first case, the Grassmannian kernel ( 1 , 2 ), which shows the similiarity between two Grassmannian points 1 and 2 is estimated using the inverse exponential map on the Grassmann manifold: where ‖•‖ is the matrix Frobenius norm. For estimating the inverse exponential map, we first need to compute the orthogonal completion of 1 and then the thin CS decomposition of matrix 2 to find the direction matrix that specifies the direction and speed of geodesic flow [25]. On the other hand, for the Euclidean space, we can simply apply a Radial Basis Function (RBF) kernel for two feature vectors 1 and 2 : Using equations (14) and (15) as similarity metrics in Grassmannian and Euclidean space respectively, we can easily estimate the elements of kernel matrices and . To improve the robustness and the discriminative ability of the method, we create kernels of equal size, i.e., , ∈ ℝ , with = * , where c is the number of classes, while is the number of the most representative samples in each class. For the Euclidean space, we apply a simple kmeans algorithm, while for the Grassmannian manifold, we select the most representative Grassmannian Points, using as a distance between two points on the manifold, the similarity metric of equation (14). To this end, we apply a k-medoids classification approach considering as medoid, the local minimizer of function F: where indicates the total number of Grassmannian points in a medoid and (•) denotes the distance between two Grassmannian points (see equation (14)). Having defined the most representative Grassmannian points among the existing ones, we estimate the Grassmannian kernel matrix as , = ( , ), with , = 1,2, … . Similarly, each element of the Euclidean kernel matrix is defined as , = ( , ) for each , ∈ [1, ].
Subsequently, we estimate the common kernel matrix for the two subspaces as =°, where ∈ ℝ and 〈°〉 is the Handamard product of kernel functions.
Finally, to classify the ECG signals we apply a sparse representation using the following equation according to [26], which enables us to map the input signal to the Hilbert space of a sparse representation : where y = 1/2 and C = 1/2 , with For the detection and localization of MI, ECG signals corresponding to myocardial infarction cases were initially discriminated from those of health controls and then they were classified in the following categories: anterior (AMI), antero-lateral (ALMI), antero-septal (ASMI), inferior (IMI) and infero-lateral (ILMI). To this end, each ECG signal represented by a set of sparse coefficients is classified to a class = 1 … , whose training samples provide the best reconstruction of it. Specifically, the classification is performed by assigning each multilead ECG signal to the class minimizing the following residual: where sets to zero the coefficients of that do not correspond to class . The estimation of the sparse representations of ECG signals is achieved using the SPAMS toolbox of Matlab.

Fusion in a Hilbert space through VLAD encoding
In this second fusion approach, we attempt to apply VLAD encoding on Grassmann manifold, as in the case of the Euclidean space, and then concatenate the two VLAD representations to form a joint representation for the two spaces (it is defined as proposed-2). In other words, we use VLAD encoding as a means to map our features into a common space.
In contrast with the previous approach, in this case, we divide each signal into overlapping equally-sized elementary signals (using a sliding window of a constant size W) that are modeled by a linear dynamical system. In this way, each ECG signal is finally represented as a set of points on the Grassmann manifold instead of a single point.
Subsequently, we apply a Karcher mean algorithm [28] to estimate the codewords in equation (12). We re-identify the members of each class, i.e., = ( ), using the dissimilarity metric defined in equation (14). Hence, the VLAD encoding of an ECG signal on the Grassmann manifold for a codebook of q representative words, { } =1 , can be defined as:

Results
For the evaluation of the proposed method we conducted extensive tests using a publicly available dataset, namely PTB Diagnostic ECG database [29], containing multi-lead ECG data. Each record in the dataset includes 15 simultaneously measured signals, i.e., the conventional 12 leads as shown in Figure 3 (i, ii, iii, avr, avl, avf, v1, v2, v3, v4, v5, v6) together with the 3 Frank lead ECGs (vx, vy, vz) (in our experiments the 3 Frank lead ECGs were not used, Figure 3). Figure 4 shows a 3D representation of a beat period of an ECG signal of the PTB Diagnostic database. Each signal is digitized at 1000 samples per second.
Nevertheless, in our experiments five groups of these have been considered (anterior (AMI), antero-lateral (ALMI), antero-septal (ASMI), inferior (IMI) and infero-lateral (ILMI)). The records of these categories are 47 from AMI, 43 from ALMI, 77 from ASMI, 89 from IMI and location, but these groups have a very limited number of ECG records to be used for training the classifier.
The goal of this experimental evaluation is three-fold. Firstly, we aim to define the optimal parameters for the two proposed fusion approaches. Secondly, a detailed experimental evaluation for the contribution of each extracted feature and fusion approach is performed estimating the MI detection and localization rates. Finally, in order to validate the efficiency of the proposed method, we compared its detection and localization rates with those of various state-of-the-art approaches using the same dataset.

Estimating the optimal parameters
The first evaluation phase concerns the selection of the optimal parameters for both fusion approaches in order to achieve the best detection rates. Initially, we carried out experiments in order to define the most appropriate kernel function for the mapping of Euclidean data into the Reproducing Kernel Hilbert Space (proposed-1 approach). In particular, we experimented using three kernel functions, namely radial basis function (RBF), polynomial and exponential chi-square distance. Figure 5 presents the classification rates for each kernel function when they are applied for the mapping of Euclidean data into a common Hilbert Space. As we can easily see, the best classification rate (100%) is achieved by using the RBF kernel function, while the polynomial and chi-square kernels provide lower detection rates of 99.6% and 93.8%, respectively. length) for creating the elementary equally sized signals. As seen in Figure 6, the best classification rate is achieved by setting the window size equal to 20, yielding a detection rate of 95.83% (this detection rate refers to the VLAD encoding ̅ and not to the concatenated ̅ vector). It is worth mentioning that when small signal sizes are used, the detection rates decrease, apparently due to the lack of sufficient information in each sub-signal, while the detection rate for signal sizes larger than 20 also seems to decrease.

Contribution of different feature representations to the detection of myocardial infraction
In this subsection, we elaborate a more detailed analysis in order to evaluate the contribution of different feature representations to the MI detection process. Specifically, we analyze the contribution of four different feature representations: a) The representation of ECG signal as a single Grassmann point (proposed-1 approach). b) The VLAD encoding representation ̅ of the ECG signal after the dyadic discrete wavelet transform and the subsequent multiscale higher-order SVD analysis on sub-band tensors (proposed-1 and proposed-2 approach). c) The VLAD encoding representation on the Grassmann manifold (proposed-2 approach) using a single Grassmann subspace i.e., ̅ and d) The concatenated VLAD encoding representation of the five sub-bands, A7, D4, D5, D6 and D7 (proposed-2 approach). For the classification of the ECG signals, in the first case, i.e., Grassmannian subspace, we used as a similarity metric the distance between two Grassmannian points, as defined in equation (14), while in the cases of VLAD encodings we used a standard SVM classifier. The experimental results in Figure 7 show that VLAD encoding on Grassmann manifold, i.e., ̅ , achieves the best results, with a detection rate of 95.8% against 94.6%, 92.3% and 91.3% for the VLAD encoding representation ̅ , the Grassmann feature representation and the concatenated VLAD vector, respectively. In the next section, we show that by mapping these features to a common Hilbert space, we can further improve the classification accuracy of individual features.

Comparison of fusion approaches
In this subsection, we aim to evaluate the effect of the two proposed fusion approaches to the detection and localization of MI. In the first approach (proposed-1), we use kernel functions to map the Grassmann feature representation and VLAD encoding ̅ into a common Hilbert As we can see in Figure 8, all approaches provide excellent detection rates and can easily distinguish MI cases from those of healthy controls. Additionally , considering the five   types of MI, namely AMI, ALMI, ASMI, IMI and ILMI, the

Comparison with state-of-the-art approaches
In this last section, we present a comparative analysis of the proposed method against a number of state-of-the-art approaches. More specifically, we compare the Sensitivity, Specificity and Accuracy rates of the proposed method (Table I) against those of ten state-ofthe-art approaches that have been used in the past for the detection and localization of MI on PTB Diagnostic ECG database.
To ensure a fair comparison, we adopted the same experimental protocol followed in [14]. The experimental results in Table I show that the proposed method (both proposed-1 and proposed-2 approaches) outperform all other methods achieving improvements up to 0.7% in detection accuracy from [17] and up to 1.9%, 1.3%, 0.4% and 1.2% in localization accuracies from [14], [31], [13] and [12], respectively.  [32] 78% 97.3% NA NA

Discussion
The method of 12-lead simultaneous recording of electrocardiographs allows the capturing of the ECG signal of the same cardiac cycle on 12 leads at the same time. This approach can significantly increase the accuracy of all measurements and reduce the variability of ECG measurement [33]. In a multi-lead ECG, the leads refer to imaginary lines between two ECG electrodes. To exploit this information and better model possible beat and lead correlations, we use a third-order tensor structure and then we attempt to extract different feature representations containing complementary information with regard to the dynamics of ECG signals. This fact justifies the superiority of the proposed method against all other state of the art approaches in Table I. While discrete wavelet transform has been widely used in the past for modeling ECG signals, to the best of our knowledge this is the first time that linear dynamical systems, and their projection to Grassmann manifold, are used for the modeling of such data. Linear dynamical systems have shown great ability to model dynamical information in video sequences [34], while recently they have been used for the extension of residual networks, i.e., ResNets, and the improvement of Faster-CNN network's accuracy in object detection applications [35]. The experimental results in Figure 8 show that the combination of LDS descriptors with those extracted from DWT increases significantly the detection rates of individual feature representations presented in Figure 7.

Conclusions
In this paper, we presented a novel methodology for assisting doctors in detection and localization of MI. The main advantage of the proposed approach is that it exploits better the intercorrelations between signals of different ECG leads by extracting feature representations that lie in different geometrical spaces and contain complementary information with regard to the dynamics of signals. More specifically, we initially reshape the multidimensional signal into a third-order tensor structure and subsequently extract feature representations in both Euclidean and Grassmannian spaces. Moreover, two different methods are proposed for the mapping of two different feature representations into a common Hilbert space before the final classification of signals. The first approach is based on the mapping of both Grassmannian and Euclidean features in a Reproducing Kernel Hilbert Space (RKHS), while the second one attempts to apply VLAD encoding directly to Grassmann manifold and then concatenate the two VLAD representations. The experimental results showed that the proposed method improved significantly the performance of the automated computer-based detection and localization of MI. In the future, more data will be collected in order to assess the effectiveness of the proposed methodology. Finally, we aim to extend and apply the proposed methodology to other application fields, using different types of signals e.g. EEG or EMG, in order to provide a generalized automated electrodiagnostic tool.

Acknowledgement
The research leading to these results has received funding from EC under grant agreement no. H2020-690494 "i-PROGNOSIS".

Conflict of Interest Statement
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, "Multi-lead ECG Signal Analysis for Myocardial Infarction Detection and Localization through the Mapping of Grassmannian and Euclidean Features into a Common Hilbert Space".
[2] National Heart, Lung, and Blood Institute, What is a Heart Attack.