Nonlinear dynamics and Poincaré sections to model gait impairments in different stages of Parkinson’s disease

Parkinson’s disease is a progressive neurological disorder that affects the motor system and produces several problems to control muscles and limbs. One of the major manifestations of the disease appears in gait and typically causes disability in the most advanced stages of the disease. Gait assessment is suitable for the diagnosis and monitoring of the neurological state of the patients. Gait is mainly evaluated from signals collected with inertial sensors attached to the limbs, and kinematics features are commonly extracted. Besides the classical kinematic methods, there are nonlinear phenomena in the gait process that cannot be properly modeled with kinematic features. This study proposes the use of several nonlinear dynamics features to assess gait impairments of Parkinson’s patients. We consider classical nonlinear features including correlation dimension, largest Lyapunov exponent, Hurst exponent, and others. In addition, we propose a novel nonlinear analysis based on the construction of Gaussian mixture models to find clusters in Poincaré sections. The nonlinear dynamics features proposed here are used to discriminate between Parkinson’s patients and healthy subjects, and to classify patients in different stages of the disease. To the best of our knowledge, this is the first study that considers nonlinear dynamics analysis including Poincaré sections to assess gait impairments of patients with Parkinson’s disease.


Introduction
Parkinson's disease (PD) is a neuro-degenerative disorder characterized by the progressive loss of dopaminergic neurons in the mid-brain [1]. There are motor and non-motor impairments that appear as a result of suffering from the disease. Non-motor symptoms include depression, sleep disorders, and cognitive decline in the most advanced stages. Motor symptoms include lack of coordination, tremor, rigidity, and postural instability. Gait impairments appear in most of the patients and include freezing, shuffling, and festinating gait. The standard scale to evaluate the neurological state of the patients is the Movement Disorders Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS) [2]. Fourteen of the thirty-three items included in the third section of the scale (MDS-UPDRS-III) are related to movement and control of the lower limbs. Each item ranges from 0 (healthy) to 4 (severely impaired). Gait changes are a hallmark of PD, where the main symptoms include speed reduction, smaller steps length, altered cadence, and increased gait variability. Although gait abnormalities are not clearly exhibited in early stages, their prevalence and severity increase with the disease progression [3]. More than 85% of PD patients develop gait impairments after three years of diagnosis [3]. The potential consequences of gait impairments in PD may include increased disability, risk of falls, and reduced quality of life. As the disease progresses, PD patients typically exhibit shuffling gait with a forward-stooped posture and festinating gait. These characteristics make the patients to spend a lot of energy while walking, leading them to their maximum metabolic capacity every day [4]. PD patients consider mobility and walking limitations the most disabling aspects of the disease and consistently identify improvement in walking as the most relevant outcome when rating the success of the treatment [5].

State of the art
There is a growing interest in the research community to study gait impairments in PD patients to perform automatic detection of the disease and to evaluate the neurological state of the patients. Most of the studies are based on signals captured from inertial sensors, e.g., accelerometers and gyroscopes attached to the shoes. Typical tasks performed by the patients during the evaluation include straight walking on intervals of about 10 m, heel-toe tapping, and circling foot movements. One of the first studies on this topic was introduced in [6]. The signals were captured using the eGait system 1 and the authors evaluated a total of 42 PD patients and 39 healthy control (HC) subjects. The study was mainly focused on analyzing information of energy content in several frequency bands, variance, root-mean square energy, and others. The reported accuracy ranged from 82 to 91% when classifying HC subjects versus PD patients in early and advanced stages of the disease, respectively. In [7] the authors considered also inertial sensors attached to the chest and knees and evaluated the neurological state of 34 PD patients according to the UPDRS score. The participants were asked to perform several tasks including 20 m walking, rising from a chair, and foot tapping. The authors computed kinematic features such as standing time, stride length, and stride velocity. The prediction of the UPDRS score was performed with a regression algorithm based on K-nearest neighbors (KNN). Spearman's correlation coefficients (ρ) of up to 0.60 were reported. In [8] the authors proposed a statistical transformation called shifted one-dimensional local binary pattern domain to classify gait signals from PD patients and HC subjects. Signals from 8 sensors were captured to measure the pressure of the foot while walking. A total of 93 PD patients and 73 HC subjects were included in the study. Accuracies of up to 88.9% were reported using a multi-layer perceptron. In [9] the authors considered a total of 15 PD patients and 16 HC subjects who were asked to use ultra-thin force-sensitive switches placed inside the shoes while walking. The authors computed a phase synchronization coefficient and the conditional entropy to model gait rhythm fluctuations between both feet. A multilayer perceptron was used to perform the automatic classification of PD versus HC subjects and accuracies of 92.8% were reported. Recently, in [10] the authors performed experiments to detect freezing of gait (FoG) episodes in 21 PD patients using deep learning methods. Several walking tasks were performed by the subjects including indoor and outdoor free walking and rising from a chair. The authors considered a six-layer one-dimensional convolutional neural network (CNN). The inputs to the CNN were formed by spectral representations of consecutive time intervals. A total of 64 frequency bins obtained from 9 inertial sensors (3axis with accelerometer, gyroscope and magnetometer) were considered. The authors reported that FoG episodes were detected with accuracies of up to 90%. Besides the aforementioned studies, where only one kind of signal is considered (mainly from inertial sensors), multi-modal strategies have also been addressed to model movement impairments due to PD. Recently, in [11] the authors combined information from gait, speech, and handwriting to perform the automatic discrimination of PD patients versus HC subjects. The strategy to merge information from the different biosignals was based on CNNs applied to the specific intervals where the participants start or stop a movement. Accuracies of up to 80.3% were reported when using only information from gait. When the three biosignals were combined, the accuracy improved to 97.6%.
Apart from the majority of studies that use kinematics, spectral and statistical features to model gait impairments of PD patients, there are some works that consider nonlinearities and the chaotic behavior that appear in human locomotion. This approach has shown to be suitable to model not only abnormal or impaired movements but also normal motor behavior [12]. For instance, in [13] the authors combined spectral, statistical, kinematics, and nonlinear dynamics (NLD) features to model gait signals of 14 HC subjects, 10 PD patients and 11 participants with peripheral neuropathy. The authors extracted different NLD features including largest Lyapunov exponent (LLE), Lempel Ziv complexity (LZC), and several entropy measures. Significant differences between the two groups of patients were reported when using LZC and crossentropy. In [14] the authors performed several experiments of automatic classification considering the Physionet dataset [15] which includes four groups of subjects: 13 PD patients, 13 Amyotrophic lateral sclerosis patients, 13 Huntington's disease patients, and 13 HC participants. The authors estimated NLD features such as the Shannon entropy, recurrence rate, and recurrence quantification analysis (RQA). Two different classifiers were used, a support vector machine (SVM) and a probabilistic neural network. The experiments were focused on discriminating between HC subjects and individuals from each group of patients. The authors reported accuracies of up to 100% depending on the disease.

Contribution of this study
This study introduces the use of classical and novel NLD features to model gait of PD patients and HC subjects. The set of classical features includes correlation dimension (CD), LLE, Hurst exponent (HE), LZC, and several entropy measures. This set of features has proved to be suitable to model impairments exhibited by PD patients in different motor activities including gait [13,16], speech [17], and handwriting [18]. The set of novel NLD features includes the use of probability distributions created with Gaussian Mixture Models (GMM) to model the dispersion of Poincaré sections. These features are used to discriminate between PD patients and HC subjects, and to classify patients in several stages of the disease. Classification exper-iments are performed considering two different classifiers: SVM and random forest (RF). Additionally, the neurological state of the patients according to the MDS-UPDRS-III score is predicted using a support vector regressor (SVR). Accuracies of up to 86.7% are obtained when discriminating between PD and HC subjects. The automatic classification of different stages of the disease shows accuracies of around 65%. To the best of our knowledge, this is the first paper that considers classical NLD features and probabilistic models of Poincaré maps to evaluate gait impairments in PD patients.

Phase space reconstruction
The phase space reconstruction is the first step in NLD analysis. According to the Takens's theorem [19], for a noise-free time-series there exists a dimension m such that the vectors X t are equivalent to the phase space vectors [20]. For a time series x t , a phase space can be constructed using Eq. 1, where the time-delay τ is computed by the first minimum of the mutual information function, and the embedding dimension m is found using the false neighbors method [21].
(1) Figure 1 shows the phase space reconstructed from gait signals corresponding to 20 mwalking with one stop every 10 m. The resulting phase space of four subjects is displayed: (A) HC subject, (B) PD patient in initial stage of the disease, (C) PD patient in intermediate stage of the disease, and (D) PD patient in severe stage of the disease. Note that the phase space of the HC subject exhibits well defined trajectories with a clear recurrence. Conversely, the trajectories for the patients are more dispersed, especially when the neurological state is severe. These figures allow us to think that several NLD features can be computed from the phase space to assess the complexity and stability of the walking process in PD patients.

Correlation Dimension (C D)
This feature is a measure of the dimensionality of the space where the attractor is embedded [20]. The computation of CD starts with the estimation of the corre- N is the number of points in the attractor, Θ is the Heaviside step function, m is the dimensionality of the embedded space, and ||·|| indicates the norm defined in any consistent metric space. C(ε) can be interpreted as a counting of the points that are inside a hyper-sphere of radius ε. CD is then computed as the slope of the linear regression of ln(C(ε)) versus ln(ε) for a small ε [20].

Largest Lyapunov Exponent (LLE)
The LLE measures the sensitivity of a dynamical system to its initial conditions according to the convergence/divergence rate of nearby trajectories in the phase space. The feature provides information about the average divergence rate of neighbor trajectories in the phase space (also called embedded attractor). The divergence rate is estimated according to the Rosenstein's method [22], which requires the estimation of the nearest neighbors to each point in the trajectories. Only those points which distance to the rest is larger than the average "period" of the time series are considered nearest neighbors. The separation of points in a trajectory can be defined as d(t) = Ce λ 1 t , where λ 1 is the LLE, d(t) is the divergence at the time t, and C is a constant. For the j-th pair of points that diverge at a rate λ 1 in an embedded trajectory, LLE is found at the i-th time interval iΔt as the slope of the average line that appears when drawing ln(d j (i)) = ln(C j ) + λ 1 (iΔt) in a logarithmic plane.

Hurst exponent (H )
This feature allows to evaluate long-term dependencies of a dynamical system. H is a smoothness measurement of a fractal time series, which quantifies the chaotic dynamics of a system. According to the rank scaling method proposed in [23], for a segment T of the time series, the following expression is defined: S is the range of the time series re-scaled by its standard deviation, and c is a scaling constant.

Lempel-Ziv Complexity (LZC)
This feature is related to the number of different patterns that appear in a given binary sequence. For practical reasons a very simple binary coding scheme is considered to convert the original signal into a binary string: when the difference between two successive samples is negative, the assigned binary is 0; when the difference is positive or null, the binary is 1. The process to compute LZC is based on the reconstruction of a sequence X by means of copying and insertion of symbols into a new sequence. Let's consider the binary sequence X = x 1 , x 2 , . . . , x n from left to right. The first bit is taken by default as the initial point. A variable S is defined to store the inserted bits. Thus, x 1 is the only value in S at the beginning of the process. An additional variable Q is defined to accumulate the bits that have already been analyzed from left to right. S and Q are attached on each iteration to form the binary sequence S Q. The last bit of S Q is withdrawn to form the sequence S Qπ which is compared to Q. If Q does not belong to the string S Qπ , the insertion process stops. The value of LZC is the number of subsets used to represent the original signal [24].

Recurrence and fractal-scaling analyses
This approach, originally introduced by Little, et al. for pathological voice assessment in [25], makes the assumption that the signal is formed by two components, deterministic and stochastic. The deterministic component is analyzed by the recurrence probability density entropy (RPDE). A hypersphere with radius ε > 0 that contains an embedded data point x(k j ) is considered. The recurrence time is defined as k r = k j − k i , where k j is the instant when the trajectory returned to the same hypersphere. If R(t) is the normalized histogram of the recurrence times estimated for all points into a reconstructed attractor, the RPDE can be defined as RPDE = where t max is the maximum recurrence time in the attractor. The stochastic component of the signal is modeled by detrended fluctuation analysis (DFA). Frames of the signal are integrated as between each integrated frame x(n) and the leastsquare straight-line that is optimized on each frame considering the parameters a n and b. The process is repeated for all frames with L samples length. At the end of the process, the log L versus log F(L) plane is created. When there is self-similarity in the original signal, the proportionality F(L)αL β is found. The DFA feature corresponds to the sigmoid normalization of the scaling exponent β.

Approximate Entropy (A E )
Entropy is an uncertainty measurement of a random variable. Its typical definition is known as Shannon's entropy, expressed as where X is a random variable with alphabet χ and probability mass function p(x). In a random process, where a set with n independent but not identically distributed variables is considered, the joint entropy has a growing rate given by Since it is not possible to compute entropy for n → ∞, there is an approximated version, namely A E , which consists on dividing the trajectories of the embedded space into several intervals to estimate the average con-ditional information generated by diverging points on a trajectory according to Eq. 3.
The computation of A E considers the distance among all possible pairs of points in the embedded attractor, which makes it non suitable in most of the cases. This drawback can be solved by computing the sample entropy which was introduced in [26] and defined as The main difference between φ(ε) (in Eq. 3) and Γ (ε) is that the second one does not include self-comparisons of points in the phase space, making it the computation of S E more efficient. A E and S E with Gaussian kernel The use of the Heaviside function Θ when counting points in the trajectories of the embedded attractor introduces discontinuities. In [27], the authors proposed to use a Gaussian kernel function to give greater weights to nearby points. The process consists in replacing the Heaviside function in C m i (ε) in the computation of A E and S E with a Gaussian kernel to measure the distance between every point in the embedded attractor [28]. The function to estimate such a distance is shown in Eq. 4.

Poincaré sections
A Poincaré section is formed in the intersection of a trajectory in the phase space with a hyperplane transverse to the flow of the trajectory. It is used to reduce the dimensionality of the phase space, and to transform the continuous flow time into a discrete time map. The time variation of a trajectory between two successive intersection points depends on the trajectories in the phase space, and on the surface chosen for the intersection. Each quasi-period in the time series leads to at least one point in the Poincaré section. Let ϕ(x, t) be the solution of f (x) which is the function that represents trajectories in the embedded space. Lets consider that for t = 0, ϕ(x, 0) = x, which means that the starting point that we will analyze in the trajectory passes through the point x. Orbits/trajectories are approximated via numerical methods considering small increments t , as it is shown in Eq. 5.
Let x n be a point such that T is the period of the orbit, i.e., the point x n returns to itself after T time units. The distribution of points in the map is given by the areas where the trajectories intersect the surface S which is transversal to the phase space in an specific direction, as it is shown in Fig. 2. As a result, the Poincaré map (P) is formed with the neighbors inside the surface S of all the points x n . S are also known as hyperplanes of the Poincaré sections. x n returns to S after kT time units, where k is the number of times in which a given point in an orbit returns to x n .
Note that T = τ , where τ is the time delay of the phase space and P is now our first return map to x as in Eq. 6, where R(x) is a positive number such that R(x n ) = T and P(x n ) = x n . Return maps plot pairs of points (x n , x n+1 ), where x n is the n-th point in the Poincaré section and x n+1 is the next point that arrives to the same return map after τ time units.
There is no general method to create Poincaré maps except under the most trivial circumstances where the equations associated to the phenomenon are known. In this paper we implemented the first return map to maxima method, which consists of finding the points that are local maxima. The first return map occurs when a given point (x n , x n+1 ) maps two consecutive maximum values in a trajectory. Maximum mapping points are expressed as (max n , max n+1 ).
After the reconstruction of Poincaré sections, GMM models are created with the aim of considering the probabilistic information of the sections. Figure 3 shows examples of the Poincaré maps and their corresponding GMM distributions. Note that there are differences in the maps obtained from the different subjects. The HC participant (Fig. 3a) and PD patient in the low stage of the disease (Fig. 3b) show more concentrated distributions, while the patients in intermediate (Fig. 3c) and severe state of the disease (Fig. 3d) show more sparse distributions.

Gaussian mixture model (GMM)
GMM-based systems are capable of representing arbitrary probabilistic densities. GMMs are parametric models represented as a weighted sum of M Gaussian densities. For a set with D-dimensional features (like points on a Poincaré section) x a GMM is defined as: The Gaussian densities p i (x) are parameterized by the mixture weights ω i , a D×1 mean vector μ i , and a D×D covariance matrix Σ i [29]. The parameters of the density models can be denoted as λ = (ω i , μ i , Σ i ) and the Gaussian densities as The parameters λ of the maximum likelihood function that better represent the set of points in a Poincaré section are estimated with the Expectation Maximization (EM) algorithm, which is typically initialized by the K-Means algorithm. Further details about how the GMM models are created can be found in [29].

Support Vector Machine (SVM)
The aim of an SVM is to discriminate data samples using the optimal hyperplane that better separates the classes. The decision function of the SVM is given by Eq. 9, which corresponds to a soft margin SVM, i.e., the classifier allows certain amount of errors which are penalized by the slack variable ξ n in the optimization process. y n ∈ {−1, +1} are the class labels, φ(x n ) is a kernel function that allows to transform the original feature space x into a higher-dimensional space where a linear solution exists. The weight vector w and the bias b are the terms that define the separating hyperplane.
The optimization problem to find the optimal hyperplane is shown in Eq. 10, where the hyper-parameter C controls the trade-off between ξ n and the width of the margin between the two classes. The samples x n where the condition y · We consider a Gaussian kernel function φ(x n ), given by Eq. 11, where the hyper-parameter γ is the bandwidth of the kernel function. For a more complete and a deeper understanding on the support vectors theory, we recommend the readers to see [30].
Random Forest (RF) This is one of the most popular methods based on the ensemble of several classifiers to make the final decision. RF combines several simpler classifiers such as decision trees. Each decision tree makes a local decision using different sets of features. The final decision is made following a majority voting scheme. An optimal RF can be found by optimizing the hyper-parameters N and D which correspond to the number of trees and the maximum depth of the trees, respectively.

Support Vector Regression (SVR)
Besides the binary classification of participants from the two different groups (HC and PD), and the multiclass classification of patients in different stages of the disease, the neurological state of the patients according to the MDS-UPDRS-III scale is predicted using an insensitive ε-SVR with linear kernel. This technique allows to predict the value of the scale ( y) using a loss function L(y, y) such that a global optimum exists. This function is calculated by Eq. 12, where ε is an insensitive hyper-parameter that determines the maximum error allowed in the prediction.
Similar to the process in the SVM, the feature vectors x are mapped to an m-dimensional space using a kernel function φ(x). The predicted values y are estimated in the same way as in the SVM classifier, according to Eq. 9, but in this case the class labels are real numbers (y n ∈ R) [30]. Fig. 4 Interface of the eGaiT system and a shoe with the inertial sensor a 3D gyroscope (range (5)00 • /s) attached to the lateral heel of the shoes [6]. Figure 4 shows the eGait system and the inertial sensor attached to the lateral heel of the shoe. The signals were transmitted via blue-tooth to a tablet using an Android App. Data from both foot were captured with a sampling frequency of 100 Hz and 12-bit resolution. The tasks performed by the patients include 20 m walking with a stop at 10 m (10 m walk twice, namely 2 × 10 m), and 40 m walking (Four times 10 m walk, 4 × 10 m).
A total of 45 patients with PD and 45 HC subjects participated in this study. Both groups of participants are balanced in age (t-student test with p = 0.52). All of the patients were evaluated by a neurologist expert who labeled their neurological state according to the MDS-UPDRS-III scale. During the evaluation, the neurologist filled in the form indicated in Fig. 10 of "Appendix". The exclusion criterion for participants in the HC group was to have any movement disorder, impairment or any injury/surgery in the lower limbs. Table 1 summarizes clinical and demographic information of the participants. The table also includes the scores of the items related with the movement and control of the lower limbs, namely LL-score. Anonymous subject-per-subject information is provided in Tables 17 and 18 of "Appendix". Additionally, details of the MDS-UPDRS-III scale and the items related to the evaluation of the movement and control of lower limbs are provided in Fig. 10 and Table 19 of "Appendix", respectively. We believe that the LL-score is more specific to assess gait impairments, therefore more suitable to perform automatic evaluation of problems exhibited by patients to move and control the lower limbs.

Methodology
A summary of the methodology proposed in this study is depicted in Fig. 5. The main stages include: (1) data acquisition, (2) feature extraction and (3) classification/regression. The first stage consists in capturing the gait signal from the participants. The second stage is the feature extraction which includes three different sets: classical nonlinear dynamics, entropy measures and Poincaré sections extracted from the attractors. The third section consists in discriminating between PD patients and HC subjects, and evaluating the neurological state of the patients according to the MDS-UPDRS-III scale and the LL-score.

Experiments and results
Three different experiments are performed in this study: (1) automatic classification of PD patients and HC subjects, (2) prediction of the neurological state of the patients according to the MDS-UPDRS-III score and the LL-score, and (3)   The performance of the classifiers is evaluated according to their Accuracy (Acc), Sensitivity (Sen), Specificity (Spe), and the Area Under the receiver operating characteristic Curve (AUC).

Classification of PD patients and HC subjects
Within the classification of PD versus HC subjects, six experiments were performed considering different feature sets: (1) NLD features, (2) entropy measures,  AUC, area under the ROC curve; l, left foot; r, right foot; C and γ are the optimal parameters for RBF-SVM, N and D are the optimal parameters for RF classifier AUC, area under the ROC curve; l, left foot; r, right foot; C and γ are the optimal parameters for RBF-SVM; N and D are the optimal parameters for RF classifier fusion), and the evaluation of each foot (r: right and l: left). The best result in the bi-class classification with an accuracy of 89% is obtained with the combination of NLD features and Poincaré sections using the RBF-SVM as classifier (see Table 6). The second best result is obtained with the combination of all feature sets and the RF classifier. In terms of tasks, the early fusion of 2 × 10 and 4 × 10 exhibited the best results when the movement of left foot is considered.
The classical nonlinear dynamics features seem to be the most accurate among the three considered feature sets. However, the different combinations between the features sets show improvements especially when the left foot is considered. This result indicates that the feature sets in this work are complementary for the task of discriminating between PD patients and HC subjects considering gait signals collected with inertial sensors. Note that specificity is higher than sensitivity in most of the experiments. This is likely explained because there is no a large number of patients with severe impairments in their lower limbs. In addition, the combination of feature sets allows to increase the classifier's ability to discriminate patients. This result indicates that the proposed approach seems to be more accurate to correctly classify HC subjects. AUC values are also included in the results with the aim of showing a more compact statistic of the system's performance. AUC, area under the ROC curve; l, left foot; r, right foot; C and γ are the optimal parameters for RBF-SVM; N and D are the optimal parameters for RF classifier The scores obtained from the two classifiers are used to create the distributions of both classes and their corresponding box-plots shown in Fig. 6. The scores of the RBF-SVM, in Fig. 6a, correspond to the distance of each subject to the hyperplane. The scores of RF, in Fig. 6b, correspond to the difference between the posterior probabilities assigned to each subject for the two classes. Dark gray bars are for HC subjects while white bars are for PD patients. The light gray bars correspond to the intersection between both sets, and give an idea of the low misclassification rate obtained with the proposed approach. It is worth to highlight that the distribution of subjects in the representation space shows a clear trend where HC participants are far away from most of the PD patients, especially those in the most advanced stages of the disease. This will be extensively discussed in Sect. 5.2.2. Besides the distribution of the scores obtained with both classifiers, we want to show the performance of the proposed approach more compactly. Figure 7 shows the ROC curves obtained for the fusion of tasks considering the RBF-SVM and RF classifiers. The four feature sets are included. These curves allow to perform direct comparisons among models, for instance the better performance of the early fusion of features can be easily observed with the RBF-SVM classifier. AUC, area under the ROC curve; l, left foot; r, right foot; C and γ are the optimal parameters for RBF-SVM; N and D are the optimal parameters for RF classifier

Regression
The regression experiments considered here include feature sets extracted from the signals obtained with the left foot. This is because those signals exhibited the best performance in the bi-class classification experiments presented in Sect. 5.1. Tables 8, 9     ρ, Spearman's correlation; MAE, Mean absolute error; C and are the optimal meta-parameters for SVR obtained in development Table 9 Regression results with 4 × 10 m task (left foot) using different feature sets  ρ, Spearman's correlation; MAE, Mean absolute error; C and are the optimal meta-parameters for SVR obtained in development using the classical NLD features with ρ = 0.63. The highest correlation with respect to the LL-score is obtained with NLD + Entropy measures and with NLD + Poincaré sections with ρ = 0.38. Although the LL-score is more specific to describe impairments in the movement and control of lower limbs and muscles, it seems like the reduced variability of such a score affects its correlation with respect to quantitative measurements like those based on nonlinear analysis. Furthermore, except NLD features, individual feature sets seem to be not suitable for a regression-based analysis. A possible reason is because the variability of the entropy-based and Poincaré-based measurements is limited and not enough to find high/strong correlations with the two clinical scores considered in this work. Further research is required in this point to evaluate whether other strategies improve the correlation results.

Classification of patients in different stages of the disease
Although there is some correlation between the predicted and the real MDS-UPDRS-III scores, we believe that from the clinical point of view it is more suitable for the patients to know in which stage of the disease they are, rather than to have the prediction of a continuous scale. In addition, for medical applications it is difficult to have a great amount of data to train suitable regression algorithms like an SVR. For these reasons we believe that it is better to divide the patients into three groups according to their neurological state: lower, intermediate, and severe. Figure 8 shows the distribution of the neurological stage of the patients into the three groups according to MDS-UPDRS-III scores and to LL-scores. HC subjects are considered as a separate group to perform four-class classification experiments. Tables 11, 12 and 13 show the classification results  for the neurological state of the patients according to  the MDS-UPDRS-III score, and Tables 14, 15 and 16 according to the LL-score. As in Sect. 5.2, all results correspond to experiments considering signals from the left foot sensors. Note that the multi-class classification experiments are more challenging due to the distribution of classes. In Fig. 8a and b, it can be observed that class PD3 has a wider range than PD1 and PD2. The best result among the multi-class classification experiments with respect to the MDS-UPDRS-III score is obtained with NLD features (accuracy = 68.2, F1-score = 0.65, Cohen's κ = 0.47). When the classification is with respect to LL-scores, the best result is obtained with the combination of NLD features and Poincaré sections (accuracy = 66.7, F1score = 0.60, Cohen's κ = 0.45). In both cases the reliability according to the Cohen's κ is moderate, which indicates that further experiments are required to find optimal points in the Poincaré sections. This process could be addressed through a grid search, but it is out of the scope of the present work.
With the aim of improving the interpretability of the proposed approach, the feature space resulting after the fusion of the three feature sets extracted from the left-foot signals in the two tasks (2 × 10 and 4 × 10) is reduced down to its first two principal components using the classical principal components analysis (PCA) algorithm. Figure 9 shows the resulting twodimensional space. Although we did not perform any classification experiment with these principal components, the optimal hyperplane and the decision boundary, found with a RBF-SVM, are included to provide a reference to the readers. Additionally, the criterion to distribute the participants among the four classes (HC, PD1, PD2, and PD3) was based on the values of the LL-score. Note that the four classes are reasonably well distributed in the space. Most of the circles and stars which represent healthy subjects and patients

Discussion and conclusions
Gait signals captured from inertial sensors of PD patients were characterized with several NLD-based features to model nonlinear effects in the walking process. The extracted features include classical NLD features such as CD, LLE, HE, and several entropy measures. In addition, we proposed a new characterization scheme based on a clustering analysis of Poincaré sections using a GMM algorithm. The extracted features were used to classify PD patients and HC subjects, to predict the neurological state of the patients, and to classify patients in several stages of the disease. The results show that it is possible to discriminate between PD patients and HC subjects with accuracies of up to 88%. The proposed NLD features based on Poincaré sections also provide complementary information with the classical NLD features to discriminate between PD patients and HC subjects. The results indicate the presence of a contra-laterality effect [31] because higher accuracies are obtained with features computed from the left foot and most of the participants of this study are right-handed. The proposed approach seems to be promising to classify PD patients in different stages of the disease. The combination of the classical NLD features are the most accurate approach to classify PD patients in several stages of the disease according to the MDS-UPDRS-III score (up to 68.2%). Patients in lower stages are mainly misclassified with HC subjects, which is explained because in early stages of the disease, symptoms have less impact in lower limbs. Regarding misclassification of patients in severe stages, we believe that those errors are because there is more variability in the neurological state in the more advanced stages [32]. We believe that the nonlinear dynamics analysis can be extended to other applications, for instance to compensate or correct abnormal gait patterns [33], and also the discrimination between PD and other neurological disorders with similar symptoms, such as Huntington's disease, amyotrophic lateral sclerosis, and essential tremor [14,34].      Figure 10 shows the MDS-UPDRS score-sheet used by the neurologists when doing clinical evaluations of PD patients [2]. Part III of the scale is related to the motor symptoms, and the items related to the movement and control of lower limbs are indicated in Table 19.