A Self-Tuned Architecture for Human Activity Recognition Based on a Dynamical Recurrence Analysis of Wearable Sensor Data

Human activity recognition (HAR) is encountered in a plethora of applications, such as pervasive health care systems and smart homes. The majority of existing HAR techniques employs features extracted from symbolic or frequency-domain representations of the associated data, whilst ignoring completely the behavior of the underlying data generating dynamical system. To address this problem, this work proposes a novel self-tuned architecture for feature extraction and activity recognition by modeling directly the inherent dynamics of wearable sensor data in higher-dimensional phase spaces, which encode state recurrences for each individual activity. Experimental evaluation on real data of leisure activities demonstrates an improved recognition accuracy of our method when compared against a state-of-the-art motif-based approach using symbolic representations.


I. INTRODUCTION
Human activity recognition (HAR) and classification using wearable sensor data is gaining an ever increasing interest, mainly due to its utility in a plethora of applications, such as in pervasive health care systems [1], smart homes [2], and surveillance systems for indoor and outdoor activities [3]. Nevertheless, the latest trends in HAR primarily focus on addressing the challenge of moving towards data processing architectures governed by the need to analyze and decipher complex activities while in data capture [4], [5]. To this end, motif discovery attempts to extract new, meaningful and unknown knowledge from data, as well as to monitor and track structural similarities in streaming data generated by a network of sensors. More specifically, time series motifs, which are approximately repeated subsequences in a longer time window [6], are exploited for representing the meaningful information content of the original data. Numerous studies have shown the potential of using motifs for detecting and classifying activities and events [7]- [9], by addressing the problem of motif discovery in unidimensional data. Among the several existing motif discovery techniques, the symbolic aggregate approximation (SAX) method [10] has a prominent role, due to its conceptual simplicity and computational efficiency. Recently, a modified grammar induction algorithm, namely, the modified Sequitur algorithm [8], [11], [12] has been introduced, to improve the performance of SAX. Although such methods can lead to high precision results in the unidimensional case for relatively smooth data, their performance often degrades in more general cases. Furthermore, their enhanced performance comes at the cost of an increased sensitivity to the values of a set of parameters, whose accurate tuning is a demanding task.
To overcome these limitations, this work proposes an alternative approach for accurate human activity recognition, which exploits the temporal variability of the underlying dynamical system that generates the data associated with a specific activity. To this end, recurrence quantification analysis (RQA) [13] will be exploited to perform a sophisticated nonlinear analysis of sensor streams, while being also able to treat nonstationary and short data series. RQA comprises of a set of appropriate quantitative measures for the quantification of recurrent, typically small-scale, structures, and the detection of critical transitions in the system's dynamics (e.g. deterministic, stochastic, random).
RQA has been used recently in the field of HAR, demonstrating a promising performance. In [14], the complexity and recurrence properties of daily non-complex human activities measured by wearable sensors are investigated. Along these lines, [15] combines RQA and multiscale entropy to characterize trunk postural control and motor complexity during childhood, while [16] employs recurrence plots to quantify the local dynamic stability of human walking kinematics. On the other hand, the methods proposed in [17]- [19] focus on offline experimentation with benchmark datasets and various classifiers applied on time-or frequency-domain features.
Although these previous studies also try to exploit the inherent time-varying dynamics of the recorded data, however, either they rely on the time variations of statistical or recurrence measures, or they employ complex classifiers, in order to study qualitatively the differences between distinct activities. To address the above limitations, the contributions of our work are the following: (i) the underlying data generating processes are modeled directly in a higher-dimensional phase space identifying more accurately the time-evolving dynamics of sensor streams; (ii) an efficient feature extraction scheme is designed for the discovery of information-rich patterns that best capture the underlying data dynamics; and (iii) a totally self-tuned architecture is designed for unsupervised HAR.
The rest of the paper is organized as follows: Section II introduces briefly a well-established motif discovery method, the so-called SAX, which has been extensively used for HAR. Section III analyzes in detail our proposed HAR architecture, based on RQA features and a linear-kernel support vector machine for activity recognition. Section IV evaluates the performance of our method on a real dataset of complex leisure activities, and compares its accuracy with a SAX-based approach. Finally, Section V summarizes the main outcomes of this work and gives directions for future extensions.

II. SAX-BASED MOTIF DISCOVERY FOR HAR
This section describes briefly the core architecture of a stateof-the-art online motif discovery method, shown in Fig. 1, which will be used as a benchmark to compare against our proposed RQA-based framework. The rationale for choosing motif discovery stems from the fact that the occurrence frequency of the motifs is used as a classification feature, instead of the symbolic representation itself. This resembles the RQA approach, which is based on the number of state recurrences in the phase space.
Discovering similarities directly in streaming data is typically a demanding task, in terms of computational and memory complexity. In order to improve the efficiency of data mining from time series, the symbolic aggregate approximation (SAX) is a well-established technique. SAX transforms the sensor data streams into strings of discrete symbols. The main benefit of this algorithm is the effective dimensionality reduction, while satisfying a lower bounding property, which guarantees that a distance measure applied on two symbolic strings lower bounds the true distance between the original time series [10]. Specifically, a sliding window runs over the time series, and a vertical segmentation is applied first, which divides the current window into equally sized non-overlapping segments. Then, for each segment the average value is computed followed by a horizontal segmentation, which associates the average to a symbol selected from a predetermined alphabet. This alphabet is constructed by dividing the whole range of values of a given time series into 2 Q intervals, where Q ∈ N is defined by the user according to the required granularity. Then, each symbol from the alphabet is assigned to an interval determined by a pair of breakpoints. In our implementation, the set of breakpoints is defined by the real numbers that divide the area under the standard Gaussian distribution into 2 Q equal regions.
To extract re-occurring patterns (a.k.a. motifs) from a SAXbased symbolic representation of the current window, the modified Sequitur algorithm is applied next. This recursive algorithm detects repetitions of bigrams, that is, of two consecutive symbols, and extracts them from the symbolic string by defining rules constituting a grammar. In order to obtain motifs of variable length, each rule corresponds to a set of bigrams, instead of matching a single bigram, as in the original Sequitur algorithm [11]. Having applied these rules, a bag of motifs is extracted for each streaming window, followed by a feature extraction process. In particular, a supervised learning approach is employed, where bags of motifs representing streams that correspond to the same target activity are merged, forming a new bag of unique core activity motifs. Then, for each motif in the core activity bag, the frequency of occurrence among the set of already extracted bags is measured. Finally, the frequencies of all motifs over all the available time windows constitute the feature matrix, to be further used for activity classification.
III. PROPOSED RQA-BASED FEATURE EXTRACTION FOR HAR In contrast to a SAX-based approach, our proposed method capitalizes on the efficiency of RQA to extract the underlying dynamics of a recorded data stream by mapping the time series in a higher-dimensional phase space of trajectories. A major advantage of our feature extraction approach, when compared with symbolic representations for motif discovery, is the fully self-tuned nature, in the sense that no prior parameter finetuning is required in a manual fashion, as is the case with SAX-based techniques.
More specifically, a recurrence plot (RP) is derived first, which depicts those times at which a state of a dynamical system recurs, thus revealing all the times when the phase space trajectory of the dynamical system visits roughly the same area in the phase space. To this end, RPs enable the investigation of an m-dimensional phase space trajectory through a two-dimensional representation of its recurrences. Such recurrence of a state occurring at time i, at a different time j is represented within a two-dimensional square matrix with ones (recurrence) and zeros (non-recurrence), where both axes are time axes.
Given a time series of length N , {r i } N i=1 , a phase space trajectory can be reconstructed via time-delay embedding, where m is the embedding dimension, τ is the delay, and N s = N −(m−1)τ is the number of states. Having constructed a phase space representation, an RP is defined as follows, where x i , x j ∈ R m are the states, ε is a threshold, · p denotes a general p norm, and Θ(·) is the Heaviside step function, whose discrete form is defined by The resulting matrix R exhibits the main diagonal, R i,i = 1, i = 1, . . . , N s , also known as the line of identity (LOI). Typically, several linear (and/or curvilinear) structures appear in RPs, which give hints about the time evolution of the high-dimensional phase space trajectories. Besides, a major advantage of RPs is that they can also be applied to rather short and even nonstationary data. Fig. 2 shows the RPs for two distinct time windows corresponding to core and noncore activities, respectively. Clearly, in the former case the RP is able to identify state recurrences, which are visualized in the form of shorter or longer diagonal and vertical line segments, whereas in the later case, the absence of structure in the associated time window is expressed in the form of isolated points in the RP. The visual interpretation of RPs, which is often difficult and subjective, is enhanced by means of several numerical measures for the quantification of the structure and complexity of RPs [20]. These quantification measures provide a global picture of the underlying dynamical behavior during the entire period covered by the HAR data. The temporal evolution of RQA measures and the subsequent detection of transient dynamics are enabled for each recorded data stream by employing a windowed version of RQA. Doing so, the corresponding quantification measures are computed in small windows, which are then merged to form our feature matrix. Furthermore, it is noted that the length of the sliding window yields a compromise between resolving small-scale local fluctuations and detecting recurrence structures located farther away from the LOI. The following ten RQA measures are utilized in order to form our feature matrix (ref. [21] for the definitions): recurrence rate, determinism, average diagonal length, length of longest diagonal line, entropy of diagonal length, laminarity, trapping time, length of longest vertical line, clustering coefficient, and transitivity. Finally, a linear-kernel support vector machine (SVM) is applied on the feature matrix for activity recognition. Fig. 3 shows the overall architecture of our proposed RQA-based HAR system.
Estimation of embedding parameters: In our implementation, the optimal time delay τ is estimated as the first minimum of the average mutual information (AMI) function [22]. Concerning the embedding dimension m, a minimal sufficient value is estimated using the method of false nearest neighbours (FNN) [23]. Furthermore, the maximum norm is used as our selected distance metric for the construction of the RP, which is defined as x max = max i=1,...,N |x i |, while a rule-ofthumb is currently used to set the threshold ε = 0.2 √ m.

IV. PERFORMANCE EVALUATION
In order to evaluate the performance of our proposed RQAbased HAR architecture, and compare against the motif-based counterpart, we employ a publicly available dataset [7], which includes leisure activities data for a group of six volunteers. Each volunteer performs a single activity once a day over a period of five days. A remarkable aspect of this dataset is that the sampling rate varies, being higher during the target activity. We particularly focus on three leisure activities, namely, "cycling", "playing the guitar" and "dancing flamenco". These activities are selected such that both periodic and intense (in "cycling"), as well as sharp and delicate movements (in "dancing flamenco" and "playing the guitar"), are examined.
The recorded data correspond to the acceleration measured across the x-axis direction. The length of the non-overlapping  Fig. 4. Confusion matrices for "cycling" (1st row), "dancing flamenco" (2nd row) and "playing the guitar" (3rd row) activities, using motif discovery (left column) and RQA (right column). Green boxes represent true positive (TPR) and true negative (FPR) rates for class 1 (activity) and 2 (non-activity). Red boxes correspond to false positive (FPR) and false negative (FNR) rates. Gray boxes represent the precision and recall percentage (in green) and the error rate (in red). Blue boxes contain the classification accuracy (in green) and expected error rate (in red).
sliding windows is set equal to 1 minute. Instead of performing calculations on the entire dataset, we select a total of 100 consecutive windows, including time intervals before, during, and after the target activity. The sensor data and metadata of each activity during the five days of execution are concatenated and then divided randomly into training and testing subsets containing 75% and 25% of the data, respectively. For the comparison with the state-of-the-art motif-based SAX-Sequitur scheme, the optimal parameters of the associated algorithms are tuned using a nested cross-validation process. Specifically, the word length is set equal to 4, the alphabet size equal to 5, and the sliding window length to 80 samples.
The two HAR architectures are implemented in MATLAB, on a desktop computer equipped with a CPU processor (Intel Core i5-4590) clocked at 3.30GHz, and a 8 GB RAM, while the CRP toolbox (http://tocsy.pik-potsdam.de/CRPtoolbox/) has been employed for implementing the RQA. Lastly, a nonlinear classifier, specifically, a linear-kernel SVM, is applied on the generated feature matrix in order to perform activity recognition, which is addressed as a classification problem. The choice of this classifier is motivated by its fast execution, as well as by its high accuracy, especially in the case of a large number of available features. We emphasize, though, that the classification step is decoupled from the feature extraction step, thus the overall performance of a HAR architecture can be further improved by employing a better classifier.
The performance of our RQA-based HAR architecture is compared against the SAX-Sequitur approach, in terms of classification accuracy, F-score and memory complexity. As illustrated in Fig. 4, our method outperforms significantly the motif-based scheme for every activity. In particular, the results corresponding to the "cycling" and "playing the guitar" activity reveal that for periodic, yet complex activities, our proposed RQA-based feature matrix is better capable of detecting accurately the temporal variations of the underlying dynamical system that generates the corresponding data, yielding a high accuracy at the order of 92% and 93.6%, respectively. Moreover, Table I demonstrates that, although the motifbased method achieved a 72.8% accuracy in correctly classifying a window as a core or non-core activity, however, it yields a low F-score for class 1 (target activity). On the contrary, our RQA-based method achieves a high accuracy, while recognizing precisely the target activity, as demonstrated by the high F-scores for both classes and the zero false negative rate (FNR). This also verifies the capability of the employed classifier to discriminate accurately between the core activity and any other activity.
Considering the "dancing flamenco" activity, the motifbased scheme performs rather poorly, achieving a total accuracy of 45.6% and an F-score equal to 46.88% for the core activity class. This result reveals a degraded performance of the motif discovery scheme, which is not capable of detecting accurately such a complex activity with varying sampling rate. On the contrary, our RQA-based method yields a significantly higher accuracy and F-score values, at the order of 75.2% and 79.19%, respectively. This also verifies the efficiency of our RQA-based feature extraction scheme to generate features with an increased discriminating capability between the core and non-core activity, yielding an FNR at the order of 4%. Finally, Fig. 5 indicates that for the motif-based scheme, the motifs referring to the target activity appear both in target and non-target activity windows. Specifically, for the "dancing flamenco" and "playing the guitar", the low performance is primarily due to the fact that these activities include various irregular and complex motions, which are quite difficult to capture via the extracted motifs. To this end, Table II indicates that the number of features extracted by the motif-based method not only differs significantly between the activities, but is also much larger compared with our RQA-based approach.

V. CONCLUSIONS AND FUTURE WORK
In this work, we designed and implemented a HAR architecture based on a representation of wearable sensor data in higher-dimensional phase spaces using the RQA method for capturing the underlying dynamics of the data. The experimental evaluation on real leisure data revealed the superiority of our RQA-based framework in extracting and exploiting the underlying temporal dynamics of the data generating processes, resulting in significantly higher classification accuracy for complex activities, when compared against a state-ofthe-art online motif discovery scheme based on symbolic representations. An extension of this work will consider the use of multidimensional (multichannel) HAR data based on joint RPs and a multidimensional extension of RQA. We expect that the incorporation of potential correlations and joint dynamics between the multiple data streams will further increase the overall classification accuracy.