Model-based and Class-based Fusion of Multisensor Data

. In the recent years, the advancement of technology, the constantly aging population and the developments in medicine have resulted in the creation of numerous ambient assisted living systems. Most of these systems consist of a variety of sensors that provide information about the health condition of patients, their activities and also create alerts in case of harmful events. Successfully combining and utilizing all the multimodal information is an important research topic. The current paper compares model-based and class-based fusion, in order to recognize activities by combining data from multiple sensors or sensors of different body placements.


Introduction
Population of elderly people is constantly rising and will continue to increase significantly according to expectations. Although this creates a lot of problems in the health care, including higher costs and larger number of people with difficulties in self-serving, the advancement in technology is able to provide solutions [1]. Systems that employ artificial intelligence applications, have the ability to recognize the daily activities of a subject, provide information about heart rates, blood pressure or temperature and detect the occurrence of a harmful event, such as fall.
Assisted living systems utilize a variety of sensors, devices and technologies. Inertial sensors are usually included in such systems, however they can also be used separately to provide information about the activities performed by a person. Nowadays, such sensors can be easily found in mobile and wearable devices. Most commonly used sensors are accelerometers, gyroscopes and magnetometers. Accelerometers, which are the most popular, are quite effective in recognizing activities with repetitive body movement [2] while gyroscopes are capable of recognizing the orientation of an object [3]. Accelerometers and gyroscopes can be used separately and produce adequate results, while magnetometers are known to perform poorly when used individually [4]. Inertial sensors in general, are very prone to environmental noise, thus combining them tends to increase the accuracy of the recognition rate. Combination of sensors is algorithmically achieved through combination of the features extracted from each sensor, referred to as early fusion, or combination of the classification results of each individual sensor, also known as late fusion. It is easily understood that late fusion can be also applied to an individual sensor, combining the results of different classification algorithms and improve its performance.
In human activity recognition problems, fusion can also be applied to combine the results of the same sensor placed on different body parts, since there is claim that the placement of accelerometer affects its performance [5]. Although it is not considered mistaken to ignore the location of the sensor when analyzing its recordings, there are studies that investigate the affect the body placement has on the recognition of activities. In [5], the authors propose a late fusion methodology that combines accelerometers placed on ankle, wrist and chest. Authors in [4] investigate how five different body locations of a smartphone affect the performance of its built-in sensors in recognizing particular activities. Performance of individual sensors and their combination by concatenating feature vectors is also studied. [6] and [7] combine accelerometers of different body placements by applying early fusion.
In the current study, late fusion is employed to combine three inertial sensors, namely accelerometer, gyroscope and magnetometer, by body placement. Furthermore, the fusion of different placements of the same sensor is investigated. For the late fusion, two weighting schemes are incorporated and compared with a baseline late fusion method. The first one is a model-based method, the weighted accuracy [1], which reflects the total performance of a classifier. The second scheme is a class-based method, recently proposed by the current authors in [16], that utilizes the detection rate of a class, so as to emphasize the ability of a classifier in detecting certain classes and not its overall performance. In the latter, we suggested using fusion weights that are equal to the supplement of the class detection rate. We incorporated the suggested weights in a typical late fusion function and in a framework with posterior adapted class probabilities. The suggested method was compared with known late fusion methods, like averaging and stacking and with a novel framework suggested in [5]. Finally, we compare the model and class based fusion schemes with a simple form of late fusion, the averaging of class probability vectors produced by different models.
The rest of the paper is organized as follows: Section 2 includes a brief mention of related work. Section 3 explains the theory used and Section 4 presents the results of the application. Section 5 concludes the paper.

Related work
Although fusion has recently started gaining popularity, there is already a wide variety of available research work. As already stated, fusion can be applied to combine different sensors, even quite hetereogenous ones, sensors placed on different locations or improve the performance of an individual sensor by combining results of several algorithms. A methodology for recognizing activities from wearable sensors is proposed in [8]. The methodology is based on the late fusion of classification results obtained from Neural Networks (NN) and Hidden Markov Models (HMM) using two sensors placed on different body locations of the participants. In [9] a wireless sensor network is developed to observe the status of persons in need of assistance. Accelerometer data are combined with images from three different cameras to detect falls and improve the system's ability to create true alerts. In [10] data from a wearable inertial sensor and two cameras are combined in order to detect events. For the fusion step a probabilistic scheme is proposed by the authors. Many studies use audio visual sensors to recognize activities, although this process is usually more time consuming and the equipment needed requires bigger budgets. As already mentioned, input from wearable sensors is often combined with video images, especially in platforms that provide support at home for people with disabilities. In [11] the authors implement three types of fusion, early, intermediate and late, to analyze input from wearable cameras and recognize activities. HMM algorithm is incorporated in the process to classify the data. Different forms of fusion allow for the combination of multiple modalities in different levels. In [13] accelerometer and video data are combined in different stages of the classification process. The authors combine the different inputs before the feature extraction step, after feature extraction by concatenating the different vectors and at results' level by combining the algorithm outputs.
In [4] the authors explored the individual performance of each one of the three smartphone sensors, accelerometer, gyroscope and magnetometer, the combination of two of them, and the affect of the location of the sensor in the recognition process. For the combination of sensors or placements, the authors use the simplest form of early fusion, the concatenation. The current paper, on the contrary, uses late fusion to combine three sensors rather than two. In [12], the authors use built-in sensors of smart devices, separately and combined, and apply several well known classifiers to recognize activities. With the assistance of GPS and light sensors the activities can be further categorized to indoor or outdoor. Accelerometer, gyroscope and magnetometer data are combined with light and pressure data, and after being interpolated and filtered, features are extracted and combined with GPS information in order to enter a classification algorithm. The authors don't apply any fusion techniques, they focus however on the preprocessing of the data so as to eliminate noise and hetereogenity. Accelerometers and gyroscopes are most often combined in fusion schemes, probably due to their satisfactory performance in the daily activities recognition. Early and late fusion is applied in [17] to combine accelerometer and gyroscope features. The authors use concatenation for early fusion, a weighted scheme for late fusion and a descriptor-based framework for the activity recognition. Authors in [18] fuse accelerometer and gyroscopes to recognize activities and detect falls and focus on the importance of the window size for signal segmentation.
Weights are quite often utilized in fusion schemes, especially in late fusion ones. Out of bag errors acquired from the random forest algorithm are used in order to combine classifier results of different modalities in [14]. The classification problem is not relevant to the activity recognition, the methodology though could be easily implemented on multisensor data. In [1] the authors make use of a multisensor platform and apply several fusion weights combined with different fusion functions in order to recognize activities. The authors also propose a genetic algorithm to calculate weights. The weighted accuracy included in the latter, is also utilized in the current paper.

Methodology
The current human activity recognition framework comprises of the following steps: 1. Sliding window segments 2. Feature extraction 3. Classification algorithm 4. Model-based or class-based fusion Sliding windows of 2 seconds without overlap were taken in order to extract features, similar to [5]. Time domain features, mentioned in Table 1, were extracted from the sliding windows without any further filtering or preprocessing of the data. The initial dataset was then segmented in the required subsets, responding to the sensor and body placement we wanted to analyze. Several multilabel classification algorithms were tested and the four that achieved better results are included here: Support Vector Machines (SVM), Random Forests (RF), C5 trees and k-Nearest Neighbors (kNN). Each classifier was applied separately to a sensor and the classification results of the algorithms were combined afterwards. For the fusion step, two types of fusion were tested. Model-based, which characterizes the overall performance of the classifier and class-based which pays attention to the recognition of specific classes. The results were compared with those of the simple late fusion method of averaging class probabilities. To derive both types of weights for the fusion step, the typical steps of a classification framework were applied. An algorithm was trained on the trainset and then applied to the testset in order to predict the types of activities. The fusion schemes applied, combine sensors according to the following two scenarios: 1. Different sensors with the same placement 2. Identical sensors of different placement Model-based fusion For the model-based approach the weighted accuracy was used. The accuracy (Eq. 2) of a classifier applied to a sensor, divided by the sum of accuracies, was multiplied by the class probability vectors (Eq. 1) and the products of all three sensors were finally added together to create a final class probability vector [1]. The class with the maximum probability was assigned to each test case. This method gives advantage to the model that has the best performance overall. The formula for weighted accuracy, as described in [1] is given in Eq. 3 and is calculated for each one of the i models: (1) (3)

Class-based fusion
For the class-based fusion, a novel method proposed by the authors of the current paper in [16] is applied. Class-based methods pay attention to the recognition of each class, which is usually characterized by F1score [5] or balanced accuracy. This method multiplies the class probabilities with the supplement of the detection rate, which was chosen as weight, to assist the recognition of classes not so easily predicted. This is performed for each sensor separately and the weighted probability vectors of all sensors are afterwards summed together. Again, to assign a class to a test case, we find the class with the maximum fused probability. The detection rate is defined in Eq. 4.

DR = TP/(TP
The weights are calculated for each class j by calculating the supplement of the class detection rate (Eq. 5).
Both types of weights are then multiplied with the class probabilities of an algorithm (Eq. 6). In the model based fusion, the class probability vectors are multiplied by the same number, the WACC, while in the classbased weight, each class probability vector is multiplied by a different weight.
To combine the results, a probability vector is calculated for each class, by averaging the respective class probability vectors of the m models combined (Eq. 7).
To evaluate the performance of the individual classifiers and the fusion results, we report two typical evaluation metrics. The accuracy of the model (Eq. 2) and the F1-score (Eq. 8). For multilabel classification, the F1-score is calculated for all classes and then averaged. The F1-score embodies both sensitivity and specificity and especially in multiclass problems, is considered more indicative of the accuracy metric, since it assesses the recognition of each class. In multiclass problems, high accuracy values may arise, while few of the classes may have not been recognized at all, a finding that is also confirmed in the current paper, as it can be seen in the following section.

Application
In order to compare the model-based fusion method of weighted accuracy with the class-based fusion method, proposed by the authors in [16], and with the simple late fusion method of averaging and assess the influence of location on the sensors' perfomances, we utilize the MHEALTH dataset [15]. The MHEALTH contains recordings of numerous sensors, some of which were placed at different parts of the subject's body. The activities were performed by ten participants, nine of which have constituted the training set and the recordings of one participant constituted the test set. The subjects performed 13 daily activities, 8 of which are kept as a subset for the current application. The activities were "Standing still", "Sitting and relaxing", "Lying down", "Walking", "Climbing stairs", "Cycling", "Jogging" and "Running". The time domain features that are mentioned in Table 1 are extracted from a sliding window of 2 seconds without overlap, which in the current datasets responds to 100 recordings, since the sampling frequency is 50 Hz. Since it has been reported that body placement of a sensor affects its performance [5], we chose to combine a) results of accelerometers, gyroscopes and magnetometers of the same body location and b) accelerometers placed on three different locations, as well as gyroscopes placed at two different locations. More specifically, the results reported here for a) refer to the fusion of 1.
The three aforementioned sensors placed on the left ankle 2.
The three sensors placed on the right lower armand the results reported for b) refer to the fusion of 1. Accelerometer placed on chest, with that placed on the left ankle and theone of the right lower arm 2. Gyroscope placed on left ankle with the gyroscope of the right lower arm. The flowchart in Fig. 1 provides a graphic demonstration of the procedure. The flowchart describes the fusion of two sensors as an example, while three sensors are fused in the current application. The same procedure is repeated for the sensors on the right lower arm and for the fusion of identical sensors of different locations. Table 2 presents the evaluation metrics of the four classifiers applied on each sensor of the left ankle separately and of the three fusion methods, the weighted accuracy, which characterizes the performance of the algorithm, the detection rate based weighted fusion which evaluates the ability of an algorithm to detect each class and averaging. Table 3 contains the respective results for the sensors placed on the right lower arm. In most of the classifiers, fusion schemes exceed the best performing classifier on individual sensor data. In half of the cases the class-based fusion outperforms the model-based one.  SVM has the worst performance among all four classifiers at all cases. In general, although the accuracy and F1-score may indicate good performance, some classes were not recognized at all from certain sensors and classifiers. Three classifiers, namely SVM, RF and C5, applied on the accelerometer placed on the left ankle, failed to recognize one activity, "sitting and relaxing", while SVM was not able to predict at all two more activities: "climbing stairs" and "cycling". SVM applied on gyroscope of the left ankle, does not again recognize "cycling", while kNN on gyroscope data does not predict "standing still". Three activities were not detected from the worst performing classifier, SVM, on magnetometer of the left ankle, namely "lying down", "running" and "jogging". It is obvious that each sensor provides better recognition of some activities, therefore the fusion of all of them is expected to utilize all information in order to detect all performed activities. SVM, however, failed to detect "cycling" on both fusion methods, which may be an indication that the sensors need to be placed elsewhere to better recognize that activity.
The accelerometer of the right lower arm failed to recognize different activities than the accelerometer placed on the left ankle. More specifically, SVM, RF and C5 do not predict "walking", while SVM still does not recognize "cycling'. "Lying down" is the activity that gyroscope of the right lower arm cannot recognize with three tested classifiers: SVM, RF and C5. SVM does not detect "walking" and "cycling" also. SVM on magnetometer data does not detect "lying", "jogging", "cycling" and "running". Two activities are not predicted from the model-based fusion method when using SVM, while RF, which in general performed very well in most cases, does not predict "walking" from the class-based fusion. The results reveal that right lower arm may probably be a worst spot than left ankle to place those sensors.
Following are the results of the four classifiers applied on each sensor separately for each location and the results of fusion of the same sensors of all placements. As it can be seen in Table 4, in most algorithms, the best results are obtained from the accelerometer when placed on the right lower arm. Fusion, whether model or class based, did not improve the recognition rate in all classification algorithms except SVM.  Table 5 shows the results for gyroscope. Gyroscope seems to perform better when placed on the left ankle. Here, fusion of gyroscopes of different placements, improves the recognition rate compared to that of individual gyroscopes.
In general, fusion of gyroscopes results in better recognition than the fusion of accelerometers, an indication that these activities are better detected by a gyroscope. However in some cases, there were still activities not detected at all, like "lying", that was not recognized by gyroscopes and three algorithms, kNN, RF and SVM when using class-based fusion.

Conclusion
The fusion of the three sensors improved the recognition rate, whether the sensors were placed on the left ankle or the right lower arm. For half of the classification algorithms, class-based fusion outperformed the others. In almost all cases, model-based fusion and class-based fusion outperform the baseline method of averaging. Furthermore, for the current application, the left ankle placement achieves higher recognition rates than the right lower arm. Fusion of the same sensors placed on different body placements did not prove so promising for the prediction of the specific activities. The fusion of accelerometers of three placements, did not exceed the individual sensor's performance in most cases, while for gyroscopes, model-based fusion with the weighted accuracy, improved the recognition rate for half of the algorithms applied. Fusion of magnetometers of different placements was not attended due to the poor performance of the sensor.
Overall, for the particular implementation, combining different sensors of the same location proved better than combining same sensor placed on different locations. The class-based fusion scheme suggested in [16] performed equally well with the model-based fusion with the use of weighted accuracy. Both fusion schemes outperform fusion with averaging of class probabilities. The fact that for some tests there were classes not predicted, may have affected the performance of the class-based fusion.
For future work, the authors will investigate fusion frameworks that combine different sensors and different placements, that will eliminate the heterogeneity caused by both factors.