Optimal Reconstruction of Human Motion From Scarce Multimodal Data

Wearable sensing has emerged as a promising solution for enabling unobtrusive and ergonomic measurements of the human motion. However, the reconstruction performance of these devices strongly depends on the quality and the number of sensors, which are typically limited by wearability and economic constraints. A promising approach to minimize the number of sensors is to exploit dimensionality reduction approaches that fuse <italic>prior</italic> information with insufficient sensing signals, through minimum variance estimation. These methods were successfully used for static hand pose reconstruction, but their translation to motion reconstruction has not been attempted yet. In this work, we propose the usage of functional principal component analysis to decompose multimodal, time-varying motion profiles in terms of linear combinations of basis functions. Functional decomposition enables the estimation of the <italic>a priori</italic> covariance matrix, and hence the fusion of scarce and noisy measured data with <italic>a priori</italic> information. We also consider the problem of identifying which elemental variables to measure as the most informative for a given class of tasks. We applied our method to two different datasets of upper limb motion D1 (joint trajectories) and D2 (joint trajectories + EMG data) considering an optimal set of measures (four joints for D1 out of seven, three joints, and eight EMGs for D2 out of seven and twelve, respectively). We found that our approach enables the reconstruction of upper limb motion with a median error of <inline-formula><tex-math notation="LaTeX">$0.013 \pm 0.006$</tex-math></inline-formula> rad for D1 (relative median error 0.9%), and <inline-formula><tex-math notation="LaTeX">$0.038 \pm 0.023$</tex-math></inline-formula> rad and <inline-formula><tex-math notation="LaTeX">$0.003 \pm 0.002$</tex-math></inline-formula> mV for D2 (relative median error 2.9% and 5.1%, respectively).


I. INTRODUCTION
A CORRECT sensing of human bio-mechanics, and more specifically human body motion, is of paramount importance in many trans-disciplinary fields, which include rehabilitation and assistive applications [1], and human-robot interaction [2]. With the aim of providing unobtrusive and comfortable sensing solutions that can be suitably employed in everyday-life as well as in unstructured environments (like human-robot workspace), the research and technological effort have been focused on the development of wearable systems for evaluating human ergonomics [3]. These systems usually gather kinematic information (i.e., joint angle information) [4] and/or record human muscular activity, e.g., through surface electromyography (sEMG) electrodes [5]. This multimodal information can be also used to devise informed models of the human neuro-musculo-skeletal system [6], e.g., for a correct biomechanical risk assessment.
However, correctly retrieving human body motion is a challenging task because of its complex architecture. Indeed, while it is ideally feasible under a technological point of view, there are several implementation issues that are difficult to overcome, such as the difficulty of tracking, with common sensors, the movement of certain degrees of freedom (DoFs) such as the rotation and the elevation of the scapula, and the uncertainties due to relative motion between bones and sensors, see e.g., [7], [8].
A possible solution to deal with such abundancy-and hence to reduce the dimensionality of the problem-can come from neuroscience. Indeed, there is a vast neuroscientific literature that has highlighted the dependencies between human body DoFs [9]. These dependencies have been studied and formalized within the general concept of motor synergies [10], which, in a broad sense, provides a theoretical framework to analyze motion coordination in terms of goal-oriented functional couplings of elemental variables (e.g., joints, muscles) [10], [11]. This framework has found fertile ground in the study of human hands, in grasping, manipulation, and haptic exploration tasks (see, among the others, [12]), producing interesting transdisciplinary outcomes in robotics [13]. Indeed, the dimensionality reduction that is intrinsic to the concept of synergies has been successfully exploited for the design [14], the control [15], and the planning [16] of artificial robotic systems [13].
Under a pure observability point of view, i.e., for the simplification of the problem of retrieving information on human 2168-2291 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
body posture and motion, the synergy-inspired dimensionality reduction has also produced interesting results [17]. In [18], the authors presented CODE, a COordination-based action DEscriptor, which considers minimum and maximum joint velocities and the correlations between the most informative joints (i.e., the joints that are mostly involved in the execution of a certain action), for whole-body action classification. In [19], a compact 6-D view-invariant skeletal feature (skeletal quad) based on a local skeleton descriptor for encoding the relative position of joint quadruples was proposed for action recognition from single depth images.
In the field of computer vision and computer graphics, datadriven approaches have been also proposed, e.g., to address the problem of motion infilling for 3-D human motion data. In [20], the authors presented a convolutional autoencoder that, given a start and end sequence, can complete the missing gaps in between. In [21], a recurrent neural network based approach was presented, for the prediction of future motion when a seed sequence is given. Hierarchical non linear principal component analysis NLPCA neural networks were successfully employed to reduce the dimension of the configuration space, for the representation of human/humanoid motion patterns [22].
Low-dimensional feature extraction methods such as principal component analysis (PCA) and locally smooth manifold learning (LSML) were also proposed to extract linear and non linear manifolds of human motion data, e.g., for hand avatar animation [23] or for humanoid robot interface design [24].
Although promising, all the aforementioned methods lack of a theoretical understanding of the statistical properties of human muscular-skeletal data, which prevents their use to solve the twofold problem of the estimation of the temporal evolution of multimodal data, in spite of scarce and noisy sensory information, and the definition of optimal design strategies for engineering wearable sensing devices. Furthermore, data-driven and learning approaches are usually strongly dependent on the size and quality of the training data, and have important requirements in terms of computational power, which are hardly achieved with on-board electronics used for data processing in wearable applications (whose memory resources are usually in the order of some megabites or lower).
In [25], we proposed a minimum variance estimation (MVE) technique to fuse the a priori information, organized in terms of interjoint covariation patterns of the most frequent human grasping postures [12], with a reduced number of noisy measures, to reconstruct the most probable joint angles in a Bayesian sense. In [26], we pushed further this investigation to identify the optimal hand joints that should be measured to maximize the reconstruction accuracy (i.e., minimizing in average the reconstruction error), as the result of an optimization problem based on the minimization of the a posteriori covariance matrix. These outcomes led to the development and the design of wearable sensing gloves for hand pose reconstruction, using a reduced number of sensing elements (affected by noise) [27].
However, the methods presented in [25] and [26] relied on a priori information of static postures, and then they cannot be applied in a straightforward manner for estimating temporal trajectories. Furthermore, they cannot be applied to the Fig. 1. Schematics of the estimation method proposed in this article. Temporal measures of a limited number of joints and muscles are mapped (encoding) on the extended state space of weights and average trajectories/muscles envelopes, via a basis of functional Principal Components (fPCs) extracted in advance on an a priori dataset. Then, the missing portion of the state is estimated via MVE. Finally, the whole muscle-skeletal system temporal evolution is reconstructed by properly combining the fPCs with the estimated extended state (decoding). simultaneous reconstruction of multimodal motion-related quantities (e.g., joint angles and EMG signals), due to the intrinsic difficulties of providing a reliable estimation of the covariance matrix from heterogeneous data [28]. Finally, an application of these techniques to human upper limb data has not been performed yet.
In this article, we propose a method to generalize the estimation based on minimum variance to the whole upper limb temporal evolution of joint and EMG measurements, capitalizing on two pillars: 1) the existence of covariation patterns in human upper limb motions, which we demonstrated in [29]; 2) the application of functional analysis for enabling the reconstruction of the whole trajectory over time and the estimation of the covariance matrix from multimodal data (see Fig. 1). In brief, our solution relies on three phases: encoding phase, estimation phase, and decoding phase. The idea is to use functional principal component analysis (fPCA) to identify a basis of functions, whose combination can approximate any generic joint and EMG signal temporal evolution [30]. The coefficients of this decomposition (encoding phase) or weights, which are extracted from an a priori dataset, can be organized together with the average trajectories to form an extended state space and represented by a static vector. As soon as a novel motion is performed, it can be encoded, through the basis of fPC, in terms of the extended state. We then apply MVE to estimate all the components of the state corresponding to all the joints and EMG recorded signals of the kinematic chain, even when some of them are not directly determined by the measurements or are corrupted by noise (estimation phase). Finally, a decoding procedure is applied to reconstruct the whole motion of the upper limb joints, by properly combining the fPCs with the estimated extended state.
We validated our framework with two different datasets, which are freely available online [31]. The first is a dataset of 30 gestures performed by a cohort of 33 healthy subjects and collected at the University of Pisa (hereinafter referred to as the dataset D1). Data consisted of the 3-D positions of the markers fastened on the upper-limb links over time (see [31] for details). From the Cartesian position of the optical markers, we extracted joint angular values by solving a mapping problem on a 7 DoFs kinematic model (three for the shoulder, two for the elbow and two for the wrist, see [29]). To prove that our work can be generalized to multimodal data, we also included a second dataset, which was recorded at the Hannover Medical School (MHH) (hereinafter referred to as the dataset D2) and also available in [31]. Also, in this case, movements were recorded with a motion capture system, but also complemented with surface electormyographic recordings (sEMG) from 12 muscles. The motion capture recordings and hence the 3-D optical marker positions allow the definition of a 7 DoFs kinematic model, which is however different from the one that characterizes the first dataset, since it enable to retrieve two additional DoFs for the shoulder and no DoFs for the wrist. In both cases, we splitted the dataset (where the same upper limb actions were considered) in two parts, one (containing the 70% of all the entries) was used as a priori information, and the remaining part for testing. We quantified the norm of the error between the real and the estimated signal on the testing data, without considering a number of independent joints/EMG signals and adding different magnitudes of noise to the measured ones. Our results demonstrate the capabilities of accurately reconstructing the trajectories of the whole kinematic chain, with respect to two alternative methods that we used for comparison, one based on the usage of the pseudoinverse (as done in [25]) and a second one which recursively estimates the current joint static posture through an iterative MVE based on a static covariance matrix [25] (the latter cannot be applied in a direct manner to multimodal data).
Motivated by these outcomes, we designed an optimization problem to identify the optimal selection of independent joints/EMG that one should measure to maximize the estimation accuracy, by leveraging on a metric of uncertainty estimation based on the a-posteriori covariance matrix as in [26]. We believe that our findings can open interesting perspectives for the design of an optimized undersensorized wearable system for multimodal upper limb tracking, and potentially for the whole body, which could be successfully exploited for every day life human body monitoring in different application fields, e.g., rehabilitation, assistance, and advanced human-robot interaction and cobotics.

A. Motion Decomposition via fPCA: Encoding Phase
As anticipated before, our reconstruction of human multimodal motion data leverages on the decomposition of signal provided by fPCA, a statistical method to identify functional primitives from time-varying data. In the following, we will briefly introduce the theory behind fPCA, while we refer the reader to [32] for more details.
Following the definitions of Table I, let us consider, without any loss of generality, a l-DoFs kinematic model moved by r muscles, N independent observations of joints temporal evolution q 1 (t), . . . , q N (t) and (eventually) of muscles activation envelopes m 1 Let us also assume that the time-series are defined over a normalized time t ∈ [0, 1] sampled in T evenly spaced points. The latter can be achieved for example through the solution of a dynamic time warping (DTW) problem, i.e., a preprocessing technique that, given a dataset of time series of arbitrary length, produces a time-normalization to generate entries with the same number of time-points. In our implementation, given two time series, v 1 and v 2 , the affinity between the two signals is maximized by the following optimization problem: where the operator ρ is the cross correlation between two vectors. Note that this implementation also imposes time-monotonicity, to preserve the coherence in time of the samples, and linear distortion of time with a constant scaling factor to the time domain. All the elements of the dataset are time-warped w.r.t. a common reference signal identified as the dataset entry of average duration. Of note, the optimal parameters S * , T * for each dataset entry are shared across all the DoFs to preserve the coordination of movements. Then, a generic signal (representing either a joint temporal profile or a muscle activation envelope) can be decomposed as a weighted sum of basis elements ξ s j (t), aka fPCs where q/m α i ∈ R l/r is a vector of weights, q/m ξ i (t) ∈ R l/r is the ith basis element or fPC and s max is the number of basis elements considered. The operator • is the element-wise product (Hadamard product),q ∈ R l andm ∈ R r is the average of q(t) and m(t), respectively. The output of fPCA, which is calculated independently for each joint or muscle j, is a basis of functions {ξ 1 j , . . . , ξ s max j } that maximizes the explained variance of upper limb motions collected in the dataset. Therefore, the first fPC ξ 1 j (t) is the function that solves the following problem: Subsequent fPCs ξ s j (t) are the functions that solve the same problem of (3), with an additional constraint of orthogonality, i.e.
Note that (3) and 4 refer to joint angular values, but the extension to muscles is straightforward and here omitted to improve readability. For practical implementations of fPCA that bypass the solution of the optimization problem the interested reader can refer to [32].
Given the optimal selection of basis elements, the movements of the upper limb are now described by an average configuration plus a reduced number of weighting coefficients, i.e., a generic trajectory q i j (t) is now represented by the averageq i j and by the vector where k is the cardinality of the fPCs basis and (•) is the transpose operation (an analogous nomenclature holds for the characterization of muscles activation envelopes). Note that k can be intended as a free integer parameter, arbitrarily selected in the interval [1, s max ]. It is important to note the significant reduction of dimensionality, given that in practical cases k (the number of fPCs, in our case equal to 5, see Section III-B) is usually at least two order of magnitude smaller than T (the number of time frames, in our case 100 samples for each second of recordings, i.e., 9.7 s for the signal used for the DTW procedure and ≈ 4.7s for the shortest signal in our dataset, see Section III-A). More importantly, this change of domain comes with the additional benefit that a movement can be now described by a single static vector of coefficients or weights. Clearly, higher values of k enable a more precise approximation of the real motion, yet augmenting the dimensionality of the state in the weights domain. For each joint, the relationship in matrix form between the joint and the weights domains is then where and q Θ j ∈ R k,N is composed by columns collecting the weights θ i j associated to the dataset elements. An equivalent matrix form exists for muscles activation envelopes

B. Estimating Nonmeasured Motion-Related Quantities via MVE (Estimation and Decoding Phase)
As briefly introduced in the previous sections, in [25], we proposed a method to estimate the posture of human hands leveraging on a set of observations previously collected and used as a priori information. This information was organized and described in terms of joint mean μ 0 and covariance matrix P 0 . When some of the joints are not directly measured or the measurements are corrupted by noise, we can fuse such insufficient sensory information with the prior data to estimate the whole posture through MVE. In the following, we will provide only a basic description of how to use MVE to improve the posture estimation, while we invite the reader to refer to [25] for more theoretical details. Let us consider a set of measures y ∈ R d provided by a selection of d sensors, and let us also assume a linear relationship between the joint variables q ∈ R l and the measures y where H ∈ R d,l is a full row rank measurement matrix (hereinafter we consider, without loss of generality, H as a selection matrix or discrete measurement matrix [26]) and ν is the measurement noise. The goal is to estimate q given y when d < l.
Because the number of unknown is higher than the number of measures, the problem admits an infinite number of solutionŝ where H † stands for the pseudoinverse of H, N H is the null space of H and ψ ∈ R l−d a free vector of parameters. The leastsquared solution isq Yet, the output of (10), although it represents the vector of minimum Euclidean norm, can be very far from the real one. However, from neuroscientific studies, it is known that human movements are characterized by stable and coherent covariation patterns between joints [12]. This is indeed a source of information that can be used to improve the estimation in (9). Given a (large enough- [33]) number of realizations of q i collected in a matrix of a priori X ∈ R l,N , we can compute the covariance matrix whereq is a matrix whose columns contain the average μ 0 of X. Given P 0 , the best estimateq of q is the vector that solves the following optimization problem: Under the hypothesis of ν with zero mean and Gaussian distribution with covariance matrix R, the solution of (12) can be found in closed form and iŝ and the a posteriori covariance matrix (which brings information about the uncertainty of the corresponding state estimation) is This work generalizes the MVE-based posture estimation method to heterogenous data and motion profiles by applying (13) to the extended state of the weights q Θ for joint angular values, eventually complemented with m Θ for the estimation of muscles activation. Indeed, considering the kinematic domain only, we can leverage on the relationship introduced in (5) to build an extended state vector, where a general motion trajectory for the whole upper limb can be represented as in the following: where the subscript (•) j stands for the ith joint,q j is the mean value of the jth joint and q α k j are the coefficients in weight space identified in Section II-A. Obviously, the extension to the muscle-skeletal case is straightforward and can be obtained by expanding x e with the terms associated to muscles activation envelopes [see (7)]. x e can be considered as an extended space, represented by a vector ∈ R (k+1) * l for the kinematics-only case and ∈ R (k+1) * (l+r) for the muscle-skeletal case (i.e., joint and muscle activation signals). If some of the q/m α k j -related components cannot be directly computed (because the related measurements are missing) or are affected by noise, we can perform an estimation of the whole extended state by applying the static formalization of [25]. The estimated coefficients can then be used to reconstruct the motion of all the elements of the kinematic chain according to (5).

C. Optimal Sensors Placement
As anticipated in the previous section, MVE relies on a dataset of complete measures available a priori to provide, given partial measurements of the current state, a complete estimation of the whole muscle-skeletal configuration. Such an estimation is affected by a certain degree of uncertainty, which is directly related to the amount of noise (with covariance matrix R) that characterizes the measurements y, and, most importantly, to the choice of matrix H, i.e., the choice of which DoF to measure [see (14)]. For this reason, a proper selection of the DoFs to be measured can improve the accuracy of the estimation. This problem, which is dual to the estimation problem described in the previous section, targets the optimal placement of a given number of sensors, with the goal of minimizing the estimation uncertainty, and therefore maximizing the accuracy. To do this, one could rely on the a posteriori covariance matrix P P and minimize a cost function associated to a given norm of P P , where the argument of the minima is the matrix H itself [26]. Among the possible choices of norm definitions available, it is worth mentioning the Frobenius Norm [26], which leads to the definition of a matrix H that-used with the estimation approach described in (12) leads to the minimization of the average uncertainty across joints. However, with this choice, it could happen that some DoFs could be affected by large uncertainties, compensated by other DoFs, where the estimation is more accurate. To avoid this problem, in this work, we rely on a differentiable approximation of the maximum eigenvalue of P P , which we can get considering the Schatten norm when p >> 1 (ideally p = ∞) [34]. A minimization of the index defined in (16) with H as the argument of the minima, then, will converge toward a matrix H, i.e., a sensor placement distribution, which minimizes the uncertainty in the worst case.
Exploiting the differentiability properties of the Schatten norm, it is possible to calculate a gradient of the cost function by differentiating (16) after substituting (14). However, because of the particular structure of H for our extended state, which is composed of blocks of squared k-dimensional diagonal matrices, it is not possible to introduce a constraint in the gradientbased optimization procedure, which can also guarantees that H preserves the block structure. For this reason, gradient-free methods should be applied to minimize (16) in our case.

A. Dataset of Upper Limb Actions
To test our framework over a wide range of functional tasks, we considered a dataset of upper limb movements collected during daily living activities [31]. More specifically, we picked from [31] two blocks of data. The first, which hereinafter we refer to as the dataset D1, consists of motions performed by 33 healthy subjects (17 Female, age 26.56 ± 2.77, all right-handed) who were asked to execute three times with their right arm a selection of 30 daily living tasks (enumerated in [31]), while the motion of the upper limb was recorded via optical motion tracking (PhaseSpace, sampling frequency 100 Hz). The 3-D position of the 20 optical markers used to track the movement were given as input to a two-step procedure: 1) an optimization routine to estimate the subject-specific physical parameters of a predefined 7 DoFs kinematic model (i.e., length of the links, distances between joints and markers, etc.-see Fig. 2(a)); 2) an Extended Kalman filter to estimate the time evolution of joint profiles that generate the measured markers' motion [16].
To further validate our method with multimodal data, we complemented our analysis with a second dataset, the dataset D2, also available in [31]. This is a collection of signals acquired by the Hannover Medical School and consists of motions performed by 20 healthy subjects (12 Female, age 46.8 ± 15.3, of which eight right-handed) who were asked to perform the same set of tasks of the first dataset, but in this case, the placements of optical markers was different. In this manner, the estimation of the relative motion of the shoulder with respect to (w.r.t.) the torsum, but not of the hand w.r.t. the forearm can be obtained. This resulted in a kinematic model that is different from the one used for the dataset D1 [see Fig. 2(b)]. The Cartesian position of markers was processed using the same procedure introduced above for the dataset D1 to extract joint angular values. In addition, the activation of 12 different muscles was also collected during the execution of the tasks (see list of involved muscles in Table II). Muscular data were preprocessed using standard techniques to extract muscle activation envelopes, namely: rectification, filtering (low-pass, cut-off frequency equal to 20 Hz), Resampling. These data-and the related information-are also available at [31].
To enable a meaningful comparison between different timeseries (different joint trajectories and/or EMG signals) for each dataset we applied DTW independently. The output of this procedure is a dataset of signals with the same number of time frames T , which for the dataset D1 is equal to 970-duration of the reference signal used for DTW [30].
Finally, for each dataset independently, the total number recordings was splitted in two blocks, one (containing a random selection of 70% of all samples) that was then used to build the a priori dataset (see Section II-B) and the second (containing the remaining entries) that served for testing.
Once data were properly warped in time w.r.t. a common reference, we used fPCA on the a priori dataset to identify a low-dimensional set of primitives/basis functions that can be combined with the measurements of few joint/EMG trajectories to optimally reconstruct the joint evolutions and/or the muscles activation envelopes in time. This set of basis elements extracted from data can then be used to map any new movement from the testing dataset to the weight space by simply inverting (5) and 7-see Section II-A.

B. Estimation and Metrics for Performance Evaluation
As already mentioned, the usage of MVE as in [25] cannot be used for estimating time-varying and multimodal signals, and the aim of this article is to propose a solution, which generalizes to the estimation of multimodal temporal data. Our approach relies on the decomposition of signals via fPCA, which enables the definition of an extended state space x e in place of joint trajectories. All the experimental validation performed in this article used the first five functional PCs only (i.e., k = 5), because these are sufficient to explain over 93% of the total variance in the a priori dataset for all the joints [16]. Of note, the number of fPCs used is to be considered as a tradeoff between accuracy and computational burden. To quantify the effectiveness of our estimation method, we used as quality index the norm of the estimation error over all the DoF w.r.t. the real motion. For performance comparison, we considered two different alternative estimation methods, i.e., a direct application of static MVE over each time frame, and a standard pseudoinverse [ (10)] as in [25]. The first is a straightforward application of [25] and consisted in the definition of a static covariance matrix, which we built starting from the matrix of average poses (i.e., for each movement, we picked the average joint configuration). Note that this method can be used for unimodal data only, due to the problems of computing the matrix P 0 from heterogeneous data. To this aim, we implemented a recursive MVE algorithm in which each time frame is estimated through standard MVE relying on the updated measure (i.e., values of joint angular values/EMG measured at the specific time frame) and the same a priori information. The second, instead, refers to the implementation of (10) applied to the extended state. All the methods were tested over multiple examples, where we discarded the information coming from one or more DoFs, and considering random values of noise amplitude. For each testing condition, we evaluated the norm of the reconstruction error (in rad) for all the entries of the testing dataset. Statistical significance of differences between estimations performed through pseudoinverse, static MVE and the proposed method was evaluated using the Wilcoxon signedrank test applied to the norm of the reconstruction error.
It is worth noticing that the whole procedure described in this article relies on the information of the set of movements considered in the a priori dataset. In our case, these are complete actions, i.e., the initial and final arm kinematic postures are close to each other. Therefore, the description of generic movements in terms of functional components is more effective when the new signal considered for testing is a complete action as well. It is important to state that this is not a limitation of the work per-se, since it depends on the particular selection of tasks included in the a priori dataset. The choice we made in our paper makes our method particularly suitable to estimate cyclic motions (as the ones commonly performed by humans in human-robot industrial collaborative tasks [3]) and, given a novel stream of sensory data, to (likely) automatically detect the end of a cycle and the beginning of the following one, based on the estimation performance. This point will be further discussed later in the manuscript and analyzed as future work.

C. Minimization of a Posteriori Covariance Matrix: Optimal Sensing Design
As mentioned in Section II-C, we target the identification of an optimal placement of sensors to minimize the estimation uncertainty. Unfortunately, because of the block structure of H, it is not possible to employ gradient based optimization algorithms, and search methods over discrete states should be considered. We developed an implementation in MATLAB, which uses a genetic algorithm (Population size = 150, Max number of generations = 200, Elite Count = 10) to solve the optimization problem where

IV. RESULTS
To verify the accuracy in reconstruction, we applied our method and the two alternative solutions to estimate all the entries of the test dataset D1, assuming to measure only a limited number of DoFs and artificially adding a Gaussian noise on the measured ones, with standard deviation (std) equal to 0.1 rd ( [25]) in the exemplary tests, and in the range [0, 2] rad to test how the results are affected by the Signal-to-Noise (SN) ratio. As anticipated in the previous section, the accuracy of the estimation highly depends on the number and the selection of joints that are measured. For this reason, without any loss of generality, we will first discuss the optimization of the sensor placement, i.e., the solution of the minimization problem of (17), considering a tradeoff between number of sensors used and accuracy of the estimation. Then, we will use the optimal placement of sensors to evaluate the estimation accuracy of our method compared to the alternative solutions.

A. Optimal Sensing Design
As mentioned above, we can identify an optimal placement of sensors by minimizing a norm of the a posteriori covariance matrix reported in (18). However, the number of DoFs to be measured must be known in advance, and this is a free parameter of the problem that has to be set as a tradeoff between estimation uncertainty and complexity of the sensing setup. Indeed, it is reasonable to expect that an higher number of available measures contributes toward a reduction of the maximum uncertainty error (see Fig. 3). Interestingly, there is a consistency over the joints that are subsequently enrolled. In other words, if the optimal placement of three sensors is the one that measures joints number Fig. 3. Minimum value of the P Schatten norm calculated over the a posteriori covariance matrix (i.e., an approximation of the maximum estimation uncertainty) versus the number of different DoFs measured for the dataset D1. For each point, the corresponding label reports the n-tuple of measured joints that provides the minimum estimation uncertainty. The label associated to seven DoFs is trivial and omitted for sake of clarity. Results are drawn considering a random added noise in the range [0, 2] rad. The filled line refers to the average minimum of ||P P || P w.r.t. different instances of noise, while the colored area marks the standard deviation of ||P P || P over the mean. one, four and seven, this n-tuple will be part of all the optimal n-tuples with higher cardinality. Furthermore, we also studied the effect of noise on the identification of the optimal set of DoFs to measure, and verified that a reduction in the signal-to-noise ratio (coherently across joints) does not affect the resulting optimal design, as depicted with the shaded area in Fig. 3, while we anticipate that it will play a crucial role in the accuracy of the estimation itself. For this reason, and without any loss of generality, hereinafter we will consider and discuss the optimal sensing setup corresponding to four measured DoFs. In this case, we obtained that the optimal placement that minimizes the objective function of (18) is composed by the n-tuple 1,4,6,7 (see. Fig. 2 for joints definition). Of note, the most important joints in terms of reduction of the estimation accuracy is the flexion-extension of the elbow, which indeed represents one of the DoFs that is largely involved in reaching tasks. Then, the algorithm gives particular relevance to one of the joints that move the shoulder and the two DoFs of the wrist, which are highly variable across different movements. Of note, although we can explain the biomechanical reason underlying the particular selection of the most informative joints, their identification is not feasible a priori without our analysis.
Considering the multimodal case (dataset D2), similar observations could be made, and are here omitted for sake of clarity, leaving space to discuss the optimal setup we identified. Indeed, in this case, it is important to mention that, considering the strong causality relationship between muscle activation and consequent joint temporal evolution, one could impose a number of constraints on a minimum selection of muscles/joints that needs to be recorded. For example, given the list of muscles available in the dataset D2, it is possible to observe that four of them (i.e., Flexor digitorum superficialis, Extensor carpi ulnaris, Abductor pollicis Brevis, and Abductor digiti minimi) are specifically enrolled in the movement of the wrist and of the fingers and, therefore, cannot be estimated relying on the kinematic counterpart. Without any loss of generality, we report here on Fig. 4. Motion estimation for a random entry of the test dataset (D1) when the n-tuple of joints [2,3,5] is assumed to be not measured and the remaining measurements are corrupted by Gaussian noise (std. 0.1 rd)). In this plot, we compare the estimation provided by three different methods: the pseudoinverse (in red) a static MVE (in magenta) and our fPCA+MVE approach (in blue). Note how our method (blue lines) provides a good and clean approximation of the real signal (green line) unlike the pseudo-inverse technique, which cannot estimate joints not measured, and the static MVE, which is highly affected by noise.
the optimal sensor placement that minimizes the a-posteriori covariance matrix when only three joints and eight muscles are directly measured. In this case, the optimal sensor placement resulted in the joint n-tuple 3,5,6 complemented with the muscles n-tuple 1,2,4,7,8,9,11,12. It is interesting to observe that our optimization converged to an optimal sensing setup in which the system exploits the readings of M1 and M2, i.e., the Deltoideus pars clavicularis and the Biceps brachii muscles, which are involved in most of the movements considered in the dataset. In addition, also M7 (Flexor carpi ulnaris) and M9 (Pronator teres), which are responsible for elbow pronation-supination, are enrolled in the optimal setup.

B. Estimation Accuracy
To assess the estimation accuracy of our method, we used the optimal placement of sensors identified in the previous section. Considering the dataset D1 and joint trajectory estimation corrupted by Gaussian noise with standard deviation equal to 0.1 rd, numerical results demonstrate that the n-tuple that yields the minimum estimation uncertainty (1,4,6, and 7) provides a norm of estimation error equal to 0.013 ± 0.006 rad (expressed in median ± interquartile range, values are normalized over T). To quantify the relative error, we calculated the average absolute difference between maxima and minima of each reconstruction (1.43 rd) and verified that the relative median error of our estimation for D1 is equal to 0.9%. Our approach outperforms the pseudoinverse method (0.05 ± 0.014 rad, i.e., on average 280% higher than our approach) and the static mve (0.017 ± 0.08 rad, i.e., on average 35% higher than our approach). A Wilcoxon signed-rank test applied to the norm of the reconstruction error, with Bonferroni correction to compensate for multiple tests, verified that the difference in performances is statistically significant for all the three methods (p < 0.0001), with higher performances provided by our method. In Fig. 4, we show a random task selected from the test dataset. It is possible to observe that the estimation performance between the pseudoinverse, the static MVE and our approach are similar for the joints that can be directly measured, while our method strongly outperforms the alternative solutions in case of no direct measure of the joints. Both the methods based on MVE are able to provide an estimation of the joint temporal evolutions in case of missing data, however, only our method based on fPCA+MVE can guarantee the likelihood of the estimated movement, since it is directly embedded in the functional components that encode the a priori information [16]. Furthermore, it is worth noticing that, although on average the estimation accuracy of the static MVE is close to our approach yet statistically significantly different, it provides a very noisy estimation (see Fig. 4), which depends on the SN ratio and is intrinsically unavoidable, because there is no direct relationship between subsequent time frames. We also verified the performances of our method for different values of standard deviation of the Gaussian noise applied to the data, testing values ranging in the interval [0, 2] rad. Results are depicted in Fig. 5. It is possible to observe that, although-as expected-the error increases for all the methods at high noise level, numerical values obtained with the pseudoinverse and with the static MVE are consistently higher than our approach, especially in the range [0,0.2] deg, which corresponds to a reasonable amount of noise observed in common sensors [35] and therefore will be the proper working range of our system. Statistical significance of the differences in estimation error was proved also in this case through Wilcoxon signed-rank test with Bonferroni correction, proving the difference between the three conditions (p < 0.0001).
Finally, to evaluate the effectiveness of our approach with different sensing sources, we also quantified the estimation accuracy obtained with our method using the dataset D2 and obtained a reconstruction error of 0.038 ± 0.023 rad for the kinematic joints and 0.003 ± 0.002 mV for the muscles (values expressed in median ± interquartile range and normalised over T). In this case, considering that for D2 we observed an average absolute difference between maxima and minima of each reconstruction equal to 1.27 rd and 0.059 mV, the Fig. 6. Motion estimation for a random entry of the test dataset (D2) when we use a noised recording of joints [3,5,6] and muscles [1,2,4,7,8,9,11,12] (Gaussian noise with std. 0.1 rd for joints, with std. 10 −2 mV for muscles). In this plot, we compare the estimation provided our fPCA+MVE approach (in green) versus the real signal. In A the kinematic signals, in B the muscles. relative error for joints and muscles is equal to 2.9% and 5.1%, respectively. An example of motion estimation carried out on a random entry of the test dataset D2 is given in Fig. 6. Using the pseudoinverse, instead, we obtained 0.224 ± 0.046 rad for the kinematic joints and 0.005 ± 0.002 mV for the muscles. Of note, in this second dataset, the reconstruction error obtained on the kinematic temporal profiles is slightly higher that in the D1 case. This is caused by the fact that here we measure only three joints. It is also interesting to observe that our approach is able to provide a reasonable estimation of muscles envelopes, which is not feasible with the static MVE implementation.

V. CONCLUSION
In this article, we presented an integrated framework, which extends the MVE to the more challenging case of multimodal time-varying temporal kinematics of multimodal upper limb muscular-skeletal data. The idea is to leverage on fPCA to establish a mapping from the domain of the time series to a static extended state of weighting coefficients. We demonstrated that, despite the measurement noise, our approach can provide an accurate estimation of all motion profiles, including those referred to the joints not directly measured, leveraging on the covariation patterns identified on a priori dataset of motions. However, since the estimation strongly relies on the set of movements used as a priori information, it is reasonable to expect that the estimation accuracy could degrade when the novel signal is significantly different with respect to the ones that are present in the a priori dataset. This is an intrinsic limitation of MVE procedures, and it could be particularly relevant for the muscular domain, where the signals shape may vary significantly as a consequence of stiffness modulation during the action.
With the choice of a priori data reported in this article, our method seems particularly suitable to estimate cyclic tasks, as the ones commonly performed in industrial human-robot collaboration. The development of an index based on the estimation performance could be used to identify whether the analyzed motion is a complete action or not, and-on line-detect the end of a cycle. Furthermore, this could also provide important information on how a specific cycle is performed w.r.t. the previous iterations. Indeed, if significant changes in the estimated parameters can be observed while analyzing subsequent iterations of a given task, it is reasonable to expect that these changes could be correlated with modifications in the execution of the cycle, which can be caused for example by fatigue or other nonergonomic conditions. Our future work will investigate these aspects together with the capability to generalize to other, significantly different, classes of motions and prior information, for on-line motion reconstruction and prediction. The extension to whole-body reconstruction is also envisioned. Another interesting result is that specific joints seem to be more informative than others, and hence they should be preferably measured, if some constraints due to wearability as well as to economic and environmental conditions impose limits on the number of sensing elements and their accuracy. This opens to the perspective of devising optimal design strategies for the development of under-sensorized wearable sensing systems [26].
Finally, it is worth mentioning that there are no constraints on the kind of signals used with our method. We demonstrated this by applying our method to a second dataset that included both kinematic and muscular data. We observed that, while a static implementation of static MVE cannot provide an estimation of multimodal data because of the difficulties in defining a reliable covariance matrix, with our approach, we are able to provide a reasonable estimation of the whole muscle-skeletal temporal evolution relying on a reduced set of measures. However, we noticed that the estimation of muscles profiles is more difficult than the kinematic counterpart. This could be also partially ascribed to the fact that not all the muscular activity is devoted to generate motion. Indeed, a not negligible part of muscular activation is used to preserve muscular tone and to regulate the stiffness during the movement. Our method is not able to discriminate between the different components of the muscle activity and, therefore, this could result in reduced estimation performance for certain muscles and tasks. It is reasonable to expect that this phenomenon can be accentuated when the kinematic component of the signal cannot compensate for the missing information, as for Muscle 10 (Flexor carpi radialis FCR) in Fig. 6. Indeed, FCR muscle is mostly involved in the flexion and abduction of the wrist, which are not part of the available a priori kinematic data in the dataset D2. It is likely that the availability of joint angular values of wrist flexion and abduction could also improve the muscular estimation. This point is left for future developments of our work.
Interestingly, in principle, using as a priori information a set of data extracted in a low-stiffness condition, one could also use our method to develop a reliable online estimation of arm stiffness even in functional tasks. This will be one of the focuses of future works of our group, in conjunction with the extension of our framework to other sensing modalities, such as pressure forces at the feet level (which we can record with sensorized insoles), electroencephalographic data, etc. This extension will also take into consideration that the relationship between measurements and DoFs could be, in general, nonlinear (i.e., H will be a block-based matrix of nonlinear terms). This will require the integration in our framework of a mapping tool from measurement recordings to the DoFs, as for example done in [8] and in our previous work [29], [30]. We strongly believe that this work can provide interesting insights for the development of undersensorized tracking solutions for multidomain applications, opening interesting perspectives in different scenarios, such as advanced human robot interaction (including brain machine interfaces), tracking of working conditions, assessment of muscular fatigue, rehabilitation and modelling of the neuro-muscular system.