A Framework for Autonomous Impedance Regulation of Robots Based on Imitation Learning and Optimal Control

In this work, we propose a framework to address the autonomous impedance regulation problem of robots in a class of constrained manipulation tasks. In this framework, a human arm endpoint stiffness model is used to extract the task stiffness geometry along the constrained trajectory, which is then encoded offline and reproduced online by a Gaussian Mixture Model (GMM) and the Gaussian Mixture Regression (GMR), respectively. Furthermore, the full Cartesian impedance of the robot is formulated through an optimal control problem, i.e., the Linear-Quadratic Regulator (LQR), in which the task stiffness geometry (extracted from human demonstrations) is considered as the time-varying weighting matrix Q. The optimal impedance is eventually realised by the robot through a task geometry consistent Cartesian impedance controller. A tank-based passivity observer is implemented to give evidence on the stability of the system during online impedance variations. To evaluate the performance of the framework, a comparative experiment with three different impedance settings (i.e., the proposed framework, the framework without LQR and the framework without GMM/GMR) for Franka Emika Panda to perform a door opening task was conducted. The results reveal that our framework outperforms the other two, in terms of tracking error and the interaction forces.


I. INTRODUCTION
I N THE last decades, Cartesian impedance control techniques have been developed for the new generation of torque controlled light-weight robots to guarantee safe interaction with humans and their surroundings [1]. In such control techniques, the impedance parameters regulate the relationship between the trajectory tracking errors and the interaction forces, playing a crucial role in the interaction performance of the robots for a given task. Nevertheless, for tasks that require continuously varying interaction patterns, the adjustment of the impedance parameters remains a difficult issue, also considering the dynamic uncertainties [2], [3].
Inspired by the human manipulation capability, researchers have been trying to understand how human beings regulate their arm impedance in performing interactive tasks. In this direction, authors in [4] proposed a 2D stiffness model of the human arm by assuming that the arm endpoint stiffness is linearly related to the magnitude of the joint torques. Relying on this assumption, an arm control model including the adaptation of force and impedance was proposed in [5], leading to a similar adaptation behavior observed in human experiments. The online human arm impedance estimation techniques in 3D was first proposed under the concept of teleimpedance control [6], and then extended to the entire arm workspace through a reducedcomplexity representation in [7]. An extension of this work to enable transferring of whole-body human motor behaviours to teleoperated robots was presented in [8], where a MObile Collaborative robotic Assistant (MOCA) was operated to execute complex loco-manipulation tasks. Although teleimpedance control has shown a high potential in the execution of remote interactive tasks, in most applications, an autonomous approach is preferred.
To achieve autonomous interaction control of robots, the stiffness profile can be encoded offline and reproduced online. An elegant and efficient way to implement this is through imitation learning. In this direction, Dynamic Movement Primitives (DMPs) [9], which consists of a canonical system, a transformation system, and a nonlinear function estimator is widely used to encode complex behaviors in human motor control and robotics. However, due to the dependency of the canonical system on time or phase input in this framework, it can hardly deal with multi-dimensional input data. Another popular movement representation technique is through Gaussian Mixture Regression (GMR) [10]. Firstly, it models the joint probability density of the input and output data in the form 2377-3766 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
of a Gaussian Mixture Model (GMM), which can be estimated by an Expectation-Maximization (EM) procedure. Furthermore, the regression function is derived by computing the conditional probability distribution of GMM. It directly maps the input to output and both can be multi-dimensional. In addition, GMR is a probabilistic model, which can not only predict the mean value but also covariance of the output variables. In [11], the robot's movement was encoded and reproduced by GMM/GMR. They consider the predicted covariance as uncertainty in the environment and then use its inverse as the stiffness in Cartesian impedance control of robots. The same idea but with a minimal intervention control framework was established in [12]. However, the inverse of the covariance embedded in multiple movement demonstrations doesn't correspond to the physical meaning of stiffness, and this framework will not work in the constrained motion scenarios, such as door opening and valve turning. Instead of extracting stiffness information from variability in multiple demonstrations, there are some works about estimating it based on interaction forces and tracking errors, followed by imitation learning algorithms to reproduce the learned stiffness profiles online. The authors in [13] proposed an approach to collaboratively assemble a wooden table by a human subject and a robot. By taking into account the sensed force information, a Weighted Least-Squares (WLS) method is employed to estimate the stiffness. In [14], the authors used a similar framework but adopted a sliding window technique to carry out local stiffness estimation instead of WLS. However, in this way, the position and force data are usually demonstrated via kinesthetic teaching, which may cause discontinuity in a continuous interaction process. It is not very reliable also due to the fact that the first estimation result of stiffness usually does not comply with Symmetric Positive-Definite (SPD) constraints. A more direct way to teach robot compliant manipulation is through human-robot interaction proposed in [3]. However, for a task requiring continuously varying impedance, this way may bring inconvenience to the teacher.
Some works consider impedance regulation as an optimal control problem [15]. In this field, most of the works are based on Linear Quadratic Regulator (LQR) formulation, however, the choices of Q and R matrices play an important role which are usually hard to design for individual tasks. In most cases, Q and R are fixed which results in a hard-coded behavior. To achieve an online adaptation, adaptive optimal control or Reinforcement Learning (RL) can be employed. In [16], an approach called Out-Put FeedBack Adaptive Dynamic Programming (OPFB ADP) is proposed to solve the problem by only using the measured output data, considering that in most robot-environment interaction applications the dynamic processes are partially observable. An example of using RL method to accomplish variable impedance control for a given task is presented in [2], where a Policy Improvement with Path Integrals (PI 2 ) algorithm is proposed and a DMP model was embedded in their framework for modeling the reference trajectory. In [17], the same framework and algorithm were used to learn variable impedance control in stochastic force fields, exhibiting the generability of RL methods. To transfer highly dynamic manipulation skills from humans to robots, such as pancake flipping, not only reference trajectories but also impedance profiles must be learnt. Hence, the authors in [18] proposed a similar framework to the work in [2], but with a different learning algorithm called Policy learning by Weighting Exploration with the Returns (PoWER). Even though the impedance regulation problem can be solved through the above RL methods, the training efficiency is quite low for learning impedance parameters due to the lack of proper initialization of impedance profile compared with the learning speed of the reference trajectory.
In this work, we propose a framework that merges the advantages of imitation learning and optimal control to achieve an adaptive and optimal interaction performance with a generalization potential. The feasibility of this approach is under the premise of the formulation of human arm endpoint stiffness model which represents task stiffness geometry. Our aim is to transfer task stiffness geometry, provided by human demonstrations, to robots in an autonomous and optimal way. The door opening has been chosen as a representative example of tasks with constrained and continuously interactive motion as shown in Fig. 1(a). Due to a potential instability of the system acting under variable impedance, we developed a tank-based passivity observer to monitor the stability of the system. In addition, to verify the synergistic contribution of the imitation learning and optimal control in this framework, two additional impedance settings were tested: one used GMR to reproduce the varying task stiffness geometry but with a constant volume coefficient instead of LQR, the other one was the opposite, i.e., a constant task stiffness geometry with LQR. The performances of the controllers under different impedance settings were compared in terms of stiffness profiles, tracking error, and interaction forces.

II. FRAMEWORK
The overall framework is depicted in Fig. 1(b). Inspired by the imitation learning approach, we attempt to learn the task stiffness geometry from human demonstrations. In contrast with the work in [12]- [14], we propose to utilize physically meaningful stiffness information, which is extracted directly from a human arm endpoint stiffness model. In this model, the arm endpoint stiffness is decomposed into two components, i.e., the Configuration Dependent Stiffness (CDS) and the Common Mode Stiffness (CMS), as proposed in [19], [20]. The idea of CMS and CDS is inspired by human motor control principles on the predominant use of the arm configuration in directional adjustments of the endpoint stiffness profile, and the synergistic effect of muscular activations, which contributes to a coordinated modification of the endpoint stiffness in all Cartesian directions. Under the assumption that human beings adapt their arm endpoint stiffness ellipsoid's geometry (represented by CDS) and volume (represented by CMS) to match the interaction requirements of a task, we propose to only encode human demonstrated CDS by GMM offline (pose as input while CDS as output in the model) and reproduce it online by GMR. The CMS component, which regulates the volume of the endpoint stiffness ellipsoid, is not observed from human demonstrations since it can depend on task specifics. For instance, in door opening or valve turning tasks, the CMS would depend on the friction properties or the elastic response of a spring-loaded door or a valve. Instead, to construct the complete impedance (i.e. the full stiffness and the damping term), the LQR is adopted to take CDS as the weighting The architecture of the proposed framework. The full impedance is regulated by an imitation learning algorithm (GMM/GMR) and an optimal control method (LQR), which is then realised by the robot Cartesian impedance controller. matrix Q in an optimal control problem to match different environment dynamics. Eventually, the optimal impedance profile is implemented by a task geometry consistent Cartesian variable impedance controller, whose desired pose is provided by a trajectory planner and a Finite State Machine (FSM). In the following sections, each part of the overall framework will be explained in more detail.

A. Learning CDS Based on GMM/GMR
Imitation learning has been widely used in modeling of the task kinematics, while there are very few works about learning the dynamics. The barrier is the difficulty to extract physically meaningful dynamic parameters from human demonstrations. To exploit the potential of imitation learning in the execution of physical interaction tasks in this work, a human arm endpoint stiffness model is first introduced. Next, the GMM/GMR is applied to encode and reproduce the task stiffness geometry through the CDS.

1) Human Arm Stiffness Model and Demonstrations:
In [20], we propose a principally simplified, and intuitive online model of the human arm endpoint stiffness. The new model is based on the large dependency of the shape and orientation of the stiffness ellipsoid on the arm configuration. In fact, previous studies [21], [22] indicate that i) the major principal axis of the arm endpoint stiffness ellipse passes close to the hand-shoulder vector, and ii) when the arm is extended and the hand moves further from the shoulder, the ellipse becomes more elongated and conversely, it becomes more isotropic. Inspired by these findings, the new model in 3D is constructed by the ellipsoid's principal axes and their lengths, namely the eigenvectors and eigenvalues of the stiffness matrix, based on the arm configuration.
As shown in Fig. 2, we use a two-segment human arm skeleton structure in the 3D space. The hand-forearm and the upper arm segments in this model permit us to form an arm triangle at any non-singular configuration. Relying on the dominant contribution of the arm configuration to the endpoint stiffness geometry, we propose to use the vector from the centre of shoulder joint to the position of the hand ( l ∈ R 3 ), to identify the major principal direction of the human arm endpoint stiffness ellipsoid. The minor principal axis direction ( n ∈ R 3 ), instead, is defined to be perpendicular to the arm triangle plane where r ∈ R 3 represents the vector from the centre of shoulder to the centre of elbow. The remaining principal axis of the stiffness ellipsoid, which lies on the arm triangle plane, is calculated based on the orthogonality of the three principal axes. Under the assumption that the ratio of the length of median principal axis to the major principal axis of the stiffness ellipsoid is inversely proportional to the distance d 1 ∈ R, from the hand position to the centre of shoulder, while the ratio of the length of the minor principal axis to the major principal axis is proportional to the distance d 2 ∈ R, from the centre of the elbow to the major principal axis, the estimated endpoint stiffness matrix K c ∈ R 3×3 is formulated bŷ Here, A cc ∈ R is the co-contraction activation index of human arm muscles corresponding to CMS, while V D s V T ∈ R 3×3 corresponding to CDS, and V ∈ R 3×3 and D s ∈ R 3×3 are respectively eigenvectors representing the orientation and eigenvalues representing the shape of the stiffness ellipsoid and formulated by: where α 1 ∈ R and α 2 ∈ R are subject related parameters in the model. For the detailed explanation and further model parameter identification, please refer to [20].
According to the introduced model, to extract CDS information from human demonstrations for a given task, only positional information of the shoulder, elbow and hand is necessary which is quite easy to get. The CDS information can be extracted followed by a post-processing of the collected 3D position data according to the proposed model.
2) Encoding CDS Profile by GMM/GMR: In this part, we first give a recap of GMM/GMR algorithm and then introduce how to apply it to encode the CDS profiles. The Gaussian mixture distribution can be written as a linear superposition of Gaussians in the form where ξ ∈ R d and p(ξ) ∈ R, respectively, represent the vector of variables and joint probability distribution. π k ∈ R, μ k ∈ R d and Σ k ∈ R d×d represent the prior probability, mean and covariance of the k-th Gaussian component respectively, while K ∈ Z + is the total number of Gaussian components. The goal of the offline training phase is to maximize the log likelihood function Eq. (6) with respect to the model parameters (π k , μ k and Σ k ), which results in an EM procedure (E-step in (7) and M-step in (8) and (9)) to iteratively update the model parameters until convergence Here, N ∈ Z + is the total number of data points in a training dataset and ξ n ∈ R d is the n-th training data point. After modeling the joint probability distribution of the training data offline, to derive GMR, we use superscripts I and O to denote the dimensions of the input and output variables. At iteration n, η I n and η O n respectively represent the input and output variables. With this notation, the data point η n ∈ R d , mean μ k and covariance Σ k of the k-th Gaussian component can be decomposed as In the online reproduction phase, the best estimation of output η O n for a given input η I n is the meanμ n of the conditional probability distributionη O n |η I n ∼ N (μ n ,Σ n ), with parameters which was reported in [23]: where In this framework, only the mean value is used although covariance can be derived as well. human arm endpoint pose and CDS are respectively considered as input and output of GMM. Pose data can be represented in vector form while the CDS profile is a series of SPD matrices, it's not possible to apply GMM to encode it directly considering GMM can only deal with independent variables in vector form. In [14], two different ways to represent stiffness matrix are provided. To make it simple, we use Cholesky decomposition to decompose CDS into the product of a lower triangular matrix and its transpose, and then we arrange its elements in a vector form. Next, the offline training phase takes the compact pose and the CDS vectors as the vector of variables ξ. Subsequently, the GMR reproduces the CDS vector online by taking robot end-effector pose as input, followed by a reverse process of Cholesky decomposition to construct CDS matrix whose SPD property is guaranteed.

B. Constructing Full Impedance Through Optimal Control
For the control of robots' interactions, we need to derive the full stiffness, and eventually the full impedance from the CDS information. To take environment dynamics into account and construct the full impedance from an optimal point of view, [12] proposed to formulate it as a LQR problem, in which the predicted covariance extracted from multiple movement demonstrations was used to define the weighting matrix Q. This consideration implements a similar idea to our proposed method, with the difference that instead of using the predicted covariance, we are using the physically meaningful CDS formulated by the human arm endpoint stiffness model.
The use of a Cartesian impedance controller, which will be introduced in the next subsection is more effective than a pure position controller [1] since the impedance parameters, especially the stiffness parameter, which regulates the relationship between the interaction force and the tracking error, can be tuned in arbitrary directions to implement a trade-off between position accuracy and the 'softness' of the response. For LQR, the weighting term also regulates the relationship between position errors and control inputs in different directions. It should be a large value along the free motion direction while a small value along the constrained motion direction. It can be found out that the function of stiffness parameter in Cartesian impedance control coincides with the function of weighting term in LQR, thus in this framework, we are using CDS which represents the task related stiffness recorded from human as the weighting term in LQR. In the following, a brief description of LQR problem is given. Consider the discrete state space equation of a linear environment system as: where ζ n = [x n ,ẋ n ] ∈ R 2×p is the system state, x n ∈ R p andẋ n ∈ R p respectively represent the pose and velocity. u n ∈ R r is the system input, and A ∈ R (2×p)×(2×p) and B ∈ R (2×p)×r are the discrete state and input matrices respectively. For the subsequent form of control input: wherex n ∈ R p is the desired pose,K n ∈ R r×p andD n ∈ R r×p are complete stiffness and damping matrices estimated by LQR with time-varying CDS. For an infinite-horizon, discretetime LQR, the solution is obtained by minimizing the following cost function: where Q n ∈ R p×p and R ∈ R r×r are the weighting matrix and the time-varying Q n will be set to CDS matrix generated online from GMR according to the current input variables in this framework. The feedback gain is updated by where P ∈ R (2×p)×(2×p) is the unique positive definite solution to the following Discrete time Algebraic Riccati Equation (DARE):

P = A P A − A P B(R + B P B) −1 B P A
where Q n ∈ R (2×p)×(2×p) and Q n = Q n 0 p×p 0 p×p 0 p×p .
In this framework, compared with the heuristic way to construct impedance in [11], LQR is a more systematic method, which takes the environment dynamics into account and holds a good balance between the tracking error and the control input (which equals to the interaction force in the point view of Cartesian impedance control). The environment dynamics is set to a double integrator system similar to [12] in our experiment. In the following, we discuss how to implement the full impedance profile on robots by using a task geometry consistent Cartesian impedance controller whose choice of parameters is further discussed in Section III.

C. Robot Controller 1) Task Geometry Consistent Cartesian Impedance Control:
At present, the most widely used Cartesian impedance controller for the new generation light-weight robots (such as KUKA LWR and Franka Emika Panda) follows the Jacobian transpose scheme, in which the measurement of external wrench is not necessary by avoiding the inertia shaping. A systematic description of this scheme is reported in [1], where the analytic Jacobian is used due to the minimal representation of the orientation. Such a representation is known to suffer from singularity and the task geometry consistency problem, i.e., the equivalent rotational stiffness between the orientation displacement and the elastic moment, which depends on the desired end-effector orientation. This issue was first discussed in [24], in which an inverse dynamics strategy is employed to implement Cartesian impedance control, resulting in the requirement of measuring contact forces and moments. Although only the translational stiffness is modeled in Section II-A and is dominant in most applications, the rotational stiffness still needs to be set when the robot executes a task. To eliminate the effects brought by task geometry consistency, a derivation of task geometry consistent Cartesian impedance control is performed by following the Jacobian transpose scheme.
In contrast with [1], instead of representing the operational space dynamic equation by using analytic Jacobian, here the geometric Jacobian J (q) ∈ R 6×n is adopted to address the task geometry consistency problem. Without an explicit representation of Cartesian coordinates by using v e = J (q)q anḋ v e =J (q)q + J (q)q and neglecting the gravity term for simplicity, the dynamic model for the manipulator in Cartesian space is μ(q,q) where v e ∈ R 6 is the end-effector velocity, q ∈ R n is the joint coordinates. F τ ∈ R 6 and F ext ∈ R 6 are respectively the Cartesian control input and external wrench. Λ(q) ∈ R 6×6 and μ(q,q) ∈ R 6×6 are the Cartesian inertia and Coriolis/centrifugal matrices respectively where M (q) and C(q,q) are respectively the joint space inertia and Coriolis/centrifugal matrices. The desired Cartesian impedance relationship derived from an energy-based argument when considering task geometry consistency is [24]: whereṽ ed ∈ R 6 is the velocity error and it equals to v d − v e , v d ∈ R 6 is the desired end-effector velocity, p ed ∈ R 3 is the position error and the orientation error is represented in axis/angle o ed ∈ R 3 and o ed (r, θ) = r sin(θ), R(θ, r) = R d R T e . R d ∈ R 3×3 and R e ∈ R 3×3 are respectively the rotation matrices from the desired and actual end-effector frames to base frame.D n andK n are the impedance parameters derived from Eq. (17), with r = p = 6 in Cartesian space.
By adding Eq. (19) and Eq. (22) to eliminatev e as in [1], the following control law is derived: The joint torque command can be computed by following the Jacobian transpose scheme.
If v d = 0, then the simplified task geometry consistent Cartesian impedance controller is Fig. 3. The left subplot shows human arm endpoint trajectories (black lines) and stiffness profiles (green ellipses) during the demonstration phase. The task stiffness was calculated by the model (resulting by Eq. (2)) and only CDS was extracted from human demonstrations. The right subplot shows CDS reproduction (green ellipses) based on a planned circular trajectory (black line) as input.
2) Tank-Based System Passivity Observer: Considering that the overall framework assumes a variable impedance control, the passivity of the system and hence the stability may be lost if the impedance values change too fast. In this framework, a tank-based approach proposed in [25] is used to monitor the stability of the system. Formally, the extended dynamics is given by: is the time-varying stiffness. The scalar x t ∈ R is the state associated with the tank and the tank energy T ∈ R + and T = 1 2 x 2 t . σ ∈ R and ω ∈ R 6 respectively are otherwise .
(27) Here,T u ∈ R + is a suitable, application dependent, upper bound that avoid excessive energy storing in the tank whileT l ∈ R + is a lower bound below, which energy cannot be extracted by the tank for avoiding singularities in Eq. (26) and thus the time-varying stiffnessK v n will be removed. For a detailed analysis of the system passivity, please refer to [25].

III. EXPERIMENTS AND RESULTS
The experimental setup is shown in Fig. 1(a). On the human side of this setup, the MVN Biomech suit (Xsens Tech) was used for collecting the 3D position information required to construct CDS patterns and to train the GMM. On the robot side of this setup, the torque-controlled Franka Emika Panda arm was used, which was equipped with the Pisa/IIT SoftHand for grasping of the door handle. For the door opening task, the significant motion and interaction happened on X-Y plane. Therefore, in the following, all the stiffness profiles, trajectories and forces data are illustrated on X-Y plane.
As shown in Fig. 3, the left subplot shows the subject repeated the door opening task for five times. The black lines represent human arm endpoint trajectories, while the green ellipses represent the demonstrated human arm endpoint stiffness profile.
The CDS part was constructed according to the collected 3D position data of human hand, elbow, and shoulder at 100 Hz with model parameters (α 1 = 0.2450, α 2 = 2.7118), which was reported in [20]. The CMS value was selected experimentally to match the range of trajectories, since it was neither used in the training of the GMM nor for reproduction by GMR. Since the motion was constrained, the demonstrated trajectories didn't show much variability, which means that the method proposed in [12] would fail to provide meaningful stiffness profiles along the trajectory.
After constructing the input (endpoint trajectory) and the output (endpoint stiffness) data of the GMM, an EM training process was performed to estimate the model parameters, in which the convergence is reached by meeting the iteration stopping criteria that the increase of average log-likelihood value in current iteration is less than 1e-4. The total number of Gaussian components K in GMM is decided by the gradient of converged average log-likelihood value of all data points with respect to accumulative values of K. Hence, we chose K = 5 in the model. The reproduction of CDS by GMR was simulated by taking a planned circular trajectory as input. The results are depicted in Fig. 3 on the right.
To reveal the effectiveness and the added value of the two main components of the proposed framework (i.e., imitation learning and optimal control), a comparative experiment was conducted under three different cases.
Case 1: This case implements the proposed framework: The CDS extracted from human demonstrations was encoded by GMM offline and reproduced by GMR online (current endeffector position as the input). For LQR, the weighting matrix Q was set to CDS and R was set to 1 × 10 −6 I (I is identity matrix), which was decided by limiting the resulted largest stiffness within the stiffness range of Franka Emika Panda.
Case 2: In this case, the process of CDS information was the same as in Case 1. But instead of using LQR, a constant CMS was set to 1040.37 which is decided by keeping average area of stiffness ellipses on X-Y plane along the trajectory is the same in Case 1 and Case 2 since comparison focuses on the difference in resulted task stiffness geometry which means the shape and orientation of the stiffness, so the total volume or area should be kept as the same to make them comparable. The damping term was set to guarantee a damping ratio of 1 √ 2 to guarantee the response is fast and avoid overshooting at the same time.
Case 3: Here, instead of encoding and reproducing the timevarying CDS by GMM/GMR, a constant CDS was used, which equals to the value of the generated CDS at the starting point in Case 1 (this choice is made for aligning initial interaction force and tracking error of Case 1 and Case 3 in the constrained direction and thus, the resulted interaction force and tracking error in Case 1 and Case 3 could be conveniently compared.), and then the LQR procedure was followed.
In all above cases, the desired path was generated by the same circular trajectory planner, and since the position of the door is not exactly known, there exist position uncertainties in the experiment. The environment dynamics was set to a double integrator system due to the fact that the door opening procedure features low-friction and free rigid-body rotation, which means the matrices A and B were set as following (for X-Y plane, The tank-based stability observer was used to monitor system passivity. As shown in Fig. 4(b), the tank energy was above the lower bound (T l = 0.2 J) in all three cases, which means that the full stiffness including the constant (i.e. the average stiffness along the whole path was considered as the constant stiffness in each case) and varying parts was realised. The resulted trajectories and the stiffness profiles of Franka Emika Panda in three cases are shown in Fig. 4(a). Compared with Case 1, the stiffness ellipses in Case 2 without the LQR process were more elongated, which may cause large interaction force when the door is in the middle. The stiffness ellipses in Case 3, which used a constant CDS with the LQR process demonstrated a more isotropic shape in comparison to those in Case 2. However, due to the use of a constant CDS, although the resulted stiffness matched the task requirements in the beginning of the trajectory, however, it didn't yield a desired stiffness ellipse geometry in the end.
Due to the difference in impedance behaviors in the above three cases, the trajectory tracking and the interacting performance of the robot were different. In our experiment, the interaction force is measured by a built-in external force observer of Franka Emika Panda and it represents the force acting on the end-effector frame, expressed relative to the robot base frame. The tracking error is measured through a built-in forward kinematics of Franka Emika robot. The resulting tracking errors and the interaction forces along the constrained-motion and free-motion directions (as shown in Fig. 4(a), the solid red and yellow arrows attached on the door respectively indicate the constrained-motion and free-motion directions.) are illustrated in Fig. 5(a) and (b), respectively. The door opening task lasted 10 s, starting and ending at 17 s and 27 s respectively. The door was in the middle at 22 s. In the free-motion direction, the tracking errors (top in Fig. 5(a)) in three cases were initialised by the same zero value, thus resulted in the same zero force initial values (bottom in Fig. 5(b)). Case 1 showed less tracking errors and less control input compared with Case 2 and Case 3 along almost the entire path. In the constrained-motion direction in which a compliant behavior was expected, due to the existence of an initial position mismatch and different stiffness settings, Case 2 showed a different initial tracking error (bottom in Fig. 5(a)) and interaction force (top in Fig. 5(b)) behavior compared with Case 1 and Case 3 which were using the same stiffness values at the starting point. During the door moving to the middle, the tracking error in all three cases were decreasing, and it was less in Case 2 and Case 3 due to a larger stiffness than that in Case 1. However, the interaction force in Case 2 was much larger than the other two, which was unnecessary. When the robot moved the door from the middle to the end, the tracking error in Case 1 and Case 2 was increasing since the stiffness in that direction was decreasing. In Case 3 the tracking error didn't show obvious changes because of the near constant stiffness settings, however, it resulted in a larger interaction force due to the inconsistency between stiffness and task geometry in the end.
Due to the fact that the varying CDS encoded by GMM/GMR is not directly related to time but to current end-effector position, the online generated impedance parameters were always synchronized with the task geometry even if external interventions (external forces were applied to confront the door's motion at around 11 s and 27 s) were applied, as shown in Fig. 5(c). In the plots, when an external force was applied and the end-effector stopped moving, GMR stopped updating as well to avoid task stiffness geometry mismatch which would cause large interaction forces in the constrained-motion direction. This highlights the dependency of the GMR on task states and not on task timing, guaranteeing the adaptation of the stiffness profile according to the task requirements.
Although the proposed method outperforms the other two, one may argue that the resulted stiffness profile doesn't perfectly match the task requirements, e.g., in the middle of the task, where the major principle axis of the stiffness ellipse was in the constrained-motion direction. If we consider the problem from the view of a moving task frame attached to the door, a more intuitive and easier solution for the stiffness regulation problem can be a constant task stiffness geometry ellipse in the task frame, whose major and minor principle axes are respectively aligned with the constrained-motion and free-motion directions. However, this approach is rather heuristic based on prior assumptions and knowledge of the task.
The proposed framework aims to address the autonomous impedance regulation problem of robots in a class of constrained manipulation tasks. Door opening task is a typical representation, which shares some crucial requirements with other complex interactive tasks, such as the need for a continuous regulation of the task stiffness geometry to comply with the task/environment constraints. For other "hybrid" interaction tasks such as non-flat surface polishing, which requires force tracking and impedance regulation at the same time, our framework can also be applied with an indirect force control through a Cartesian impedance controller (e.g, see our previous controller in [26]) and the task required stiffness can be learnt and controlled by using our framework. Indeed, the strength of the proposed approach is the ability to learn, replicate, and optimise the interaction behaviour of human arm that can deal with different manipulation tasks.

IV. CONCLUSION
In this work, we proposed a framework for autonomous impedance regulation of robots based on imitation learning and optimal control, which resulted in task-and dynamics-aware impedance planning without an explicit time dependency. The effectiveness of the framework was evaluated by conducting a comparative experiment including three different cases for a door opening task, as an illustrative example of constrained and continuously interactive actions. Results provide evidence on the superior performance of the proposed framework in terms of tracking error and interaction forces, through optimal and adaptive planning of robot interaction parameters.