A control scheme for haptic inspection and partial modification of kinematic behaviors*

Over the last decades, Learning from Demonstration (LfD) has become a widely accepted solution for the problem of robot programming. According to LfD, the kinematic behavior is "taught" to the robot, based on a set of motion demonstrations performed by the human-teacher. The demonstrations can be either captured via kinesthetic teaching or external sensors, e.g., a camera. In this work, a controller for providing haptic cues of the robot’s kinematic behavior to the human-teacher is proposed. Guidance is provided in procedures of kinesthetic coaching during inspection and partial modification of encoded motions. The proposed controller is based on an artificial potential field, designed to adjust the intensity of the haptic communication automatically according to the human intentions. The control scheme is proved to be passive with respect to robot’s velocity and its effectiveness is experimentally evaluated in a KUKA LWR4+ robotic manipulator.


I. INTRODUCTION
Learning from Demonstration (LfD) is recently proposed as a promising solution for the problem of robot programming, according to which the robot's kinematic behavior is taught based on a set of demonstrated by the humanteacher motions. Learning from demonstration provides the robots with flexibility and allow them to exploit the workers' experience, as opposed to classical programming which is inflexible, cognitively and physically demanding, timeconsuming and requires technical skills from the humanteacher. The set of motion demonstrations can be either captured via external sensors [1], e.g., a camera, or via internal sensors, i.e., robot's proprioception, during physical human-robot interaction, an approach widely known as kinesthetic teaching [2]- [4]. To encode and generalize the demonstrated kinematic behavior, the most popular approaches involve the utilization of Dynamical Systems (DS) with parameters learned to optimally reflect the set of demonstrated motions. The most popular dynamical systems for LfD are the Dynamic Movement Primitives (DMP) [5], [6] and Gaussian Mixture Models (GMMs) [7]. Both dynamical systems involve function approximation methods, most commonly utilizing a weighted sum of Gaussian base functions, with the weights of those functions being the parameters which encode the kinematic behavior. Dynamical systems generalize motions from different initial states to new targets and allow on-line adaptation and modification. * LfD can be significantly accelerated if feedback of the current knowledge is provided from the robot-learner to the human-teacher, since it can accelerate the learning process [8]- [10]. Many works propose the bi-directional communication between the human-teacher and the robot-learner [1], [2], [4], [8]- [15]. Some works utilize graphical interfaces to display the path of the robot [12], [13], while others simulate the robot's kinematic behavior [1], [12], [14], [15]. However, the planar nature of the utilized monitors hinders the user's ability to get a clear understanding of the current knowledge of the robot and navigate in such an environment. A more effective approach involves the autonomous execution of the learned behavior, while allowing physical intervention for modifications [2], [4], [11], [16]. However, in these cases, the user has to wait for the evolution of the dynamical system or synchronize with it and intervene exactly at the specific instance the modification is intended, which can be timeconsuming and cognitive-demanding. In physical humanrobot interaction the passivity of the control system is needed in order to ensure that the energy produced by the system will in any case be less than the absorbed one, and therefore the safety of both the system and the operator are guaranteed. In some of the above works, passivity proof is not given [2], [11]; in others [16] the passivity analysis involves the velocity error and not the velocity of the robot which is more appropriate, as it corresponds to the energy transferred between the system and the environment.
A common technique for providing haptic feedback to the user involves the utilization of Virtual Fixtures (VF), which were firstly introduced in tele-robotic manipulation [17], [18] and have been later utilized in surgical [19], micro [20], industrial [21], [22], or even in underwater robotic tasks [23], to enhance operator performance in terms of execution time, precision and error rates. Virtual fixtures for guidance are usually enforced either via controllers that utilize artificial potential fields [8], [24], [25], or via controllers which do not store energy [20], [26], [27]. The controllers of the latter category do not provide haptic cues when the robot velocity is zero, which actually interrupts the communication between the robot and the human.
In our previous work [8], a passive control scheme was proposed which imposes penetrable virtual fixtures around the spatial properties of the kinematic behavior. The humanteacher has the ability to haptically inspect and validate the spatial properties of the already learned kinematic behavior and modify any segments of it, by penetrating the virtual fixture, reducing, in this way, the time required for modifications. However, haptic cues do not communicate temporal properties and they are not provided during the modification Thus, when a human introduces a modification he is unaware of the already learned behavior and he is therefore obliged to re-demonstrate the rest of the behavior. The passivity proof of the overall scheme of [8] is proved under a restrictive assumption, which is that the nearest pose from the spatial properties of the kinematic behavior is found by an optimization algorithm within one control cycle. By not including the optimization procedure dynamics, the proof is valid only for the ideal case.
In this work, a control scheme is proposedwith the following properties: a) not only the spatial, but also the temporal properties of the kinematic behavior are haptically communicated to the human-teacher as opposed to [8], while maintaining passivity, b) the proposed virtual fixture is based on a novel artificial potential field which is designed to provide haptic cues both during the inspection and the modification of the learned kinematic behavior and c) the passivity proof includes the optimization procedure dynamics, as opposed to [8]. The main contribution of this work is a control scheme that significantly reduces the effort and time of human coaching as opposed to previous published works.

II. PROBLEM DESCRIPTION AND HAPTIC CUES CONCEPT
Consider a robot with an already learned kinematic behavior, utilizing a DS. Our aim is to design a passive control scheme which assists the human-teacher by providing him/her with the information of the spatial and temporal properties of currently known kinematic behavior haptically. Furthermore, the control scheme should allow the humanteacher to kinesthetically modify any segment of the already learned kinematic behavior and easily return to the already learned segments with the assistance of the proposed haptic cues during the whole procedure. In particular, the control scheme should fulfill the following key objectives: • The control action applied by the robot's motors should communicate haptically both the spatial and the temporal properties of the learned kinematic behavior to the user and allow him/her to discriminate between the phase of inspection and the phase of modification. • The control scheme should be passive in terms of the transferred power between the system and the environment, which involves the velocity of the robot and the interaction force. The basic concept for providing the spatial and temporal properties of the learned kinematic behavior via haptic cues is illustrated in Fig. 1. The control action is generating a force pointing from the current end-effector's pose towards the future evolution of the DS. It contains a component which is pointing to the nearest pose on the learned path and thus it is dependent on the spatial properties of the behavior similarly to [8] and a component dependent on the temporal properties of the behavior and in particular on its velocity profile; thus higher velocities will yield higher angles with the nearest pose direction. The magnitude of this force depends on whether the user is inspecting the learned behavior or he is modifying it. During inspection and for a preset distance around the learned behavior, the force magnitude is produced by a non-linear spring and it is therefore dependent on the deviation. In contrast, during modifications, which are defined as movements beyond the preset distance, the force magnitude is small and independent of the deviation. This signal allows the user to significantly modify segments of the learned trajectory while always being aware of the direction of the encoded kinematics so that he can easily return to the learned segments. III. PROPOSED CONTROL SCHEME Let x = [p T Q T ] T be the generalized pose of the endeffector, with p ∈ R 3 being the position and Q ∈ S 3 being the orientation in the form of unit quaternion. The mapping between the generalized velocity v = [ṗ T ω T ] T ∈ R 6 of the end-effector, withṗ, ω being the translational and angular velocities respectively, andẋ iṡ where with J Q (Q) ∈ R 4×3 being a matrix, mapping the angular velocity of the frame to the unit quaternion rates of Q: with η ∈ R, ∈ R 3 being the scalar and vector part of the unit quaternion Q respectively and S( ) ∈ R 3×3 the skew symmetric matrix derived from . Let the kinematic behavior be encoded by a second order Dynamical System (DS), e.g., a DMP, of the following general form: where T the pose and velocity reference respectively, z ∈ R an auxiliary state variable called "phase variable", h s , h z smooth functions of the state and phase variable respectively and w ∈ R N the parameters which are tuned or "learned" in order for the DS to optimally reflect the desired kinematic behavior. Let the notion "spatial properties" of the kinematic behavior refer to the path of the evolution of x d and the term "temporal properties" to the velocity profile.
To achieve the set objectives and realize the basic idea of the haptic cues described above, we propose a control scheme depicted in Fig. 2  Integrator The control scheme synthesizes a robot control input at the torque level given an already learned kinematic behavior encoded by a Dynamical system (DS) and includes active damping. The evolution of the DS is controlled to remain synchronized with the motion of the end-effector driven by the human-teacher via the introduction of a virtual time variable σ ∈ R ≥0 , which acts as a replacement of time; hence the evolution of the DS can be expressed parametrically with respect to σ as: (4) becomes: To achieve synchronization, we utilize a gradient descent optimization algorithm with respect to σ as described later in Subsection III-B. In fact, we find the nearest to the current pose on the learned path. Then we advance σ (by one step in a discrete time implementation) to incorporate temporal properties. As the passivity of the overall system is lost by this advancement, we utilize an energy tank state to collect the energy dissipated by the active damping and provide the required energy back to the system, as shown in Fig. 2. In this way, we can guarantee the passivity of the system, which is one of our objectives. Notice that the energy provided by the energy tank is utilized for the advancement of the evolution of the DS, which is different to the common methods involving energy tanks for interaction with dynamical systems [16], [28], [29]. The latter utilize the stored energy to adjust the impedance. In our case, it is used to communicate the temporal properties of the kinematic behavior, when the energy tank is not depleted.
To realize the basic concept of the haptic cues and in particular the magnitude of the communicated force, we impose via the control signal, a novel artificial potential field U (x, x d (σ)), which forms a penetrable spherical virtual fixture in position and orientation of preset radius, around the pose x d (σ), for the advanced σ value In this way, the spherical fixture is one step ahead of the user. The artificial potential field is designed to reflect a relatively high apparent stiffness within the spherical virtual fixture, while outside this fixture, it induces forces with small and independent of the deviation magnitude. In fact, when the end-effector is driven out of the spherical virtual fixture, the human's intention for modification is identified and communicated. The re-entrance within the virtual fixture is followed by the synthesis of a new dataset, similarly to [8], and consequently the re-training of the DS.
The control input is analytically expressed with the following generalized force which is then mapped to the joint space by the Jacobian transpose: with D ∈ R 6×6 a positive definite damping matrix, introducing the active damping.
In the rest of this section, details of the proposed control scheme (Fig.2) are given. First the artificial potential field is presented, followed by the kinematic behavior synchronization which details the finding of the virtual time instance σ depicted in Fig. 1. Last, the training dataset synthesis mechanism is briefly presented for completeness.

A. Artificial potential
The following artificial potential is proposed combining translation and orientation deviations: with . denoting the Euclidean norm, e p p − p d (σ) ∈ R 3 the translation error, e o J T Q (Q d (σ))Q = −J T Q (Q)Q d (σ) ∈ R 3 the orientation error, k p , k o ∈ R >0 tunable gains and r ∈ R >0 , s = sin θ 2 preset thresholds which determine the radius of the spherical virtual fixtures in position and orientation respectively with θ ∈ R >0 and f (x) : R ≥0 → R ≥0 the following C 1 -smooth function: with g ∈ (0, 3]. Function f (x) is depicted in Fig.3 for two different values of g = 0.1 and 0.5 respectively. Its derivative is proportional to the force and torque magnitude transmitted to the user. Hence, above the thresholds r and s for translation and orientation respectively, ∂f (x) ∂x is constant and equal to g. Consequently, the magnitude of the haptic cue transmitted to the user will be constant during any performed modification, as described in Section II.
The potential U (x, x d ) has the following properties: , ∀λ ∈ (0, 1). Furthermore its global minimum is found at x = 0 and it does not possess any stationary points other than this. Similarly, U has no stationary points other than its global minimum, which is found at x = x d .
The partial derivative ∂U ∂x in (6), is given by: where

B. Kinematic behavior synchronization
To incorporate the temporal properties of the kinematic behavior as described in the concept solution, the following continuous update law for σ is proposed: with k g ∈ R >0 being the gradient descent gain. The update law (11) induces the DS evolution in parallel with the optimal seeking. In particular, the first term of (11) will result in the virtual time being equal to the real time. The second term of (11) is a gradient descent update law for σ to find the nearest pose of the 1-dof curve x d (σ) from x(t). The combination of these two terms results in the acceleration of the DS evolution whenσ > 1 and the deceleration of the DS evolution whenσ < 1 is decelerated. To look ahead in time, one has to enforce positive values onσ. Remark 2: The second term of (11) is the on-line solution to the following optimization problem: with T ∈ R >0 being the total duration of the generated motion.

C. Final control synthesis and passivity proof
Let the robot's task-space model with gravity compensation be: where Λ x (x), C x (x, v) ∈ R 6×6 the task-space inertia and Coriolis matrices respectively, for which it holds a T (Λ x − 2C x )a = 0, ∀a ∈ R 6 , F x ∈ R 6 the interaction force applied by the human-teacher and u ∈ R 6 the control input, i.e., the force/torque control action (6). It is easy to show that by utilizing the update law (11), the closed loop system consisted of (13), (6) and (11) is not passive with respect to the velocity of the end-effector v, under the exertion of the external force F x , since the first term of (11) generates energy. To resolve this problem, we utilize an energy tank state L(t) ∈ R ≥0 , which stores a part of the dissipated energy from the active damping term. To exploit this stored energy we replace the first term of (11) with a smooth switch function of L, depicted in Fig. 4 with black solid line; namely h(y) : R → [0, 1] which equals to 1 if y > δ and equals to 0 if y ≤ δ, with δ ∈ R >0 being a very small preset parameter 1 , This allows the temporal properties to be communicated as long as the energy tank is not empty. Further, we impose the forward evolution of the DS from the current state, by keepingσ positive via a smooth ramp function h(y)y, depicted in Fig. 4 with red dashed line. We assume that when the user drives the robot in the opposite to v d direction, he/she has the intention to modify and not to inspect backwards. Thus, the final proposed update law for σ becomes:σ = h(y)y, with The energy tank's level L dynamics are: where a ∈ (0, 1) is the percentage of the dissipated energy rate which is stored in L. The energy tank's level L will always be positive since L = 0 impliesL ≥ 0, whilė L > 0 andL < 0 means that energy is stored and released respectively. Equation (16) implies that the energy is released given that the energy tank is not empty (h(L) > 0) and wheṅ σ > 0. This is the case when the user modifies the trajectory in the opposite direction of v d and hence the evolution of σ should stop and energy should be released so that the system remains passive. Theorem 1: The closed loop system consisted of (13), (6), (16) and (14) is strictly output passive with respect to the velocity output v, under the exertion of the interaction force F x .
Proof: The proof is given in the Appendix. Hence, the proposed control scheme consists of the control law (9), the update law (14) and the energy tank dynamics (16). Algorithm 1 describes its discrete time implementation, where T c ∈ R >0 is the control cycle. Compute u from (9), given x k , v k , x d,k

5:
Send u as a force/torque command to the robot 6: Compute ∂U ∂σ , given x k , x d,k , v d,k

7:
Computeσ from (14), given ∂U ∂σ , L k 8: Integrate DS (5) 10: ComputeL from (16) given L k , v k , ∂U ∂σ 11: L k+1 := L k + T cL Update L 12: end while Remark 3: In order to make the calculation of y appearing in the update law (14) independent to the magnitude of v d , one can select the following varying gain in the gradient descent law k g (σ) = κ v d (σ) +ε , with κ ∈ R >0 being a positive constant and ε ∈ R >0 a relatively small value. Notice that such a choice does not affect the passivity proof.

D. Training dataset synthesis
Similarly to [8], the intention of the human-teacher for modification is identified when the thresholds r, s are exceeded. Towards this direction, let us define the following region: Ω(σ) {x ∈ T : ψ(x, σ) ≤ 1}, with ψ(x, σ) max ep(p,σ) r , eo(Q,σ) s . Based on region Ω, the old segment of the previous training dataset between σ o = σ(t o ) and σ i = σ(t i ) is replaced with the newly demonstrated segment of the modification between t o and t i , with t o being the actual instance of exceeding the thresholds r, s and t i being the actual instance of returning within Ω. If no intention is identified within a single iteration of the kinesthetic teaching, i.e., if x(t) ∈ Ω, ∀t : σ(t) < T , the kinesthetic teaching has finished and the system is ready to autonomously execute the task.

IV. EXPERIMENTAL EVALUATION
Experiments are conducted utilizing a KUKA LWR4+ robotic manipulator, with a control cycle of T c = 1ms. The proposed controller is compared to the control scheme proposed in [8], i.e., g = 0 and not utilizing the energy tank in the update law (14), i.e., L = 0,L = 0 in (14). The parameters of the proposed control scheme are set as follows: κ = 900 (reflecting the convergence rate of the gradient descent term), k p = 0.2, k o = 0.4 (reflecting the VF signal intensity in position and orientation respectively), g = 0.4 (reflecting the magnitude of the force felt by the user outside the VF), r = 0.02m, θ = 10 o (reflecting the radius of the VF in translation and orientation respectively), D = 0.2I 6 , ε = 0.05, δ = 10 −10 (relatively small values), a = 0.9 (being the percentage of the dissipated energy stored in the energy tank). For the encoding of the kinematic behavior, a DMP is utilized with a total of 400 kernels in each axis 2 .
The experimental scenario emulates the task of applying liquid material on the upper edges of the cylinders of an internal combustion engine. The robot is already taught how to apply material on the two of the engine's cylinders, as shown in Fig. 5a, but a variant of the task requires the behavior's modification, as shown on Fig.5b, which involves the application of material in an additional cylinder's edge of the engine. Notice that the additional cyclic edge is between the two others and has different orientation.  In [8], the results are compared with the case of using a gravity compensated robot agnostic of the previously learned behavior, revealing the reduction of the duration of teaching and thus the user's cognitive load. In this experimental evaluation our aim is to reveal the extra advantages of the proposed control scheme as compared to [8] in terms of time required and physical load reduction.
The path of the end-effector is depicted in Fig. 6, utilizing the proposed controller and the controller proposed in [8]. Notice that, due to the lack of the term h(L) in [8], the solution is trapped in a local minimum and hence the user is unable to return to the tube of the virtual fixtures, as opposed to the proposed control scheme which is able to overpass this local minimum by utilizing the stored energy. The user identifies that the end-effector did not re-entered the virtual sphere defined by r, θ and demonstrates the whole segment until the end of the motion. As a result, the time required for this procedure was approximately 21s for [8] and 30% less for the proposed controller.  In Fig. 7, the two terms of the right-hand side of (16) are shown separately to demonstrate the necessity and the functionality of the energy tank, for two different time instance, namely after the penetration of the virtual fixture and after the re-entrance within it. Notice that during the whole procedure the energy tank is never depleted, since the collected energyL in = av T Dv is generally larger than the provided energyL out = −h(L)h (y) ∂U ∂σ in this experiment, as also indicated in Fig.7. However, during the time windows depicted in Fig. 7, the energy tank had to provide energy to the system in order to maintain passivity, which is visible in Fig.7 due to the fact thatL out has also negative values.
The evolution of x = [p T Q T ] T in time is shown in Fig. 8, utilizing the proposed control scheme. For comparison purposes, the synthesized training dataset is also depicted, which corresponds to the demonstration required to be executed by the user utilizing a system agnostic of the previously learned behavior. Notice that with the proposed controller, the user is able to inspect and validate the segments of the kinematic behavior before and after the modification (light grey boxes) faster than the training dataset, while it takes the same time to demonstrate the modified segment, i.e., dark grey boxes has the same length. Notice that, due to this fact, the time needed for the whole modification procedure is significantly reduced, as compared to the re-demonstration of the whole  The evaluation in terms of the physical load required by the human-teacher is done using a validation demonstration along the modified trajectory, i.e., x(t) ∈ Ω, ∀t, utilizing the proposed controller and the controller proposed in [8]. The total energy transferred is utilized as a representative metric of the physical load required by the user, which is calculated as E = T d 0 v T F x dt, with T d being the total duration of the demonstration, and F x the external force estimation provided by the robot. The control scheme of [8] required a total energy of 4.861J to be transferred from the user to the robot, while the proposed control scheme required approximately 17% less, as expected, due to the enhancements of the proposed method.

V. CONCLUSIONS
In this work a controller for providing haptic cues of the robot's kinematic behavior of both its spatial and temporal properties to the human-teacher is proposed, which enables the inspection and partial modification of kinematic behaviors outside the VF. The proposed controller is proved to be passive and it is experimentally validated and evaluated. The experimental comparison of the proposed work with the previous one [8], shows its superiority in terms of the energy required to be provided from the human-teacher to the system and the time required for the whole modification procedure are shown to be significantly reduced. One limitation of the proposed method is that it does not address cases of singular configurations since the haptic rendering is inaccurate in these cases. Future work will study a subjective evaluation by users, with different background and expertise.