A Shared-Autonomy Approach to Goal Detection and Navigation Control of Mobile Collaborative Robots

—Autonomous goal detection and navigation control of mobile robots in remote environments can help to unload human operators from simple, monotonous tasks allowing them to focus on more cognitively stimulating actions. This can result in better task performances, while creating user-interfaces that are understandable by non-experts. However, full autonomy in unpredictable and dynamically changing environments is still far from becoming a reality. Thus, teleoperated systems integrating the supervisory role and instantaneous decision-making capacity of humans are still required for fast and reliable robotic operations. This work presents a novel shared-autonomy framework for goal detection and navigation control of mobile manipulators. The controller exploits human-gaze information to estimate the desired goal. This is used together with control-pad data to predict user intention, and to activate the autonomous control for executing a target task. Using the control-pad device, a user can react to unexpected disturbances and halt the autonomous mode at any time. By releasing the control-pad device (e.g., after avoiding an instantaneous obstacle) the controller smoothly switches back to the autonomous mode and navigates the robot towards the target. Experiments for reaching a target goal in the presence of unknown obstacles are carried out to evaluate the performance of the proposed shared-autonomy framework over seven subjects. The results prove the accuracy, time-efﬁciency, and ease-of-use of the presented shared-autonomy control framework.


I. INTRODUCTION
In the mid-1940s, Ray Goertz introduced a remotely controlled manipulator [1] that led to one of the most promising fields in robotics, called teleoperation. Since then, several application domains have benefited from this concept such as medical surgery [2], space exploration [3], and disaster response [4]. The stability and transparency of such teleoperation systems [5] have undergone significant progress in an attempt to ensure safety and interaction efficiency. However, the intuitiveness-of-use of the teleoperation interfaces, another equally important aspect, has received less attention [6]. This is of particular importance when users must operate multi-degrees-of-freedom follower systems that are embedded with dexterous loco-manipulation (simultaneous locomotion and manipulation) potential such as mobile manipulators or humanoids [7]. In fact, in most of the related developed frameworks (see e.g., [8]), a user, while watching a graphical user interface (GUI), constantly generates the desired motions for the follower robot to accomplish the Block diagram of the proposed framework, which consolidates the advantages of both motion control interface and eye-gaze tracking system. The mode detection block decides which mode (teleoperation or autonomous) should be active at each moment based on the signal S received from the target detection block. As a result, the proper twist commands are sent to the robot, allowing the user to change mode at any time.
desired task. Nevertheless, full teleoperation of such devices may be perceived as annoying during prolonged or repetitive tasks, e.g., for long-distance target reaching operations. A practical solution could be the use of fully autonomous systems that can navigate through selected indoor [9] and outdoor [10] coordinates. However, these systems lack the flexibility, reactivity, and supervisory skills of human operators, which can be crucial, e.g., when avoiding unexpected obstacles. Therefore, the combination of autonomous and corrective control inputs with human monitory commands arises as an effective solution to improve the user experience during the performance of remote robotic tasks. These combined systems can be categorized as shared-autonomy teleoperation [11], [12], [13].
A key factor in the shared-autonomy control is the choice of the appropriate human input sensory data to be used for the intention detection. In [14], the authors utilized handsketches on a screen to generate a desired path for a mobile robot to follow and avoid the obstacles. The user has to look at the screen and sketch the path simultaneously. This required continuous attention and conscious effort from the user. To limit this effort, other approaches propose to use more intuitive data such as gaze tracking [15].
Gaze estimation poses several challenges [16], eye-tracker inaccuracies, rapid eye movements, eye-blinking, etc. Hence, while this method is not encouraged to be used independently, it is best suited to be employed together with a motion-intention interface. In [15], the authors developed a shared-autonomy system, in which pick and place tasks were remotely performed by utilizing gaze and joystick data. Moreover, a predictive system based on the gaze information and the body motion sensors is proposed in [17] and [18], which improves the maneuverability in the leader-follower systems in performing the reach and grasp tasks for a fixedbase manipulator. Although these solutions have made the remote grasping tasks more intuitive and fast, the proposed frameworks cannot be applied to mobile robot navigation due to the different nature of the motions.
Consequently, the objective of this work is to introduce a novel shared-autonomy framework for remote navigation control of mobile platforms. This study proposes a hybrid interface that merges information from a control-pad device and human eye-gaze to calculate the intended reaching goal by the mobile robot. Once a goal is detected, the robot's autonomous navigation controller brings the platform to the target. Still, during this autonomous motion, the framework keeps a human-in-control approach, in which the user is given full control over the robot's motion through the controlpad device. Our proposed method unloads users from the low-level control of the robot, allowing them to focus on the more high-level decision-making commands. For example, in a logistics environment, the user would be able to command the final location of the robot without needing to manage its motion during the full trajectory.
The performance and intuitiveness-of-use of the sharedautonomy navigation interface are evaluated on 7 users and compared to the full teleoperation mode where no autonomous goal detection and navigation are exploited.

II. METHODOLOGY
The schematic of the proposed framework is displayed in Fig. 1, which incorporates the advantages of motion control interface and eye-gaze tracking system. The mode detection block decides which mode (teleoperation or autonomous) should be active at each moment based on the signal S received from the target detection block. This block includes the gaze estimation (Section II-A) and control-pad interface.
As an overview, when a target is established by the gaze tracker system, if the user starts to navigate the robot towards this target, a message is shown in the GUI's window which indicates the release of the control-pad device. By doing so, the mode detection resets the corresponding mode flag and the autonomous controller sends the required twist commands V a = [v a , ω a ] to the mobile robot. (the twist command V is composed of linear velocity v and angular velocity ω, see Fig. 2.) If the operator starts using the control-pad device, both at the beginning or in the middle of the autonomous control, the mode detection block triggers the mode flag. This activates the teleoperation mode, in which the mobile robot is controlled by the user through desired twist commands V t = [v t , ω t ]. Teleoperation is most effective within the proposed method in case of failures due to sudden and unpredictable environment changes, giving the user full control over the robot's motion. User continuously receives visual feedback from an RGB camera sensor.
The different components of the shared-autonomy framework are explained in detail hereafter.

A. Target detection
To detect the desired target for autonomous navigation, the gaze tracker information is enhanced with the inputs coming from the control-pad and the actual robot's heading angle, being the latter updated by the odometry sensors.
Starting with the gaze tracking system, this measures the gaze coordinates in pixels with respect to (w.r.t.) the top-left corner of the computer monitor, i.e., frame {T } ( Fig. 2-a), T where u T and v T are horizontal and vertical pixel coordinates of {T }, respectively. Since the origin of the GUI's coordinates is considered to be located at frame {S} ( Fig. 2-a), p T needs to be first transformed to this frame by making use of a map-function, f T S (·) : W gui and H gui are the pixel-based width and height of the GUI window, respectively. Next, another function f SC (·) : R 2 → R 3 is utilized to express p S in camera coordinates {C} ( Fig. 2-b). In this work, a predefined discrete mapping is employed to convert pixel coordinates Thus, a set of specific coordinates (with n points) in pixel space , · · · , n}}) and its representative set of points in the camera coordinates in the Cartesian , · · · , n}}) are considered to be known beforehand. p C is expressed in the robot's base frame, {R} ( Fig. 2-b), by employing the constant transformation matrix between camera and robot's base frame as p T R , 1 Based on the nearest neighbor search (NNS) algorithm [19], at each iteration, the measured gaze information in {S}, p S , is compared with the stored data in S uv during gaze fixation time T f ix (time-period in which the eye is kept aligned around a certain point located at the screen, which is dependent on the person who carries out the experiment, and it usually ranges in 2 to 4 seconds [20]). If the shortest Euclidean distance between p S and a particular s uv i ∈ S uv is lower than a selected threshold (d tsh ) during T f ix , s uv i is chosen as the intended pixel-space goal. In the camera's coordinates, s uv i is expressed as p Ci ∈ S xyz , which is then converted to p Ri by employing R T C . Thus, the fixed-gaze direction is obtained by: φ f ix = tan −1 p Ri y , p Ri x . This direction is updated based on the user will, who is allowed to restart the operation to modify the estimated location.
For calculating the intended direction at instance k, first, φ f ix is updated due to the robot's movements during the time interval t 0 − t 1 , by making use of the robot's odometry data (x − y position of the robot) and the vectors addition property: Second, the average value of the heading angle, measured by the odometry sensors, over the past N samples is calculated: If |θ(k) − φ f ix | < Ψ tsh , the user is informed through the GUI to release the control-pad to activate the autonomous controller setting θ(k) as the intended direction.

B. Autonomous controller
The autonomous controller designed based on the bicycle model for differential-drive mobile robots [21]. According to this model, the platform motion is described as: v is the linear velocity and ω is the angular velocity. X = [x, y] T and θ are the current position and orientation of the platform, respectively, being the current pose of the platform The controller is used during two phases: path-following and pose-correction. The former is in charge of navigating the robot on the desired generated path (starts in t 1 and ends in the target pose at t 4 ) while the latter contributes to the realization of the robot's goal pose. This pose is considered to be located at a distance of d saf e before the real goal for safety reasons and avoiding collisions with it (p g = [x g , y g , θ g ] T ).
To assign desired initial and final values for position, velocity, and acceleration, a 5-th order trajectory ( T ) is generated for the path-following phase, i.e., from the instance the user releases the control-pad to the moment the robot reaches its target position (time interval t 1 − t 4 in Fig. 2-b). The robot's position in the release and target moments are defined as X r = [x r , y r ] T and X t = [x t , y t ] T , respectively. As a result, X d is given by: ∆X = X t − X r ,Ẋ r =Ẋ t = 0,Ẍ r =Ẍ t = 0, and τ = t/T with a total trajectory time T (time interval t 1 to t 4 in Fig.  2 and v is chosen based on the maximum linear velocity v max allowed for the mobile platform (v = α v max , being α the percentage of the maximum velocity). After releasing the device, the robot autonomously navigates, on a straight line, towards the target pose following the next control laws: d is a distance behind the pursuit point, and e and θ are defined as [21]: During this phase, the user-in-control approach allows the operator to take full control over the platform. There are several cases where this can be needed, e.g., when obstacles are not detectable (glass objects), change of task goal, and sensor failures. When the user takes control of the robot by using the control-pad, time t 2 in Fig. 2-b, teleoperation mode is activated triggering the mode variable, and consequently, the autonomous navigation is put on hold. After bypassing the obstacle at time t 3 , the user activates the autonomous mode simply by releasing the control-pad. Consequently, the robot resumes the autonomous navigation generating a path (5) towards the previously detected target pose (p t ). The path-following controller (6) then navigates the robot and it reaches the goal at time t 4 ( Fig. 2-b).
After reaching the target pose, the move-to-pose controller brings the robot to the goal pose p g at time t 5 by employing the following control rules [21]: k ρ > 0, k β < 0, and k α > k ρ .

III. EXPERIMENTS
The performance of the proposed framework is evaluated in a navigation task in two scenarios, and compared to the full teleoperation mode. The software architecture is implemented based on the Robot Operating System (ROS) and MATLAB software. In the first set of experiments, i.e. scenario 1, the aim is to analyze the robot motion and the subjects' efforts in an obstacle-free environment in two modes: shared-autonomy and teleoperation. In scenario 2, we repeat the scenario 1 but in the presence of an obstacle in the environment and testing more subjects. It should be noted that in this case, we use a fixed positioned obstacle to allow a better comparison between users. Still, the same method is applicable to moving and unpredictable obstacles.

A. Experimental Setup
The experimental set-up is shown in Fig. 3. The mobile platform used in our experiments (MOCA [8]) is a recent integration of an Omni-directional base (SUMMIT XL STEEL from Robotnik), a robotic arm (Franka Emika Panda) and an under-actuated hand (Pisa/IIT soft hand [22]). As this work mainly focuses on the remote navigation control of the mobile platform towards a target goal, the arm and the hand are controlled to keep a fixed configuration.
During the experiments, the user sits in front of a computer monitor with a dual-shock SONY PS4 control-pad. Visualdata is fed-back from the platform's camera through the provided MATLAB GUI in Microsoft Windows 10 operating system ( Fig. 3-a). In the shared-autonomy mode the task's current state is also shown to the user. The eye-tracker application is based on Myex, which is specifically developed for Tobii EyeX system [23] to estimate gaze location. The resolution of the computer monitor, for which the eye-tracker is calibrated (Tobii eye-tracking core software), and the GUI window are 1680×1050 pixels and 640×480 pixels, respectively. (the robot's RGB camera resolution is also 640×480 pixels) The goal is set as one object located on a desk (Fig. 3-d). Besides, several boxes are placed to enhance the perception of the final safe pose while the robot approaches the desk (Fig. 3-c). These boxes are never used for the control of the platform, but only for the mentioned user perception. For the second scenario, an obstacle is additionally located along the way of the platform (Fig. 3-c).
The first scenario is performed by one person who is not familiar with the shared-autonomy concept (male, 27 years old, and without eye-glass). He executes five times the task in each mode. For the second scenario, 6 subjects are selected to perform the navigation task in each mode while avoiding the obstacle. Subjects are of different ages (24 to 37), gender (4 males and 2 females), and 5 with glasses and 1 without, and have little knowledge about the shared-autonomy navigation but are familiar with teleoperation.
During the teleoperation mode in both scenarios, subjects are asked to navigate the robot from the start pose ( Fig. 3-b) to the goal pose (Fig. 3-d) while watching the camera data on the screen. In the shared-autonomy mode, the subjects are asked to perform the same task with the developed shared-autonomy framework. Performance of the subjects during the experiments will be evaluated in terms of task completion time, and position and orientation errors of the final platform's pose. Moreover, the subjective perception will also be considered through a Likert scale subjective questionnaire [24].
The controller parameters during the shared-autonomy mode are selected as:

B. Experimental Evaluation
Quantitative and subjective analyses are carried out for both scenarios. Quantitatively, the interpretation of quantilequantile plots [25] yields that the data does not follow a normal distribution in both modes and scenarios. Thus, the following parameters are studied: median (M) and interquartile range (IQR) of the position error e p = X g − X f , with X f being the final position of the platform; and the orientation error e θ = |θ g − θ f |, with θ f being the platform's final heading angle. The final pose of the platform p f = [X f , θ f ] T is given by the odometry sensors when the task is finished. These quantitative results are presented in Table I. Also, the task completion time is recorded in each trial and the average values, related to each scenario and mode, are then calculated. It should be noted that the gaze fixation time is also included in the shared-autonomy task achievement. Additionally, statistical tests are invoked to compare both modes in the sense of position and orientation errors. Due to the small amount of subjects available during the experiments, it is not possible to prove a particular distribution type of the data. Therefore, non-parametric tests should be considered. Also, in each scenario, the experimental conditions are kept fixed in both modes, and the same person is compared when executing the trials in mode 1 in and mode 2, being the experiments defined as paired. Based on these experimental characteristics, the Wilcoxon signed-rank test [26] is employed for the statistical evaluation with the significance level α set to 0.05.
The subjective analysis is done through a Likert scale subjective questionnaire [24] during scenario 2. The answers are mapped to numerical grades between -1 (strongly disagree) to +1 (strongly agree). The following statements are asked in the survey: (Q1) It was physically tiresome to accomplish the task, (Q2) It was psychologically tiresome to accomplish the task, (Q3) The GUI accurately detected the intended location in around 3 seconds, (Q4) I had good control over the task performance (controlling the robot and obstacle avoidance), (Q5) It was not intuitive to activate/deactivate the autonomous motion controllers, (Q6) It was not intuitive to understand the current action to perform (GUI), (Q7) It was easy to keep the focus on the task execution (robot motions), (Q8) I think by using the proposed framework I can repeat the same task for a longer time and in a better way (more precisely and faster) than in pure teleoperation, (Q9) I felt satisfied with the current task. Statements [3-5-6-8] are just applicable to the shared-autonomy control structure. Table I shows the results of the median and IQR values of the position and orientation errors, being these greater in teleoperation than in the shared-autonomy mode in both scenarios. Moreover, the presence of an obstacle in the second scenario increases the final position error in both modes. This is more notable in the teleoperation mode, which may be due to the loss of perception of the users of the relative orientation between the platform and the goal after the obstacle is passed.

C. Experimental Results and Discussion
As illustrated in Fig. 4, during scenario 1 in teleoperation mode, the robot deviates from the optimal path in all trials. Indeed, the robot moves more accurately towards the goal in the shared-autonomy mode (less position and orientation error, Table I). Moreover, the average time that the user commands the robot with the control-pad is, as expected, higher for the teleoperation (28.80 sec) than for sharedautonomy (27.80 sec) mode. Fig. 5 shows the robot's odometry data after adding an obstacle to the environment in scenario 2 with the 6 subjects in both modes. In the presence of an obstacle, the subjects tend to start deviating from the optimal path sooner, which may be related to the fear of collision. Additionally, in the teleoperation mode, the participants perform several forwardbackward motions with the robot, which does not happen in the shared-autonomy mode. As shown in Table I, again this reflects in bigger final pose errors. During these experiments though, the average execution time with the shared-autonomy mode (47.67 sec) is greater than the teleoperation one (44.65 sec), which can be due to the need of change between modes during the obstacle avoidance. Moreover, it must be noted it takes about 3 ± 0.5 sec for goal detection through gaze and around 1.5 ± 1 sec to estimate the desired path by comparing the φ f ix andθ. Moreover, gaze data during the goal detection, is shown in Fig. 6. It can be seen that overall, the data is concentrated around the goal. Present variations in the data are due to eye-blink, head motions, etc.
Finally, the p-values calculated by the Wilcoxon signedrank test confirm that the shared-autonomy mode has statistically fewer position errors than the teleoperation mode both in scenario 1 (p = 0.0313) and scenario 2 (p = 0.0156). However, this test yields p > 0.05 for the orientation errors. The p-values for the orientation error tests of scenario 1 and scenario 2 are p = 0.0625 and p = 0.0781, respectively. A possible reason of the p-values being greater than 0.05 can be the small sample size of the data, as the obtained p-values are actually close to the threshold. Specifically, from the individual data, we could observe how one of the subjects performed very differently from the rest in terms of orientation error in the second scenario. Due to the small size of our data set, this could have a big impact on the overall evaluation, yielding a p > 0.05. Future works envisage the collection larger data sets of subjects to minimize the influence of possible individual deviations. The use of more subjects will also have an impact on the average execution time, allowing a better comparison of both modes in completion time.
Regarding the subjective questionnaire, results are illustrated in Fig. 7. The participants show less physical and psychological effort for the navigation execution with the activated shared-autonomy algorithm than with the teleoperation mode (Q1-Q2). Instead, for the motion control and obstacle avoidance (Q4-Q7), differences are quite low and the satisfaction level of both modes is almost the same (Q9). Regarding the developed framework, the participants are generally satisfied with the GUI; stating in Q3 that it detects the intended target accurately and in about 3 sec. Plus, subjects have indicated no problems for understanding the Robot's position in teleoperation and shared-autonomy modes (scenario 1). Each color is assigned to one trial. current action to perform (Q6) and for activating/deactivating the autonomous motion controller (Q5). Finally, subjects show an agreement on the fact that they can repeat the same task for a longer time and in a more precise and faster way with the developed interface (Q8).

IV. CONCLUSIONS
In this work, we proposed a shared-autonomy framework for the navigation control of a mobile robot in remote environments. The developed control interface processes the data from an eye-tracking system and a control-pad device. From these data, the target goal is quickly identified activating the autonomous navigation controller. The users can go back to control robot movements at any time through the control-pad, e.g., when an obstacle is observed through the monitor. After releasing the control-pad device, the control framework smoothly switches back to the autonomous mode and navigated the mobile platform towards the previously identified goal.
Experiments with 7 subjects in pure teleoperation and shared-autonomy control modes revealed the accuracy, time-  efficiency, and ease-of-use of the proposed shared-autonomy interface. The latter was evidenced by a reduction in the physical and cognitive loads thanks to the intelligent goal detection and robot navigation algorithms developed in this work.
Future works envisage the addition of an obstacleavoidance protocol that will handle easy to detect obstacles, being the operator's cognitive skills only required in case of unexpected events or trajectory corrections.