Intuitive and Safe Interaction in Multi-User Human Robot Collaboration Environments through Augmented Reality Displays

As autonomous collaborative robots are more widely used in work environments alongside humans it is of great importance to facilitate the communication between people and robotic systems, in a way that promotes safety and productivity. To this end, we propose an Augmented Reality (AR) based system that allows workers in a human-robot collaborative environment to interact with a robot while also receiving information regarding the robot state and plans that relate to the human’s safety and trust, such as the intended movement of the robotic arm or the navigation plan of the mobile platform. To evaluate the effectiveness of the proposed system we conducted experiments with 13 participants, where two users had to work in the same workspace while being assisted by a mobile manipulator. We measured the task completion time as well as the robot idle time using our AR-based human-robot interaction system and compared them to a conventional setup without the use of augmented reality. Additional, subjective evaluations related to user satisfaction, system usability, perceived safety and trust showed that users assessed the system in a positive way and preferred AR visualization over more traditional interfaces.


I. INTRODUCTION
Human robot collaboration is a central component in Industry 4.0 [1], which necessitates robots' operation outside their restrained areas, in close distance to humans [2]. Efficient and effective human-robot cooperation, i.e. humans and robots working alternately on different tasks within a process in the same workspace, and collaboration, i.e. humans and robots interacting in a shared workspace, are having an ever increasing, important role, especially when considering large-scale industrial, manufacturing and logistics settings. The extended capabilities offered by modern autonomous, collaborative service robots increase the need for workers to perceive the intentions and status of the robot for improved safety and interaction.
Augmented Reality (AR) is an emerging technology that enhances our perception of the real word by overlaying virtual computer-generated information on top of it. According to Azuma [3], an AR system must combine real and virtual content, be interactive in real time and be registered in 3D. AR applications are becoming popular in various domains of our everyday lives such as manufacturing, repairs, maintenance, architecture and education. The rapid adoption of AR technology can facilitate the development of various ARbased human-robot collaboration tools.
All authors are with: Centre for Research and Technology-Hellas, Information Technologies Institute (CERTH / ITI), 6 th Km Charilaou-Thermi Road, Thessaloniki, Greece, 57001, Email: {gtsamis, geochan, dgiakoum, gkostave, akargakos, atsakir, dimitrios.tzovaras}@iti.gr AR can enhance human-robot collaboration (HRC) in many ways. By displaying critical information to the user it can increase their workspace awareness and productivity [4]. Novel input modes offered by AR systems, like gesture and voice commands, provide intuitive means of communication [5]. Increased user safety provided by AR can lead to reduced task completion time and costs. Additionally, AR can contribute to boosting the adoption and benefits of "explainable AI" [6] into robotics, making collaborative service robots more predictable and their actions and next steps better explainable to their human collaborators. Through an AR display, a service robot can provide indications on both its current state, action and its safety implications, as well as on its next action and plan, aspects that can be particularly precious both in real human-robot collaboration spaces, as well as in initial, training sessions of human workers that will then collaborate with robots on a daily basis.
The main focus of our work is to develop a system that provides intuitive communication for HRC, as well as increase the safety, trust and productivity in the workspace.
In this scope, our contributions are two-fold, as we present an AR system that (a) is capable of receiving intuitive human worker commands for a collaborative mobile manipulator and (b) is capable of providing indications to the human worker over the robot state, next action and safety constraints in relation to them (Fig.1). Notably, the majority of AR applications developed so far are oriented towards intuitive and fast robot programming [7], while, as further explained in Section II, the few applications realized for the actual HRC are mostly focused on the visualization of the movement of a robotic arm or the planned path of a mobile robotic platform for a specific user. Our proposed system can be used by multiple users at the same time, receive gesturebased commands in an intuitive way, while also, evaluate the worker's position relative to robot safety zones and visualize corresponding information along with moving robots. In addition, the communication in the proposed system is bidirectional, meaning that both the robot and the users can exchange information about their position in the workspace and possible collisions between them in real time.
The rest of the paper is organized as follows: Section II presents related works, the proposed human-robot collaboration approach is then described in Sect. III. Section IV provides implementation details for a real, functional system of multi workers-robot collaboration. Preliminary experimental results from user trials are reported in Sect. V and conclusions are drawn in Sect. VI.

II. RELATED WORK
Human-robot collaboration in a work environment is a rapidly expanding field of research and there have been multiple proposed ways to achieve intuitive, effective and seamless communication among humans and robots therein. One way to enable this communication is through the use of various "smart" devices. In [8] tablets were used to render mixed-reality visual environments and give commands to the robot for object manipulation. Smartwatches were used in conjunction with AR in [9]; the users could interact with the robot by giving commands through a User Interface on the watch or they could select what was visible in the AR display from the watch.
Augmented Reality displays have been used by the authors of [10] and [11], in human-robot collaborative industrial environments. The AR system provided users with immersive assembly instructions in their field of view along with production data, enabled them to give commands to the robotic agent and receive information regarding the state of the robot and its intended actions. These led to shorter production times and increased the "safety feeling" of the workers. In [12] Hernández et al. reported on an AR interface that allowed a user to specify high-level requests to a robot, to preview, approve or modify the computed robot motions. They evaluated the proposed approach, by presenting a proof of concept case, in which a user could manipulate a virtual object in order to command a robot to fetch and place a real object.
In [13] Hietanen et al. proposed a depth-sensor based model for workspace monitoring and an interactive AR User Interface for safe HRC. The proposed system was implemented in a projective AR setup as well as wearable AR head-mounted display (HoloLens) and they reported increased task completion times. Another critical aspect in modern industrial environments that has attracted research interest is the mental workload of workers and operators. In [14], [15] AR devices were used to implement an end-user programming system allowing regular shop-floor workers to program industrial robotic tasks. They reported a decrease in the mental workload of the users which can affect their performance during a task.
Head mounted displays (HMDs) and specifically HoloLens have recently been used to map robotic environments based on their depth sensing capabilities [16]. Puljiz et al. evaluated the point cloud quality provided by the HoloLens sensors, compared to a high-end laser scanner and reported more than adequate results for setting up safety zones in a previously unknown robot's workspace.
Our approach aims to extend these ideas by combining multiple user AR-based interaction with a mobile manipulator, during a human-robot collaborative task, while providing safety constraints related to both the planned robot navigation path and its intended arm movement.  Our proposed AR-based human robot communication framework takes a modern work environment into consideration, one where robots and workers are collaborating for completing the tasks at hand. We assume a collaborative industrial space with multiple workbenches, where the human workers are disassembling devices and sort specific components into boxes that the robot should then transport to another workbench. Further cases of HRC can be considered in a very similar setup, e.g. where human workers fill-in boxes that are then transferred by the robots once filled-in. In this respect, main points of interest that deserve due attention while designing corresponding human robot communication systems include the ease of using the human-robot interaction system (AR-based in our case) for passing commands to a robot, the improvement of the workers' safety and trust when they are exposed to vital information of the robot related to its state, next steps and safety zones, as well as objective, quantitative issues of high importance to the shopfloor, such as the tasks duration and the minimization of the robot's idle time.

III. AR-BASED COLLABORATIVE SYSTEM APPROACH
At the center of the workflow, exhibited in Fig. 2, the users stationed at their respective workbenches are able to set goals and send commands to the robot for box pick up. In our framework, user commands are given through pointing gestures to the box that needs to be picked-up. Then, through the communication protocol established between the AR display and the robot, the robot receives the user input and infers the required movement. The intended robot movement and status are then visualized in the workers' field of view through their HMDs using three different visualization modules.
First the navigation path is rendered on the workshop floor as a sequence of 3D green spheres. The robot navigates towards the pickup goal and when it is in grasping distance, the object detection and pose estimation are used for arm motion planning. Then, the end-effector's planned trajectory is visualized using 3D spheres along with the maximum arm movement radius as a red semi-transparent sphere. A 3D representation of the arm also animates according to the robot's real arm's intended movement. Then the robot initiates the pickup routine.
This way, all workers can perceive the safe and dangerous zones of the workspace through their AR display, having a clear understanding of what the robot intends to do before its actual action. The optical observation of the workspace of the robot in the HMD's field of view enables the workers to bypass it and not trigger the safety stop. In case the human enters the robot's workspace, a potential collision with the user is detected and thus, a warning message appears in their field of view through the HMD in real-time, informing them that they are in the workspace of the robot. The warning message on the HMDs is only cleared when intersection of the robot workspace and the human is no longer detected and the collision risk is eliminated.

A. Hardware Configuration
The proposed collaboration framework has been implemented on a real mobile manipulator suitable for industrial applications. We employed the Robotnik© RB-Kairos heavy duty mobile platform integrated with a UR10 arm manipulator from Universal Robots©. The mobile robot is an omnidirectional platform equipped with laser sensors suitable for mapping and navigation. An RGB-D camera along with a standard gripper have been integrated on the manipulator's flange to enable box detection and grasping. The UR10 manipulator has a payload of 10kg, which is adequate for our pick-and-place application. Regarding the employed human machine interaction interface, a Microsoft© HoloLens device has been selected as a head mounted display for the ARbased human-system interaction. HoloLens is an optical seethrough HMD developed by Microsoft equipped with a complete suite of sensors such as an inertial measurement unit, four environment understanding cameras, a depth camera, an ambient light sensor, built-in speakers and an array of four microphones. Through its holographic lenses (waveguides), HoloLens can display holograms in the user's field of view. It can track the user's head gaze, recognize hand gestures and voice commands and understand the surrounding space through the spatial mapping [17].
In order to enable the ambient interplay of the mobile platform and the HMD, a specific integration pipeline has been adopted. The mobile manipulator was linked to the HMD through Robot Operating System (ROS) middleware using the ROS# library which provides ROSbridge clients for .Net applications. Robot-related messages were visualized on HoloLens through the Unity3D application, operating on the Universal Windows Platform (UWP), and the communication exchange among the robot and the HMD was performed through the TCP/IP protocol. An overview of the proposed system architecture is exhibited in Fig. 3  To enable the operation of the mobile platform in the indoor industrial environment the Gmapping 2D SLAM method has been utilized which employs robot's wheel odometry and laser data to create a 2D map the environment [18]. This map is then utilized for robot path planning and navigation, which is optimized for minimizing the execution time for dynamic obstacle avoidance. The robot localization within the 2D map is performed with the adaptive Monte Carlo localization (AMCL) method that uses a particle filter to track the robot pose in the map, built in the ROS navigation stack [19].
2) Object Detection and Grasping Points: To enable robot interaction with boxes in our collaborative scenario an object detection method based on nVidia's Deep Object Pose Estimation framework [20] has been utilized. A neural network has been trained using synthetic images to identify the selected object in the image and estimate its 6DoF pose. Since the detected object will be grasped and transported from one workstation to another, additional pose refinement steps are required to facilitate object grasping. Thus, the output of the network is further processed using the depth data from the RGB-D camera, where Random Sample Consensus (Ransac) surface detection is applied in order to fix the object detection error in elevation (Z-axis) in the detected pose and to discard irrelevant parts of the point cloud. In order to minimize the error on the XY-plane, a constrained Iterative Closest Point (ICP) algorithm is employed, which minimizes the difference between the actual point cloud and the object reference model, when positioned on the initial detected pose. The next step, comprises the calculation of the grasping poses towards which the manipulator should plan its trajectory. The candidate grasping poses are generated on the 3D model of the handle of the box and transformed to the respective frame on the detected object.
3) Arm Trajectory Planning and Motion: The arm trajectory planning utilizes the MoveIt framework as an out-ofthe-box solution [21], yet parameterized for the given robot setup and collision space. Each candidate grasping pose is checked against an arm trajectory which is calculated with the RRTConnect planner [22]. Moreover, the trajectory plan is performed in several steps starting from the arm parking pose, to the pre-grasping pose, the grasping pose and the retract pose, so that each candidate grasping pose can be examined faster in case of a planning failure. In case all parts of the planing are successful, the trajectory is executed in a single, smooth movement, while always checking for collisions with the environment or nearby people. The collision area is passed to the MoveIt framework as an Octomap representation formed by the depth data obtained from the RGB-D sensor. In case the robot detects a human in its workspace while the manipulator is in motion, the safety stop is triggered; the trajectory execution is paused and it only continues when there is no one in the robot's workspace.

C. Augmented Reality Components
The contextual information to be provided in HoloLens is processed from a Unity3D application developed for the UWP platform employing Mixed Reality Toolkit and ROS# library. Unity3D is suitable for the development of AR applications as it can render 3D meshes in a device's camera view. The main component of the Unity application is a scene which initially is an empty 3D space that consists of a virtual camera aligned with the device's physical camera. As the user wearing HoloLens moves, so does the virtual camera of the scene, rendering the 3D content from the correct perspective.
1) User Interface and Visualization: The HoloLens app user interface (UI) consists of a single panel containing buttons to calibrate the view and to send commands to the robot. It also contains textual information regarding the current state of the robot. This panel implements a tag-along component so it follows the user unobtrusively, allowing intuitive interaction. The contextual information is rendered on HoloLens proactively prior to any robot motion, informing the user on the intended motion and the occupied space (Fig. 4). In particular, before any movement of the mobile platform, the robot global path is visualized as 3D spheres on the floor, while in the case of the manipulator, we visualize the planned trajectory of the end effector for the selected plan as a sequence of 3D spheres as well.
The maximum distance from the base of the arm to the end-effector, that indicates the manipulator's operation space during arm motion, is visualized as another separate and semi-transparent sphere. Those 3D shapes contain colliders that are triggered once a collision is detected with the user wearing the Head-Mounted Display. In order to detect such collisions, a collider is also attached to the virtual camera and subsequently to the user's transformation in the 3D scene.
Using the universal robot description format file (URDF) provided by ROS we also visualize a 3D model of the robot's arm inside the user's field of view overlaid on top of its real world counterpart. The virtual arm animates following the planned arm poses offering an immersive real time preview of the intended movement of the arm to the user. 2) Localization and coordinate system: In order to align the virtual content rendered by HoloLens with the real environment we need to establish a common reference coordinate system for the robot and HoloLens. To achieve this, we used AprilTag markers placed on a fixed position in the workspace, which the user wearing the HoloLens has to scan as an initial step. The HoloLens camera is used to detect and track the image target. Once the calibration is completed the users can move freely in the workspace as the HMD keeps track of their origin and movements through its self-localization and mapping capabilities. As a result, each transformation exchanged between ROS and the Unity app uses the same coordinate frame as reference. However, due to the discrepancies between Unity and ROS conventions regarding the coordinate system, an intermediate transformation from one coordinate system to another is required each time geometrical information is exchanged.
3) Spatial Mapping: In order to detect surfaces and map the user's workspace we used the spatial mapping API provided by Microsoft. The sensors of HoloLens recreate a detailed spatial mesh of the surroundings which enables precise collision detection. HoloLens continuously scans the environment in order to receive spatial mapping data and create spatial surfaces. Each spatial surface is a representation of a real-world surface as a triangle mesh attached to the application's coordinate system. During the application's runtime, the spatial map is continuously updated as new data are gathered from the environment through the sensors. For each new spatial surface acquired, a spatial collider component is calculated that is later used for collision detection and user input.
4) User input: The user sends commands to the robot using an Air-Tap gesture. A cursor is displayed at the center of their view that follows their head movement. In the studied collaborative scenario, the user taps on the spatial mesh of the workspace to select the box that is ready for pickup. In order to calculate the actual position of the pick up goal, that the user indicates in the common-referenced coordinate system, a ray casting method [23] has been implemented. An invisible ray is launched from the center of HoloLens through the cursor and detects whether any colliders lay in its path. The point in 3D world space where the ray intersects with the spatial collider is the position of the pick up goal in the common reference frame. Then, through the main panel by clicking the "Send" button the coordinates of the pick up point are sent to the robot.
Regarding the communication between the Unity3D application and the robot, a ROSbridge client framework has been utilized and implemented by the ROS# library. Through the client, the application can subscribe and publish to topics set up by ROS and can exchange data encoded in JSON format, containing information related to the robot's status, position, navigation path and arm pose. The data are streamed through a WebSocket transport layer over a local wireless network.

V. EXPERIMENTAL EVALUATION
In order to evaluate our system we performed user trials within a laboratory environment, simulating the industrial setup described in Section III. The basic purpose was to assess the usability of the system, user satisfaction, perceived safety and trust, as well as the required time for the mobile robot to complete manipulation tasks when workers are passing nearby.

A. Experiment Setup
Two workbenches were set up in a confined space in the laboratory, as shown in Fig. 5. Each workbench was manned by a person, performing an action, simulating a real working environment. In this space the mobile manipulator, as described in Sect. IV-B, was also placed, to assist the two persons on the workbenches. When the user on workbench A finishes the task, s/he indicates to the robot which box to come and pick up. This communication can happen either through a conventional tablet app (baseline approach), or through a head-mounted display providing AR-based human robot communication. Upon the arrival of the robot to the workbench A, the box pick-up task is initialized. During the manipulator's movement the person from workbench B passes near the robot, simulating normal worker movements in a collaborative industrial space. After a while, the worker returns to the workbench B passing from the same point. The human working at workbench B can either be equipped with the HMD or not.
In case both users wear the HoloLens devices, they are both able to observe the same virtual content in real time. For example, when one user sets a pick up goal for the robot, the planned navigation path appears also to the HoloLens devices of the other user.

B. Non-AR Baseline
As a baseline, a separate application was developed for an Android tablet without the AR visualization offered by HoloLens. The application consisted of a single screen with four buttons. The user had to select the box to be picked by the robot, by pressing the corresponding button. The positions of the boxes were predefined and thus, the button selection was translated into coordinates in the map that were sent to the robot as target. It is apparent that when the worker in our experiments uses the baseline application, they have no awareness regarding the intended motions of the robot and the manipulators workspace area. For the safety stop, the on-board laser sensor of the robot was used. In case that a movement in close vicinity to the robot was detected, while the grasping motion was executed, the arm would stop and wait until no one was passing near the robot. However in this case, no explicit visual feedback regarding the workspace of the robot was provided to the person passing by.

C. Participants & Procedure
A total of 13 participants took part in the study. The mean age of the participants was 28.9 (±4.35) years. The total completion time of the scenario, the time spent for the arm movement routine and the idle time of the arm due to safety stops were measured.
Each participant performed the same task four times; they assumed the role of Worker A, stationed at workbench A, with and without the HMD and then they assumed the role of Worker B, stationed at workbench B and moving in the collaborative space, with and without our AR system, as described in Sect. V-A and graphically illustrated in Fig. 5.
After the test, the participants were asked to fill-in two different questionnaires that included questions for both (a) the usability of the system, as well as (b) their satisfaction from the system, the perceived safety and trust. The first part of the questionnaire was based on the System Usability Scale (SUS) [24], a common tool for assessing the usability of a product such as an application. It consists of 10 questions with five response options each; from Strongly agree to Strongly disagree. The second questionnaire included a set of 7 Likert (1-5) questions (Table I) measuring the subjective satisfaction, perceived safety and trust of the users in both scenarios (with and without the AR display). Each participant filled-in the same questionnaire two times, once after being worker A and worker B with the HMD and once after being worker A and worker B without the HMD (baseline).

D. Results
All participants completed the tasks successfully. The average SUS score for the HoloLens application and the baseline solution was 85.83 and 77.3, respectively. According to [24] any system above 68 can be considered as usable yet, the higher the score, the more usable the system is. We can deduct from the score that the HoloLens AR application was found to be highly intuitive for the users. The average of positively-worded questions related to the ease of use was higher in the HoloLens case, while the average of negativelyworded questions related to system complexity was lower. Results for both systems are presented in Fig. 7.
Based on the user satisfaction questionnaire results, when it comes to information visualization, perceived user safety Proposed AR method Baseline Fig. 7. Average SUS scale rating results for HoloLens and baseline (Score 5 denotes "strongly agree" and 1 "strongly disagree) -Q1: I think that I would like to use this system frequently, Q2: I found the system unnecessarily complex, Q3: I found the system easy to use, Q4: I think that I would need the support of a technical person to be able to use this system, Q5: I found that the functionalities of the system were well integrated, Q6: I thought there was too much inconsistency in this system, Q7: The system is easy to learn, Q8: The system is very cumbersome to use, Q9: I felt very confident using the system, Q10: I needed to learn a lot of things before I could use this system. and feeling of trust towards the robot the AR system was found to outperform the baseline solution significantly, though it may offer limited ergonomics due the weight and the narrow field of view of the device, as it can be seen in Table I.
Regarding the completion times of the task with or without the AR systems, the results are also indicative. In the tests, where the users were wearing the HMD and were able to see both the current and the planned state of the robot, they could avoid the workspace of the robot much more easily and thus, not trigger the safety stop. As a result, the idle time of the robot was reduced in the AR display case compared to the baseline (Fig. 8). On average, 4.50% of the total arm movement execution time was idle time using the HoloLens app, while this value was 18.79% without AR. This constitutes a reduction in idle time of 80.37%.
Lastly, overall completion times of the scenario were also reduced by 5.60% on average. This was due to the reduction in idle times, as well as due to the better ability of the HMDwearing users to avoid the planned navigation path of the robot, while moving around. In this work we proposed a novel AR-based system enabling safe interaction and collaboration between multiple workers and a robot in a shared industrial environment. The system utilizes a state of the art head-mounted display offering an immersive user interface with visual cues related to the status and intended movement of the robot. In our experimental setup of a simulated industrial task we found improved task completion and robot idle times using our system, with less interruptions to the overall workflow and more clear information representation of the task state. In addition, users perceived the AR system in a positive manner compared to a more traditional interface as it increased their feeling of safety and trust towards the robot during the task.