A Human-Aware Method to Plan Complex Cooperative and Autonomous Tasks using Behavior Trees

—This paper proposes a novel human-aware method that generates robot plans for autonomous and human-robot cooperative tasks in industrial environments. We modify the standard Behavior Trees (BTs) formulation in order to take into account the action-related costs, and design suitable metrics and cost functions to account for the cooperation with a worker considering human availability, decisions, and ergonomics. The developed approach allows the robot to online adapt its plan to the human partner, by choosing the tasks that minimize the execution cost(s). Through simulations, we ﬁrst tuned the weights of the cost function for a realistic scenario. Subse-quently, the developed method is validated through a proof-of-concept experiment representing the boxing of 4 diﬀerent objects. The results show that the proposed cost-based BTs, along with the deﬁned costs, enable the robot to online react and plan new tasks according to the dynamic changes of the environment, in terms of human presence and intentions. Our results indicate that the proposed solution demonstrates high potential in increasing robot reactivity and ﬂexibility while, at the same time, in optimizing the decision-making process according to human actions.


I. INTRODUCTION
Nowadays, compliant lightweight robots are increasingly exploited to address the upcoming needs of the new industries, where cyber-physical systems continuously communicate and collaborate with each other and their human counterparts. Beyond the requirement for high flexibility to accomplish a wide variety of tasks, the improvement of worker ergonomics is of high relevance. Moreover, while achieving such tasks, robot behaviors should be adapted to the worker intentions and commands, from control to task planning level [1]. To address the above specifications, several attempts have been made to develop dexterous robots with agile motions, ranging from humanoids [2], [3], to single, or dual-arm wheeled manipulators [4], [5]. The latter category has received slightly more attention in industry due to an inherently greater postural stability, while dealing with heavy manipulation tasks. The examples include logistics [6]- [8] and manufacturing [9] scenarios. The mobility of such systems allows to exploit their loco-manipulation capabilities to ensure safe human-robot collaboration, through intuitive interfaces that allow the human operators to interact with the robot [10], [11]. Nevertheless, such interfaces should not only allow to regulate the interaction at the control level but also adapt robot behavior to human intentions, actions and preferences. In literature, different methods have been used to face the adaptation of robot behavior to human intentions or interactions. In auction-based planning, the plan is generated through human-robot communication using gestures [9] or verbal statements [12]. Alternative approaches, to address this problem, combine motion [13] to consider human activities together with dynamic changes of the environment [14]. For instance, a Human Aware Task Planner (HATP) for collaborative and interactive robotic applications producing socially acceptable plans for several agents is proposed in [15]. To anticipate human activities, in [16] the authors presented a learning method, based on Markov decision processes, that learns the human actions from RGB-D videos. One of the shortcomings of the approach is that the dataset is limited to human actions in domestic environments and the method requires re-training to be able to learn new tasks.
To solve the task planning problem, it is necessary to exploit models that allow taking into account all the possible robot actions and reactions in accomplishing a set or sequence of tasks. This can be achieved by taking advantage of heuristics [17] and deterministic [18] methods. Specifically, in [17], the authors proposed a method that can decompose a causal graph of a translated planning task into a sequence of tasks and to plan with a heuristic system. In [18], the presented approach generates plans including probabilistic values, which are obtained from Markov Logic Networks (MLN), a learning statistical relational model. The problem of blending decision making with task execution has been widely tackled with methods like Decision Trees (DTs) [19] and Hierarchical Finite State Machines (HFSMs) [20]. DTs are constituted by predicates and control statements that map possible consequences like event results and utility. HFSM is a FSM in which the states can be other FSMs. This structure is easy to implement and design and the hierarchical property reduces the state dimensions' growth when its complexity increases. Nevertheless, the state transitions have to be defined manually and cannot change dynamically, hence, each scenario-related behavior is usually not reusable.
Behavior Trees (BTs) represent another method, that has been extensively used in the behavior development of the Non-Player Character in the game AI. BTs can be considered as an evolution of HFSMs in which the states are replaced by atomic actions and the state transitions are defined implicitly in the BT structure [21]. Moreover, they have the advantages to be reactive, modular, maintainable and reusable. Recently, researchers started to develop methods to integrate task planning concepts with BTs. For instance, the CoSTAR system was developed to design robotic programming for non-expert users exploiting the user-friendly BTs structure [22]. The architecture allows users to plan abstract goals by defining generic actions. Extended BTs [23] were developed to generate a plan using a planner for a problem defined with the planning language Planning Domain Definition Language (PDDL), to create a hierarchical tree and then to optimize it to minimize the execution time and resources usage. In [24], the authors merged the PDDL with the Hierarchical task and motion Planning in the Now (HPN) to update dynamically the BT every time it fails. Similarly, in [25], the approach extended the BTs adding pre-and post-conditions. Finally, Utility BTs [26] allow creating behaviors where each action is selected among a set of actions, like the one that maximizes the utility value. The other actions are then discarded from the plan.
The standard BT structure and the above-mentioned extensions were not designed to operate in human-populated environments, and moreover in industrial scenarios. They do not provide the possibility to change online the structure of the BT. Furthermore, they do not envision the presence of other agents in the plan. In Human-Robot Collaboration, it is important not only to adapt to the dynamic changes of the environment but also to modify the generated plan with respect to human intentions, activities, motion and availability. To overcome these limitations, this paper proposes an online adaptation of the robot plan to human decisionmaking as a first step towards human-aware task planning using Behavior Trees. We extended Utility BTs in order to plan robot behaviors according to production-related indices, such as time performance, distance traveled by the agents and human ergonomics, ordering the sequences of actions to maximize the utility factor. One of the advantages of the  proposed method is that the same BT can handle different levels of engagement between humans and robots: from coexistence to cooperation, and autonomous task execution. The developed method (see Figure 1) models the task-related cost considering distances from the human and the robot to the task, human ergonomics in terms of weights and object location, and the task duration. The performance of the proposed approach is evaluated experimentally both in simulation and in a real scenario, where the MObile Collaborative robotic Assistant (MOCA) plans each new task adapting and reacting to human presence while achieving object transportation tasks.

A. Preliminaries on Behavior Trees
A BT is a directed rooted tree, consisting of internal nodes for control flow and leaf nodes for action execution or condition evaluation. Pairs of adjacent nodes are denoted as parent and child. The only node without parents, the root, periodically sends a signal, called tick, through the tree, which is propagated to its children to allow their execution. Then, the queried child returns immediately a status to the parent, depending on the type of node: SUCCESS if the node successfully completed its execution, FAILURE if it failed and RUNNING if the execution is not complete. There are four standard types of control nodes (Fallback, Sequence, Decorator, Parallel) and two standard categories of execution nodes (Condition, Action). In Table I the standard node types are summarized, with their symbol and the return status depending on each case. A more detailed overview of standard BTs can be found in Colledanchise et al. [21].

B. Custom Cost Behavior Trees
First, we developed a custom decorator node to allow the execution of repetitive actions or sub-behaviors. The socalled Keep Running Until Success node ticks continuously the only child until it returns SUCCESS, so that, if an action fails, the action is repeated until it is achieved. The pseudocode of the Keep Running Until Success node, represented by the ↻ symbol, is synthesized in Algorithm 1.
Moreover, in the standard BT formulation, the execution order of sequence and selector node children is intrinsically fixed. Thus, the execution order of conditions and actions must be established beforehand by the programmer, which is, in the case of complex tasks, non-trivial. To enable adaptation to human presence we employ Utility BTs to assign a utility value to each action. This value is normally Algorithm 1 Tick() function of the "KeepRunningUntilSuccess" node.
1: procedure KEEPRUNNINGUNTILSUCCESS::TICK() 2: end if 8: end procedure used by the fallback node to pick the action that maximizes such value. We extend this concept, enabling a sequence node to order the actions according to the utility value. In this way, the following ticked child is no longer fixed but is selected online as the one with minimal cost. The pseudocode for a Sequence Costs node with children, represented by the → $ symbol, is summarized in Algorithm 2.

C. Metrics and Cost Function Design
The meaning of the utility factor in game AI and also how it should change in relation to the situation, feelings, risks, etc., is quite clear [27], while, in human-populated environments, and especially in industrial ones, other factors influence the cost evaluations. We propose to use, as relevant factors in these scenarios, three different metrics that influence task performance: task execution time, human ergonomics, and total travel cost of each agent. a) Duration Index: We consider the duration of a task as a suitable metric to measure the agent's performance in achieving the task. In manufacturing processes, such as assembly lines, the same sequence of tasks is continuously repeated. The average task duration represents a simple index to measure task performance. Moreover, this index could allow reducing the overall execution time in future executions. The duration index cost, , is computed by: where is the average execution time of the -th task at the previous shift of the production line and , is the nominal duration of the -th task dictated by the manufacturing process. The higher the duration index, the slower the execution of the task. The minimization of such index allows executing first the actions the robot is able to achieve in time and the human does not. In this way, the robot will leave to the worker the tasks the latter is faster. Since the worker is not obliged to execute those tasks, the robot will schedule them as last. Moreover, if he is not present in the workcell, the robot will execute all tasks from the fastest to the slowest. In case the human enters later in the scene, the remaining tasks are the ones the robot performs slower than the previous ones.
b) Ergonomics Index: A fundamental aspect of the employment of cobots in industry is the opportunity to boost worker ergonomics while keeping productivity. Therefore, we would like to achieve that the robot entrusts itself with the heaviest and uncomfortable tasks. Thanks to ergonomics indicators, it is possible to assign to each task an executionrelated cost that indicates the level of ergonomic risk of the worker in achieving such a task. In literature, different ergonomics indicators are present, from posture-related (RULA, REBA, OWAS) to more task-specific (OCRA, NIOSH, EAWS). In this work, we select an ergonomic risk assessment for manual material handling tasks, the Washington Industrial Safety and Health Act (WISHA) [28]. This index allows taking into account not only the weights of the objects but also the relative position of the task with respect to the human (in terms of height and horizontal distance), the frequency of the task execution in a day, the task duration and the body twist angle required to achieve the task. The cost related to the ergonomics index is defined as: where is the unadjusted weight limit (i.e. the weight limit that a worker can lift, not considering the twisting adjustment) that changes according to vertical position and horizontal distance of the -th task from the human, is the twisting adjustment for the -th task, is the limit reduction multiplier that takes into account the frequency of repetition and the duration of tasks in a day and is the weight of the -th task 1 . For the sake of simplicity, in this paper, we consider the horizontal distance, and as constant and hence not affecting the computed value.

c) Travel Distance Index:
We would like to account for the relative distance between tasks and agents. For instance, it is more likely that the human executes all the tasks close to his current position than he starts a new task far from him.For this reason, we consider a travel cost. In addition, we evaluate, with less priority, also a robot travel cost. In this way, among the tasks far from the human, the robot will pick the closest to the current one. Moreover, this component allows optimizing travel cost when the human is not present (and hence its travel cost is 0). As travel cost we simply consider the Euclidean distance between task and robot position ( ) and the inverse of the Euclidean distance between task and human ( ), as: where and ℎ are the distances between the human, the robot and the -th task, respectively. In this way, each task close to the human will have a high cost, whereas a task close to the robot will have a lower cost. d) Cost Function Design: To allow a meaningful comparison between the terms, each cost is normalized by dividing it by the maximum cost value of the same metric. Moreover, in the denominator of the computation of and , there is an additional > 0, small enough, to avoid numerical issues. Then, the total cost of each task is computed as a weighted sum of the considered factors  explained before. Hence, the total cost of the -th task, is simply defined as: where , , and are the semi-positive weights of the respective costs and can be tuned in order to obtain the desired behavior. The cost weights can be set arbitrarily, but, in our scenario, we tuned them to obtain the desired robot behavior. For instance, the distance cost weights need to satisfy the following constraint: The condition reflects the fact that human-task and robottask distance are jointly normalized, and, if the human is not available, = 1. Moreover, only if the human is present in the workcell, we assume > . This additional condition reflects the fact that if the two agents have the same distance from a task, and other tasks are available, the robot will leave such task to the human.

III. SIMULATION EXPERIMENTS
First, we test the operation of the proposed approach in a simulated scenario. We consider 4 different tasks: the manipulation of an object, with two different tools, a press   In the simulation, the robot's goal is to accomplish all tasks. The robot is not aware of the human intentions and planned tasks, hence, it has to react according to worker action. In all the simulations, the objects' locations and the worker action are fixed. The worker always accomplishes the insertion of the object in the laser cutter at the same instant. The BT of the simulation is depicted in Figure 3. All the tasks can be either accomplished by the MOCA autonomously or cooperating with the human. The Reactive Fallback allows to continuously check the condition while executing the action, in order to react in case the worker decides to accomplish a task. The goal of the simulation is to verify the robot behavior according to the selected costs and weights. To do so, first, we test individually the proposed indices, then we combine them together by means of (5). b) Results: First, the robot considers only the task duration in choosing which task to accomplish. The costs of the tasks in time are shown in Figure 4 2 . They are constant since the execution time does not change. The plan, in Figure 6a, is updated when the human accomplished the task using the laser cutter. Noteworthy, due to the normalization, the unit cost is always assigned to the most expensive task. The tasks are selected by the BTs starting with the most efficiently performed. Next, the weights are tuned in order to take into account only the ergonomics of the worker. So, the robot starts to plan from the most risky task for the human, which is the one with the lowest cost for the cobot, as shown in Figure 5. Also, in this case, the plan, in Figure 6b, is updated only when the human achieved the task related to the laser cutter. Next, the weights are tuned in order to consider only the human and robot distances to the tasks. The costs are shown in Figure 7 and the plan is illustrated in Figure 6c. The weights are chosen as: = 0.51, = 0.49 and = = 0. It can be noticed that the cobot starts to accomplish the farthest task from the human but, at the same time, closest to its position. Then the robot updates the plan when the human achieved the manipulation task with the laser cutter. Finally, the combined cost case is considered. The chosen weights are: = 0.51, = 0.49, = 0.6 and = 0.35. Since the human is present in the workcell, = 0.51 and = 0.49 are chosen to satisfy the conditions in (6), where the two weights are similar to not privilege a specific agent. Moreover, in our scenario, we would like to favor worker ergonomics instead of the execution time. For this reason, we select > . The plot of the costs in time can be seen in Figure 8. The plan, in Figure 6d, is not updated since the task of the laser cutter, later achieved by the human, is planned as last. Once the robot executed the transportation of the box 2 , the plan is ended. Noteworthy, tuning the weights in different manners generates different plans and hence behaviors of the robot. Therefore, the design of the weights has a high impact on the cobot behavior and each user can design them in relation to the desired index to maximize.

IV. EXPERIMENTS
The proposed approach is further validated in a proof-ofconcept experiment with a fast-reconfigurable and flexible setup, inspired by the SOPHIA 3 project use-case.
Setup: A single cobot, MOCA, has to accomplish different tasks in sequence. MOCA is a robotics research platform made of a torque-controlled redundant manipulator mounted on top of a velocity-controlled mobile base. It is controlled by means of a Cartesian weighted wholebody impedance controller [7], [11] that executes smooth polynomial Cartesian trajectories. The depicted task is the  Fig. 9: Experimental setup. The human exploits a web GUI clicking on the phone screen the button of the task that he wants to accomplish. In the right of the image, a pole can be seen in which is attached a RGB-D camera that allows to track the skeleton of the worker. The objects are in the starting position, except the gear 2 which initial location is on the table next to the rotor. The goal location is the Box.
boxing of four parts of an electric motor, i.e., two gears, a rotor and a stator, with different dimensions and weights, located in different places in space. These 4 spots represent the end of the production line, where objects of the same type are manufactured, or different temporary containers where these objects are placed before being boxed and delivered. In our experiment, gear 1 and the stator are located on the shelf, the rotor and gear 2 on the table. The experimental setup is depicted in Figure 9. To track human pose and skeleton joint positions in real-time, we placed an RGB-D camera, an Intel RealSense D435i, fixed in the workcell, running OpenPose [29], similar to [11]. Thanks to the skeleton tracker, it was possible to account for human presence in the work-cell, automatizing the change of the cost weights. Moreover, the skeleton keypoints' positions were used to  calculate the task-related distances and to estimate the height of the shoulder, waist and knee of the human, needed to compute the ergonomic index. The position of the object was considered known. To estimate the relative transformation between objects and MOCA we employed a motion capture system, OptiTrack. In this way, it was also possible to avoid large drifts on the odometry localization of the robot. Moreover, we developed and endowed the worker with a web GUI working on the browser of the cellphone. Thanks to the web GUI, the worker could inform the system about the intention to achieve a specific task, simply by pressing the task-related button (see Figure 9). The BT of the experiments is illustrated in Figure 10. To prove that with the same BT it is possible to handle different situations, such as cooperation and autonomous task execution, we envision two different scenarios. In the first scenario, no workers are present in the work-cell and MOCA has to accomplish autonomously all the tasks. Then, we re- peated the same experiment, asking a human subject to enter the work-cell and place some of the objects, acknowledging the decision with the web GUI. A video of the cooperative experiment is available in the multimedia extension. In the experimental settings, we will not consider task duration, since the task does not require the repetitive execution of the same task.
Results: In the first experiment, since no workers are present, the cost weights are set as: = 1 and = = = 0. In this way, only the robot distance to each task is considered. The time evolution of the costs is shown in Figure 11. Since the cobot starts next to the shelves, it picks gear 1 as the first task. Then, while MOCA moves towards the box with the grasped item, he acknowledges that there exist another task that is closer to him than the planned one. For this reason, the planned sequence of tasks is updated accordingly.
In the second experiment, the robot starts to execute the task autonomously. In this situation the weights are exactly the same as in the previous case ( = 1 and = = = 0), while we change initial robot position. After the first object (gear 2 ) is grasped, the human enters the work-cell. Thus, the costs consider also the human-related quantities, with new weights: = 0.51, = 0.49, = 0.6 and = 0. Since, in the beginning, the human is not in the workspace, gear 2 is the closest object to MOCA and, hence, the one with the least cost ( Figure 12). Then, while the cobot is placing the gear 2 in the box, the human is detected (dash-dot line in magenta). Because of that, the costs instantaneously change and the plan, depicted in Figure 13b, is updated. At this moment, the task with least cost is the stator that is the farthest from the human, and, at the same time, the nearest to the robot and with the largest ergonomic risk factor for the worker. Then, the worker decides to place the rotor (its closest task). To acknowledge this decision, the worker selects the task in the web GUI (this specific moment is illustrated in Figure 9). Once the button is pressed, the decision is communicated to the BT and the plan is online updated leaving the selected task to the human. As can be noticed in Figure 13b, the rotor and the gear 1 tasks are accomplished by the human subsequently. We would like to highlight that the experiments were conducted using only one BT. This proves that the presented method allows encoding different robot behaviors, depending on the considered cost  metrics and weights. Moreover, the human-centric planning strategy adapts robot behavior to the human actions and intentions, executing unpleasant tasks and the riskiest for human ergonomics.

V. DISCUSSION AND CONCLUSION
In this work, we proposed a novel human-aware task planner taking advantage of the Behavior Trees paradigm. The approach enables the robot to plan online the execution of tasks while obtaining different robot behaviors in relation to the user choice. The developed methods allow considering costs in the selection of the task to execute. Moreover, we presented three different metrics, suitable for manufacturing environments, to compute the cost values. Therefore, the robot can plan adapting to the dynamic changes of the environment and, especially, to human intentions, motion, decisions and availability. The same structure permits to consider different levels of engagement between robots and humans: coexistence, cooperation and even autonomous task execution. The explained results showed the high potential of the developed methods in improving robot reactivity and flexibility and, at the same time, considering the human motion, decisions and ergonomics. Future works will focus on extending the approach to multi-robot and multi-human teams and on merging robot task planning with an interactive task allocator.