Towards an Intelligent Collaborative Robotic System for Mixed Case Palletizing

In this paper, a novel human-robot collaborative framework for mixed case palletizing is presented. The framework addresses several challenges associated with the detection and localisation of boxes and pallets through visual perception algorithms, high-level optimisation of the collaborative effort through effective role-allocation principles, and maximisation of packing density. A graphical user interface (GUI) is additionally developed to ensure an intuitive allocation of roles and the optimal placement of the boxes on target pallets. The framework is evaluated in two conditions where humans operate with and without the support of a Mobile COllaborative robotic Assistant (MOCA). The results show that the optimised placement can improve up to the 20% with respect to a manual execution of the same task, and reveal the high potential of MOCA in increasing the performance of collaborative palletizing tasks.


I. INTRODUCTION
The e-commerce revolution, whose progress in Europe is estimated at 10% every year [1], has placed significant demands on the supply chain performances. In response to the increasing demand, several industrial sectors have automated the processes of goods storage and shipment. In particular, recent robotic palletizing solutions have permitted to increase considerably the processes' performances in terms of throughput (products per hour) and time (run products for longer periods) and contributed to perceiving the e-commerce benefits. In food and beverage industry, for instance, palletizing similar package types, using heavy robotic arms that can lift up to one thousand kilos, has become a benchmark for quality and high performance [2].
In high-mix environments such as warehouses, however, not all the processes are automated and human presence is still a requirement. Heavy manipulators cannot populate these environments, since they are not safe for humans and, to avoid potential human injuries, need to be enclosed in proper fences and to be equipped with reliable safety monitoring systems. Besides, in distribution centres, a wide variety of articles must be placed in boxes and then palletized, which demand for a certain level of manipulation flexibility. In such situations, redundant lightweight robotic arms, while dealing with different case sizes and weight, can ensure human safety due to the ability to regulate interaction forces both with humans [3] and with the environment [4]. More details on the application of cobots in industrial environments can be found in [5], while a review with recent developments  of robotized and autonomous warehouses is presented by Azadeh et al. [6]. Thanks to their flexibility, cobots can be exploited for different tasks and easily taught with different learning techniques. Nevertheless, they lack mobility, which is a fundamental requirement for warehouse operations.
A possible solution to address the mobility requirements is to attach the robot base to a mechanism that can roll/slide along rails. This solution restricts the motion of the base of the robot to be along the rails, which makes the construction of such warehouses space inefficient and costly. Other flexible solutions such as humanoid robots, whose kinematic structure resembles humans, offer a high potential in dealing with the variability of tasks in warehouses. However, their performances are still not satisfactory enough especially when it comes to execute logistics tasks that demand simultaneous locomotion and manipulation.
A very promising solution towards addressing the problem of mobility has been the introduction of Autonomous Ground Vehicles (AGVs), i.e., wheeled mobile robots. AGVs can achieve repetitive and monotonous transportation of goods and increase process efficiency, while, at the same time, sharing their workspace with humans [7]. More advanced and recent forms of the AGVs have been developed to fulfil the manipulation requirements. Such platforms, due to robust loco-manipulation capabilities, have the potential to navigate in flat indoor environments using planning algorithms [8], avoid obstacles [9], pick goods from shelves, conveyors, etc., and place them in pallets and pushcarts, autonomously or in collaboration with humans [10].
In this direction, the goal of this work is to evaluate the potential of a MObile Collaborative robot Assistant (MOCA) in autonomous palletizing of mixed-size andweight box containers (see Fig. 1). We propose a novel loco-manipulation framework for logistic applications, which enables MOCA to navigate in free space and, using an embedded RGB-D camera, to scan all the boxes on a conveyor that must be palletized. After recording the sizes and the weights (through markers) of all boxes, an optimisation algorithm is activated to maximise the number of boxes that can be sorted on the detected pallet space. Subsequently, a role allocation algorithm is activated to designate the carrying and sorting actions to MOCA, in autonomous mode, or in collaboration with a human partner. The latter mode is activated when dealing with large-sized boxes, where singlearm manipulation poses a limit.
The control of MOCA loco-manipulation in different phases of the task is achieved by a weighted whole-body impedance controller, which is also capable of assigning larger movements to the arm or to the mobile base based on the task requirements/constraints. The algorithm gives higher mobility to arm movements in close-proximity reaching/sorting actions, and enables larger base mobility when navigating in free spaces.
Two different experiments were performed to evaluate the proposed framework. The first aims to prove that the guidance of the GUI, that displays the solution of the optimised mixed case palletizing, improves the performances, in terms of time and throughput. The second one, instead, is a proofof-concept collaborative mixed case palletizing, executed by a human subject and MOCA as a coworker.

II. MODULAR FRAMEWORK FOR THE COLLABORATIVE MIXED CASE PALLETIZING
We aim to create a modular framework that enables and optimises the collaborative mixed case palletizing. These aspects range from the allocation of roles between the agents (humans and robots), up to exploring the environment, detecting relevant features from the perception sensors, and planning (collision-free) motions in space.
The framework consists of three main modules (see Fig 2). The modules will be listed by their operating frequency, from the slowest to the fastest. First, a task allocation algorithm, depending on the throughput of the system, computes, for each box, the destination of the box (pose in the pallet), and the agent in charge of the placement. The possible placement modes, considered in this work, are: (MODE A) MOCA places the box in the pallet; (MODE B) the human operator can place the box in the pallet, e.g., for lightweight boxes, up to his/her choice, to speed up the process, otherwise MODE A will take over; (MODE C) MOCA and the human operator place the box in the pallet, sharing collaboratively the load (e.g., for large boxes where dual-arm manipulation is a requirement. This cannot be achieved by MOCA, and might pose health risks to the human workers if handled individually). Another module consists in the visual perception system that, using the RGB-D data from the camera, estimates the poses of the boxes in the conveyor and the pallet in the area, defined in the same frame of MOCA. Finally, the motion handler is in charge of generating feasible paths and sending spatial references for the  controller. The controller is a weighted whole-body Cartesian impedance controller, designed to regulate the behaviour of the mobile platform with respect to the end-effector task. The joint space weights of the controller are selected to obtain, alternately, higher flexibility of the arm in close-proximity reaching/sorting operations and larger mobile base mobility when navigating in free spaces. The controller settings, such as impedance parameters and weights for the priorities can be changed online. During each operation, the modules are always active but their services and actions are triggered by different Finite State Machine (FSM) requests.

A. MOCA: Platform and Control
In this section, the hardware and the prioritised wholebody impedance control algorithm of the MObile Collaborative Assistant is presented. MOCA [11] is a research platform, designed for human-robot physical collaboration (HRC), with loco-manipulation capabilities that makes it potentially suitable for logistic and flexible manufacturing. It is composed by a lightweight torque-controlled 7-DoFs Franka Emika Panda robotic arm, equipped with the underactuated Pisa/IIT SoftHand, which is mounted on top of a velocity-controlled 3-DoFs Robotnik SUMMIT-XL STEEL mobile platform. An ASUS Xtion Pro Live RGB-D camera supported by a pole is also attached to the mobile base.
In Wu et al. in [11], we presented the first attempt to deal with the control framework of MOCA. The locomanipulation capabilities were addressed using two different control modes, i.e., the whole-body manipulation mode, that features a whole-body Cartesian impedance controller for the manipulation tasks and the locomotion mode, that consists of a Cartesian impedance controller on the arm uncoupled with an admittance controller on the base. The latter mode was implemented to activate uniquely the base mobility while navigating between two distant points in the environment, which otherwise would have generated unnecessary arm movements if the first whole-body mode was exploited.
In this paper, we exploit a similar whole-body control strategy for MOCA, since the targeted logistics tasks involve multiple contacts with humans and the external environment. However, the decoupling of the arm and the base movements in locomotion phases is not practical anymore, since the collaborative palletizing actions require simultaneous arm and mobile base movements and interactions. In addition, different phases of such logistics tasks require higher arm or mobile base mobility while maintaining contact at hand, to ensure their successful execution.
In order to fully exploit the redundancy provided by MOCA, we designed a two-level priority impedance controller by solving a weighted inverse dynamics problem. The whole-body decoupled dynamics of MOCA can be written as the parallel of the admittance controller of an m-DoFs mobile base and the dynamics of an n-DoFs torquecontrolled arm [11]: (1) where M adm ∈ R m×m and D adm ∈ R m×m are the virtual inertial and virtual damping,q v ∈ R m is the input velocity sent to the mobile platform, τ ext v ∈ R m and τ vir v ∈ R m are the external and the virtual torque. Concerning the manipulator, q r ∈ R n is the joint angles vector, M r ∈ R n×n is the symmetric and positive definite inertial matrix of the arm, C r ∈ R n is the Coriolis and centrifugal force, g r ∈ R n is the gravity vector, τ r ∈ R n ,and τ ext r ∈ R n are the commanded torque vector and external torque vector, respectively.
The two-level priority impedance torque controller is inspired by the work by Mingo Hoffman et al. [12]. For the sake of simplicity, the dependencies on q and x will be dropped from now on. Let's consider the problem of finding the closest input torques τ to some desired τ 0 that realises the operational forces F : , and the constraintJ τ = F , is the general relationship between the generalised joint torques and the operational forces [13]. The set of solutions of problem (2) can be found using the Lagrangian multipliers method. Differentiating the following Lagrangian function leads to where and is the weighted Cartesian inertia, analogous to the Cartesian inertia in Eq. (5). In Eq. (4) it is possible to recognise the two tasks in the controller, the Cartesian force task F with higher priority and the joint-space torque τ 0 projected in the null space of the first task through a dynamically consistent null-space projector. The formulation in (4) contains the prioritised tasks, but the input torques of the controller need also to compensate for the other terms that are not present in (2), like gravity and Coriolis/centrifugal terms: In order to obtain the desired impedance behaviour, we need to define properly F ext and τ 0 . F ext is computed according to the following relationship wherex = x d −x ∈ R 6 is the Cartesian error computed with respect to the desired Cartesian pose x d , and K d ∈ R 6×6 and D d ∈ R 6×6 are the desired Cartesian inertia, damping and stiffness matrices, respectively. The desired joint-space impedance behaviour τ 0 ∈ R n is computed according to [14], whereq = q 0 − q ∈ R n is the joint position error computed with respect to the reference joint position q 0 , K 0 ∈ R n×n and D 0 ∈ R n×n are the desired joint-space stiffness and damping. We used a weight matrix W of the form where H ∈ R n×n is the tunable positive semidefinite weight matrix of the controller. In particular, in this paper, we select H as a positive definite diagonal matrix, dynamically selected depending on the task, of the form: The diagonal elements are tuned to fit each particular task. For example, in manipulation tasks η vi > η rj , while, during locomotion, η vi < η rj . Noteworthy, if H = I, W = M −1 , and the equation (4) can be simplified in the notable result [15]: In other words, the control algorithm proposed by Wu et al.
in [11] is the solution of problem (2) weighted by the inverse of the joint-space inertia matrix.

B. Vision Module
Despite warehouses can be considered structured environments, human presence increases the variability of the scenario. Moreover, collaborative robots, for safety reasons, must integrate robust perception systems. For instance, random box positions on conveyors and imprecise pallet location are problems that have to be considered in the framework. For this reason, a strong and reliable visual feedback is designed, to increase robot state awareness and accuracy in the pick and place task. In this section, the vision module is presented, in charge of the detection of boxes and pallets.
Box Detection: For the depicted task, it was necessary to detect the pose of several boxes on a conveyor with a high level of accuracy. These boxes have to be grasped precisely from the top, and placed on the pallet each one close to the other; the high precision in the detection of position and orientation affects the computation of the grasping point, allowing consistent placement of boxes on the pallet. Current literature proposes several techniques for general objects recognition in the scene: for example, in [16] a learningbased method has been implemented for pose estimation of various and heterogeneous objects, while in [17] the author takes advantage of a template matching approach, used for the same purpose. In this work ,ArUco markers detection were used, relying on the robustness and speed in the marker pose computation, as reported in [18] and [19]. Moreover, since the role allocation algorithm requires information about box weights and dimensions, such information can be easily integrated into a list, where each box is identified by its marker ID. This approach presents promising results since, in logistic scenarios, boxes are already equipped with markers or bar codes that specify weight, dimensions, address, etc. The ArUco markers detection has been performed through the integration in the module of the aruco detect ROS package, using the images provided by the Asus Xtion Pro Live camera mounted on MOCA. Once the ArUco marker pose is estimated (Fig. 3(a)), the grasping location is computed through a rigid transformation based on prior information about boxes dimensions.
Pallet Detection: The second perception requirement consists of the detection and the pose estimation of pallets. In theory, the exploitation of the ArUco markers on the pallet could solve also this problem. In general, pallets are not equipped with such artificial tags and are subjected to wear over time. In any case, different features can be exploited for the detection. We implemented a detection algorithm based on the pallet geometrical dimensions. This algorithm processes the pointcloud acquired from Orbbec Astra depth camera integrated on the Robotnik platform. As reported in [20], in order to perform the pallet recognition, first, the pointcloud is pre-processed using a pass-through filter implemented in the used Point Cloud Library (PCL). It is possible to remove all the points which lie outside a certain region of interest, defined as d ≤ d threshold , where d threshold is a parameter which defines the dimension of the region from the ground.The detection is performed by looking at the long side of the pallet, based on its particular structure: as a matter of fact, the pallet is characterised by the presence of three equidistant wooden blocks on its side which are very specific.The algorithm uses a region growing segmentation, which merges the points that belong to the same smooth surface. This approach returns several clusters; to extract the ones corresponding to the wooden blocks, the procedure searches for three clusters whose centroids lie on the same line, as well as at a certain distance one from the other. Once these clusters are obtained, the pallet is detected. Then, a frame is placed on the centre of the top surface: the pose can be obtained through a rigid transformation from the central wooden block, based on pallet dimensions ( Fig. 3(b)).

C. Task Optimisation
In this section, we will discuss how the framework intends to improve the execution of mixed case palletizing tasks in terms of time and density of packing and to decrease heavy lifting requirements for human workers. The mixed case palletizing problem is NP-complete and the stacking rules, based on heuristics, must be adapted to each customer accordingly, taking into account all the system constraints. Optimising in this way the packing of goods has several benefits: the packing density is maximised (90% with respect to the average 70%-80% of the manually packed pallets), while the number of pallets needed for the same volume of units is reduced. Usually, the solution is pre-computed offline and consists of the packing unit pick sequence and the location in the pallet, where large, heavy products are placed on the bottom of the pallet while small, lightweight products on the top. During the package placement design process, a small gap between the packages is added, as safety tolerance, to avoid potential collisions. In literature, the mixed case palletizing problem is formulated as the well-known Bin Packing problem [21]- [23], which goal is to orthogonally pack a set of rectangular-shaped items into the minimum number of identical containers (bins). Each item can be rotated by 90 degrees on each axis. The theoretical formulation of the 3D bin packing problem and the application on the pallet loading problem, including also technological constraints and stability of the load on the pallet, is presented by Terno et al. in [22]. Since most 3D packing heuristics are built upon 2D ones, in this paper, we focus on solving the 2D problem. To evaluate the packing in different scenarios, we enabled our framework to run three different Bin Packing algorithms: the Guillotine algorithm, the Maximal Rectangles algorithm and the Skyline algorithm. These algorithms differ mainly on the left space splitting strategy. A complete description of the implementation and the computational complexity of the algorithms can be found in [24]. Once the Bin Packing algorithm returns the desired box placement, a suitable policy for the role allocation problem should be designed. Introducing the human-robot collaboration in such mixed case palletizing tasks, as a matter of fact, opens to the problem of designating systematically a suitable agent to perform each box placement in the pallet. To design the role allocation algorithm, we assign each task according to each agent capability. This concept was introduced by the authors in [25]: exploitation of robots for repetitive and hard tasks and human employment for monitoring and solving complex tasks. Among the three metrics proposed, namely task complexity, agent dexterity and agent effort, to allocate the pick and place tasks depicted in this work, just the last one will be addressed. The limitations for the role allocation are imposed by the size and weight of the boxes. We assume that the boxes that we are dealing with weight less than 25 kg (the value is imposed by law as the maximum weight that can be lifted by a male worker). On the other hand, the maximum load that MOCA can carry depends on the payload of the mounted robotic arm. Thus, the algorithm will negotiate three different modes (MODE A, B, and C, as explained above) for achieving the task.  periodically the GUI and to acknowledge the execution of the task. To do that and also achieve a greater level of cooperation between the agents, minimising waiting times, mobile devices or AR headsets [25], [26] could be exploited.

III. EXPERIMENTAL RESULTS
The proposed framework was validated with two different experiments, executed by 6 different naive subjects, not aware of the scope of the experiments. In the first experiment, we asked the subjects to place 11 numbered boxes, of different size and weight, on the surface of a pallet, in such a way all the boxes fits in the lower layer of the pallet. Later, with the same settings, we computed the box placement through the Skyline algorithm and we asked the subjects to place them according to the algorithm result, displayed in the GUI. We compared the results in terms of time spent in completing the palletizing. The results (Fig. 5 (left)) show that, using such algorithm to help human workers improved their performance, saving up to the 20% of the execution time.
The second experiment consists of a proof-of-concept collaborative mixed case palletizing. In this experiment, the subject and MOCA have to collaborate to achieve the pallet loading. Three boxes, marked with an ArUco marker, were placed on top of a simulated conveyor. The pallet was placed close to the conveyor (Fig. 4). The placement is computed, as in the previous experiment, by the Skyline algorithm, using as input the total number of boxes that are supposed to be placed in that pallet. Moreover, the agent in charge of the placement is computed by the role allocation algorithm, that will assign one of the three modes (MODE A, MODE B, MODE C). The robot has just a rough idea of the pallet and conveyor location, and, because of considerable errors in the odometry, has to re-detect the pose of the pallet before each new placement. Also, the boxes have to be detected every new placement, since, in general, the number of boxes in the conveyor might change, due to newly delivered boxes. Just the detected boxes will appear in the GUI, with a different colour depending on the mode (red for MODE A, green for MODE B, and blue for MODE C). To enable MOCA to grasp boxes we planned to use a vacuum gripper, but due to delay in the shipment, we had to opt for a different momentary solution. We placed an electromagnet in the wrist of MOCA and a sheet made of ferromagnetic material on the upper surface of each box. In this way, we could emulate the functioning of the vacuum gripper, that was activated in the box picking phase and deactivated in the box placement phase. Finally, the control parameters were designed. We selected a compliant behaviour, keeping a low valued diagonal of Cartesian stiffness (K x = K y = K z = 300, K roll = K pitch = K yaw = 30), except in the picking and in the placing phase, in which, to ensure higher precision, we increased the stiffness in the x-y axis up to 1000. The Cartesian damping was computed using Double Diagonalisation design [14].  (Fig. 6) depict the main phases of the experiment. Finally, we asked the subjects to compile a Likert scale-based questionnaire, to have a subjective evaluation of the experiments, approved by the ethics committee Azienda Sanitaria Locale Genovese (ASL) N.3 (Protocollo IIT HRII 001 (rif. interno:108/2018)). The questionnaire included 9 sentences. Q.1 The palletizing task was easy to perform; Q.2 It was physically tiresome to accomplish the palletizing; Q.3 It was psychologically tiresome to accomplish the palletizing; Q. 4 The cognitive load to achieve the task was high; Q.5 Overall, I felt satisfied with the current task performance; Q.6 It was easy to understand where to place each box in the pallet; Q.7 I felt safe in performing the palletizing with the robot; Q.8 Overall, I think that using the current collaborative framework I could perform the same task for a longer duration and better quality time; Q.9 Given the current task performance, I think that collaborative robots do not help to improve such logistic tasks. The participants stated that, even though the complexity of the task results still acceptable (Q.1), the manual performance requires greater physical and psychological effort (Q.2-3), with high cognitive load (Q.4) with respect to the smart collaborative approach. On the other hand, the proposed framework, presents promising results in terms of performance and work quality (Q. [8][9], leading to broader satisfaction with the proposed collaborative system (Q.5) (Fig. 5 (right)).

IV. CONCLUSION
In this work, we presented a novel human-robot collaborative approach to the mixed case palletizing problem. Several problems were addressed, from the box and pallet detection and localisation achieved by the visual perception module, to the task and role allocation algorithm, that computes the optimal location in the pallet and agent in charge of the placement, passing through the design of a Cartesian impedance controller capable of changing the joint space behaviour of the robot. The results demonstrated that such a framework, together with MOCA, have a high potential in improving human workers productivity and ergonomics, and represent a promising first step towards an intelligent collaborative robotic system for mixed case palletizing.