Ant Colony optimization with Heuristic Repair for the Dynamic Vehicle Routing Problem

Ant colony optimization (ACO) algorithms have proved to be suitable for solving dynamic optimization problems. The intrinsic characteristics of ACO algorithms enables them to transfer knowledge from past optimized environments via their pheromone trails to shorten the optimization process in the current environment. In this work, change-related information is also utilized when a dynamic change occurs. The dynamic vehicle routing problem is addressed where nodes are removed, representing customers that have already been visited, or added, representing customers that placed a new order and need to be visited. These change-related information are used to heuristically repair the solution of the previous environment, based on effective moves of the unstringing and stringing operator. Experimental results show that utilizing change-related information is beneficial in the generated dynamic test cases.


I. INTRODUCTION
Ant colony optimization (ACO) algorithms have proved to be powerful problem-solving tools. They are able to provide the optimal (or near optimal) solution for difficult vehicle routing problems (VRPs) [1], [2]. Traditionally, researchers have focused their attention on static optimization problems, where the environment of the problem remains fixed during the optimization process of an algorithm. However, many real-world applications are subject to dynamic environments. Dynamic optimization problems (DOPs) are challenging, since the aim of an algorithm is not only to locate the optimum of the problem quickly, but also to efficiently track the moving optimum when changes occur [3]. A dynamic change may involve factors such as the objective function, input variables, problem instance, and constraints. This work has been supported by the European Union's Horizon 2020 research and innovation programme under grant agreement No. 739551 (KIOS CoE) and from the Government of the Republic of Cyprus through the Directorate General for European Programmes, Coordination and Development.
A simple way to address DOPs is to restart the optimization process of an algorithm whenever a dynamic change occurs. However, this strategy is usually used in case the dynamic changes are severe. On the contrary, when dynamic changes are small to medium, it is more efficient to adapt to the changing environment by transferring past knowledge since the new environment will be in some sense related to the previous one. ACO is a good choice in adapting to dynamic changes because it naturally implements a memory structure via the pheromone trails, allowing ACO to remember and transfer the past knowledge [4].
Furthermore, previous works on dynamic versions of the well-known traveling salesperson problem (TSP) proved that a dynamic change also contains information that could be useful in the optimization process of the newly generated environment. Guntsch and Middendorf [5] utilized the location of the dynamic changes to locally repair the pheromone trails of ACO. Later on, the same authors utilized change-related information to repair previous infeasible solutions [6].
In this work, change-related information are utilized for the dynamic VRP (DVRP) where nodes are inserted (i.e., representing orders from new customers) and removed (i.e., representing already visited customers). Suppose that new orders have arrived and need to be served causing a dynamic change to the current solution. But the vehicles have already left the depot serving the already scheduled orders. A new feasible solution that includes the new orders and omits the already served orders is required. Therefore, change-related information is used to repair the previous solution, which becomes infeasible by the insertion and/removal of nodes, when a dynamic change occurs. The Unstringing and Stringing (US) [7] moves are used to heuristically repair the solution. In particular, the unstringing moves are used to remove the affected nodes from the solution, whereas the stringing moves are used to insert the new nodes in the solution. Although the US has been designed for TSP solutions, in this work we extend it for VRP solutions which is the main contribution of this paper. 978-1-7281-2547-3/20/$31.00 ©2020 IEEE The rest of the paper is organized as follows. Section II describes the VRP and the construction of the dynamic test cases used in the experiments. Section III describes the application of one of the best variations of ACO, i.e., the MAX -MIN Ant System (MMAS) [8], to the DVRP. The core logic of repairing the solution heuristically when dynamic changes occur is also described. Section IV gives the experimental results and analysis. Finally, Section V concludes this paper.

A. Problem Formulation
The VRP is a challenging N P-hard combinatorial optimization problem [9]. The problem can be described as follows: given a fleet of vehicles with limited cargo load capacity, we need to find the best possible route for each vehicle, starting and ending at the central depot, while satisfying the delivery demands of a set of customers.
Typically, a VRP instance is modeled by a fully connected weighted graph G = (N, A), where N = {1, . . . , n} ∪ {0} is a set of n + 1 nodes and A = {(i, j) | i, j ∈ N, i = j} is a set of arcs connecting these nodes. A non-negative value w ij ∈ R + is associated with each arc, representing the euclidean distance between nodes i and j. Node 0 denotes the central depot whereas the remaining nodes denote the customers. Each customer i ∈ N is assigned a positive value δ i indicating the customer's delivery demand 1 . Each vehicle 2 has a maximal cargo capacity C.
Let x ij denote the binary decision variables with the following interpretation: x ij = 1, if a vehicle visits node j immediately after node i 0, otherwise.
(1) Then, the VRP objective is defined as follows: j∈N,i =j j∈N,i =j i∈S,j∈S where Eq. (2) defines the objective to minimize the total distance traveled, Eq. (3) ensures that the vehicle's cargo capacity constraint is satisfied, Eq. (4) ensures that if a vehicle visits a customer it also leaves the customer, Eq. (5) requires that all customers are visited once, Eq. (6) ensures subtour elimination, and finally, Eq. (7) is the aforementioned binary decision variable.

B. Generating Dynamic Test Environments
A DVRP was introduced in [10], [11] in which customers are revealed in different time slices, and a DVRP was considered in [12], [13] in which the weights of arcs connecting the customers increase/decrease. For more details, a comprehensive survey of the DVRP is available in [14].
In this work, we adopt the dynamic framework recently proposed in [4] as follows. Every VRP instance consists of a weight matrix that contains all the weights associated with the arcs of the corresponding graph G. In order to generate dynamic test cases the weight matrix of the problem is subject to changes as follows: where W(·) is the weight matrix and T is the environmental period index which is synchronized with the algorithm during the optimization process. Therefore, the environmental period index is defined as T = t/f , where f is the frequency of change and t is the evaluation counter of the algorithm.
The key idea is to replace nodes from the current working node set N in (T ), where N in (0) = N , with newly introduced nodes drawn from another node set N out (T ). The latter node set N out (T ) is initially generated with n new random nodes in the range of the N set. A dynamic change occurs as follows. Every f evaluations exactly mn nodes are randomly selected from N out (T ) to replace exactly mn randomly selected nodes from N in (T ), where m (m ∈ (0, 1]) denotes the magnitude of change. The higher the value of m, the more nodes will be replaced. In this way, the weight matrix will be affected because the weights on the arcs connecting the nodes that have been replaced will be modified. Note that the introduced dynamic changes are synchronized with the optimization process of the algorithm. Hence, the parameter f is expressed in algorithmic evaluations.
Such a dynamic change to the node components will also cause a change to the weight matrix defined in Eq. (8) and, thus, it may affect the algorithm's output: the best output before a change may not be the best (or even feasible) after the change. Real-world applications that encompass the aforementioned dynamic change can be found in many fields, including transportation. For example, changes in the visiting locations (e.g., removal of nodes denoting customers already served and the addition of nodes denoting arrival of new customer orders). Suppose that new customer orders have arrived and need to be served causing a dynamic change to the current solution. But the vehicles have already left the depot serving the already scheduled customer orders. A new feasible solution that includes the new orders and omits the already served orders is required.

III. ACO FOR DVRP
The MMAS [8] variant is used which is one of the most-studied ACO variants. In this section we describe the application of MMAS to the DVRP, including the proposed method to heuristically repair solutions when dynamic changes occur.

A. Initialization
A colony of ω ants is initially positioned at the central depot, i.e., component 0. All the solution components of the problem are associated with a pheromone trail value which is uniformly initialized at the start of the execution as follows: where τ ij is the pheromone trail value associated with arc (i, j) connecting solution components i and j, and τ 0 is the initial pheromone trail value. A good value for τ 0 was found to be 1/ρC nn , where ρ is the evaporation rate (more details are provided later on) and C nn is the solution quality of the solution generated by the nearest-neighbor heuristic [8].

B. Solution Construction
Each ant k represents a complete VRP solution T k (i.e., the routes of all vehicles) and makes selections biased by the existing pheromone trails and some heuristic information associated with the solution components of the problem, until all the customers are selected.
The probability distribution with which ant k selects the next customer component j from solution component i is defined as follows: where τ ij and η ij = 1/w ij (T ) are, respectively, the existing pheromone trail and the heuristic information available a priori between components i and j. Parameters α and β are the two parameters which determine the relative influence of τ ij and η ij , respectively, while N k i is the set of unselected customers components for k-th ant adjacent to component i. The customer included in N k i must satisfy the vehicle capacity constraint defined in Eq. (4). Whenever the N k i set is empty it denotes that there is no unvisited customer that can be visited (e.g., the capacity constraint is violated). Subsequently, the depot component is added to the VRP solution to close the route of the vehicle (i.e., denotes the return of the vehicle to the central depot). Note that the depot is never included in the N k i set.

C. Pheromone Update
In the MMAS variant [8], [15], the pheromone trails are updated by first decreasing the pheromone trails on all arcs (using pheromone evaporation), and then increasing the pheromone trails on the arcs of the solution constructed by the best ant (using pheromone deposit). The pheromone evaporation is applied as follows: where ρ ∈ (0, 1] is the evaporation rate. After pheromone evaporation, the best ant deposits pheromone on the arcs of its solution components as follows: where Δτ best ij = 1/C best (t) is the amount of pheromone that the best ant deposits and C best (t) is the quality of the best solution T best at time 3 t. The "best" ant that is allowed to deposit pheromone may be either the best-so-far ant 4 , in which case C best (t) = C bs (t), or the iteration-best ant, in which case C best (t) = C ib (t). These two ants are allowed to deposit pheromone in an alternate way. More precisely, the iteration-best ant deposits pheromone at each iteration and the best-so-far ant deposits pheromone occasionally (i.e., every 25 iterations in this work -more details are provided in [15]).
The MMAS variant explicitly imposes the lower and upper limits on the pheromone trail values, preventing in this way the excessive growth of pheromone trails on the arcs of the best ant, as follows: where τ min and τ max are, respectively, the minimum and maximum pheromone trails.

D. Heuristic Repair When a Dynamic Change Occurs
ACO algorithms are able to use knowledge from previous environments via their pheromone trails and can be applied directly to DOPs without any modifications [16], [17]. For example, when the changing environments are similar, the pheromone trails of the previous environment may provide knowledge to speed up the optimization process to the new environment. However, the algorithm must be flexible enough to accept the knowledge transferred from the pheromone trails, or eliminate the pheromone trails, in order to better adapt to the new environment. When a dynamic change occurs, evaporation eliminates the pheromone trails of the previous environment from areas that are generated on the old optimum and helps ants to explore areas for the new optimum. In case the changing environments are completely different, then pheromone reinitialization may be a better choice rather than transferring the knowledge from previous pheromone trails [6], [16], [17].
In this work, we also transfer change-related information [4], (e.g., the nodes to be replaced), to the optimization process of the algorithm. The removal (or unstringing) and insertion (or stringing) moves of the US heuristic [7] are used to heuristically repair the current best-so-far solution generated by ACO when dynamic changes occur. In particular, the unstringing moves are used to remove the affected nodes from the solution, whereas the stringing moves are used to insert the new nodes in the solution.
The US heuristic was initially designed for the TSP problem, whereas in this work we extend it to the VRP. Since one route of a VRP solution is in fact a TSP solution (e.g., a Hamiltonian path starting and ending at the central depot), a segmentation phase is applied to the VRP solution initially to separate the vehicle routes as presented in Fig. 1. In the example of Fig. 1, four vehicle routes are extracted from the VRP solution.
The new node will be added in the same route where the removed node existed to avoid violating the capacity constraint of the route. The overall procedure of the heuristic repair is presented in Algorithm 1. In the following, a description of the unstringing and stringing moves applied to replace nodes for the DVRP is provided.
1) Removal of Nodes: The main feature of the US heuristic is that the re-insertion of nodes occurs between non-adjacent nodes, resulting in a tour where both nodes become adjacent to the node being inserted [7], [18]. Suppose that we wish to insert V x between any two nodes V i and V j . For a given orientation of a tour, consider node V k in the subtour from V j to V i , and node V l in the subtour from V i to V j . We also consider for any node V h on the tour, node V h+1 (successor) and node V h−1 (predecessor). The re-insertion of V x between V i and V j can be done in several ways using different types of insertions and removals.
In [7], [19] two types of removals [i.e., Type I in Fig. 2(a) and Type II in Fig. 2(b)] for the unstringing procedure have been proposed. The unstringing procedure removes a given node from the tour and repairs the connections with the remaining nodes in order to have a closed tour. The procedure Algorithm 1 HeuristicRepair(T bs ) 1: INPUT: T bs %current best-so-far VRP solution 2: Apply segmentation in T bs 3: for (each subtour S i in T bs ) do 4: if (S i contains a node to be replaced) then 5: while (no further improvement in S i ) do 6: for (each node j in S i ) do 7: Calculate cost of all Type I and Type II removals 8: S i ← Apply best removal move to S i 9: Calculate cost of all Type I and Type II insertions 10: S i ← Apply best insertion move to S i 11: if (S i is better than S i ) then Also, the subtours (V i+1 , . . . , V k ) and (V k+1 , . . . , V j ) are reversed. • Type II removal: Assume that V j belongs to the neighborhood of V i+1 , V k belongs to the neighborhood of V i−1 , with V k being part of the subtour (V j+1 , . . . , V i−2 ) and V l belongs to the neighborhood of V k+1 , with V l being part of the subtour (V j , . . . , V k−1 ). The removal of node V i results in the deletion of arcs and (V l , V k+1 ). As above, the subtours (V i+1 , . . . , V j−1 ) and (V l+1 , . . . , V k ) are reversed.

A. Experimental Setup
In the experiments, we investigate the effect of repairing the best solution heuristically, utilizing change-related information when dynamic changes occur. The MMAS with the proposed heuristic repair (denoted MMAS+H) is compared against the MMAS with complete re-initialization of pheromone trails when dynamic changes occurs (denoted MMAS+R), and the MMAS with simple repair in which the inserted nodes are placed in the position of the removed nodes (denoted MMAS+S).
All MMAS algorithmic parameters were set to commonly used values: α = 1, β = 5, ρ = 0.8 and the number of ants was set to ω = 50. Dynamic test cases are generated from six static benchmark instances [22] obtained from CVRPLIB 5 : X-n101-k25, X-n143-k7, X-n219-k73, X-n313-k71, X-n429-k61 and X-n561-k42 using the dynamic generator described in Section II. The frequency of change f was set to 10e4 algorithmic evaluations and the magnitude of change m was set to 0.1, 0.25, 0.5, and 0.75, indicating small to medium to severe dynamic changes. Totally, a series of 4 DVRP test cases were constructed from each stationary instance to systematically analyze the performance of the algorithms. For each algorithm on a DVRP, 30 independent runs were executed on the same set of random seed numbers. For each run, 50 environmental changes were allowed and an observation (i.e., the value of the best-so-far ant after a dynamic change) was recorded. For a fair comparison, all the algorithms performed the same number of evaluations. The proportional evaluations required when applying the moves for the heuristic repair used in MMAS+H are added to the total evaluations of the algorithm between the dynamic changes.
The offline performance [23] was used to evaluate the overall performance of the algorithms, which is defined as: where E is the total number of evaluations and C bs (t) is the best-so-far solution quality after a change.

B. Experimental Results and Discussion
The experimental results regarding the offline performance of the investigated algorithms for all DVRPs are presented in Table I. The corresponding statistical results are presented in Table II, in which pairwise Mann-Whitney statistical tests with a significance level of 0.05 were performed. In Table  II, the results are shown as "+", "−" and "∼" when the first algorithm is significantly better than the second one, when the second algorithm is significantly better than the first one, and when the two algorithms are not significantly different, respectively. In Figs. 4 and 5 the dynamic average offline performance against the algorithmic iterations of MMAS+H, MMAS+S, and MMAS+R are plotted for the last ten dynamic changes to better understand the behavior of the algorithms on X-n219-k73 and X-n516-k42 problem instances, respectively. From the experimental results the following observations can be drawn.
First, MMAS+H significantly outperforms MMAS+S in most DVRP cases (except for the X-n219-k73 problem instance). These results were expected because MMAS+H replaces a node in vehicle routes and at the same time optimizes the route using the heuristic moves. In contrast, MMAS+S replaces the nodes in the positions of the removed nodes. This observation confirms that proper utilization of the change-related information further improves the performance of the algorithm. For example, in Fig. 5 it can be observed that MMAS+H obtains better offline performance than the competing algorithms in most dynamic changes.
Second, MMAS+H significantly outperforms MMAS+R in most DVRP cases (except for the X-n219-k73 problem instance in which MMAS+R significantly outperforms MMAS+H). These results confirm that transferring knowledge from previous environments shortens the re-optimization process compared to the case where the re-optimization process restarts from scratch (it can also be observed in Fig. 5). However, the comparisons between MMAS+S and MMAS+R show that improper utilization of change-related information may have a negative impact on the performance of the algorithm.
Finally, a possible reason concerning the inferior performance of MMAS+H on the X-n219-k73 problem instance is that a solution for this instance consists of several vehicles routes (i.e., 73). Considering the size of the problem (i.e., 219) it is possible that vehicle routes of small size are formed that cannot be optimized by the heuristic moves. Therefore, the replacement procedure will be similar with the one used in MMAS+S. In fact, from Fig. 4 it can be observed that all three algorithms have similar dynamic performance on the X-n219-k73 problem instance.

V. CONCLUSIONS
In this work, we utilize change-related information to improve the solution quality of ACO when a dynamic change occurs. The DVRP is used in which nodes are inserted and removed. The unstringing and stringing moves are used to heuristically repair the best solution from the previous environment. The performance of ACO with the proposed heuristic repair is investigated on dynamic test cases of the DVRP that are systematically constructed. The experimental results confirm the positive effect on the performance of ACO when utilizing change-related information.
For future work, it would be interesting to investigate more effective ways of utilizing change-related information, e.g., with additional removal and insertion moves.