Ant Colony optimization for the Electric Vehicle Routing Problem

Ant colony optimization (ACO) algorithms have proved to be powerful tools to solve difficult optimization problems. In this paper, ACO is applied to the electric vehicle routing problem (EVRP). New challenges arise with the consideration of electric vehicles instead of conventional vehicles because their energy level is affected by several uncertain factors. Therefore, a feasible route of an electric vehicle (EV) has to consider visit(s) to recharging station(s) during its daily operation (if needed). A look ahead strategy is incorporated into the proposed ACO for EVRP (ACO-EVRP) that estimates whether at any time EVs have within their range a recharging station. From the simulation results on several benchmark problems it is shown that the proposed ACO-EVRP approach is able to output feasible routes, in terms of energy, for a fleet of EVs.


I. INTRODUCTION
Transportation has been the main contributor to CO 2 emissions. Due to global warming, pollution and climate changes, modern logistic companies such as FedEx, UPS, DHL and TNT have became more sensitive to the environment and they are looking for ways to reduce the CO 2 emissions of their daily operations. There is no doubt that using electric vehicles (EVs) instead of conventional vehicles will significantly contribute to the reduction of CO 2 emissions.
The conventional vehicle routing problem (VRP) can be described as follows: given a fleet of vehicles with a certain capacity, the objective is to find the shortest delivery route for each vehicle satisfying customers' demands starting from the central depot and returning to it. Exact [17], [22], [27], [28], heuristic [7], [8], [17], [18] and metaheuristics [14], [15] approaches were designed to solve the VRP. Since VRP is an N P-hard combinatorial problem, exact methods can only solve relatively small problem instances [13]. Therefore, heuristic and metaheuristic are more reliable and efficient since they trade optimality for efficiency.
Recently, the VRP with electric vehicles (EVRP) variation has attracted a lot of attention by the research community due to the extra constraints regarding energy and emissions. So far, the research on EVRP has focused on problem formulations for routing passenger cars (e.g., shortest routes from source to destination) [1], [16], problem formulations for charging coordination in recharging stations [29] and problem formulations for advanced logistics systems [4].
In this paper, an EVRP variation for an advanced logistic system to minimize the total operation time of a fleet of electric vans is formulated. There are also other objectives that can be considered such as operation costs [4] and energy efficiency [2]. The difference of the EVRP with the conventional VRP is that vehicles may require to visit a recharging station to address the driving anxiety of the limited range of EVs [16]. The recharging station may be visited by any vehicle multiple times (if required) or none (if not necessary). Although recharging is essential to generate a feasible route for EVs, it is time consuming. Depending on the technology of the recharging station and the EV's specifications (e.g., battery capacity and on-board charger) the recharging time varies from 1 to 12 hours for a full charge [10]. In addition, the EV has to travel extra distance to visit the recharging station. Hence, the recharging process of EVs will increase the operation time of companies (and possibly also their operation costs).
The ant colony optimization (ACO) metaheuristic [9] has state-of-the-art results on many variations of the conventional VRP [5], [6], [12], [20]. However, ACO applications focused on finding the shortest paths for commercial EVs [3]. In this paper, an algorithm based on ACO is proposed to find Hamiltonian paths for electric vans in an EVRP formulation. Based on the experimental results, the effectiveness of the proposed ACO to generate feasible routes in terms of energy for different EVRP scenarios is presented.
The rest of the paper is organized as follows. Section II describes the EVRP model used in this paper. Section III describes the proposed ACO for EVRP (ACO-EVRP) algorithm. Section IV presents the experimental results and analysis of ACO-EVRP. The effect of key algorithmic parameters is also analyzed. Finally, Section V concludes this paper with discussions on future work.

A. Problem Formulation
The EVRP is a variation of the conventional VRP and, thus, it follows a similar formulation as described in this section. An EVRP instance is modelled by a fully connected weighted graph G = (N ∪ F, L), where N = {1, . . . , n} is a set of n customers (nodes), F = {n + 1, . . . , n + s} ∪ {0} is a set of s energy recharging stations and a central depot 0 which is also a recharging station and L = {(i, j) | i, j ∈ N } is a set of links (arcs) connecting them.
Each arc (i, j) ∈ L is associated with non-negative value t ij = R + which represents the travel time between customers i and j and can be defined as: where d ij and s ij are the distance (km) and average speed (km/h) 1 associated with arc (i, j) ∈ L, respectively. Each customer i ∈ N is associated with a non-negative demand δ i of some goods that need to be delivered by a fleet of m vehicles as well as service time σ i to unload the goods from the vehicle. Note that σ i is proportional to the demand of the customer i ∈ N (e.g., the higher the demand, δ i , the more the service time required). The vehicle load of an EV, say k, on arc (i, j), is given be l k ij (i.e., 0 ≤ l k ij ≤ Q), where Q is the maximum vehicle capacity (which may be different for each EV). The maximum service time for each EV is J minutes (which is the same for all drivers of an EV and basically defines the maximum working hours of the drivers). Each recharging station i ∈ F is associated with a non-negative waiting time w i that represent possible waiting time. The battery recharging rate r in all charging stations is the same and is constant. Each EV has a battery level (i.e., e k i ) that determines the current energy when reaching customer or recharging station i ∈ N ∪ F which satisfies 0 ≤ e k i ≤ B C where B C is the maximum battery capacity. The charging time c k i of an EV at the charging station i ∈ F is determined by its current energy level e k i and the charging rate of the station such that c k i = (B C − e k i )/r assuming that the EV will be fully charged (i.e., to its maximum battery capacity B C ). Note that all EVs are fully charged and loaded when they leave the depot at the start of the day.
The objective of the problem is to find the minimum set of M = {1, . . . , m} EVs visiting all customers once and only once satisfying their demands, all starting from and ending at the depot, such that the total travel time (including driving time, and charging and waiting time at the energy recharging station(s)) is minimized. A complete EVRP solution π is defined by a permutation of nodes (both customers and energy recharging stations) and consists of the complete routes of all the EVs. An EVRP solution is evaluated as follows: 1 Note that the average speed is based on the speed limit of the roads. It may also be affected by other uncertainties such as traffic, road conditions, etc. x k 0j = m, where Eq. (2) defines the EVRP objective function (output in minutes), Eq. (3) ensures that each customer is visited exactly once, Eqs. (4) and (5) ensure that all EVs leave and return to the central depot, Eq. (6) ensures that when an EV visits a customer or a charging station it also leaves from it, Eq. (7) ensures that no EV is overloaded, Eq. (8) ensures that the maximum service time of an EV is respected, Eq. (9) ensures that the battery level is reduced by b k ij (described in more details in Eq. (15) later on) when visiting customer j from customer i, Eq. (10) ensures that there is enough energy to either return to the depot or visit a recharging station, Eq. (11) is a binary decision variable defined as follows: and Eq. (12) is another binary decision variable defined as follows:

B. Energy Consumption
The energy consumption model utilized to calculate the energy consumption of the k-th EV between customers i and j is defined as follows [19]: where a ij = a + g sin θ ij + gC R cos θ ij is an arc specific constant, z = 0.5C D AD is a vehicle specific constant, E F is the engine efficiency, w is the vehicle curb weight (kg), a is the acceleration (m/s 2 ), g is the gravitational constant (m/s 2 ), θ ij is the road angle (degree) associated with arc (i, j), A is Algorithm 1 ACO-EVRP 1: t ← 0 2: InitializePheromoneTrails(τ 0 ) 3: while (termination condition not satisfied) do 4: if (φ(π ib ) < φ(π bs )) then 7: PheromoneUpdate 10: t ← t + 1 11: end while 12: OUTPUT: π bs %best EVRP solution the frontal surface area of the vehicle (m 2 ), D is the air density (kg/m 3 ), C R is the coefficient of rolling resistance and C D is the coefficient of rolling drag.
Note that the energy consumption calculated using Eq. (15) for different EVs (even if we have a homogeneous fleet of EVs) travelling on the same arc may be different because their current loads may differ (i.e., l k ij ). An EV with a heavier load will consume more energy than an EV with a lighter load. Note that the load of an EV changes whenever a customer is served (e.g., unloading goods). There are other dynamic factors that can affect the energy consumption of an EV such as the traffic conditions [24], the number of idle and acceleration situations [1], the angle of the roads [23], the weather and cabin temperature [2], and many other uncertainties.

A. Constructing Solutions
Ants read pheromones to construct solutions and write pheromones to mark their constructed solutions (see Algorithm 1). Each ant h represents a complete EVRP solution (i.e., the routes of all EVs). With probability (1 − q 0 ) the h-th ant uses a probabilistic rule to choose customer j from customer i (i.e., exploration) as follows: and with probability q 0 the h-th ant chooses the next customer j from customer i with the highest probability (i.e., exploitation) as follows: where q 0 (0 ≤ q 0 ≤ 1) is a decision rule parameter, τ ij and η ij are the existing pheromone trail and the heuristic information available a priori between customers i and j, respectively. The heuristic information is calculated as where t ij is defined as in Eq. (1). N h i is the set of unvisited customers satisfying EV's capacity constraint in Eq. (7) and the maximum service time constraint in Eq. (8) for the h-th ant adjacent to customer i. When N h i = ∅ then the central depot (i.e., 0) is added to the EVRP solution that denotes the route of another EV. α and β are the two parameters which determine the relative influence of τ ij and η ij , respectively.

B. Visiting a Recharging Station
In order to ensure that EVs have enough energy to travel and satisfy the demands of their customers and return back to the central depot, they may need to make a visit (or visits) to energy charging stations. In other words, to satisfy the energy constraints in Eqs. (9) and (10) at each solution construction step each ant h must ensure that enough energy is left to the EV to visit at least one charging station from F set. Note that charging stations can be visited multiple times by the same or a different EV.
To construct a feasible solution in terms of energy the algorithm must look ahead to discover if any charging station (including the depot as charging station) is within its range after a customer is visited. Therefore, the N h i set of unvisited customers in Eqs. (16) and (17) must also satisfy the aforementioned energy constraints. If not, then their is a potential risk that the EV will run out of energy at some point. Suppose that an EV, say k, visits customer i using the decision rule defined in Eqs. (16) and (17). Then based on the energy level at customer i the range of energy required for visiting a recharging charging station or the central depot must satisfy e k i − b il ≥ 0, ∃ l ∈ F to be able to recharge at any time (if required). In this way, the algorithm ensures that no EV is left without energy and eliminates the violation of the energy constraints in Eqs. (9) and (10).
Finally, in case all customers within the energy range of the EV violate the capacity constraint in Eq. (7) or service constraint in Eq. (8) then the EV has to return back to the central depot according to the decision rule in Eq. (16).

C. Updating Pheromones
In this paper, the pheromone update policy of MMAS [25], [26] is used. At the beginning all the pheromone trails are initialized as follows: where ρ (0 < ρ ≤ 1) is the pheromone evaporation rate (see below) and C nn is the length of a tour generated by the nearest-neighbor heuristic 2 . Then, the pheromone trails are updated by applying evaporation as follows: where ρ is the pheromone evaporation rate and τ ij is the existing pheromone value on arc (i, j). After evaporation, the best ant deposits pheromone proportional to its solution quality as follows: F-n45-k4.evrp F-n72-k4.evrp F-n135-k7.evrp  where ∆τ best ij = 1/C best is the amount of pheromone that the best ant deposits and C best is the cost of the best solution π best (i.e., C best = φ(π best )). The "best" ant that is allowed to deposit pheromone may be either the best-so-far ant 3 (π bs ), in which case C best = C bs , or the iteration-best ant (π ib ), in which case C best = C ib , where C bs and C ib are the tour costs of the best-so-far ant (i.e., C bs = φ(π bs )) and the iterationbest (i.e., C ib = φ(π ib )), respectively. These two types of ants are applied in an alternate way. More precisely, the iterationbest ant is allowed as a default to deposit pheromone and the best-so-far ant is used only every fixed number of iterations (i.e., 25, more details in [26]).
The lower and upper limits τ min and τ max of the pheromone trail values are imposed. The τ max value is bounded by 1/(ρC bs ), where C bs is initially the solution quality of an estimated optimal tour (i.e., C nn ) defined in Eq. (18), and later on is updated whenever a new best-so-far ant solution quality is discovered. The τ min value is set to where avg is the average number of different choices available to an ant at each solution construction step.
Since only the best-so-far and iteration best ants are allowed to deposit pheromone, the algorithm may reach stagnation behavior quickly. To address this issue, the pheromone trails are occasionally reinitialized to the value τ max . For exam-ple, whenever the stagnation behavior 4 occurs or when no improved solution is found for a given number of iterations, the pheromone trails are reinitialized.

A. Experimental Setup
In the experiments, we apply ACO-EVRP to three typical VRP problem benchmark instances (i.e., F-n45-k4.vrp, F-n72-k4.vrp, and F-n135-k7.vrp) obtained from the VRPLIB 5 . Considering the EVRP described in Section II, charging stations are required for EVs for recharging purposes. Since EVRP benchmark instances are not available, charging stations are added to the three conventional VRP instances to generate three EVRP benchmark instances F-n45-k4.evrp, F-n72-k4.evrp, and F-n135-k7.evrp, respectively 6 . The charging stations are distributed in such a way to cover all the customers of the problem instance. The resulting problem instances are shown in Fig. 1 and their specifications in Table I. Furthermore, the capacities and customer demands of the conventional VRP benchmark instances are modified to fit the specifications of Smith Newton's electric vans [19]. Considering that these EVs have a gross vehicle weight rating (GVWR) of 7.5 tons and  (7)). The maximum service time for an EV driver is set up to 480 minutes (8 hours) for one day (i.e., parameter J in Eq. (8)). The rest of the parameter values used in our EVRP model for our simulation experiments are shown in Table II. The algorithms perform 10e4 iterations for 50 independent executions (on the same set of random seeds) and the mean total travel time (in minutes) is recorded together with the total distance travelled (in kilometres), feasibility (as a percentage of successful executions), number of recharges and number of EVs used. The feasibility of a complete EVRP solution in terms of energy for each execution must satisfy e k i ≥ 0, ∀i ∈ N ∪ F, ∀k ∈ M . Also the average number of iterations and CPU time (in seconds) to find the solution in an execution is recorded. The experiments were performed under a Linux System with an Intel Core i7-3930K 3.20GHz processor with 12MB cache and 16GB RAM.
For the experiments 50 ants were used and the common parameters used were set to typical values as follows: α = 1 and β = 5 whereas the effect of q 0 and ρ parameters are further investigated in the following subsection.

B. Results on the Effect of the Decision Rule and Evaporation Rate Parameters
It is well known that the q 0 and ρ parameters defined in Eq. (17) and Eq. (19), respectively, have a great impact on the exploration and exploitation of the ACO metaheuristic. Since they are key parameters, we systematically vary the values as follows: q 0 ∈ {0.0, 0.2, 0.5, 0.9} and ρ ∈ {0.02, 0.2, 0.4, 0.6, 0.8}. The experimental results with all the combinations of the parameters are given in Table III. Moreover, the solution quality (in minutes) against the iterations  Underlined values indicate best performing parameter combination.
graphs are plotted in Fig. 2   From Table III, Fig. 2, and Fig. 3 it can be observed that ACO-EVRP solution quality improves as the ρ value increases. The best mean results are obtained when the evaporation rate is set to ρ = 0.6 for F-n135-k7.evrp and ρ = 0.8 for the other two instances. The same observation holds for all values of q 0 . However, the performance of ACO-EVRP is better when the decision rule parameter is set to q 0 = 0.0. In terms of efficiency, ACO-EVRP outputs its best solution within a few seconds (e.g., 18.6secs for F-n135-k7.evrp which is the largest problem instance).
For the remaining experiments we set the decision rule parameter to q 0 = 0.0 for all problem instances, and the evaporation rate to ρ = 0.8 for F-n45-k4.evrp and F-n72-k4.evrp and to ρ = 0.6 for F-n135-k7.evrp. It is worth mentioning that for other problem instances a different parameter combination may perform better.

C. Results on the Effect of the Pheromone Trails
To investigate the effect of the pheromone trails generated by ACO-EVRP we set α = 0 from Eq. (16) so pheromone trails are not considered when solutions are constructed. The comparisons for ACO-EVRP with pheromone trails (denoted ACO-EVRP+) and ACO-EVRP without pheromone trails (denoted ACO-EVRP−) are given in Table IV. In addition,  Table IV it can be observed that ACO-EVRP+ performs significantly better than ACO-EVRP− in all three problem instances. The results show that the pheromone trails have a positive effect on the performance of the proposed ACO-EVRP. This is because the iterative pheromone update procedure is basically a reinforcement learning procedure: the arcs of the best path will be rewarded with additional pheromone. Therefore, at every iteration the optimization process exploits previous knowledge and is directed towards promising areas of the search space. After several iterations the arcs most probably belonging to the global optimum solution will have a higher probability (i.e., more pheromone than other arcs) to be selected according to the decision rule in Eq. (17).
When pheromone trails are not considered, then the algorithm behaves as a classic stochastic greedy constructive heuristic with multiple restarts. This is because the shortest arcs between two customers are more likely to be selected. However, in Hamiltonian paths the shortest arcs may not necessarily belong to the global optimum solution. Furthermore, there is no reinforcement learning therefore the optimization process will be evolve without exploiting previous knowledge.

D. Results on the Comparison of ACO-EVRP vs ACO-VRP
To investigate the ability of generating feasible solutions (i.e., no EV is left without energy while operating), the proposed ACO-EVRP is compared with an ACO for the conventional VRP (denoted ACO-VRP) [20]. The experimental results of the two aforementioned ACO variations are presented in Table V. Apart from the feasibility percentage, the number of EVs used, number of recharges, the total time and distance travelled, the waiting and charging times are also given.
From Table V it can be observed that ACO-EVRP is always able to find feasible solutions (50 times out of 50 executions) in all problem instances. This eliminates the anxiety of driving EVs due to the several factors affecting the already limited range of their batteries. On the contrary, ACO-VRP is able to find feasible solutions only for F-n72-k4.evrp. However, it can be observed that for this specific problem instance there is no need for EVs to recharge and this is why ACO-VRP is able to find a feasible solution. It can be observed that exactly the same solution is found by the proposed ACO-EVRP which shows that the potentially different neighborhoods (i.e., N h i ) generated in Eq. (16) during the solution construction are not degrading the solution quality.
Furthermore, the ACO-EVRP solutions on the remaining problem instances (i.e., F-n45-k4.evrp and F-n135-k7.evrp) are worst than ACO-VRP in terms of time and distance because some EVs require to visit recharging stations twice. Therefore, it is straightforward that the total operation time and distance travelled will be increased to ensure feasibility. Also, it is worth mentioning that the presence of recharging stations contributes to the feasibility of the routes. Otherwise, if only the central depot was available to EVs for recharging, it would not be always possible to generate feasible routes as in the ACO-VRP case.

V. CONCLUSIONS
In this paper, ACO is applied to the EVRP to minimize the travel time (and potentially the operation cost) of a fleet of EVs. The construction of solutions of the proposed ACO considers the energy levels of the EV and estimates the range of the charging stations before performing the next operation step (i.e., visit the next customer). The aim of ACO is to generate feasible solutions in terms of energy such that EVs will not run out of energy during their daily operation. The experimental results on different EVRP scenarios show that the proposed ACO algorithm can always output a feasible solution in a few seconds.
For future work, we are also considering other factors in the EVRP presented model that affect the energy consumption of EVs, such as traffic [21] and weather [2] conditions. In this way, the estimation of whether a recharging station is within the range of an EV will be more accurate.