Traffic-Light Control at Urban Intersections Using Expected Waiting-Time Information

We consider an optimal traffic-light control framework for urban traffic intersections to alleviate congestion phenomena. We analyze a scenario in which we provide drivers with information about the waiting time at the intersections. We model the drivers’ lane-changing information-based behavior as the solution of a convex optimization problem. We compute the optimal traffic-light control mechanism as the solution to a bi-level optimization problem. We provide a complete analysis in terms of (i) the existence of a solution; (ii) an iterative algorithm to compute it; (iii) sufficient conditions for the solution’s uniqueness and the algorithm’s convergence. Early simulation results show the proposed control scheme’s effectiveness compared with an optimal control algorithm in the absence of waiting-time information.


I. INTRODUCTION
Transportation is the energy end-use sector with the fastest growth rate in terms of greenhouse gas emissions, and road traffic is estimated to be responsible for over 80% of this increase since 1970 [1]. Road traffic, moreover, is associated with several other problems of environmental, financial, and social nature due to congestion, delays, and infrastructure maintenance or building. For example, traffic congestion costs billions of dollars to the economy every year [2]. All these negative consequences are exacerbated in presence of high-density traffic and congestion. Therefore, as the number of road vehicles steadily increases every year 1 , rethinking the way traffic is managed is necessary to guarantee a sustainable future for transportation.
In this paper, we focus on an urban traffic setting and, in particular, on intersection control. For more than 50 years, computer-aided traffic lights have been the standard tool for controlling intersections [3], and quite an extensive literature exists on the topic boasting many different approaches. Design solutions based on dynamic programming or informed by control theory have been proposed, for instance, in [4]- [11]. See also [12], [13] for broad reviews. Varaiya, in his seminal paper [14], proposed a traffic light control algorithm  1 See, e.g., https://www.acea.be/publications/article/report-vehicles-in-useeurope-2019, and https:// www.gov.uk/government/statistics/road-trafficestimates-in-great-britain-2019 to stabilize the queue length at the intersections. Since then, many algorithms have been proposed to achieve the same aim under various settings [15]- [18]. However, all these papers did not consider the possibility of showing drivers the information on how much time they have to wait, on average, to cross the intersection. However, when these information are provided to the drivers they can react by changing lanes based on the displayed expected waiting-time information at the lanes at an intersection. We seek to answer the following questions: If we inform the drivers of the waiting-time at the intersections, can the congestion be alleviated? Second, can we develop an optimal traffic-light control mechanism by considering the impact of the drivers' rerouting decisions based on the displayed information?
We consider a network of intersections where each intersection may consist of multiple lanes. We propose and analyze the addition to traffic lights of a visual indication of each lane's expected waiting time. Then, we propose a control policy deciding the green-light duration of each traffic light based on the observed and estimated future traffic flows. Compared to canonical traffic light approaches, a visual indication of the expected waiting time allows us to considerably increase the duration of red lights. Indeed, as drivers can see that their waiting time is too large, they can change lanes and find alternative paths. Thus, we enable an additional degree of freedom, for instance, to divert traffic to specific routes to avoid congestion actively.
Compared to state of the art, it is worth noting that incentivization mechanisms such as toll prices and their impact on the traffic flows have been considered before [19]- [21]. However, the above papers did not consider the control of traffic-lights at intersections. The closest to our work is [22] which investigated how the vehicles reroute based on the traffic delays at various intersections. The authors in [22] then investigated the performance of various traffic-light control algorithms in alleviating the congestion. However, the above paper did not formalize how an optimal traffic-signal control algorithm should be computed by incorporating the drivers' behavior based on the traffic-delay. Our proposed model formalizes how the drivers react to the information and provides an algorithm to optimally compute the green trafficlight durations at intersections based on the drivers' reactions on the expected waiting-time. Further, [22] considered an information structure where the travel-time for each vehicle is updated at every instance of time, which is computationally costly. Instead, in our methodology, we provide the drivers the waiting-time information at an intersection, which is easier to implement in practice.
We cast the design of the control policy at the trafficintersections into a convex optimization problem on the observed traffic characteristics. We model the fraction of the drivers changing lanes based on the displayed information as the solution to a second convex optimization problem. Since control decisions depend on the drivers' behavior, and vice versa, the traffic-controller operates online and in closedloop, in which the loop is closed on humans [23], [24]. Indeed, the closed-loop system results in a solution to a bilevel optimization problem which turns out to be non-convex. We show that an optimal solution exists. Further, we propose a fixed-point iterative algorithm to find an optimal solution. We provide sufficient conditions under which the algorithm converges.
We provide numerical results showing promising performance of the proposed architecture compared with an optimal traffic-light control policy that does not provide visual information to the drivers. Specifically, our algorithm controls the flow at intersections in such a manner which significantly reduces the mean queue-length across the network compared to the optimal policy which does not provide visual information to the drivers.

II. THE MODEL
In this section, we first describe the network structure which we consider (Section II-A). Subsequently, we characterize the traffic-light control architecture (Section II-B), and the dynamics of the traffic flow(Section II-C). Finally, we characterize the constraints on the decision variables the green traffic-light durations at different lanes across the intersections (Section II-D).

A. The Road Network
The road network is modeled by a directed graph G = (V, E), in which V is the set of nodes and E ⊆ V × V the set of edges. Nodes represent intersections and edges represent roads connecting adjacent intersections. In particular, if (i, j) ∈ E, then there is path going from i to j. A nonempty subset N e of nodes represents "terminal intersections", from where traffics can only originate or terminate (Fig. 1). The incoming traffic from these terminal nodes can not be controlled and they are exogenous variables.
The set of nodes which have direct outgoing edges toward node i is denoted by N in i . Similarly, the set of nodes which have incoming edges from node i is denoted by N out . For example, in Fig. 1, (A, C, F) ∈ P C . Furthermore, we let P = ∪ i∈V P i . At each intersection i, there is a queue for each path (j, i, k) ∈ P i , including vehicles coming from node j ∈ N i in and going towards node k ∈ N i out via node i. For notational convenience, we shall assume that there is a unique queue for each path in P.

B. Traffic Lights
We assume that there is a different traffic light for each path (j, i, k) ∈ P. Each traffic light is characterized by its duty cycle, defined as the relative duration of the green light in each period. In particular, the duty cycle of the traffic light on path (j, i, k) ∈ P is denoted by g j,i,k ∈ [0, 1]. The duty cycles (g j,i,k ) (j,i,k)∈P represent the variables controlled by the decision logic.
The controller updates its decision every ∆t ∈ R >0 minutes. In the meanwhile, the duty cycles are kept constant. In every decision interval ∆t, each traffic light performs T ∈ N >0 cycles, each one of duration ∆t/T , in which green and red lights are alternated according to the current value of the duty cycles (amber lights are neglected for simplicity). Hence, for the traffic light controlling the path (j, i, k), the green and red light durations in each period are given, respectively, by g j,i,k ∆t/T and (1 − g j,i,k )∆t/T .
Traffic lights belonging to the same intersection may not be independent to each other. Indeed, vehicles on potentially colliding paths cannot cross the intersection at the same time. For each intersection i, non-colliding paths are identified by a covering and (j , i, k ) represent non-colliding paths. Each I i is maximal 2 with respect to this property.

C. Traffic Dynamics
In this section, we formally describe the time evolution of traffic flows within an arbitrary decision interval of length ∆t, in which each traffic light operates T cycles. For every traffic-light cycle t ∈ {1, . . . , T } and every node i ∈ V, we denote by Λ t j,i the number of vehicles going from node j ∈ N in i to node i within the traffic-light cycle t. If j ∈ N e , i.e., j a terminal node, then Λ t j,i is the amount of vehicles 3 which come from outside the considered road network. We denote by α i k,j the fraction of vehicles in Λ t j,i that goes toward k ∈ N out i . When k ∈ N e , those vehicles will then exit the network. We assume that α i k,j does not depend on the particular cycle t. Hence, the traffic going from node j toward node k through i and during time period t is given by k,j that denotes the percentage of the vehicles have their destinations at node i.
Vehicles may start their journey from in-between every pair of connected nodes. In particular, we denote by ζ t j,i,k be the amount of vehicles that initiate their journey between node j ∈ N in i and node i during cycle t, and that go towards node k. Let N t j,i,k be the total amount of vehicles arriving in the queue of path (j, i, k) ∈ P during cycle t. Then, (1) For ease of exposition, we assume that the traffic controller has access to 4 ζ j,i,k and α i j,k for each (j, i, k) ∈ P. Furthermore, in the following we make the simplifying assumption that, during a traffic-light cycle, only the vehicles present in the queue at the beginning of the cycle can move towards the next node. This is a standard assumption (see, e.g., [15], [25]), and it is justified in our context by the fact that, in a urban area, paths are short and traffic speed is small 5 .
Let N 0 j,i,k denote the amount of vehicles in the queue (j, i, k) at the end of the previous control interval and, for t = 1, . . . , T , let N t j,i,k denote the amount of vehicles in the queue (j, i, k) at the end of cycle t. We assume that the number of vehicles at each queue, N t j,i,k , can be measured. Furthermore, let M t j,i,k be the total number of vehicles moving from node j ∈ N in i towards node k ∈ N out i during cycle t, and let v j,i,k be the maximum traffic outflow for the path (j, i, k) within a traffic-light cycle (for simplicity, v j,i,k is assumed to be independent from t).
Then, for all (j, i, k) ∈ P and all t = 1, . . . , T , which represents the total amount of vehicles that exit the queue (j, i, k) during cycle t. Note that v j,i,k g j,i,k represents the amount of vehicles which can exit the intersection. Hence, the total amount of vehicles in queue (j, i, k) ∈ P at the end of cycle t, is given by Moreover, since in view of (2) the amount of vehicles going from node i ∈ V to node k ∈ V during cycle t is given by M t j,i,k , then we have

D. Dependency Constraints on the Decision Variables
During each cycle t = 1, . . . , T , the traffic lights controlling the non-colliding paths in each set I i ∈ L i of each intersection i ∈ V can be all green at the same time. For example in Fig. 1, the traffic-lights corresponding to paths in I C = (B, C, D) , (B, C, F) can be green simultaneously. Moreover, due to maximality of the sets I i , no other traffic light associated to a path in P i \ I i can be green during such time. For each i ∈ V and each I i ∈ L i , we introduce a variable g I i ∈ [0, 1], that indicates the fraction of time per cycle in which the traffic lights in I i can be green 6 . For each i ∈ V, the variables (g I i ) I i ∈Li must satisfy The inequality in (5) indicates that it may happen that for a certain fraction of time all the paths in P i are blocked by a red light. Moreover, since for every i ∈ V a path (j, i, k) may belong to multiple elements 7 of L i , then the duty cycles must satisfy If a path (j, i, k) belongs to only one element I i ∈ L i , then the maximum duty cycle for the traffic light controlling (j, i, k) is bounded by g I i . If all paths have this property, then we can simply set g I i = max (j,i,k)∈I i g j,i,k . Otherwise, the variables (g I i ) I i ∈Li, i∈V represent a further set of control variables that must be decided by the controller.
The controller decides the values of the duty cycles (g j,i,k ) (j,i,k)∈P and of the variables (g I i ) I i ∈Li, i∈V . Decisions are taken every ∆t minutes and are kept constants for T traffic-light cycles of duration ∆t/T minutes in which traffic lights alternate green and red lights.

III. CONTROL OF TRAFFIC LIGHTS
In this section, we formalize the control problem in terms of an optimization problem cast on the traffic characteristics. First, in Section III-A, we approach the "classic" problem, where no additional information is shown to the drivers. In Section III-B, we then consider the control problem in the case in which traffic lights display an information about the expected waiting time and the vehicles re-distribute based on the information. In Section III-C, we characterize the drivers' response based on the expected waiting-time information. Subsequently, in Section III-D, we formulate the optimal control problem considering the drivers' response as a bilevel optimization problem.

A. Control Without Waiting-Time Information
As a further degree of freedom, we consider the case in which the duty cycles are lower bounded by an arbitrary designer-decided quantity g min ≥ 0. Hence, the decision variables satisfy The controller is then obtained as a solution to the following optimization problem Q : subject to (1), (2), (3), (4), (5), (6), (7), in the decision variables (g j,i,k ) (j,i,k)∈P and (g I i ) I i ∈Li, i∈V .
We observe that, instead of the nonlinear constraint in (2), we can equivalently consider the following linear inequalities Then, Q is equivalent to Q : in the decision variables (g j,i,k ) (j,i,k)∈P and (g I i ) I i ∈Li, i∈V . A solution of Q , is also a solution of Q and vice versa. Thus, they are equivalent. Q is a convex quadratic problem whose solution can be easily obtained with standard solvers.

B. Control With Waiting-Time Information
In this section, we suppose that traffic lights inform drivers in every queue (j, i, k) about the time w t j,i,k ∈ R ≥0 that a vehicle is expected to wait before crossing the intersection during cycle t. Recall that at the end cycle t, the amount of vehicles in the queue (j, i, k) is N t j,i,k . Recall also that v j,i,k is the outflow for queue (j, i, k). Hence, the total amount of vehicles which exit queue (j, i, k) during a traffic light cycle is g j,i,k v j,i,k . The waiting time depends on the position of the vehicle within the queue and on the time required for all the vehicles in front to cross the intersection. Thus, the average waiting time displayed at the beginning of cycle t+1 to drivers in the queue (j, i, k) is given by 8 8 If g min > 0 in (7), then the waiting times w t j,i,k are bounded. If, instead, g min = 0, then (9) may be saturated to ensure boundedness.
Vehicles may want to change lanes depending on the shown value of the expected waiting time. We let β t+1 j,i,k,k be the fraction of vehicles of N t j,i,k at queue (j, i, k) which move 9 toward queue (j, i, k ) after receiving the information about the average waiting-time at the end of the cycle t at the beginning of cycle t + 1.Clearly, the vehicles in queue (j, i, k) can only move toward a queue (j, i, k ) which is accessible at node i for vehicles coming from node j. In particular, we let C j,i := {k ∈ V : (j, i, k) ∈ P} ⊆ N i out be the set of all accessible nodes for vehicles going toward node i ∈ V from node j ∈ N i in . Whether a vehicle decides to change lane given the new information provided by the traffic light may depend on many factors. We postpone the discussion to Section III-C, in which we propose a model for the drivers average behavior, and in the remainder of this section we treat the quantities β t j,i,k,k as parameters, only assumed to satisfy k ∈C j,i β t j,i,k,k = 1 and β t j,i,k,k ≥ 0, ∀k ∈ C j,i (10) for all i ∈ V, j ∈ N i in , k ∈ C j,i , and t = 1, . . . , T . The constraints (10) make β t j,i,k,k probability factors, ensuring a redistribution that preserves the amount of vehicles.
Let N t j,i,k be the number of vehicles in queue (j, i, k) at the start of cycle t. Then, This is the state of the queue (j, i, k) ∈ P after re-balancing at the start of the traffic-phase cycle t = 1, . . . , T . As only the vehicles which are in the queue at the beginning of cycle t can move to the next intersection, in view of (11), we thus replace (2) and (3) with and respectively. Denote β := (β t j,i,k,k ) i∈V,j∈N i in ,k,k ∈C j,i ,t=1,...,T . Then, similarly to the control design without information display (Section III-A), for every fixed β satisfying (10) the control policy with waiting time information is obtained as a solution to the following optimization problem H(β) : subject to (1), (5), (6), (7), (9), (11), (12), (13) in the decision variables (g j,i,k ) (j,i,k)∈P and (g I i ) I i ∈Li, i∈V . We observe that H(β) is convex.

C. Models of Drivers' Reaction
Whether a vehicle changes lane in reaction to the displayed information about the waiting time depends on many factors. In this paper, we consider that for each path (j, i, k) ∈ P the value of the variables β j,i,k,k , k ∈ C j,i in a given cycle t is chosen as the solution of the following optimization problem.
subject to (10) parametrized by the waiting times w t := (w t j,i,k ) (j,i,k)∈P . The first term in the cost function of D j,i,k (w t ) is a negative entropy term. Minimizing this implies maximizing dispersion. The second term, δ β j,i,k,k w t j,i,k , weights the expected waiting time. Minimizing this term leads to an arrangement towards the lanes for which the expected waiting time is minimum. Finally, the third term, −β j,i,k,k , represents the inertia to change lane since vehicles may have reluctance to change lanes unless they get a larger saving in wait-times. This term is indeed minimum when β j,i,k,k = 1, meaning that there is no change of lane.
Solving D j,i,k (w t ) leads therefore to a compromise between dispersion, reaction to waiting times, and inertia, in which the relative importance of the three terms over the others is regulated by the parameters η and δ. In this respect, we observe that, even if η and δ are much larger than one (meaning that the inertia term has no effect), the presence of the entropy term implies that vehicles will not always change to lanes where the average waiting-time is smaller, and vehicles may move towards the ones where the waiting time is not minimum albeit with a smaller probability. Indeed, when η decreases to zero the probability distribution concentrates around the minimum-waiting-time lanes.
The negative entropy terms are important since vehicles may not move towards the lane with the smallest waitingtime because of various reasons. For instance, different vehicles may have different destination or preferences for some specific routes. Finally, we remark that negative entropy terms are customarily used to model the decision of agents in the context of learning theory [26], [27] where it is used for regularization. Throughout this paper, we shall assume that η > 0.
For fixed w, D j,i,k (w t ) has a unique solution given by the following lemma, whose proof is omitted for reason of space Lemma 1 For every (j, i, k) ∈ P and t = 1, . . . , T , let then, the unique optimal solution of D j,i,k (w t ) is given by Note that the probability that the vehicles will move towards the lane associated with smaller waiting-times are higher compared to the probability that the vehicles will move towards the lanes associated with higher waiting-times. As η increases, the decision becomes more random.

D. The Closed-Loop System
For fixed β, Problem H(β) has a unique optimal solution, which produces a value of the weighting times w t according to (9). Conversely, for fixed w t , the problem D j,i,k (w t ) has a unique solution given by Lemma 1 for every (j, i, k) ∈ P. These solution results in an optimal value for β. Therefore, the closed-loop system consists in a bi-level optimization problem obtained as the interconnection between H(β) and (D j,i,k (w t )) (j,i,k)∈P .
Unfortunately, K is not convex, and its solution cannot be find efficiently in general. An approach to find optimal solution to K is discussed in the next section.
IV. PATHWAYS FOR SOLVING K As anticipated earlier in Section III-D, the overall control problem K in presence of waiting-time information is not convex. Nevertheless, it is given by the interconnection of two convex problems, and the following can be concluded by means of the Brouwer's fixed point theorem (details are omitted for reason of space).

Theorem 1 K admits an optimal solution.
However, Theorem 1 is only an existence result, and it does not guarantee uniqueness of the solution, nor it gives any analytical expression. In this section, we devise a solution procedure based on the Banach fixed-point iteration that can be used to solve K. Moreover, we give sufficient conditions guaranteeing that K has a unique solution, and that the proposed procedure finds it.
For fixed β, we denote by Γ(β) the unique solution to H(β) satisfying (10) for all i ∈ V, j ∈ N i in , k ∈ C j,i , and t = 1, . . . , T . Likewise, we denote by w := (w t ) t=1,...,T , and by Ψ(w) the unique solution to D(w) = (D j,i,k (w t )) (j,i,k)∈P,t=1,...,T for fixed w. We recall that every assignment g of the decision variables (g j,i,k ) (j,i,k)∈P and (g I i ) I i ∈Li, i∈V produces a value of w according to (9), which we denote simply by w(g).
As a consequence, every optimal solution g of K satisfies g = Γ(Ψ(w(g )), namely, it is a fixed point of the map Γ•Ψ•w. This, motivates the following iterative procedure, which starts at k = 0 from an arbitrary initial conditionβ 0 : S1. Computeĝ k+1 = Γ(β k ), the optimal solution of H(β k ). S2. Computeβ k+1 = Ψ(w(ĝ k+1 )) according to Lemma 1. S3. Set k ← k + 1 and repeat from S1 until convergence, or a stopping criterion is met. Convergence to an optimal solution to K cannot be established in general. Nevertheless, convergence can be concluded under a contraction-like property of the map Γ • Ψ • w. More precisely, we first notice that, under some conditions, the solution maps Γ and Ψ•w are both Lipschitz, as established by the lemma below.
Lemma 2 Γ and Ψ are Lipschitz. If in addition g min > 0 in (7), then also w and Ψ • w are Lipschitz.
Let L Γ and L Ψw denote, respectively, the Lipschitz constant of Γ and Ψ•w. Then, the following theorem guarantees that the procedure devised above always converges to an optimal solution if L Γ L Ψw < 1.
Theorem 2 Suppose that L Γ L Ψw < 1. Then, K has a unique optimal solution g . Moreover, for every initial condition (β 0 ,ĝ 0 ), the procedure described by S1-S3 produces a sequence (ĝ k ) k∈N that converges exponentially to g .

V. NUMERICAL RESULTS
We consider the road topology shown in Fig. 1 are set so that the incoming flow is always split evenly in all the possible outgoing paths. For example, the flow between F and G is divided equally between the two possible options: D and K. Therefore, α G F,D = 0.5 and α G F,K = 0.5. The amount of the new vehicles that enter the network is where r t is sampled randomly from a uniform distribution on [0.5, 1]. Hence, no vehicle is dissipated and no additional vehicle is created outside the terminal nodes. The maximum flows v t j,i,k are sampled randomly from a uniform distribution on [2, 4], ∀t and ∀ (j, i, k) ∈ P. The drivers react to the green light as explained in Lemma 1 with η = 1 and δ = 2. The model is simulated using a 5 minutes traffic light cycle, and it runs for 480 minutes (8 hours).
In these settings, we tested three controllers: A. The controller Q (without waiting-time display) with g min = 10 −4 ; B. The controller Q (without waiting-time display) with g min = 0.05; C. The controller K with g min = 10 −4 , η = 1 and δ = 2 computed using the algorithm proposed in Section IV. The controller A computes the traffic lights duty cycle without considering that the drivers can change path in response to the traffic light duration in the considered time interval. Controller B is a variation of A in which g min is larger. This is usually a sensible constraint to avoid the use of excessive red times, which may frustrate drivers when not informed about it. Finally, controller C considers the fact that the traffic light duty cycle will influence the drivers' decision to change their path in the considered time interval. All the presented controllers act every T = 5 cycles and therefore ∆t = T · 5 = 25 minutes. In the first hour of the simulation all the controllers are disabled and the traffic lights durations have a fixed duty cycle, thus distributing the time equally between all the possible paths. The controllers assume that the values of v j,i,k and ζ t j,i,k are constant for the next T traffic cycles. These constant values are selected as the mean of the last 60 minutes of measurements. The waiting time is limited to be less than 50 minutes. This constraint is not strict, and it is used mainly to avoid computational problems that can arise when dealing with extremely high values. The quadratic optimization problems Q and H (β) are solved using YALMIP [28] equipped with Gurobi [29] as a solver. Fig. 2 (left plot) shows the mean queue length at each time instant for the three controllers and for the case where no controller is enabled. An increasing queue length means that the network is congested and that there are more vehicles coming in than there are coming out. Therefore, it is clear that the controller C successfully avoids congestion, while the other approaches fail to do so. The same conclusion can be reached by looking at Fig. 2 (right plot), where the flow balance of the network, i.e. the difference between the total outgoing and incoming flows, is shown. Here, we can see that controllers A and B have a negative flow balance. This result in an increasing amount of vehicles inside the network and therefore in congestion. Vice versa, controller C, after a small transient, is characterized a flow balance that oscillates around zero. Therefore, the amount of vehicles inside the network does not increase, thus avoiding congestion. Fig. 3 shows the queue length (normalized with respect to the maximum queue) in all the 43 paths of the network. Here, it is possible to note that the controller C manages to keep a distribution of vehicle more balanced in the network and to avoid the accumulation of the traffic in a single queue. Controller B

Paths
Controller C Paths Fig. 3. Queue length at time t = 420 minutes (7 hours) obtained using the three controllers. The length of the queues in each scenario is normalized with respect to the maximum queue obtained in that particular case.

VI. CONCLUSIONS AND FUTURE WORK
We consider a scenario where the waiting-time information is provided to the drivers at the intersections and the drivers can change lanes based on that information. By considering the drivers' reaction based on the traffic-light durations, we formulate the optimal traffic-light duration selection at a network of urban intersections as a bi-level optimization problem and propose an iterative algorithm to solve it. We, empirically, show that our approach can alleviate the congestion compared to the scenario where the waiting-time information is not provided to the drivers.
We did not consider any driver specific information such as origin and destination while computing the response of the driver. The characterization of the optimal traffic-light duration by incorporating such minute specifications constitutes a future direction for research. Computing a decentralized algorithm which can be implemented at each intersection using only local information is also left for the future.