Trajectory Planning for Automated Vehicles using Driver Models

Behavioral-specific trajectory planning for automated vehicles is an intensively explored research topic. Many situations in daily traffic, e.g. following a leading vehicle or stopping behind it, require knowledge about how the scene may evolve. In recent years, much effort has been put into developing driver models to predict traffic scenes as realistic as possible according to human behavior. In this paper, we present a method for behavioral-specific trajectory planning using dedicated driver models. The main idea is to first calculate a reference trajectory using a suitable model to achieve the desired behavior and then to incorporate this reference trajectory into an optimal control problem to obtain an acceleration- and jerk-optimal trajectory. A major strength of this method is in the small computation time, since the problem is formalized as a quadratic optimization problem and can thus be efficiently solved in real time, even for a huge number of optimization variables.


A. Motivation
Trajectory planning is a basic requirement to autonomous driving and must be present in every automatically controlled vehicle. Consequently much effort has been put into this field and several techniques have been developed already [1], [2]. In general it can be distinguished between sampling based and continuous approaches. The former discretize the search space by generating a finite set of candidate trajectories before selecting the "best" one, i.e. one that satisfies the constraints and optimizes the cost functional. Sampling based approaches are typically used in complex and unstructured environments, where a cost function contains many local minima and an initial guess may be hard to find. Since the search space is discrete, the trajectory can only be optimal up to the resolution limit of the discretization. If the cost functional contains higher derivatives of the trajectory, it is difficult to systematically discretize the search space in a way that really approaches the continuum of all trajectories. The methods commonly used are of heuristical nature. In contrast, continuous methods use optimal control techniques to optimize an initial guess for a trajectory in terms of a cost function while respecting constraints. In well structures environments, where a decent initial guess can be determined and the problem is well conditioned, convergence to a desired local optimum can be achieved reliably. This increases the quality of the optimal solution in contrast to sampling based approaches. Thus, the environmental structure plays an important role for the performance of trajectory planning algorithms and should be exploited if possible. In [3] a local continuous approach in a well structured environment is presented. This method was used for the Bertha Benz drive in 2013 [4] and performed well on the route of more than 65 miles from Mannheim to Pforzheim (Germany). The cost functional in [3] is designed in a way, that the resulting trajectory keeps the vehicle in the middle of the lane, by approaching the target speed (which is usually the speed limit). The cost functional and constraints used in [3] are tailored to the case of on-road cruising and the evasion of static obstacles that narrows down the driving corridor. Other behavior, such as car following or stopping is achieved heuristically by a conventional adaptive cruise controller that overrides the longitudinal commands to the vehicle controller if necessary. Hence, the optimal planning scheme does not always actually affect the longitudinal control input. In this paper we present an approach, that generates a behavioral-specific reference trajectory, that is incorporated into a quadratic optimal control problem (OCP) to obtain an acceleration-and jerk-optimal trajectory for achieving comfortable driving behavior. Subsequently, we will use the word "smooth" to indicate trajectories that are optimized in terms of acceleration and jerk. Furthermore, it is worth mentioning, that due to the quadratic nature, convergence can be achieved fast. This guarantees not only real-time capability of this method even in case of a large number of optimization variables but makes the approach also extendable. The remainder of the paper is as follows: In Section I-B we give an overview of related work, show common trajectory planning algorithms as well as approaches, closely related to ours. The theoretically part of this paper's contribution is described in section II. Therein the use of driver models for our approach as well as the optimal control problem are elaborated in more detail. In section III we show the results of our method in different driving scenarios.

B. Related Work
A popular approach for sampling based trajectory planning is presented in [5] where jerk-optimal polynomials are sampled in a Frenét frame. Combining every longitudinal with every lateral polynomial leads to a set of trajectories, from which the one, that satisfies constraints while being optimal in terms of a given cost function is chosen. This method was successfully applied in several autonomous driving applications. Nevertheless, because of the short temporal horizon, the method has only a very low foresight. Also trajectories are restricted due to the polynomial nature of the samples. Another well-known approach is presented in [6]. Therein a search tree consisting of path primitives is utilized. Temporal information is generated by sampling velocity profiles over the paths. A search graph algorithm is used to find the best combination of paths and velocity profiles to select the best collision free trajectory according to a cost function. To get a more foresighted behavior, the temporal horizon can be increased by investigating the search tree for more paths and samples of velocity profiles. This puts an heavy computational burden on the method. If the environment is structured, continuous methods can be utilized to overcome the disadvantages of sampling based approaches. An example can be found in [3]. Therein an initial guess is optimized utilizing methods from continuous optimization theory. The trajectory can be optimized over a relatively large horizon, by maintaining an appropriate computing time. The aspect of foresight, together with the good condition of the optimization problem make the approach in [3] very attractive. Since this paper focuses on behavioralspecific trajectory planning, we also want to present some preliminary work in this field. In [7] a concept is described, in which behavioral-specific spatio-temporal corridors are computed from a grid laid out over the road. These corridors are then used to constrain the search space and thus enforce the desired behavior. A similar concept is shown in [8], where a first rough trajectory is estimated by using A*-search on a discretized action set. The resulting strategy is then realized by using the trajectory planning algorithm in [5]. In [9] it is described, how overtaking maneuvers based on the concept described in [3] can be realized. Therefore maneuvers are enforced by constraints, without changing the cost function. The concept presented in [10] enforces the behavioral strategy by using a risk-averse cost map which is based on the prediction of the traffic situation. The according trajectory is then calculated by utilizing the RRT * algorithm. However, the concept suffers of a high computational effort which prohibits its employment in real time applications. An opportunity for cooperative motion planning in a platoon of cars is described in [11]. Therefore, only the longitudinal trajectory is optimized by solving an optimal control problem based on the concept in [3]. Distance keeping to the leading vehicle is achieved by adding an extra term to the optimization problem to penalize small distances. Calculating trajectories for lane keeping is a well-investigated topic and already integrated in diverse autonomous driving test carriers or even plugged in productive cars. An opportunity, to realize complex maneuvers is to integrate social interaction aware models into trajectory planning modules. None of the presented methods perform trajectory planning directly by using interaction aware driver models in real time. We want to show an opportunity of how to use such models for real time trajectory planning. In a first approach, we use the well-known "intelligent driver model" (IDM) [12] to show that the method works. The method is easily extendable for more complex models like e.g. [13].

A. Driver Model
The main idea of our approach is to efficiently integrate driver models into trajectory planning. In general arbitrary models, that allow the prediction of trajectories can be used. In a first approach, we use the IDM to plan trajectories for driving within a single lane (driving corridor). All other vehicles are assumed to move within the same driving corridor and in the same principal direction. Differential equations of the IDM are as follows: The desired minimum gap s * is given by . ( Where v 0 is the desired velocity (often considered as the speed limit), T the safe time headway, a the maximum acceleration, b the comfortable deceleration, δ the acceleration exponent and s 0 the standstill distance. The index α identifies the individual units of the convoy of vehicles. The distance to the leading vehicle (indicated by index α − 1) is referred to as ∆s α = s α−1 − s α − l, with the vehicle length l. The relative velocity is indicated as ∆v α . As becomes apparent, the equations describe the vehicles acceleration depending on its longitudinal state, the leading vehicle as well as several parameters, that have to be carefully identified to reproduce human like behavior. Equation 1 can be split into two terms: The free-cruising term 0 ) δ and the interaction term s * (v α , ∆v α )/s α . The free cruising term converges to zero for velocities going against v 0 . The interaction term decreases if the desired distance s * gets small and even vanishes for large distances ∆s α to the leading vehicle. The IDM can of course also be used if there are no further traffic participants on the street. In this case, the interaction term is omitted. Since equations 1 and 2 merely model longitudinal behavior, an extension has been made in [14] to also consider the road's curvature. The main idea is to limit the lateral acceleration a lat due to the velocity. Therefore, the curvature κ(s) of the line, to which the IDM is referred to, is calculated over the longitudinal coordinate s of the Frenét frame. In a second step, the maximum allowed velocity is calculated by v max (s) = min a lat κ(s) , v lim .
With the speed limit indicated by v lim . Subsequently, the velocity gradient dv max /ds must be limited to decrease the velocity already before sections of high curvature are reached to obtain a more foresighted behavior.
x 1 Step 2 Step 3 Step 1 Fig. 1. Calculation of the reference trajectory.
Step 1: Vehicle states are transformed into Frenét coordinates of the centerline.
Step 2: IDM prediction is done for each traffic participant.
Step 3: Only the predicted trajectory of the ego vehicle is transformed back into cartesian coordinates and used as the reference trajectory for the subsequent trajectory optimization.

B. Trajectory generation
The environment model consists of the lanes centerline, the speed limit, the position, orientation, velocity and acceleration of surrounding vehicles as well as of the ego vehicle. Planning takes place in a 2D cartesian coordinate system, where the centerline is represented by a 2D polygonal line and reference point positions of vehicles can be described as 2D points. For further calculations, the temporal planning interval is discretized by a number of N support points [t 0 , t 1 , . . . , t N −1 ]. Position states of the trajectory at time t i are referred as x(t i ) = x i , with i = 0, . . . , N − 1. The general procedure is then as follows: In a first step, each traffic participant has to be transformed into the Frenét frame of the centerline. Therefore, the method described in [3] is utilized (step 1). The points on the centerline are the projected positions and mark the corresponding s coordinates of the vehicles. If the orientation difference between a vehicle and the center line at the point of projection is small (which is usually the case), velocities can be maintained in Frenét coordinates. After transformation, the whole platoon of vehicles, including the ego vehicle is predicted over the planning horizon T using IDM equations 1 and 2 (step 2). The predicted trajectory of the ego vehicle must then be transformed back into cartesian coordinates (step 3). The individual steps are shown in figure  1. The color coding of the reference trajectory x ref (t) refers to the spatio-temporal relation between the trajectory support points, where blue depicts the beginning of the temporal planning interval and red its end respectively. There are three problems with the reference trajectory x ref (t): First, the path is directly on the centerline, which is not necessarily smooth. Second, the velocity calculated by IDM equations 1 and 2 using the prior calculated velocity profile may also take non-smooth values. And finally, the initial sample of the trajectory is not identical to the ego vehicle state, hence it does not form a smooth continuation of the past trajectory. A final optimization step remedies these problems and asserts both a sufficiently smooth trajectory and a proper continuation of the past trajectory. However, x ref (t) represents human behavior to an appropriate accuracy for our purposes and is sufficient for the subsequent procedure. The reference x ref (t) is only used to guide the subsequent trajectory optimization. In the next step, the final trajectory x(t) is calculated. Therefore it must be ensured, that x(t) is smooth on the one hand and on the other hand, the trajectory's initial state must be equal to the ego vehicles state. As illustrated in the last picture of figure 1, this is not the case for x ref (t). To achieve these goals, we state a cost function based on the work in [3], as follows: With L = w spatial j spatial + w acc j acc + w jerk j jerk , where The spatial term penalizes the deviation between the trajectory and the reference trajectory. If there was only a spatial term, the OCPs solution would be identical to the reference trajectory.
To obtain smooth behavior, acceleration as well as jerk are penalized by j acc = ||ẍ(t)|| 2 2 (7) and The weights w spatial , w acc and w jerk are used to penalize the terms individually. Since our trajectory is represented as a discrete set of N 2D points, derivatives have to be approximated. This is done as follows: for the acceleration and ...
for the jerk. Thus, the cost functional is approximated by a sum: The multiplication by the constant parameter ∆t can be neglected, since it has no influence on the optimal solution. To enforce a smooth continuation with the past trajectory, x 0 , x 1 , x 2 are set to constant values instead of being free parameters of the optimization problem. This enforces initial values for the trajectory and its first two derivatives and hence, continuous orientation and curvature of the trajectory. Since equation 11 contains only quadratic terms, the OCP is unconstrained quadratic and thus closed-form solvable.

III. EVALUATION
The effectiveness of our approach is shown in two simulated scenarios. To give a qualitative impression of the computed outputs, we show a planned example trajectory on a winding road. The second scenario emphasizes longitudinal planning. The setting contains two elements that influence longitudinal steering commands: A leading vehicle and a (red) traffic light. The traffic light forces both vehicles to stop. Since our algorithm is designed for situations, in which all other vehicles are moving in the same principal direction, we regard scenarios with no oncoming vehicles and no vehicles crossing the lane. For the following simulations, parameters for the IDM are chosen as given in table I. The temporal interval for trajectory planning [0, T ], has a horizon of T = 10s and is discretized by N = 101 nodes, leading to a temporal spacing of ∆t = T /(N − 1) = 100ms. The method was implemented within C++ and running on a PC with an Intel-i7 CPU and 32GB of RAM. The OCP was solved by using the WORHP library for sequential quadratic programming [15]. It is to mention, that due to the closed-form solvability of the OCP, one could also use less powerful solvers. The first scenario shows how the ego vehicle is driving on a winding rural road. The according results are depicted in figure 2. The trajectory is depicted as a sequence of points, where the spatio-temporal relation is color coded from blue to red. It becomes obvious, that the trajectory starts from the ego vehicles position and smoothly approaches the centerline. Without acceleration and jerk terms in the OCP, just the spatial terms would be optimized an thus the trajectory would lie exactly on the centerline, except for the first three points, since they are fixed as explained above. We want to emphasize, that trajectories calculated directly by the IDM are in general not smooth and will consequently not lead to good tracking performances of vehicle controllers. Due to the smoothing effect of acceleration as well as jerk terms in the optimization problem, our trajectory gets smoothed and will lead to comfortable driving behavior. Figure 3 shows the velocity which increases in the beginning to approach the target speed v 0 and decreases subsequently due to the winding segment. The absolute acceleration which is given in figure 4 takes on an initial value of ≈ 2.5m s −2 and decreases then to ≈ 0.5m s −2 , this is due to the initial longitudinal acceleration to quickly come up to the target speed and the lateral acceleration to steer back to the centerline. Braking acceleration is required to reduce the speed when getting closer to the winding road segment. The acceleration increases due to cornering and becomes smaller as soon as the trajectory again reaches the straight road segment. In the next scenario, the focus is more on behavioral aspects: The ego vehicle drives behind a leading one. The traffic light, whose position is marked by the black stop line remains in a red state. Thus, both vehicles have to come to a stop before reaching the stop line. Replanning is performed every t repl. = 100ms, hence the ego vehicle follows a trajectory for t repl. until a new one is calculated. To plan appropriate trajectories, knowledge of the IDM is used as follows: At first, the leading vehicles trajectory is predicted using the IDM. Therefore, the stop line is treated as a vehicle with zero velocity. In this case the interaction term of the IDM predominates the behavior and thus, the velocity decreases to zero. This predicted trajectory of the leading vehicle is then used to calculate the trajectory for the ego vehicle by also using the IDM. Figure 5 shows snapshots of the scene for different time periods. As can be seen, while both vehicles are in motion, the ego vehicle keeps a safety distance to the leading vehicle. This distance decreases monotonously and finally converges to the standstill distance. Figure 6 and 7 show the velocities and accelerations of the ego vehicle during the whole scenario duration of t = 20s. As depicted in figure 6, the ego vehicle starts with an higher velocity than the leading vehicle. Nevertheless, the IDM forces the ego vehicle to decelerate to even smaller velocities to maintain the sufficient safety distance. Therefore, as displayed in both figures, the ego vehicle brakes down rapidly  in the very beginning of the scene. This scenario also shows the advantage of incorporating model knowledge into the optimization process. While the trajectory planner presented in [3] is not designed for following or stopping, the calculation of a reference trajectory with a driver model allows the enforcement of different strategies using an almost identical optimization problem. In addition, the low computational costs enable interaction aware and foresighted behavior even for long horizons which is indispensable e.g. when driving towards a red traffic light with high velocity. As a result, the simulation shows that the presented concept is able to generate complex behavior with low computational effort. The scenario was repeated for various numbers N of trajectory support points. The averaged computation times needed for planning are shown in table II. Evidently, the computation time increases for higher numbers of trajectory support points. However, even for N = 1001, the run time is far below the cycle time in [3] and thus, the method is absolutely real-time capable. Such low run times can be realized, due to the comparably low computational burden for calculating IDM predictions and the subsequent optimization problem being quadratic allowing convergence in a single step using WORHP.

IV. CONCLUSION
In this work, we presented a trajectory planning approach for automated vehicles based on driver models. We used the intelligent driver model (IDM) to predict the behavior of all traffic participants, including the ego vehicle. Subsequently, a quadratic optimization problem containing appropriate acceleration and jerk terms is stated, to obtain smooth trajectories. The method is real-time capable, since the IDM prediction is fast to compute and the optimal control problem converges in one step using sequential quadratic programming. The benefit of using behavioral driver models is shown by means of a traffic light scenario, where both, the ego vehicle as well as the leading vehicle need to stop due to a red  traffic light. We want emphasize, that the IDM may be easily exchanged by another model, like e.g. the foresighted driver model [13], that is able to handle even more complex scenarios. For this reason, future work includes the analysis and development of more advanced driver models. Thereby, for example uncertainties of perception modules can be taken into account as shown in [16] or [13].