Bayesian Traffic State Estimation Using Extended Floating Car Data

Traffic state estimation is a challenging task due to the collection of sparse and noisy measurements from specific points of the traffic network. The emergence of Connected and Automated Vehicles (CAVs) provides new capabilities for traffic state estimation using extended floating car data such as position, speed and spacing information. In this work we propose a Bayesian Traffic State Estimation (BTSE) methodology for estimating the traffic density based on extended floating car data. BTSE utilizes the Bayesian paradigm to express any prior information to derive probability distributions of the traffic density of different road segments of the traffic network. Two variations of the BTSE methodology are developed to handle the offline and online estimation problem. The BTSE methodology is evaluated both using realistic SUMO micro-simulations for M25 Highway, London, U.K., and a real-life vehicle-trajectory dataset from German highways, extracted from videos recorded by drones. The efficiency and accuracy of the BTSE methodology is compared to an existing methodology in the literature. We present results for the estimation performance of the methods showing that the Bayesian methodology consistently results in lower mean absolute percentage error than the compared literature method. The BTSE methodology yields high-quality estimation results even for a low penetration rate of CAVs (e.g. 5%).


I. INTRODUCTION
T RAFFIC state estimation (TSE) is a critical process for traffic planning, management and operations. A traffic state is often defined through the main traffic variables, namely density (veh/km), flow (veh/h) and speed (km/h) and is fully described by at most two of the aforementioned variables. TSE refers to the process of inferring traffic variables from a limited Manuscript  amount of noisy observed traffic data on certain road segments. This process depends on the estimation approach utilised for the task, the traffic flow models considered to describe the traffic dynamics, and the available input data. Various methods have been proposed in the literature for efficient observation of traffic conditions [1]. Generally, estimation approaches can be categorised based on input data and physical or statistical assumptions.
In terms of physical or statistical assumptions, there are four main categories of approaches: (i) model-driven, (ii) datadriven (iii) hybrid, and (iv) streaming-data-driven. Modeldriven approaches typically rely on traffic flow models, divided in first-order (e.g. the Lighthill-Whitham-Richards (LWR) model [2], [3]) or higher-order extensions (e.g. the Payne-Witham (PW) model [4], [5]). Most model-driven methods make use of the Kalman filter and its variations (see among others [6], [7], [8], [9], [10]) to solve the model equations and assume data obtained from fixed-location sensors. In addition, particle filters (see for example [11], [12], [13]) and the adaptive smoothing filter (e.g. [14], [15]), among many other methods [16], have been proven powerful tools for TSE. Such methods make sure that the estimation process respects basic traffic principles, however they might fail to fully capture the phenomena of real-world traffic when the considered traffic model is not representative of reality. Data-driven approaches rely on statistical or machine learning methods and historical data to develop an internal representation between traffic variables, without explicitly considering traffic models. Bayesian inference is an established statistical method to derive accurate TSE results when data are limited [17], [18], [19], [20], while it has been used in combination with model-based approaches for more accurate estimation results [21], [22]. The most widely used machine learning tools include deep neural networks [23], [24], support vector machines [25] and decision trees [26]. The estimation results using data-driven methods are often more accurate, however they cannot guarantee that the results will be physically feasible and they need a large amount of high-quality data to provide accurate results. Also, supervised machine learning methods require data of the true value of the estimation parameter of interest, which often not available. Hybrid methods combine desirable features from both model-driven and data-driven approaches in order to ensure accurate results and reduce data requirements [27], [28]. A new framework in the deep learning literature, namely physics-informed deep learning, has gain significant attention the past few years with promising results as these physics-informed regularisers reduce the space of feasible solutions and approximate solutions that are consistent with the chosen models using a limited amount of data (see for example, [29], [30], [31], [32]). Despite the recent success of physics-informed learning some studies have shown that such methods might fail to train [33] or could be computationally expensive [34]. Besides, the available traffic data should match the traffic parameters used in the underlying model. Finally, streaming-data-driven approaches use only streaming data (i.e. real-time data) for traffic estimation [35]. Such methods are robust against uncertain phenomena and unpredictable incidents, however they require a large amount of streaming data in order to provide accurate estimations.
In terms of traffic measurements, data is collected using either stationary or fixed-location sensors (e.g., inductive loop detectors and camera-based sensors), termed stationary data, or mobile sensors (e.g., GPS, speedometer) mounted on floating vehicles, termed Floating Car Data (FCD) [36]. Furthermore, connected and automated vehicles (CAVs) can allow the collection of extended data about the neighbourhood of a vehicle, such as the space and time headway from surrounding vehicles, using sophisticated on-board sensors (e.g., lidars, radars), termed extended FCD (xFCD) [37].
Stationary data methods rely on traffic variables, such as flow and occupancy, measured at the fixed locations of sensors. The density and speed can then be estimated by making weak assumptions, e.g., knowledge of the average vehicle length. Stationary sensors have been widely used by road administrators on highways since the early 90's [38]. TSE based on stationary data has been well studied in the literature [39]. However, the capabilities of fixed-location methods are limited due to the sparsity and high installation cost of sensors [40] and the inability of measuring traffic states beyond their fixed locations [41]. Also, such sensors often deliver low-quality and unreliable information, as they are subject to considerable disruptions due to system errors [42].
On the other hand, FCD-based TSE methods provide great opportunities for reducing the dependency on conventional stationary sensors and offer the ability to collect data from a wider spatiotemporal domain. FCD include the position, speed and direction of travel of moving vehicles equipped with appropriate sensors. FCD-based TSE methods allow the continuous observation and analysis of traffic conditions, either offline using data from conventional floating vehicles or online using data from connected vehicles, to provide wide-range spatiotemporal TSE information [43]. Typically these methods make use of traffic flow models that explicitly manage vehicle trajectory data. A novel model-based approach to integrate GPS data into highway traffic models was proposed in [44] and an extended Kalman filtering approach was developed in [45] to incorporate mobile data using the LWR model. A generalised least squared estimation approach to estimate macroscopic traffic states using multiple data sources was proposed in [46].
Several research works have also developed TSE methods that use both FCD and stationary data. In [17] and [47], datadriven methods were proposed that extract traffic information, e.g. the fundamental diagram, using historical stationary data and use mobile data to estimate the current traffic states.
In [48], a Kalman filtering model-based TSE algorithm was developed utilizing only average speed measurements from connected vehicles and flow measurements from fixed-location sensors. The performance of the approach was investigated in a detailed microscopic simulation study under various penetration rates of connected vehicles and traffic conditions [49]. The approach was further extended in [50] and [51] to take into account multi-lane highways.
Recently, TSE methods that utilize xFCD have been proposed. Such data is collected from CAVs that have the capability to measure their net space headway or gap from the leading vehicle, in addition to their position and speed. In [35] a streaming-data-driven approach was proposed to estimate the traffic density in highways, based on Edie's definitions, using only xFCD with moderate estimation performance; this methodology was also utilised for TSE in an urban area of Barcelona [52]. Furthermore, a TSE method was proposed in [53] that estimates traffic states by differentiating the continuous cumulative counts of vehicles inside a region, obtained through spacing measurements from CAVs and using a conservation law. To reduce microscopic vehicle behaviour fluctuations, an extended approach was developed that estimates the traffic states and fundamental diagram parameters using xFCD, and further updates the traffic states using a model-based data assimilation method [54]. However, the use of deterministic traffic flow models requires offline calibration and may not fully represent reality, introducing extra uncertainty. A recent work proposed two methods based on Bayesian inference and deep learning for the estimation of the traffic flow rate only in free-flow conditions using headway data acquired by CAVs [55]. The proposed methods assume error-free measurements and a large amount of historical data (from multiple days to years) were used to form an initial probability density function of the traffic flow, updated as new measurements were obtained.
In this work we consider the problem of estimating the traffic density in mixed-traffic multi-lane highways with only xFCD without considering any underlying traffic flow model, similar to [35] and [55]. In this context, we propose a data-driven Bayesian methodology and develop algorithms for both offline and online estimation; to the authors knowledge, this is the first work that develops a Bayesian methodology for TSE using only xFCD and does not make use of a traffic model, historical data or data from any other source. The proposed methodology expresses prior beliefs about the traffic states, as a probability distribution, to provide additional information, especially due to the sparsity of xFCD. These prior beliefs combined with a fundamental traffic relationship that links measurements to state variables, yields the posterior probability density that is used to infer the traffic states, i.e. the traffic density in this case. We provide all details of the prior distributions obtained through a proper elicitation technique; other Bayesian methodologies provide general information regarding prior distributions which hinders their practical implementation by interested researchers, e.g. [29]. The way prior distributions are derived in this work eliminates the utilisation of a large amount of historical data needed by other Bayesian methodologies. Furthermore, to overcome the stationarity nature of probability density functions (PDFs), we propose the estimation of the traffic density for discrete time windows in order to avoid the utilisation of filtering approaches, such as particle filters [56], that are computationally expensive.
The developed Bayesian methodology offers the following main advantages compared to the state-of-the-art: (i) the estimation performance is superior, especially for low CAV penetration rates, due to the additional source of information used along with the measurements, (ii) the inference of the traffic state is obtained in a probability density form, providing more detailed information compared to a point-estimate [57,Chapter 4] and, (iii) the estimation accuracy of the traffic variables does not rely on historical data or traffic models.
The contributions of this work are the following: • We design a fully Bayesian methodology 1 that uses sparse xFCD collected from CAVs to derive the posterior predictive distribution of the traffic density in different spatiotemporal regions of a specific highway under study. • We elaborate on the proposed methodology to develop both an offline (data-driven) and an online (streamingdata-driven) Bayesian TSE (BTSE) algorithm. The offline BTSE algorithm uses measurements collected over the entire time-horizon to estimate the traffic density in different time-windows, whereas the online BTSE algorithm uses measurements only from the previous and current time-windows to estimate the traffic density of the current time-window. • The Bayesian methodology is compared with an existing methodology in the literature proposed by [35]. The superiority of the proposed approach is illustrated using both micro-simulations for a specific section of the M25 Highway in London, U.K., and a real-life dataset collected from German highways [59]. The remainder of the paper is organised as follows. Section II describes the TSE problem when considering mixed-traffic with xFCD. Section III presents an existing solution approach introduced by [35], which will be used to compare results obtained from the proposed method. In Section IV we focus in the solution approach where we develop an offline/online Bayesian approach to obtain the posterior predictive distribution of the traffic density. A simulation study using SUMO (Simulation of Urban MObility) and a real-life dataset are used to validate the proposed methodology in Section V. Section VI discusses the use conditions and limitations of the proposed methodology. Section VII concludes the paper and suggests future research directions.

A. Notation
In the remainder of this paper we use the following notation. All bold letters indicate vectors (lower case) or matrices (upper case), while calligraphic letters denote sets. If A is a set, we denote as |A| the cardinality of the specific set. Set 1 A fully Bayesian framework requires prior distributions for all unknown entities in the model, with the posterior distribution effectively capturing all aspects of uncertainties involved. Optimal inference and prediction is achieved through the selection of the most appropriate values for nuisance parameters and hyperparameters using the available data [58]. R + = [0, ∞) denotes the set of all positive real numbers. The superscripts (·) T and (·) −1 , denote the transpose and the matrix inverse respectively. I n×n is the n × n identity matrix and 0 n×n denotes a n × n matrix that all its elements are zero. The median of a vector x is defined as the value separating the higher half from the lower half of the vector and is denoted as median(x). Furthermore, x ∼ Normal(μ, σ 2 ), x ∼ IG(α, β) and x ∼ Unif(c, d) indicate that x is a random variable drawn from the normal distribution with mean μ, and variance, σ 2 , the inverse-gamma distribution with shape and scale parameters α > 0 and β > 0, and the uniform distribution in the range [c, d], respectively.

II. PROBLEM STATEMENT
We consider a traffic network modeled as a directed graph G = (V, E) where the set of vertices V represents road junctions and the set of edges E represents road links. An arbitrary road link is considered to be comprised of N S road segments and N L lanes. We define the (i, l) space region, where, x i−1,l and x i,l denote the upstream and downstream boundary in the l-th lane of the i -th road segment.
We consider the estimation of traffic densities of distinct lanes of different road segments over a time-horizon T . The estimation time-window duration considered is T W , such that N W = T /T W different density values are derived over the entire time-horizon. The j -th estimation time-window is defined as shown in Figure 1. The coordinates of the lower-left and upperright corner of region A il j are (x i−1,l , τ j −1 ) and (x i,l , τ j ), respectively. Because the analysis is uniform across all timespace regions, we hereafter drop indices i , l, j , from all measurements, variables and sets and consider that region A is the same as A il j , unless otherwise stated. We consider mixed traffic conditions with two types of vehicles: (i) conventional vehicles (CVs) that do not have measuring capabilities and (ii) CAVs or connected vehicles with the ability of measuring the distance to their leading vehicle; hereafter we use the term 'CAVs' for the second type of vehicles. Let N V and N C denote the set of all vehicles and the set of CAVs in time-space region A, respectively,  CVs do not have the capability to collect any data. The sampling period is set equal to T S such that each CAV collects a new measurement every T S time units. Based on these measurements, the following data are calculated from each CAV and communicated to an operation center: 1 are the time-stamps of the last and first measurements, respectively. • The area of the time-space region between CAV c ∈ N C and its leading vehicle in region A, denoted as a c = A ∩ S c [km×h]. By using measurements y c and z c , we derive S c which is the time-space region, between CAV c and its leading vehicle. Finally, we calculate a c , which is the value of the area between the intersection of A and S c , as shown in Figure 1. The true traffic density, ρ true , in time-space region A is defined according to [60] as Equation (2) shows that if all vehicles in the specific road segment under study are CAVs then the traffic density is known. Nonetheless, if only a proportion of traffic are CAVs, then the traffic density cannot be estimated exactly. Given the data provided by CAVs, u c , t c and a c , ∀c ∈ N C , the objective of this work is to estimate the traffic density of time-space region A, according to Equation (2).
We consider the estimation of the traffic density both in an offline and online framework. When offline estimation is performed, xFCD is collected by CAVs for the N W time-windows, T j , ∀ j ∈ N W , and the estimation method uses the total amount of information. For online estimation xFCD is collected for time-windows T j −1 and T j and the estimation is performed for time-window T j , ∀ j ∈ N W . The online procedure is repeated for the total N W time-windows.
For simplicity the following assumptions are made. First, we assume that CAVs are randomly selected and have the same driving behaviour as the rest of the vehicles. Furthermore, we consider that measurement errors from different sensors have a cumulative effect resulting in an additive zero mean Gaussian error on density.

III. EXISTING SOLUTION APPROACH
An online estimation approach for obtaining traffic variables using only xFCD was proposed in [35] making the assumption that CAVs 3 collect a c and t c , ∀c ∈ N C . The assumption of random sampling of CAVs defined in the previous section holds for this approach with the additional assumption that the measurements are collected without error. Edie's generalised definitions [60] are utilised to estimate traffic density, flow and speed for individual time-space regions. For example, to estimate traffic density, the set of all vehicles, N V , in (2) is replaced by N C , yieldinĝ Estimator (3) is biased and an inverse correlation exists between the bias and the penetration rate of CAVs, as shown in [35]. The authors concluded that their algorithm provides good estimates for 5-minute and hourly volumes; however, the error was large for low traffic demand, as well as for low CAV penetration rates. The developed Bayesian estimation algorithms are compared to this approach, which we refer to as point TSE (PTSE).

IV. SOLUTION APPROACH
We develop a fully Bayesian approach to derive the probability distribution of the traffic density in a specific time-space region, assuming we have available the measurements acquired by CAVs, as defined in Section II. Bayesian inference is a natural way to update prior beliefs for unknown parameters through the posterior distribution and obtain marginal distributions of interest. The main idea of Bayesian inference is to continually update prior beliefs about events as new evidence, i.e. measurements, is acquired. The proposed methodology is comprised of four phases.
• Phase A: Utilise a fundamental definition to link measurements to parameters. • Phase B: Formulate prior information about parameters. • Phase C: Combine the two sources of information from Phase A and Phase B using Bayes' theorem [61] to obtain the posterior distribution. • Phase D: Infer parameters using the resulting posterior distribution. In the next section we design a Bayesian inference methodology for traffic density estimation both in an offline (Section IV-B) and online (Section IV-C) framework. Note however that the methodology can be used to estimate other traffic variables such as traffic flow and speed.

A. Bayesian Traffic State Estimation Methodology 1) Phase A:
The main objective of Phase A is to utilize a fundamental definition that relates the measured parameters, namely, time spent and area between different CAVs and their leading vehicles of a specific time-space region, with the unknown parameter, the traffic density.
Towards this direction we consider Edie's equation of traffic density, i.e. Equation (2). Initially, we select the statistical formulation where t = [t 1 , . . . , t c , . . . , t N C ] T is the N C -vector of time-spent of all CAVs and α = [a 1 , . . . , a c , . . . , a N C ] T the N C -vector of each CAV's time-space region area with its leading vehicle. Furthermore, g(·, ·) is the model given by (2), and ε is the noise which is a Gaussian random variable, ε ∼ Normal(0, σ 2 ε ). In contrast with [35] and [55] that make an error-free assumption for the measurements obtained from the CAVs, we assume that the measurements are subject to noise, ε, as shown in (4). Note that if all vehicles in the specific region where CAVs, i.e. N C = N V , then (2) would be exact and known. Here, for a specific region A we choose to observe a c for CAV c ∈ N C , and assume a prior distribution for the parameters t c and the model noise σ 2 ε . We utilise the basic concepts of Bayesian inference to derive a probability distribution of the traffic density in region A by integrating out the unknown parameters.
The likelihood function for Equation (2), π L (ρ|t, α, σ 2 ε ), is 2) Phase B: The objective of Phase B is to formulate any prior beliefs about the parameters of interest, namely, the time spent and the model noise, as a known distribution. Note that a c is not treated as a random variable because a c = A ∩ S c depends on t c through S c . First, we assume that the time spent of each CAV in the specific region follows a prior distribution with a known mean, μ c (u), and variance, σ 2 c (u), which are functions of the average speed of vehicles, u. Without loss of generality, in order to describe the a priori information about t, we choose normal prior distributions, are hyperparameters that can be estimated from the collected data. As t 1 , . . . , t N C , are mutually independent random variables all having a normal distribution, the N C × 1 random vector t defined as t = [t 1 , . . . , t N C ] T has a multivariate normal distribution with mean the when c = c and zero otherwise. 4,5 Hence, without loss of generality, we have that The estimation of the mean and covariance matrix is challenging when the average speed takes continuous values, i.e., u ∈ R + . We address this issue separately for the offline and online estimation in Section IV-B and Section IV-C, respectively. We select an inverse-gamma prior distribution, π B (σ 2 ε ), for the error variance, σ 2 where again β, δ > 0 are known hyperparameters. The inverse-gamma prior distribution is a common assumption in Bayesian statistics as it results in an analytical tractable marginal distribution. For 0 < β,δ < 1, which will be used in this work, the inverse-gamma prior distribution is an uninformative prior or diffuse prior meaning that it expresses vague or general information, i.e. information that is not subjectively elicited, about the error variance [57].
The joint prior density of t and σ 2 ε is given by 3) Phase C: In this phase we combine the two sources of information as provided in Phases A and B, i.e. the prior distribution and the chosen definition, to obtain the posterior density, π A (t, σ 2 ε |ρ, α). This is achieved by using Bayes' Theorem to update our prior beliefs for parameters t and σ 2 ε by using the likelihood function (5). Note that the values used for the traffic density ρ, are obtained by using Equation (2) and the spacing data collected by CAVs. The unnormalised posterior density satisfies The unknown error variance σ 2 ε is a nuisance parameter, i.e. is not of immediate interest, but it still must be taken into account when studying parameters which are of interest, i.e. t c , c ∈ N C . Using known results from statistical theory [61], we can integrate out σ 2 ε and obtain the marginal posterior distribution for t: which is basically the prior distribution multiplied by a multivariate t-distribution, with 2β degrees of freedom, mean g(α, t) and variance δ β I N C ×N C . Phase D. The objective of Phase D is to use the posterior distribution for prediction, an important objective of Bayesian inference addressed through the predictive distribution. More specifically, assume that we have ρ, derived through (4) and the collected measurements by a fixed proportion of CAVs, and want to predictρ, hence we want to find the predictive distribution that represents uncertainty in a new observation given previously obtained observations. We denote asρ the derived traffic density over all vehicles in the specific timespace region, and the posterior predictive density π(ρ|ρ) is obtained by integrating out t with respect to its posterior distribution: where = [0, ∞) N C is the set of all possible values of time spent for the CAVs deployed in time-window T j and π(ρ|ρ, t, α) is the conditional distribution ofρ given t, α and ρ. The joint prior distribution of ρ andρ, conditional on all unknown parameters t, α and σ 2 ε is given by Standard results for normal distributions can be used to derive the following conditional posterior distribution [62] ρ | ρ, t, α, which is the likelihood function (5) at new values of the timespentt and area of time-space regionα, obtained from the marginal posterior density π M (t|ρ, α). The predictive distribution is given as an average over the marginal posterior density π M (t|ρ, α) which contains all the information we know about t [63]. Hence, to derive the predictive distribution we know that π(ρ|ρ, t, α) follows the likelihood distribution (5), which is a normal distribution as given in Equation (12).
The marginal posterior density π M (t|ρ, α), however, is not a standard distribution, hence numerical evaluation is required. We employ sampling techniques based on Markov Chain Monte Carlo (MCMC) methods (see [64,Chapter 11]). The idea of MCMC is in a sense to by-pass the mathematical
A widely used MCMC algorithm that is relative simple is the Metropolis-Hastings algorithm [66], [67], which will be used for the purposes of this work (see Algorithm 1). The first step is to initialise the chain with starting valuest (0) for the random variables. Let the current state of the chain bet (r) . The main loop of the algorithm consists of three components: (i) generate a sample from a proposal density q(·), (ii) compute the acceptance probability, P α , and (iii) accept or reject the candidate sample with probability P α , or 1 − P α , respectively. The Markov property specifies that the distribution oft (r+1) given all previous draws,t (r+1) |t (r) ,t (r−1) , . . . , depends only on the most recent value drawnt (r) .
The Metropolis-Hastings algorithm is a general approach for sampling from a target density, in our case π M (t|ρ, α). However, it requires the specification of a proposal density, which must be chosen carefully. Acceptance rates close to 1 4 where recommended for high dimensional models and close to 1 2 for models of dimension 1 or 2 [63]. For the purposes of this work, at each MCMC step we propose values for t from a Normal distribution, q(·) = Normal(·, ·), t * ∼ Normal(t (r) , ν 2 R), where R is the covariance matrix resulting from the nonlinear least squares fit of (2), scaled by a value ν 2 . 7 As in many MCMC methods, the draws are regarded as a sample from the 6 Markov chain: A sequence of random variables {θ (0) , θ (1) , θ (2) , . . . }, such that at each time t ≥ 0, the next state θ (t+1) is sampled from a distribution f (θ (t+1) |θ (t) ) which depends only on the current state of the chain θ (t) . This sequence is called Markov chain, and f (·|·) is the transition kernel of the chain [  target distribution only after the chain has passed the burn-in time 8 and the effect of the fixed starting value has become so small that it can be ignored. Once theM samples from the marginal posterior distribution, π M (t|ρ, α), are obtained using Algorithm 1, i.e. MCMC = [t (1)T , . . . ,t (r)T , . . . ,t (M)T ] T , we proceed to the calculation of the posterior predictive distribution, π(ρ|ρ), by marginalising π(ρ|ρ, t, α) over π M (t|ρ, α). As π(ρ|ρ) might not be available in closed form and it is often easier to sample from this distribution using Monte Carlo methods. Towards this task, each sample in MCMC ,t (r) is plugged in Equation (12) to sample from π(ρ|ρ,t (r) , α) as described in Algorithm 2. Thenρ * (r) , r = 1, . . . ,M is an iid sample from the target posterior predictive distribution π(ρ|ρ).
Estimating the posterior predictive distribution is very important as this distribution gives the necessary information about unobserved data, in our case the traffic densityρ of a specific time-space region, where there are no CAVs to provide measurements. Figure 2 summarises the basic idea of the proposed Bayesian TSE methodology, referred to as BTSE in the remainder of this paper.

B. Offline BTSE Algorithm
As mentioned in Section IV-A, the estimation of the mean μ(u) and covariance matrix (u) in Equation (6)  Parameterk denotes the range index of the median speedũ of the median speeds of all CAVs. We use the median,ũ, instead of the average speed, u, as it is a more robust measure against having a small proportion of extremely large or small values. By constructing measurement sets U k over all time-windows for a specific space region, we can derive parameters μ The proposed algorithm for offline BTSE, referred to as BTSE f for a specific time-space region is given as follows.
Step 7: Take a sample, MCMC , from the marginal posterior π M (t|ρ, α), if not available in closed form, using MCMC based on Algorithm 1.
Step 9: Use sample MCMC to numerically integrate out parameters t from π(ρ|ρ, t, α) to obtain π(ρ|ρ), as shown in Equation (10), using Algorithm 2. Summarising the above steps, BTSE f executes Part I once as an initialisation process and iterates over Part II for each individual time-window to result in the estimated traffic densities of a specific time-space region over the entire time-horizon T . This procedure is repeated for all regions of interest. In order to compare estimation results with the PTSE approach [35], we calculate the mean of π(ρ|ρ), used as the estimated traffic densityρ BT S E f .

C. Online BTSE Algorithm
The proposed algorithm for online BTSE, referred to as BTSE o , differs from BTSE f by using the ability of CAVs to report data while they travel through a specific time-space region A. The procedure is given as follows.
For each time-window T j , j ∈ N W repeat: Step 1: Step 2: Calculate time spent t = [t 1 , . . . , t N C ] T and the area of the time-space region between a CAV and its leading α = [a 1 , . . . , a N C ] T .
Step 7: Take a sample, MCMC , from the marginal posterior π M (t|ρ, α), if not available in closed form, using MCMC based on Algorithm 1.
Step 9: Use sample MCMC to numerically integrate out parameters t from π(ρ|ρ, t, α) to obtain π(ρ|ρ), as shown in Equation (10), using Algorithm 2. Steps 1 and 2 of Part I are the same with BTSE f . Step 3 is slightly modified in terms of the definition of the mean and variance of π B (t). We consider measurements from the previous and current time-windows, T j −1 and T j , respectively. Assuming that the congestion level does not change within two time-windows implies that μ(u) and (u) do not depend on u, i.e. μ(u) ≈ μ on and (u) ≈ on . Hence, parameters μ on and on are derived using the collected data t and α from the previous and current time-windows. The iterative process indicated as Part II in the previous section is no longer needed, hence we proceed to Steps 4-10 as shown in Section IV-B, to calculate the mean of π(ρ|ρ), used as the estimated traffic densityρ BT S E o . Hence, BTSE o iterates over Part I for each time-space region. The main difference is that BTSE f requires the collection of all data for all time-windows to set up the prior distributions and then proceed with the estimation process, whereas BTSE o utilises only the information provided at the current and previous time-window. As before, in order to compare estimation results with the PTSE approach [35], we calculate the mean of π(ρ|ρ), used as the estimated traffic densityρ BT S E o .

V. PERFORMANCE EVALUATION
To evaluate the performance of the proposed Bayesian methodology we examine: (i) a simulation study that represents a real network that operates under both free-flow and congested conditions (Section V-A) and (ii) a real-life dataset, the HighD dataset (Section V-B). We utilise the framework for both the offline and online BTSE approaches and compare estimation results with the PTSE approach [35], described in Section III. To obtain the estimation results we where S is the total number of estimations. The MAPE is a common measure for evaluating the estimation performance because the error values of the quantity of interest are scaled to percentage units, which makes it easier to interpret. 9 The mean value of the derived probability distribution, π(ρ|ρ), is calculated using the iid samples obtained using Algorithm 2 for both BTSE f and BTSE o , and is used as the estimated traffic densityρ in (15). The true value of traffic density is obtained through (2), assuming that all vehicles in the road segment under study are CAVs, hence we have full knowledge of vehicle trajectories.
For both experiments we assume that the model noise ε is a Gaussian random variable with zero mean and unknown variance σ 2 ε , for which we assume σ 2 ε ∼ IG(β, δ). We set the values of the hyperparameters to β = 0.3 and δ = 0.2. 10 To eliminate any biases regarding the choice of CAVs we average each result over 10 repetitions and randomly select a different set of CAVs in each experiment to simulate a specific penetration rate. All experiments were executed and algorithms were coded in Matlab.

1) Simulation Setup :
For the simulation study we utilise the SUMO microscopic simulator [69]. We simulate traffic of a multi-lane highway road stretch of the M25 motorway in London, England between junctions J 13 and J 14 , with the SUMO representation of the network shown in Figure 3. The road stretch is about 8-km long and is separated into seven road segments, with size varying between 0.7-1.6 km, as shown in Figure 3 (segment 1 = 1.6km, segment 2 = 0.6km, segment 3 = 0.75km, segment 4 = 1.5km, segment 5 = 1km, segment 6 = 1.1km, segment 7 = 1km). It consists of two off-ramps (in segments 2 and 5) and two on-ramps (in segments 3 and 7). The number of lanes varies from 4 to 6 and the speed limit is 120 km/h. Two distinct traffic scenarios are considered: In the simulation, we consider two types of vehicles: (i) CVs which do not collect any data, and (ii) connected vehicles which measure position z c , speed u c and spacing y c every 0.2 seconds (assuming a maximum detection range of 200 m). Based on these measurements, each connected vehicle calculates and communicates to an operation center its speed, u c , timespent, t c , and time-space region, a c . All results are averaged over ten (10) simulations due to the stochastic nature of SUMO.
The true densities of segments 3, 4 and 5 for Scenarios 1 and 2 are given in Figures 4(a) and 4(b), for T W = 2 minutes. Note that the true density is re-calculated as the time-window changes in order to obtain the MAPE. These represent different segment types found in a highway: on-ramp (segment 3), off-ramp (segment 5) and normal (segment 4).
For the BTSE f we use three speed ranges: [0, 40), [40,80) and [80, ∞) km/h to represent high and moderate congestion and free-flow conditions, respectively. These speed ranges can be subjectively chosen to represent different congestion levels, without affecting the basic steps of the methodology described in Figure 2.
2) Results: We present estimation results for BTSE f , BTSE o and PTSE algorithms obtained using different penetration rates and time-window durations for the simulation network given in Figure 3 under Scenarios 1 and 2. Table I illustrates the MAPE (15) of the three estimation approaches (BTSE f , BTSE o and PTSE) under both scenarios for penetration rates 5%, 10%, 20% and 30%. A different traffic density is estimated for each lane and each segment every T W = 2 minutes. As shown in Table I, the BTSE f approach results in significantly better MAPE for low penetration rates, i.e. 5% and 10%. For example, the resulting MAPE from BTSE f is about 2-3 times lower than BTSE o and 3-4 times lower than PTSE when the penetration rate is 5%. For high penetration rates, the three algorithms yield similar performance. Furthermore, notice that while the performance of BTSE o and PTSE improves significantly for increasing penetration rate, the corresponding improvement for BTSE f is negligible. This may occur because the BTSE f utilises information collected from all time-windows to form the prior in Eq. (7), instead from only T j and T j −1 used in the BTSE o . Interestingly, the performance of BTSE o and PTSE is sometimes marginally better compared to BTSE f for high penetration rates. This may happen because the information provided from the current and the previous time-windows used in BTSE o represents better the prior distribution compared to BTSE f that constructs the prior distribution using data from the entire traffic scenario. In addition, BTSE o has a better performance compared to PTSE for all segments, especially for low penetration rates. The estimation error of BTSE o increases from about 11% to 31% as the penetration rate decreases from 30% to 5%, whereas the estimation error of the PTSE increases from about 13% to 45% for the respective rates. For example, let us consider Segment 2. Although the MAPE for BTSE o and PTSE is relatively close for penetration rate 30%, when the penetration rate decreases to 5%, the MAPE for BTSE o is about 15% better (26.85% compared to 40.10%).
In addition, the impact of the duration of the time-window T W is examined in terms of the MAPE for the three algorithms. Results are obtained using varying time-window durations from 2 to 10 minutes, the four different penetration rates of connected vehicles and all segments. Figure 5 depicts the average MAPE for each penetration rate and time-window duration, where the first row of figures shows results for Scenario 1 and the second row for Scenario 2, for the four penetration rates. As shown, there is an overall decrease of the MAPE for all algorithms as the size of the time-window increases, with BTSE o and PTSE resulting in higher improvement compared to BTSE f . For example, for penetration rate 5%, increasing the time-window duration from 2 to 10 minutes improves the MAPE from about 32% to 15% for BTSE o and from about 45% to 20% for PTSE, in both scenarios, while the improvement of BTSE f is around 5%. Additionally, the increase of time-window duration has less impact on the MAPE as the penetration rate increases. For instance, for penetration rate 30% increasing the time-window duration from 2 to 10 minutes yields less than 10% improvement for all approaches and the two scenarios. The impact of the time-window duration is lower for higher penetration rates and the estimation results are more accurate for longer time-windows. These observations are consistent with the fact that more data is collected in longer time-windows, resulting in more accurate estimations. Another reason that the time-window variation might decrease the MAPE is the fact that as the time-windows are increased, the true densities become smoother functions and hence easier to predict. In addition, for higher connected vehicle penetration rates, sufficient data can be provided for smaller time-windows yielding lower MAPE.
The impact of the penetration rate is also examined for all algorithms in both scenarios. In Figure 6 the barplots summarise the average MAPE of each approach using the corresponding penetration rate and all time-window durations from 2 to 10 minutes in each scenario. It is evident that the BTSE f approach achieves estimation error of about 12% for Scenario 1 and 14% for Scenario 2 using 5% penetration rate.
Compared to BTSE f , BTSE o and PTSE yield about 2 and 3 times higher estimation error for Scenario 1 and about 1.5 and 2 times higher estimation error for Scenario 2, respectively. In order to achieve similar estimation error to BTSE f , BTSE o requires a 20% and 10% penetration connected vehicle rate for Scenarios 1 and 2, while the corresponding penetration rates for PTSE are 30% and 20%. In Figure 7 an example of traffic density estimation obtained from each algorithm (PTSE, BTSE f , BTSE o ) along with the real traffic density is shown for penetration rates 5% and 30%. These results are obtained for lane 3 of segment 4 under both scenarios, over a 2-minute time-window. As shown, the BTSE methodology provides a PDF whereas the PTSE approach provides a single point estimate of the traffic density. For 30% penetration rate the distributions of BTSE f and BTSE o have less variance compared to the ones provided for 5% penetration rate. This means that the traffic density estimation for 5% penetration rate is more 'uncertain' compared to the 30% penetration rate, which is a result of the amount of information utilised in the prior distribution. Both BTSE distributions are 'closer' to the true value of the traffic density compared to the PTSE estimate, resulting in the MAPE difference shown in Table I and Figures 5 and 6.
All the above results were obtained by applying the proposed Bayesian methodology to estimate the traffic density of each lane for a simulation scenario in M25 motorway in both free-flow and congested conditions. The BTSE o algorithm consistently yields better estimation results compared to PTSE, especially for low penetration rates. The BTSE f algorithm produces significantly better estimations compared to both BTSE o and PTSE for low penetration rates. The performance of the algorithms is similar for high penetration rates. Next, we examine the performance of BTSE f , BTSE o and PTSE for a real-life dataset.

1) Dataset Description and Estimation Setup:
HighD is an open source dataset consisting of vehicle trajectories recorded at six different locations in highways around Cologne, Germany, using unmanned aerial vehicles (UAV) [59]. The length of each observed road segment varies between 400-420 m. For improved image stability, video recordings were made between the hours 8am and 5pm on days with no wind. The resolution of each video recording is 4K (4096×2160 pixels) at 25 frames per second, with an average video length of 17 minutes. The total recording time is approximately 16.5 hours. The dataset contains information for more than 110,000 vehicles (90,000 passenger vehicles and 20,000 trucks) with a total driven distance of 45,000 km and a total driven time of 447 hours. The dataset consists of information about the frames in which each vehicle appears, as well as its position, velocity, acceleration and the IDs of its surrounding vehicles. Details regarding the time and space headway for each vehicle and each frame are also included. Finally, statistical values such as the minimum, maximum and mean velocity, time and space headway of each vehicle are provided.
We present results for the BTSE f , BTSE o and PTSE algorithms for traffic density estimation (see Sections IV-A and III). Due to the fact that the length of the videos is around 15 minutes the minimum and maximum time-window duration is set to 1 and 5 minutes, respectively. Four different connected vehicle penetration rates are assumed for the collection of measurements: 5%, 10%, 20% and 30%. Similar to Case Study I, the performance of the proposed methodology is evaluated in terms of the MAPE (15). Figure 8 illustrates the performance of each algorithm evaluated for the four different penetration rates and the results are averaged over all different time-window durations. As a general observation BTSE f yields the best performance, while BTSE o is better than PTSE. As the penetration rate of connected vehicles increases the average MAPE decreases with both the BTSE f and BTSE o consistently resulting in lower MAPE than the PTSE. Although for a penetration rate of 30% the BTSE o and BTSE f yield around 3.5% and 6.5% lower MAPE compared to PTSE, the corresponding performance difference for 5% penetration rate increases to 10.5% and 25.5%, respectively. Interestingly, for 5% and 10% penetration rates, BTSE f results in about two times lower MAPE compared to PTSE. Moreover, as the penetration rate increases from 5% to 30% the BTSE f algorithm is improved by around 4%. Note that the estimation performance of BTSE o and BTSE f is better than PTSE for 30% penetration rate, in contrast with the corresponding estimation results presented in Case Study I where all algorithms had similar performance. Figure 9 shows the average MAPE when varying the time-window duration for the four penetration rates. Three main observations can be made. First, all algorithms result in lower error as the time-window duration increases with the BTSE f and BTSE o resulting in up to 30% and 10% lower MAPE compared to the PTSE, respectively. Second, it can be seen that BTSE f provides high-quality estimation results resulting in less than 20% MAPE in almost all cases considered, irrespective of the time-window duration and penetration rate. Third, the rate of improvement of BTSE o and PTSE is similar, with BTSE o being consistently better by 3% to 10% in terms of the MAPE.

2) Results:
The derived results for Case Study II are in line with the results of Case Study I and support that the proposed Bayesian methodology exhibits more accurate estimation results of the traffic density even for small penetration rates. Finally, note that the estimation results for the PTSE algorithm are inline with the results presented in [35].

A. Use Conditions
This work proposes a Bayesian approach that makes use of only xFCD obtained by CAVs or connected vehicles with the capability of measuring the distance to their leading vehicle. The Bayesian framework is a natural way to express uncertainty about an unknown variable, i.e. the traffic density, particularly when the available data is limited and sparse. This method estimates a conditional distribution of the measurements (posterior distribution) by using the available prior information that represents all the available knowledge apart from the data themselves, meaning that no relevant  information is omitted from the analysis, and in turn a PDF that represents the traffic density if all vehicles in the highway were CAVs. The most important aspects of this work are that: (i) no traffic model is used, outlasting the need of sufficiently calibrating the model prior to its use as well as avoiding the risk of the model failing to capture real-world traffic due to the ideal assumptions and conditions assumed; (ii) no extensive historical data are required for accurate estimations; (iii) even small penetration rates, i.e. 5% of CAVs deployed in the highway, are sufficient to provide traffic density estimations with small error. Hence, the only requirement for the application of this method in real-life networks is the deployment of a small penetration rate of CAVs in the network of interest that will collect the measurements, which will be transmitted and processed centrally to provide the information needed to follow the steps of the methodology as illustrated in Figure 2 either in an offline or online setup, depending on the specifications of the estimation procedure objective of the study. This work has investigated the estimation accuracy for time-windows in the range [2,10] minutes and road segment lengths in the range [0.4, 1.5] km and penetration rates as small as 5%, showing that the methodology is robust in different time-space combinations.

B. Limitations
As discussed the applicability of this work is straightforward. One issue that might rise is when there are time-space regions with no available measurements, i.e. they are not visited by CAVs and hence no prior information is available. Such an issue might occur under small penetration rates of CAVs, small time-window durations, and short road segments. One approach to resolve it is to consider time-space interpolation techniques for such regions. It is important to note however that zero measurements indicate that the density in such regions is very low, implying that accurate traffic state estimation is not critical.

VII. CONCLUSION AND FUTURE WORK
In this paper, we proposed a novel Bayesian methodology, aiming to derive a probability distribution of the traffic density in a specific road segment assuming that we have available measurements acquired by CAVs. We have developed algorithms for both offline and online Bayesian traffic density estimation (BTSE f and BTSE o , respectively). We compared the BTSE f and BTSE o algorithms with an approach previously presented in the literature [35] (PTSE) under two case studies: (a) a simulation scenario in both free-flow and congested conditions and (b) a real-life traffic dataset (highD) [59]. The results of the two case studies are in line and can be summarized as follows. The BTSE f algorithm is the best among the examined algorithms, yielding up to three and four times better estimation performance compared to BTSE o and PTSE. The performance of the BTSE f algorithm is consistent for different penetration rates and time-window durations, yielding excellent results in all examined settings. Comparing the two online algorithms, BTSE o yields up to 1.5 times lower estimation error compared to PTSE, especially for low penetration rates.
Future plans include the comparison of the proposed Bayesian methodology with other TSE methods in the literature. In terms of the Bayesian methodology we intend to investigate the robustness and dependence of the methodology on the chosen prior distributions, assumed hyperparameters and the effective size of the MCMC algorithm. Furthermore, we intend to use our methodology to estimate traffic flow and speed. Finally, we aim to investigate the correlation between neighbouring time-space regions and integrate macroscopic traffic dynamics to achieve highly accurate network traffic state estimation.