Coordinated CRLB-based Control for Tracking Multiple First Responders in 3D Environments

In this paper we study the problem of tracking a team of first responders with a fleet of autonomous mobile flying agents, operating in 3D environments. We assume that the first responders exhibit stochastic dynamics and evolve inside challenging environments with obstacles and occlusions. As a result, the mobile agents probabilistically receive noisy line-of-sight (LoS), as well as non-line-of-sight (NLoS) range measurements from the first responders. In this work, we propose a novel estimation (i.e., estimating the position of multiple first responders over time) and control (i.e., controlling the movement of the agents) framework based on the Cramér-Rao lower bound (CRLB). More specifically, we analytically derive the CRLB of the measurement likelihood function which we use as a control criterion to select the optimal joint control actions over all agents, thus achieving optimized tracking performance. The effectiveness of the proposed multi-agent multi-target estimation and control framework is demonstrated through an extensive simulation analysis.


I. INTRODUCTION
When disasters occur, an immediate life-saving response is essential in order to rescue people from imminent danger. The objective of this life-saving response, i.e., the search and rescue (SAR) mission, is to rescue the affected population in the shortest possible time, while at the same time ensuring the safety of the rescuers. To achieve both aforementioned tasks, we envision that a team of autonomous flying robots, e.g., unmanned aerial vehicles (UAV), drones, etc., could assist the rescue team by monitoring the first responders. The goals are to a) ensure their safety and b) assist them during their search operations by providing information about their searching patterns and spreading. The latter information is vital for ground SAR teams since maintaining proximity between first responders ensures that the field is properly covered.
However, drones currently are merely used as an "eye-inthe-sky", providing only a bird's viewpoint of the situation at hand using camera payloads. Decisions are solely based on the human operator, which analyses the received data and gives an expert opinion of the situation on the ground. Drone technology, however, is increasingly becoming capable of achieving more complex tasks mainly due to the miniaturization and improvements of electronic components (including processors, sensors, actuators, batteries and communications circuitry). The coverage problem is among the most cited problems to tackle, whereby a fleet of drone units need The   to spread across an area over the shortest time interval to obtain situational awareness or when searching for particular objects. The literature in this domain deals with task assignment, scheduling and path planning of the multiple agents, considering physical resource constraints such as the total number of agents, their battery levels and their communication ranges [1]- [6]. Solutions consider variants of the traveling salesman and vehicle routing problems [7]- [9], while heuristic [10], [11], and meta-heuristic algorithms [12]- [15] have been proposed to solve practical instances of the problem with short computation times. To further improve on the execution times, various decomposition techniques have also been investigated [16], [17]. To deal with uncertainty in the model, a number of studies have looked into stochastic modelling formulations with partially observable Markov decision processes being the most prominent example [18]. Tracking single or multiple targets in challenging environments [19]- [21] and actively searching for multiple targets with multiple cooperative agents [22]- [25] are two other problems that have attracted significant research interest. Collaborative agents have also been considered for the distribution of urgent supplies with the objective of minimizing delivery times or monetary cost subject to delivery time deadlines. Research in this domain focused on job allocation taking into consideration the limitations in battery and payload capacity [26], [27]. Multiple agents have also been considered to provide coordinated over-thetop connectivity services, and in order to do that, problems of node placement for coverage and topology control are being looked at to address communication and computation challenges [28]- [31]. Complementary to the research works discussed above in this study we are utilizing the Cramér-Rao Lower Bound (CRLB) to actively control multiple mobile agents operating in 3D environments with the goal of accurately tracking a team of first responders in the presence of both line-of-sight (LoS) and non-line-of-sight (NLoS) measurements.
The CRLB is a tool used in estimation theory to derive a lower bound on the variance of an estimator. For instance, the CRLB has been employed to estimate the localization and tracking accuracy for several motion models and wireless sensor measurements (e.g., signal strength, time of arrival) [32]. In the analysis of location systems, the CRLB indicates that the localization error at a given location will be greater than or equal to X meters given the conditions in the area of interest including the number of signal sources, the geometry (i.e., relative positions) of the sources, and the noise in the signal measurements. This puts a bound on the expected location accuracy at that location. This bound can be interpreted as a qualitative (i.e., relative positioning error), rather than a quantitative (i.e., absolute positioning error) metric. In particular, the CRLB becomes larger (i.e., higher positioning error should be expected) when less measurements (i.e., fewer UAVs) are available to track the first responders or when the geometry (i.e., relative locations) of the UAVs is not favorable. In this work, we employ this notion and consider the CRLB in the proposed control methodology to make the UAVs move jointly in such a way so that the expected tracking error over all first responders is reduced.
An illustrative example of the intended application scenario is depicted in Fig. 1, where we assume the presence of multiple first responders that need to be tracked by multiple agents within an actual 3D environment. Traditionally, first responders carry Global Navigation Satellite System (GNSS) receivers for location awareness during their operations. However, in deep urban areas where tall buildings and urban canyons are common, the satellite signals may be severely attenuated or even blocked resulting in decreased tracking accuracy. This fact inevitably impacts the coordination and operational capabilities of the team. In this scenario, a fleet of GNSS-equipped UAVs flying above the urban clutter will have accurate location information for all UAVs (due to LoS with satellite signals) and they act as a "constellation" that is closer to the first responders and capable to track them with higher accuracy, e.g., using dedicated radio transmissions for computing the ranges (i.e., distances) among the UAVs and members of the team.
Within this setting, the aim is to control the UAVs in a coordinated fashion to monitor the first responders and minimize the tracking error. To enable this behavior, we model our problem as a multi-agent multi-target system and derive a CRLB estimator that considers the visual LoS, as well as the beyond visual NLoS characteristics of the environment. By controlling the agents to minimize the estimation error using the CRLB as a control criterion we aim to accurately track the first responders on the ground and provide a reliable solution that is robust to dynamic changes in the fleet formation (e.g., when one or more drones malfunction and need to perform emergency landing or in case the battery level is low and need to fly back to the base for recharging). To summarize the contributions of this paper are: • We formulate the problem of monitoring a team of first responders in challenging 3D environments by a team of mobile agents as an estimation and control problem and we propose a novel framework that utilizes the CRLB as a control criterion in order to direct the mobile agents towards the location of the first responders and maximize the tracking accuracy. • We consider the challenging scenario where the agents probabilistically receive noisy LoS and NLoS measurements from the first responders and we analytically derive the CRLB of the measurement likelihood function. • We demonstrate the effectiveness of the proposed approach through extensive simulation experiments. The rest of the paper is structured as follows. Section II formulates the problem and gives an overview the proposed framework. Section III develops the system model and Section IV discusses the details of the proposed UAV control and state estimation framework. Finally, Section V conducts an extensive performance evaluation and Section VI concludes the paper and discusses future work.

II. PROBLEM DEFINITION AND SYSTEM OVERVIEW
In this work we assume that a rescue crew consisting of N first responders (i.e., also referred to as targets of interests or simply targets) operate during a search and rescue mission in a bounded surveillance region as depicted in Fig. 1. In order to assist the first responders during their mission and provide additional safety a team of M UAVs is tasked to monitor and track all N targets.
Specifically, we assume that the number of targets during the mission is fixed and equal to N . Additionally, the targets exhibit noisy motion dynamics and their positions are uncertain and unknown and thus need to be estimated with the help of a team of UAVs. Each of the UAVs is equipped with a ranging sensor that provides noisy range measurements from the targets. For instance, the first responders may carry bluetooth beacons or RFID tags which transmit beacon radio signals captured by receivers on board the UAVs. In this case, range (i.e., distance) between an agent-target pair can be measured through timing readings, e.g., Time of Arrival (ToA), Time of Flight (ToF), Round Trip Time (RTT) of the transmitted signals in the radio channel. Moreover, it is often the case that the first responders operate in challenging and harsh environments which usually contain obstacles and occlusions. As a consequence, the LoS radio transmission path between the UAVs and the targets is often blocked as shown in Fig. 1. To this end, the problem that we address in this work can be stated as follows: Problem Definition: At each discrete time t a team of M autonomous mobile agents must jointly decide their optimal mobility control actions in 3D such that the positioning error over all N targets of interest is minimized.
The proposed system is based on the theory of stochastic filtering, which we briefly review here. A more comprehensive description can be found in [33], [34]. In stochastic filtering we are interested in the posterior distribution p(x t |Y 1:t ) of some hidden state x t ∈ X at time t given all measurements up to time t i.e. Y 1:t = y 1 , ..., y t , with y t ∈ Y. Assuming an initial density p(x 0 ) on the state, the posterior distribution at

CRLB-based Controller
Architecture of the proposed system. time t can be computed using the Bayes recursion as: where Eqn. (1) and (2) are referred to as the time prediction and measurement update steps, respectively, while the functions p(x t |x t−1 ) and p(y t |x t ) denote the state transitional density and the measurement likelihood function, respectively. At each time step the state x t is usually extracted from the posterior distribution using the expected a posteriori (EAP) or the maximum a posteriori (MAP) estimators. An overview of the proposed system architecture is illustrated in Fig. 2. For each target (i.e., first responder) j ∈ (1, . . . , N ) we maintain a conditional probability distribution p j (x j t |Y j 1:t ), which accounts for the uncertainty of the target state x j t at time t, given all target measurements Y j 1:t up to time t. Without loss of generality we assume in this work that the target state x j t is composed of position and velocity components in 3D space and the measurements Y j 1:t are range measurements. Let the posterior distribution of the target state at time t − 1 be denoted by p j (x j t−1 |Y j 1:t−1 ) for target j. In summary, the proposed system applies the following procedure recursively over time: 1) In the first step, the predictive density p j (x j t |Y j 1:t−1 ) is obtained through the Bayes prediction step, i.e., Eqn. (1)), for each target as shown in Fig. 2. The predicted target statesx j t for j ∈ (1, . . . , N ) are then extracted from the corresponding predictive densities and along with the hypothesized control actions of each agent U i t , i ∈ (1, . . . , M ) go as inputs to the proposed controller.
2) The control objective is to find the the best combination of joint hypothesized control actions (u i * t ∈ U i t ∀i) which when applied to the agents will result to optimized tracking performance. In this work, this is achieved by selecting the combination of joint control actions that minimizes the CRLB [35], [36], as discussed in more detail in Section IV.
3) The optimal control actions u 1 * t , ..., u M * t are then applied to the agents, which move to their new states at time t where they receive range measurements y ij t , i ∈ (1..M ), j ∈ (1..N ) from the targets. These measurements are then used to compute the posterior distribution p j (x j |Y j 1:t ) of each target j and from which the estimated target statex j t can be extracted. This procedure is repeated recursively over time.

III. SYSTEM MODEL A. First Responder Dynamics
Let us assume that in our environment a total of N first responders (targets) j ∈ (1, . . . , N ) operate, where N is known and fixed. The motion dynamics of the first responders can be expressed by the following discrete-time dynamical model: where x j t = [x j , y j , z j ,ẋ j ,ẏ j ,ż j ] t ∈ X denotes the state of first responder j at time t and consists of the position and velocity in 3D cartesian coordinates and ν t ∼ N (0, Σ v ) denotes the perturbing acceleration, which is drawn from a zero mean multivariate normal distribution with covariance matrix Σ v . The matrices Φ and Γ are given by: where ∆T is the sampling period, I 3 is the identity matrix of dimension 3 × 3 and 0 3 is a zero matrix of dimension 3 × 3 as well. We should point out that in this work we assume that the dynamics of the first responders obey the Markov property, i.e., the target state at the next time step depends only upon the target state of the previous time step, as shown in Eqn. (3).

B. Agent Dynamics
During the emergency response mission a set of controllable mobile agents, e.g., UAVs, S = {1, 2, ..., M }, where M is the total number of available agents that operate in the environment assisting the first responders. Each agent i ∈ S is subject to the following discrete control dynamics: , (x, y, z) position coordinates in 3D, of agent j at time t − 1, ∆ R is a vector of possible radial step sizes, while ∆ φ = π/N φ , ∆ θ = 2π/N θ and the parameters (|∆ R |, N φ , N θ ) control the number of possible control actions. We denote the set of all admissible control actions of agent i at time t } as computed by Eqn. (5). An illustrative example is shown in Fig. 3.

C. Agent Sensing Model
Each agent i is equipped with a range sensor that provides noisy range measurements y ij the measurement noise which is distributed according to the following Gaussian mixture model: In the above model, N (0, σ 2 LoS ) denotes a zero mean Gaussian distribution with variance σ 2 LoS , which models the noise of LoS range measurements, whereas N (µ NLoS , σ 2 NLoS ) is used to model the noise characteristics of a NLoS range measurement, with a Gaussian distribution centered at µ NLoS with variance σ 2 NLoS . Moreover, the mixing component λ ij denotes the probability of agent i receiving a LoS measurement from target j, which is further given by [37]: where ϑ ij denotes the elevation angle between agent i and target j. The free parameters a and b depend on the environmental characteristics of the surveillance area and model the statistical properties of LoS between the agent and the target as a function of the agent's elevation angle ϑ; see Fig. 1.

IV. PROPOSED CRLB-BASED UAV CONTROL AND STATE ESTIMATION FRAMEWORK
In this section, we describe in detail the proposed control methodology that allows the system to determine the optimal joint control actions over all agents for high-accuracy target tracking.
Let us assume that at time t the posterior filtering distribution of target j is given by p j (x j t |Y j 1:t ) and that the state of target j at time t can be estimated asx j t = x j t p j (x j t |Y j 1:t )dx j t . Observe that in order to compute the target state, we first need to compute the posterior distribution p j (x j t |Y j 1:t ), which relies on the received target measurements through the measurement likelihood function as shown in Eqn. (2). Additionally, the quality of the received target measurement y ij t depends on the state of the agent s i t , i ∈ (1, . . . , M ) through Eqn. (6). In fact, we can observe from Eqn. (8) and (9) that the noise in y ij t is linked to the elevation angle ϑ ij t between the agent and the target. Thus, the elevation angle ϑ ij t which can be computed as: ultimately determines the quality of the received measurements and consequently the accuracy of the estimated target state. In other words, the control actions applied to the agents affect the received measurements, which in turn affect the estimated target states. Therefore, in order to optimize the tracking accuracy at a particular time t it suffices to find the optimal control actions for all agents u i t ∈ U i t , ∀i. That said, the tracking-cost depends on the applied control actions and on the received measurements. Let us denote the tracking-cost objective function by ξ track t (Y t , U t ) if we were going to apply at time t the hypothetical control actions U t (i) = u i t ∈ U i t , ∀i and receive measurements Y t (i, j) = y ij t for i ∈ (1, . . . , M ) and j ∈ (1, . . . , N ). The control problem now becomes: Observe that in order to optimize the above objective function we need to obtain the unknown future measurement set Y t , which is received only after the control vector U t has been applied at time t. As explained earlier this is because the applied control actions affect the measurements received. Next, we describe how we tackle this control problem.

A. CRLB-based Objective Function
According to the agent sensing model of Section III-C, the joint measurement likelihood function over all targets, considering measurements from all agents is defined as: where Y t (i, j) = y ij t for i ∈ (1, . . . , M ) and j ∈ (1, . . . , N ) is the range measurement of target j received by agent i with state s i t and X t (j) = x j t , j ∈ (1, . . . , N ) is the target state at time t. Moreover, in Eqn. (12) the terms µ ij t and (σ ij t ) 2 are given by: The log-likelihood ln l(Y t |X t ) is then: Let F t = [Hx 1 t , . . . , Hx N t ] be a vector, which contains the target positions Hx j t = [x j t , y j t , z j t ] for all targets j. The Fisher Information Matrix (FIM), J(F t ) can be obtained as: The CRLB of the location of first responders is obtained from the inverse of the FIM as follows: where Cov(A) and tr(A) denote the covariance matrix and the trace of matrix A, respectively. Let the FIM be denoted as, where The elements of matrix J j in Eqn. (19) are obtained using Eqn. (25) and (26), as shown in Appendix A. The inverse of the FIM can be obtained by inverting each matrix element: Substituting Eqn. (20) into Eqn. (17), the CRLB is equal to the sum of traces of all FIM elements:

B. Computation of the UAV Optimal Control Actions
In practice, the CRLB provides the Root Mean Square Error (RMSE), which can be interpreted as the achieved position error in meters [32]. However, the CRLB in Eqn. (21) is computed at known agent and target positions, i.e., the expected tracking error is computed at given positions of the first responders. Since the actual positions of the first responders are unknown (i.e., they are estimated jointly with the agents' control actions), the predicted positions of the first responders (see sub-section IV-C) are used when computing Eqn. (21). Subsequently, the CRLB is computed for all possible combinations of candidate agent locations (i.e., Eqn. (21) is computed for all the possible combinations of joint control actions of the agents). The combination that minimizes the overall responder tracking error indicates the optimal control actions, i.e., the locations where the drones should move in the next time step. This is formally given by: where the notation J −1 j,Ut indicates that the elements of matrix J j have been computed using a combination of agent hypothetical control actions U t (i) = u i t ∈ U i t , ∀i ∈ S.

C. First Responders' State Estimation
The objective here is to use the available agents in order accurately estimate the states of all targets at each time t given their stochastic motion dynamics and the received by the agents noisy range measurements. To do that we use stochastic filtering, in order to propagate in time the posterior distribution p j t (x j t |Y j 1:t ) of the target state x j t given the history of measurements Y j 1:t . In essence, we maintain N filtering densities, one for each target, which we compute recursively using Eqn. (1) and (2). We should point out that in this work we assume that the agents can distinguish from which target originates which measurement. In other words we assume that the problem of data-association is solved. This is a reasonable assumption as the radio transmitters carried the first responders will typically include their identity in every transmission to identify the source of the signal.
In Section III we observe that the target transitional density p(x t |x t−1 ) is described by a multivariate Gaussian distribution according to the target dynamics of Eqn. (3) and given by p(x t |x t−1 ) = N (x t ; Φx t−1 , ΓΣ v Γ ). Thus, the predictive density p(x t |Y 1:t−1 ) can be computed directly from the Chapman-Kolmogorov equation, i.e., Eqn. (1) given the posterior density of the previous time-step p(x t−1 |Y 1:t−1 ).
The measurement likelihood function of target j must account for the measurements received by all M agents. Thus, we need to compute p(y 1j t , . . . , y M j t |x j t , s 1 t , . . . , s M t ). Assuming pair-wise independence between the agents, the measurement likelihood function for target j can be decomposed as:

V. PERFORMANCE EVALUATION A. Simulation Setup
In order to evaluate the performance of the proposed approach we have conducted several numerical experiments. For these experiments we have used the following setup: We assume that the agents are airborne and maneuver in a 3D space according to the dynamical model given by Eqn. (5). Additionally, the agents can sense the first responders in their environment according to the sensing model given by Eqn. (6). Without loss of generality, we assume that the first responders (i.e., targets) are moving in the ground plane according to the dynamical model described in Eqn. (3). Finally, the stochastic filtering approach described in subsection IV-C was implemented as a Sequential Importance Resampling (SIR) particle filter [34], [38] to handle the nonlinearities of the measurement model.

B. Experiment 1: Indicative Application Scenario
This experiment aims to demonstrate the overall behavior of the proposed system and provide key insights regarding its operation. In this application scenario, 4 UAVs (i.e., agents) maneuver in a bounded 3D region (expressed in cartesian xyz coordinates) of dimensions 350m × 350m × 30m and 2 first responders (i.e., targets) move in the ground plane of the same region. We would like to use the 4 UAVs in order to track the 2 first responders. The objective is to control the 4 UAVs, so that the targets are being tracked as accurately as possible.
The dynamical model of the UAVs is given by Eqn.  Figure 4 shows the evolution of this experiment over 50 time-steps. In particular, Fig. 4a shows the evolution of the agent and target trajectories when the proposed control framework is applied. Fig. 4b shows the LoS model used in this experiment and Fig. 4c shows the tracking Mean Squared Error (MSE), i.e., the error between the estimated and true target positions, as well as the derived CRLB for the selected control actions taken by the UAVs. The tracking error and CRLB shown in this figure have been averaged over 100 Monte Carlo simulations of this experiment. In addition, Fig. 4d shows the height of the 4 UAVs in each time-step and finally Fig. 4e and Fig. 4f show the elevation angle ϑ between the UAVs and each target. Initially, the 4 UAVs start at a height of 10m as shown in Fig. 4d and with elevation angles ϑ close to 90 degrees as shown in Fig. 4e and Fig. 4f. During the first time-steps both the CRLB and the tracking error are high as is shown in Fig. 4c. This is because the geometry and formation of the UAVs during this time window does not allow for accurate localization. Note that the CRLB and the tracking error should be compared qualitatively, rather than quantitatively. This is because the CRLB derivation does not consider the dynamical model of the first responders, which is used by the filtering algorithm. In other words, the CRLB cannot be used to bound the tracking error in the proposed framework. Instead, the CRLB is used as a control strategy to guide the UAVs as is demonstrated by the results in this section.
To continue with our evaluation, we observe that between time-steps t = 5 and t = 10, the tracking error decreases significantly and so does the CRLB. This is due to the better formation that the UAVs take during this period. Intuitively, the agents would like to position themselves with respect to the targets in such way so that the probability of LoS measurements is increased (i.e., Eqn. (9)). This probability is determined by the elevation angle ϑ ij between agent i and target j, which is given by Eqn. (10). As we can see in Fig. 4b, for elevation angles larger than 40 degrees the probability of LoS measurements is close to 1. This explains the behavior of the UAVs during t = 1 to t = 10. Observe that during this period the elevation angle of the UAVs with the 2 targets is decreasing from around 86 degrees down to about 40 degrees. During this period the agents maintain their height. For time-steps t > 10, the targets continue to move away from each other and from the agents. The agents now follow a different strategy according to the CRLB control. In order to maintain a good elevation angle which provides a high probability of LoS measurements they have two options as depicted in Fig. 4a: a) they can either keep approximately the same (x, y) coordinates and increase their height (agent 2 and agent 3) or b) they can re-position themselves closer to the targets (agent 1 and agent 4) to achieve better angles. We should point out that in this experiment the maximum height that the agents can reach is limited to 30m, which explains the behavior of agents 2 and 3 after time-step t > 20. Also, observe that in this scenario where the two targets move away from each other, the CRLB controller splits the 4 agents into 2 groups and assigns group 1 (agents 1 and 3) with target 2 and group 2 (agents 2 and 4) with target 1, so that 2 LoS measurements are received per target. Fig. 4e shows that during this experiment agent 1 and 3 maintain an angle of around 40 degrees with target 2, which almost ensures LoS measurements. The same is true for agents 2 and 4 with respect to target 1. Finally, we observe in Fig. 4c that the CRLB, which is used as the UAVs control criterion, follows the same trend as the tracking error.

C. Experiment 2: Robustness to Agent Failures
The next experiment aims to provide more insights by investigating the robustness of the proposed control methodology in case one UAV fails and breaks down during the mission. Specifically, in this application scenario (Fig. 5), 3 UAVs are used to track 2 targets which start at the same position [x, y, z] = [50, 10, 0] and move away from each other as illustrated in Fig. 5a with velocity vectors [0.6, 06, 0] and [−0.6, 06, 0] for target 1 and target 2, respectively. For this experiment the following setup has been used: Three agents are initialized at [100, 10,20] , [10,100,20] and [50, 80, 5] for agents 1, 2 and 3 respectively, inside a 3D surveillance region of size 100m × 100m × 50m. The agent dynamical model parameters are ∆ R = [1,2,5]m, N φ = 2 and N θ = 4; and the LoS model used is shown in Fig. 5f with a = 0.5 and b = 13. All the other parameters have remained the same with the previous experiment.
During time-steps t = 1 to t = 10 all 3 agents increase their height to about 50m above the ground in order to achieve an elevation angle with respect to both targets at around 40 degrees and receive LoS measurements. Once the agents have reached the allowed height limit they start to move towards the targets as illustrated in Fig. 5a. Observe, from Fig. 5c and Fig. 5d that all agents try to maintain at least an elevation angle of around 40 degrees with all targets. Between time-steps t = 10 and t = 25 all agents try to maintain LoS with the targets, i.e., agent 1 moves towards the vicinity of target 1, agent 2 moves towards the vicinity of target 2 and finally agent 3 tries to maintain equal distance from the 2 targets as is shown.
At time-step t = 25, agent 2 fails and goes off. As a result, only agents 1 and 2 are left to track the 2 targets. During time-steps t ≥ 25 we observe that the two remaining agents move in positions which will allow them to maintain an elevation angle with the targets of at least 40 degrees. The optimal formation which achieves this is illustrated in Fig. 5a, where the two agents move along the line segment which keeps the two targets in equal distance from them. Finally, Fig. 5e. shows the average MSE tracking error and CRLB over 100 Monte Carlo trials for this scenario. First, we can verify that the tracking error and the CRLB follow the same trend. More interestingly we can observe a jump at time-step t = 25 in both the CRLB and the tracking error which corresponds to the time-step at which agent 3 broke down. This increase in the error and CRLB reflects the reduced tracking accuracy that the system can deliver by receiving only 2, instead of 3, measurements from the targets.

D. Experiment 3: Application in Different Environments
This last experiment aims to demonstrate the behavior of the proposed system in different environments. Specifically, we investigate the performance of the proposed system in two different settings, i.e., urban and rural and the corresponding results are shown in Fig. 6. Intuitively, in urban environments, which exhibit tall structures like buildings, obstacles and obstructions, the UAVs will most likely receive LoS measurements from a potential target only for large elevation angles. This is because the UAVs will have to fly higher and above the environmental obstructions to maintain LoS with the target. On the other hand, in a rural setting with no obstructions the UAV can maintain LoS with a target at lower elevation angles, since there are no obstacles to block the LoS.
The LoS profiles for these two settings are illustrated in Fig. 6d. This figure is produced using Eqn. (9) with a = 0.9, b = 12 and a = 0.5, b = 15 for the urban and rural settings respectively, and shows that in urban environments to reach a LoS probability of 0.95 requires an elevation angle between the UAV and the target of about 65 degrees, whereas in rural settings the same is true for angles of approximately 37 degrees. This is to demonstrate than due to the obstacles in urban settings the agent needs to maintain a higher elevation angle with the targets in order to receive LoS measurements.
For this experiment, we allow 3 agents to move in a 3D bounded area of size 100m×100m×80m. The agent control parameters are given by ∆ R = [0.5, 0.8, 1, 3]m, N φ = 3 and N θ = 6. The agents are distributed randomly within the disk centered at [20,20,1] with radius of 5m. Additionally, one target is initialized at x 0 = [30, 30, 0, 1, 1, 0] and moves for 50 time-steps as is illustrated in Fig. 6a. The rest of the parameters used in this experiment are as follows: σ LoS = 0.5m, σ NLoS = 5 and µ NLoS = 10m. Figure 6 shows the trajectories that the 3 UAVs follow in order to track the target for the 2 different settings namely urban and rural.
The result shown here is the average over 100 Monte Carlo simulations. Interestingly, we can see that in the urban setting the UAVs begin quickly to gain height and at the same time to move towards the target until the elevation angle between every single UAV and the target exceeds the 60 degrees. It is shown in Fig. 6d that for this elevation angle the probability of LoS measurements is higher than 0.9. During, time-steps t = 10 to t = 50, we can observe that the 3 UAVs maintain their current height (at approximately 30m) and successfully track the target by maintaining an elevation angle which maximizes the probability of LoS measurements and thus the tracking performance. For the rural setting, we observe that the 3 UAVs follow a similar strategy (Fig. 6e). In this scenario due to the different LoS profile (Fig. 6d) the UAVs maintain a lower height, i.e., they fly up to approximately 15m (Fig. 6f). This however allows them to maintain an elevation angle with the targets (Fig. 6g) of approximately 38 degrees which results in LoS measurement with probability of around 0.95.
In summary, the control strategy followed in both scenarios, resulted in optimizing the tracking performance by selecting the control actions which result in the reception of LoS measurements. This is also shown in Fig. 6h, where the tracking error in both scenarios shows to decrease over time until it reaches a plateau. It is also worth noting that despite the significantly different environment, the control strategy resulted in similar tracking performance i.e., in both environments the UAVs tried to reach their optimal state which achieves LoS with the target.

VI. CONCLUSION
We have investigated the problem of monitoring a team of first responders operating in challenging 3D environments using multiple autonomous mobile agents. We have proposed a novel estimation and control framework that takes into account the noisy dynamics of the first responders and the probabilistic nature of the received measurements, which can be either LoS or NLoS depending on the environmental characteristics. We have proposed a novel control criterion based on Cramér-Rao Lower Bound which can be used to select the optimal joint control actions of the agents to attain high tracking accuracy. Finally, we have demonstrated the performance of the proposed approach through a series of simulated experiments. Future work, will focus on the realworld implementation and testing of the proposed system, and its integration into our existing multi-drone tasking platform [39], [40].