Jointly-Optimized Searching and Tracking with Random Finite Sets

In this paper, we investigate the problem of joint searching and tracking of multiple mobile targets by a group of mobile agents. The targets appear and disappear at random times inside a surveillance region and their positions are random and unknown. The agents have limited sensing range and receive noisy measurements from the targets. A decision and control problem arises, where the mode of operation (i.e., search or track) as well as the mobility control action for each agent, at each time instance, must be determined so that the collective goal of searching and tracking is achieved. We build our approach upon the theory of random finite sets (RFS) and we use Bayesian multi-object stochastic filtering to simultaneously estimate the time-varying number of targets and their states from a sequence of noisy measurements. We formulate the above problem as a non-linear binary program (NLBP) and show that it can be approximated by a genetic algorithm. Finally, to study the effectiveness and performance of the proposed approach we have conducted extensive simulation experiments.

Predicted ideal measurement set (PIMS) for time k based on time k−1. X k|k−1 Estimation of X k based on time k−1. X k Estimation of X k based on time k. n k|k −1 Estimation of the true number of targets n k based on time k − 1.

INTRODUCTION
The main objective of a search and rescue mission (e.g. ground search and rescue, air-sea rescue, etc) is to search for and provide aid to people who are in imminent danger as efficiently and safely as possible. When disasters happen e.g. earthquakes, marine disasters and aircraft accidents, the search and rescue (SAR) missions are of critical importance for finding survivors and saving lives. SAR missions however could be very dangerous and expensive (e.g. the search for the Malaysian Airlines Flight 370 that disappeared on 8 March 2014 had costed a total of more than US $150 million).
On the other hand, the miniaturization and cost reduction of electronic components and the recent advances in robotics and specifically in small unmanned aerial vehicles (UAVs) have spurred an unprecedented interest on intelligent mobile agents for group missions. Motivated by this, we believe that a team of autonomous mobile agents could become an important aid in many search and rescue missions by improving the efficiency while at the same time reducing the need to place the rescuers in dangerous arXiv:2302.01845v1 [eess.SY] 1 Feb 2023 situations. Take for instance a maritime disaster where a team of drones could be deployed to search for and track survivors until further assistance is available.
In this work we are interested in the task of searching for and tracking multiple survivors or targets of interest in SAR missions with a group of autonomous mobile agents. This is a challenging problem since a) the number of survivors that needs to be tracked at a time instance is random, unknown and thus needs to be estimated, b) the agents need to estimate the location of the survivors through noisy measurements and c) the mobile agents exhibit a limited sensing range thus there is a need for efficient searching. Finally, the problem becomes even more challenging due to the tradeoff which arises on how many resources to allocate in each task i.e. searching or tracking at each time instance so that the joint search and track (SAT) objective is achieved as best as possible.
To address the above challenges, we propose a unified probabilistic approach to jointly tackle the problem of searching and tracking of multiple moving targets by a team of mobile agents. The proposed approach, refereed to hereafter as JoSAT (Jointly-optimized Searching and Tracking) is formulated in the framework of recursive Bayesian multiobject stochastic estimation using random finite sets (RFS) [1], [2] which allows us a) to simultaneously estimate the time-varying number of targets and their states from a sequence of noisy measurements and b) avoids the problem of data-association in multi-target tracking. The contributions of this paper are: • Provides a multi-agent probabilistic framework for jointly searching and tracking multiple targets inside a given surveillance area. We utilize the theory of random finite sets (RFS) in a multi-agent framework to accurately capture the inherent uncertainty present in many search-and-track (SAT) missions.
• Develops a novel decision (i.e. the agents can switch between search and track mode during the mission) and control (i.e. control the movement of all agents) algorithm which takes into account the stochasticity of the system in order to tackle the joint objective of searching and tracking.
• Formulates the decision and control problem as a non-linear binary optimization program which is then solved using a genetic algorithm.
The rest of the paper is organized as follows. Section 2 reviews the existing literature on searching and tracking by single and multiple agents, demonstrating in the way the contributions of this work. Section 3 outlines the problem addressed in this paper and Section 4 presents an overview of the proposed system. Section 5 provides a brief overview on the framework that the proposed approach is based on namely a) random finite sets (RFS) and b) Bayesian multiobject stochastic filtering and then Section 6 presents the details of the proposed approach. Section 7 reformulates the problem in the context of mathematical programming, demonstrates the complexity of the resulting non-linear binary program and elaborates on how the problem can be solved in practice using a genetic algorithm. Section 8 conducts an extensive performance analysis and finally, Section 9 concludes the paper and discusses future work.

RELATED WORK
Coordinated teams of agents unlock significantly greater capabilities than what is possible by single-handed missions. By taking this into consideration, existing literature has looked into several challenging problems that multi-agent systems can successfully tackle.
The coverage problem is among the most cited problems, where a fleet of airborne agents need to spread across an area over the shortest time interval for situational awareness or when searching for particular targets. Research works have focused around task assignment, scheduling and path planning of the multiple agents, considering physical resource constraints such as the total number of agents, their battery levels and their communication ranges [3] [4] [5] [6] [7] [8].
Another challenging problem is that of localization and tracking of single or multiple targets [9] [10] [11]. For this problem both centralized and distributed solutions are being investigated for the envisioned tracking strategies which generally seek to minimize the tracking error for all detected targets [12] [13] [14].
Interestingly though, many of these problems are interconnected and interrelated and thus need to be looked at jointly especially in certain settings such as in search and rescue missions (SAR) where efficiency and effectiveness are of essence. The problem of SAR with single or multiple robots has attracted a wide interest in the recent years with the research community proposing methods from multirobot cooperative learning [15] and path-planning strategies [16] to collaborative online task-planning by heterogenous robot teams [17] and autonomous multi-UAV systems for SAR missions [18]. Detailed surveys on multi-robot SAR can be found in [19], [20]. In this paper however, we are focusing on a particular sub-problem i.e. the problem of search and track which has also attracted a wide interest by the research community because of it importance in SAR missions. An interesting work in [21] develops a probabilistic approach for the SAT problem but only for the single-agent singletarget case. Similarly, a more recent work [22] investigates the SAT problem for the single-agent single-target scenario, and proposes a model-predictive control (MPC) framework for the path-planning problem. The work [23] considers the cooperative management of groups of UAVs as an optimization problem of the information obtained through multi-target tracking. Compared to our work however, the work in [23], is mainly focused on improving the tracking accuracy without including the search objective. The works in [24], [25] develop an information-theoretic multi-agent SAT approach which is used to search for a single stationary target by minimizing the entropy of the target distribution at each time step. Moreover, [26] looked at the problem of route planning for multi-agent SAT missions and proposed a Fisher information based approach to route planning. However, in [26] the problems of false alarms and data association are not considered. Similarly, the work in [27] presents a receding horizon controller (RHC) approach to jointly drive a group of UAVs to SAT missions. In contrast, in the proposed approach the data-association problem can be avoided altogether. Notably, the classical approaches to multi-target tracking require to first solve the data association problem and then estimate the states of the targets. RFS-based approaches can estimate the states of the targets directly without requiring to first solve the data-association problem [28]. In addition, existing techniques make weak assumptions regarding the number of targets, the target dynamics and the measurement models.
Complementary to the aforementioned studies, the proposed system advances the state-of-the-art, by considering a multi-agent approach where the agents can dynamically switch their mode of operation between searching and tracking during the mission in order to jointly optimize the coupled objective of searching and multi-target tracking and by developing a unified probabilistic search-and-track framework based on random finite sets which avoids the problem of data-association and accounts for the uncertainties in the number of targets, target dynamics and measurement models. Finally, this work derives both optimal and heuristic solutions to solve the resulting joint searching and tracking problem.

PROBLEM DEFINITION
In this work we assume that the agents can operate in either search or track mode and that they can switch modes dynamically during the mission in order to optimize the joint search and track objective. The problem to solve can be stated as follows: At each time step a fixed number of agents must find their mode of operation and the control actions which result in the best tradeoff between maximizing the search coverage and tracking all targets found in the surveillance area.
Throughout this paper we consider the following modeling assumptions.

Single Target Dynamics
The single target state vector x k ∈ X , k ∈ N evolve in time according to the following equation: where the X ⊆ R nx denotes the state space and the known function ζ k : R nx → R nx models the dynamical behavior of the target. Eqn. (1) describes the evolution of the state vector as a first order Markov process with transitional density π k|k−1 (x k |x k−1 ) = p v (x k − ζ k (x k−1 )). The random process v k ∈ R nx , which is referred to as the process noise, is IID according to the probability density function p v (.). The role of the process noise is to model random disturbances in the evolution of the state. Without loss of generality, in this paper we assume that the state vector x k ∈ X ⊆ R 4 is composed of position and acceleration components i.e.
give the 2D position of the target in Cartesian coordinates and (ṗ x ,ṗ y ) are the velocities of the target in the x and y direction respectively. Suppose now that multiple targets x i k exists inside the surveillance region where we assume that the targets are independent of each other. The targets can spawn from anywhere in the state space X and target births and deaths occur at random times. This means that at each time k, there exist n k target states x 1 k , x 2 k , ..., x n k k , each taking values in the state space X where both the number of targets n k and their individual states x i k , ∀i are random and time-varying. The group of target states at time k can now be modeled as the set: where F(X ) denotes the space of all finite subsets of X . Note here that n k is the true but unknown number of targets at time k which needs to be estimated along with the target states x i k , ∀i. In Sec. 5-6 it will become evident that X k is actually a random finite set (RFS). We will refer to X k as the multi-target state.

Single Agent Sensing Model
The ability of an agent to sense its 2D environment is modeled by the function p D (x k , s k ) that measures the probability that a target with state x k at time k is detected by an agent with state (i.e. 2D position coordinates) More specifically the detection probability of a target with state x k at position p k = Hx k (where H is a matrix that extracts the xy-coordinates of a target from its state vector) from an agent with state s k , is given by: (3) where d k = Hx k − s k 2 denotes the Euclidean distance between the agent and the target, p max D is the detection probability for targets that reside within R 0 distance (i.e. the agents primary radius) from the agent's position and finally η captures the reduced effectiveness of the agent to detect distant targets.
When an agent detects a single target it receives a measurement vector z k ∈ Z which is related to the target state as follows: where Z ⊆ R nz denotes the measurement space and the function h k : R nx → R nz projects the state vector to the measurement space. The random process w k ∈ R nz is IID, independent of v k and distributed according to p w (.). The probability density of measurement z k for a target with state x k when the agent is at state s k is given by the measurement likelihood function g k (z k |x k , s k ) = p w (z k − h k (x k , s k )).
Without loss of generality, in this paper we assume that the measurement vector z k ∈ Z consists of range and bearing measurements.
That said an agent can receive m k measurements z 1 k , z 2 k , ..., z m k k at each time step k each taking values in the measurement space Z, where m k and z i k , ∀ i are random and time varying. Thus the group of received measurements by an agent at time k can be modeled as the set: where F(Z) denotes the space of all finite subsets of Z. In Sec. 5-6 it will be shown that Z k is also a random finite set. Z k will be referred to as the multi-target measurement set.

Agent Dynamics
Let S = {1, 2, ..., |S|} be the set of all mobile agents that we have in our disposal operating in a discrete-time setting. At time k the 2D surveillance area A ⊆ R 2 is monitored by |S| mobile agents with states s 1 k , s 2 k , ..., s |S| k , each taking values in A. Each agent is subject to the following dynamics with bounded control inputs: where s j k−1 = [s j x , s j y ] k−1 denotes the position (i.e. xycoordinates) of the j th agent at time k − 1, ∆ R is the radial

JoSAT Objective
The jointly optimized search and track objective can now be more formally defined as: subject to α ∈ P(S) The cost of tracking depends on the control u j k that is applied to agent j through the future received measurements (i.e. depending on the control action taken, different measurements will be received). The measurement generator function Z PIMS (X j k|k−1 , u j k ) generates the predicted ideal multi-target measurement set based onX j k|k−1 and u j k . Finally, the constraint u i k − u j k > d min , ∀ i = j does not allow any two agents to get close to each other. This is by design since in this work there is no benefit of two agents to be close to each other, and for instance search the same area or perform tracking of the same target.
We should note here that the optimization of Eqn. (7) finds the joint control actions over all agents, and at the same time partitions the agents into two groups i.e. search or track so that the joint search-and-track cost is minimum. In other words, at each time-step, the system automatically assigns the agents to searching or tracking mode and selects the joint control actions which minimize the JoSAT objective. Also note, that in this work we do not assume any time delays between agents i.e. all agents have access to the same clock and they are fully synchronized.

SYSTEM OVERVIEW
In this section we give an overview of the proposed system architecture, shown in Fig. 2, and we outline how we have incorporated the theory of multi-object filtering to the problem of multi-agent joint searching and tracking.
To summarize, an agent j ∈ S can operate in either search or track mode and in each mode the agent is subject to a set of admissible control actions denoted by U j k for time k. The agents can dynamically switch between the two modes during the mission in order to maximize the joint global objective of searching and tracking i.e. Eqn (7). When a subset of agents α ⊆ S is in search mode the collective objective over all agents j ∈ α is the maximization of the covered area. In other words the agents should traverse the surveillance area in such a way so that the coverage is maximized. On the other hand when a subset of agents α C ⊆ S (where α C denotes the complement of α with respect to S) is in track mode, their objective is the maximization of the tracking performance, i.e. to move in locations where the number of targets and their states is estimated as accurately as possible. Deciding on the subset of agents in search or track mode is governed by the respective objective functions that control the movement of the agents to achieve the best tradeoff. In other words the proposed controller solves the optimization of Eqn. (7).
In order to achieve this, the proposed system takes the following steps: The number of targets and their states at time k are jointly modeled as a random finite set X k (also known as multi-target state) which is a generalization of a random variable/vector (see Sec. 5).
Each agent maintains a multi-object probability distribution (see Sec. 5-6) which jointly accounts for the uncertainty in the number of targets and their states (i.e. position and velocity) inside its sensing range (this multi-object probability distribution is a generalization of the probability density function on random finite sets i.e. X). Let the posterior multi-object distribution at time k − 1 be denoted by f j k−1 for agent j. To be precise f j k−1 = f j k−1 (X j k−1 |Z j 1:k−1 ) where X j k−1 is conditioned on the received measurement sets Z j 1:k−1 up to time k − 1, however we will often use the short notation i.e. f j k−1 to denote this distribution. The following recursion is shown: a) for each agent j the predictive multi-object probability distribution f j k|k−1 is first computed from the posterior f j k−1 using the prediction step, b) the estimated numbed of targets and their states are extracted from each predictive multi-object distribution, c) an optimization problem is then solved where different hypothetical control actions from the set of admissible control actions U j k are generated for each agent. The mode of operation i.e. search or track and the corresponding control actions for all agents are jointly selected so that the total cost of searching (given by the searching objective function i.e. ξ search ) and tracking (given by the tracking objective function i.e. ξ track ) is minimized. d) Finally, the agents move to the new states according to the selected optimized controls {u 1 * k , . . . , u |S| * k } and compute the posterior multi-object distribution f j k using the received measurements Z j k at time k. The same procedure is the repeated for the next time instance.
In the first step the predictive density f j is obtained through a prediction step as the figure shows. This allows us to make predictions regarding the number of targets and their states for the next time-step. The estimated number of targets (n j k|k−1 ) and their states (X j k|k−1 ) for the next time-step are extracted from the predictive density.
Based on these predictions the proposed controller finds the combination of control actions (u j k ∈ U j k ∀j) as well as the mode of each agent which minimizes the convex sum of the search and track objective functions (see Sec. 6).
The optimal control actions u 1 * k , ..., u |S| * k are then applied to the agents. Each agent then moves to its new state at time k where it receives the multi-target measurement set Z j k (if it exists). This measurement set at time k is then used in the update step i.e. each agent uses the received measurement set to compute the posterior distribution f j k = f j k (X j k |Z j 1:k ). The updated (i.e. final) estimates of the number of targetŝ n j k and their statesX j k for time k are then computed from f j k for each agent j. The total estimated number of targets at time k is then computed asn k = jn j k , ∀j and their states asX k = jX j k , ∀j. This procedure is repeated recursively over time.
We should note here that the number of targets (and their states) is being estimated recursively over time. Based on these estimates, the mode of operation should be carefully chosen. For instance when no targets are inside the surveillance area, all agents should be in search mode, etc. It is also worth noting that the control actions taken, affect the received measurements which in turn affect the estimation of the target state during the update step. Hence, to optimize the control actions would require the knowledge of all the future measurements. In the next section we provide an overview of random finite sets for multiple target tracking which we use subsequently to address the aforementioned challenge in our proposed approach.

BACKGROUND ON MULTI-TARGET TRACKING
Multiple target tracking (MTT) [29], [30] refers to the problem of estimating the number and states of multiple targets from noisy sensor measurements in the presence of false alarms or clutter. MTT is nowadays found in many applications including surveillance, defense, autonomous vehicles, robotics, etc. In this section we only give a brief overview on the various MTT techniques. A more detailed description of MTT algorithms can be found in [29], [31].
In this work we divide MTT algorithms in two main categories namely data association-based and data association free. Data association-based MTT algorithms such as the MHT, JPDA and GNN [29], [31] require to first solve the measurements-to-tracks assignment problem and only then proceed with the multi-target state estimation. On the other hand, data association free methods such as the RFS-based PHD, CPHD and Multi-Bernoulli filters [28], [30] can bypass the data association problem and proceed directly with the multi-target state estimation. Depending on the application, this property could be highly desirable since it significantly reduces the computational complexity of MTT algorithms.
The PHD filter [32] is the first computationally desirable approximation of the multi-target Bayes posterior. In particular it propagates the first-order statistical moment of the multi-target posterior in place of the full posterior distribution. The computationally more expensive CPHD filter [33] provides a better performance compared to the PHD filter since it jointly propagates the first-statistical moment and the cardinality distribution. Unlike the PHD and CPHD filters which propagate moments the multi-Bernoulli filter [34] approximates the true multi-target posterior distribution as multi-Bernoulli and propagates the parameters of a multi-Bernoulli distribution. The multi-Bernoulli filter avoids many of the problems present in the PHD/CPHD filters i.e. these filters require a clustering step to extract state estimates which introduces an additional source of error and increases the computationally complexity. This makes the multi-Bernoulli filter a more attractive solution with linear complexity in the number of targets and measurements.
We should also point out that with the introduction of labeled RFSs [35], [36] and in particular with the inception of the generalized labeled multi-Bernoulli (GLMB) density the multi-target Bayes filter can tackle the data-association problem and provide target tracks. More specifically, the GLMB multi-target filter [37] is able to simultaneously estimate the number of targets and their states from a set of noisy observations in the presence of data association uncertainty, detection uncertainty and clutter. However, compared to the multi-Bernoulli filter the computational complexity of the GLMB filter is at best cubic in the number of measurements. An extension to the multi-Bernoulli filter i.e. the Labeled multi-Bernoulli (LMB) filter, was presented in [38]. The LMB filter achieves better performance compared to the multi-Bernoulli filter while at the same time outputs target tracks. This however comes with an increased computational cost i.e. its computational complexity is at worst cubic in the number of measurements. Recently the authors in [39] have presented a more efficient implementation of the GLMB filter with linear complexity in the number of measurements and quadratic in the number of targets by combining the prediction and update into a single step.

Random Finite Sets
A random finite set (RFS) is a finite-set-valued random variable, which differs from a random vector in two ways: a) the number of elements in a RFS is random and b) the order of the elements in a RFS is irrelevant. More specifically, a RFS X is completely specified by a) its cardinality distribution ρ(n) = p(|X| = n), n ∈ N 0 which defines a probability distribution over the number of elements in X and b) by a family of conditional joint symmetric probability distributions p(x 1 , ..., x n |n), x 1 , ..., x n ∈ X that characterize the distribution of its elements over the state space X . In connection with our problem, the random variable n can be used to model the time-varying number of targets (i.e. the number of survivors at some point in time) and thus ρ(n) is the probability distribution of the number of targets.
The belief density or otherwise known as the multi-object probability density function (pdf) f (X) of the RFS X is given by: f (X) = f ({x 1 , ..., x n }) = n!ρ(n)p(x 1 , ..., x n |n) and the notion of integration is given by the set-integral which is defined as: is the space of finite subsets of X . The following RFSs are relevant in this paper:

Bernoulli RFS
The Bernoulli RFS X can either be empty with probability 1 − r, r ∈ (0, 1) or be a singleton set (i.e. its set cardinality is equal to one) with (existence) probability r and with its element distributed over the state space X according to pdf p(x). The Bernoulli multi-object pdf is given by: Thus a Bernoulli RFS can be completely characterized by the parameter set (r, p(x)).

Multi-Bernoulli RFS
The multi- with multi-object pdf given by (see [30] pp. 102): Here the notation 1≤i1 =,..., =in≤v (.) enumerates the n- In other words, it enumerates all ordered arrangements of n elements taken from a set of size v. A multi-Bernoulli distribution is completely specified with the parameter set In connection with our problem v can be used to model the maximum number of hypothetical targets maintained by an agent.

Poisson RFS
The poisson RFS X has a cardinality distribution which is Poisson with parameter λ i.e. ρ(n) = e −λ λ n n! , n = 0, 1, 2... and elements which are independent and identically distributed (IID) random variables and distributed according to p(x) on X . The multi-object pdf is given by: where κ(x) = λ p(x) is called the intensity function for the Poison RFS.

Multi-object Stochastic Filtering
In stochastic filtering [40] we are interested in the posterior probability density p k (x k |z 1:k ) of some hidden state x k at time k given all measurements z 1:k = z 1 , ..., z k . Assuming an initial density on the state p 0 (x 0 ), the posterior density at time k can be computed using the Bayes recursion as: where Eqn. (12) and (13) are referred to as the prediction and update steps respectively. The function π k|k−1 (x k |x k−1 ) models the uncertainty of the current state given the previous state and is referred to as the transitional density and the function g k (z k |x k ) describes the distribution of measurements given the current state and is referred to as the measurement likelihood function. At each time step the hidden state is usually extracted from the posterior distribution using the expected a posteriori (EAP) or the maximum a posteriori (MAP) estimators. The above recursion can be extended [28] in the case of RFSs. Suppose now that the hidden state and measurements are RFS and that at time k − 1 the posterior multi-object pdf of X is given by f k−1 (X k−1 |Z 1:k−1 ). The predicted and updated multi-object densities are then given by the multiobject Bayes recursion: where the integrals in Eqn. (14)- (15) are set integrals, and the functions φ k|k−1 (X k |X k−1 ) and ϕ k (Z k |X k ) are the multiobject transitional density and the multi-object likelihood function respectively. Because the recursion above has no analytic solution in general, several approximations exist (as in [30]) which propagate only summary statistics instead of the full posterior density. A prominent approach is the multi-Bernoulli filter [34], [41] which is used in this work and elaborated in Sec. 6.

JOSAT MULTI-TARGET FRAMEWORK
This section develops the components of the JoSAT system using random finite sets and derives the multi-agent objective functions for the tasks of searching and tracking. Intuitively, the collection of targets at each time instance can be seen as a set which changes size as targets appear and disappear from the surveillance region (e.g. due to births and deaths). Since this set consists of a random number of random variables it can be modeled as a random finite set and the same is true for the received measurements as we detail below.

Multi-target Dynamics
Earlier in subsection 3.1 we have described the single target dynamics, in this subsection we will use random finite sets to describe the multi-target dynamics.
Using the RFS theory we can now define a multi-object dynamic model i.e. the evolution in time of the RFS X k as follows: where Ψ(x k−1 ) is a Bernoulli RFS which models the evolution of the set from the previous state, with parameters . Thus a target with state x k−1 continues to exists at time k with surviving probability p S (x k−1 ) and moves to a new state x k with transition probability π k|k−1 (x k |x k−1 ), (see Eqn. (1)). Otherwise the target dies with probability 1 − p S (x k−1 ). Note here that X k is a multi-Bernoulli RFS assuming that the RFSs constituting the union in Eqn. (16) are mutually independent. Finally, the term Γ k denotes the multi-Bernoulli RFS of spontaneous births with uniform spatial distribution over X . We now have a multi-object transition model i.e. Eqn. (16) which jointly incorporates motion, birth and death for multiple targets.

Multi-target Measurement Model
At each time step an agent can receive a time-varying number of measurements i.e. the number of measurements is random and depends on: a) the number of targets inside the agent's sensing range which is random as well, b) whether all targets have been detected or not, and c) whether measurements come from targets or from clutter (i.e. false alarms). That said, the measurements received at some time instance can be modeled as a random finite set. The multi-target measurement set Z j k of agent j at time k is given by: where X j k ⊆ X k is the multi-target state perceived by the j th agent. Θ(x j k ) is a Bernoulli RFS which models the target generated measurements with parameters . Thus a target with state x j k at time k is detected by the j th agent with state s j k with probability p D (x j k , s j k ) (see Eqn. (3)) and generates a measurement z ji k with likelihood g k (z ji k |x j k , s j k ) (see Eqn. (4)) or is missed with probability 1 − p D (x j k , s j k ) and generates no measurements. Additionally an agent can receive false alarms measurements i.e. the term K k is a Poisson RFS which models the set of false alarms or clutter received by an agent at time k with intensity function κ k (z ji k ) = λf c (z ji k ), where in this paper f c (.) denotes the uniform distribution over Z.
Note here that Eqn. (17) accounts for the detection uncertainty and clutter and that in this work the following holds by design X k = j X j k , ∀j and X j1 k ∩ X j2 k = ∅, ∀j 1 , j 2 i.e. a target cannot be tracked by more than one agent.

Tracking Multiple Targets
When the agent j is in track mode the objective is to estimate at each time k both the number of targets n j k and their respective states x ji k , ∀i ∈ {1, ..., n j k } inside its sensing range from a sequence of noisy measurement sets Z j 1:k . In essence we would like to compute the multi-object density of X j k given the measurement sets Z j 1:k i.e. f j k (X j k |Z j 1:k ) using the multi-object Bayes recursion of Eqn. (14) - (15). The RFS formulation of subsections 6.1-6.2 however, allows us to compute this recursion with a multi-Bernoulli filter [34] which approximates the multi-object posterior density as multi-Bernoulli distribution and propagates only its param- in time instead of the full distribution. For brevity, in the rest of the paper, we will denote the multi-Bernoulli distribution at time k as . Furthermore, the multi-Bernoulli filter recursion is summarized here. A detailed description of the multi-Bernoulli filter can be found in [34]. The multi-Bernoulli filter proceeds as follows: The predictive multi-object density at time k is given by: describes the set of persisting targets from the previous time-step and f birth are the parameters of a multi-Bernoulli RFS for the new-born targets at time k. f persist k|k−1 is further given by: where the notation < f, g > is defined as f (x)g(x)dx. Let the predictive multi-object density denoted by where now the term v k|k−1 denotes the number of hypothesized Bernoulli components after accounting for the target births and let the multi-object measurement set at time k be Z k = {z 1 k , ..., z m k k }. The posterior multi-object distribution at time k is given by a multi-Bernoulli distribution with parameters: contains legacy tracks i.e. updates of predicted tracks, assuming that no measurements were collected from them and given by: where p D (x k , s k ) is the probability of detecting a target as described in subsection 3.2. f meas k = {(r k (z), p k (., z))} z∈Z k contains new tracks i.e. joint updates of all predicted tracks using each of the measurements separately; and given by: where L z k (x k ) = g k (z k |x k )p D (x k , s k ). The posterior multi-object density at time k for agent j is thus given by from where the estimated number of targetsn j k and the multi-target statê X j k can be computed (see Sec. 6.4). The total estimated number of targets at time k in the surveillance area is then computed asn k = jn j k , ∀j and the multi-target state which accounts for all targets in the area asX k = jX j k , ∀j.

Multi-Agent Tracking Objective
from agent j where v j k is the maximum number of hypothesized targets, r ji k denotes the probability of existence of the i th hypothesized target (i.e. how likely the i th hypothesized target is a true target) and p ji k (.) denotes the corresponding posterior density of the state of the i th target.
The expected a posteriori (EAP) estimate of number of targetsn j k can be computed asn j k = round( i r ji k ) and the multi-target stateX j k can then be computed by taking then j k components with the highest probabilities of existence and extracting the single target states (i.e. x k ) from their individual posterior densities asX j k = n j k i=1 {x k = arg max p ji k (x k )}. In tracking, we wish to estimate as accurately as possible then j k andX j k for all j agents. Intuitively, one way to achieve this is to minimize the variance in the posterior distribution f j k from which we derive the above estimates. In this work we build upon this and we denote the tracking objective function we wish to minimize by ξ j track (u j k , Z j k ), u j k ∈ U j k if we were going to apply at time k the control action u j k and subsequently observe at the same time-step the measurement set Z j k . Note here that the received measurement set Z j k depends on u j k (i.e. the control action taken) by the agent j through Eqn. (3) -(4). Also observe that the objective function we wish to minimize depends on an unknown future measurement set, i.e. the measurement set Z j k will be received once the control u j k has been determined and applied. In other words, this objective function can be evaluated after the control action u j k has been taken, because only then the measurement Z j k becomes available.
A common approach [42], [43] around this problem is to take the statistical expectation of the objective function with respect to all possible values of measurements i.e. E[ξ j track (u j k , Z j k )] thus the optimization problem becomes u j k = arg min E ξ j track (u j k , Z j k ) . However, computing this expectation is computationally intensive because it requires the generation of an ensemble of measurement sets (pseudomeasurements) Z j k for each hypothesized control action. An alternative and computationally cheaper approach which we adopt in the paper uses the predicted ideal measurement set (PIMS) [44] i.e. the noise-free clutter-free measurement set which is most likely to be obtained when a particular control action is applied. We denote the PIMS measurement set as Z j k|k−1 and thus the control problem now becomes: To optimize Eqn. (26) let the multi-Bernoulli distribution at time k − 1 be given by f j where now by using this notation we explicitly show its dependence on the control action that was taken for the time-step k − 1. At time k the agent has available a set of admissible control actions u j k ∈ U j k given by Eqn. (6), one of which should be selected in order to take the agent to the next state at time k. First we compute the multi-Bernoulli predictive density which is obtained without yet performing any control action. Then we apply a pre-estimation step where the number of targetŝ n j k|k−1 is first estimated from the predictive density by counting the Bernoulli components which exhibit a probability of existence r ji k|k−1 > 0.5, ∀i ∈ {1...v j k|k−1 }. The states of those targets are then extracted from their spatial distributions to obtainX j k|k−1 . Then for a given hypothesized control action u j k ∈ U j k the corresponding PIMS Z j k|k−1,u j k is generated as: Suppose now that for each pair (u j k , Z j k|k−1,u j k ) a pseudoupdate is performed using Eqn. (22) -(25) to compute the (pseudo) posterior densityf j k (X j k |Z j k|k−1,u j k , u j k ) from where the number of targets is estimated. We consider the variance of this estimate as a measure of uncertainty which should be minimized. More specifically letf where σ j k denotes the variance of the number of targets estimate. In essence, by minimizing the cardinality variance σ j k agent j chooses the control action which results in the best estimate regarding the number of targets. This variance is maximized when r ji k (u j k , Z j k|k−1,u j k ) = 0.5, ∀i thus we can define the normalized variance asσ j k = 4σ j k /v j k , where we have dropped the dependence on the control action and measurements for notational clarity.
Using the normalized variance we define the single agent objective function to be minimized as: where V cap is a known design constant which specifies the maximum number of targets that our agent is able to track at a given time which is referred to as the agent tracking capacity in this paper. The above objective function is bounded in the closed interval [0..1] and it reaches the minimum cost (i.e. best) when the agent reaches its maximum tracking capacity i.e.n j k = V cap with the maximum accuracy (i.e. σ j k = 0). On the other hand the agent receives the maximum cost when it tracks targets with the worst accuracy (i.e. σ j k = 1). It is not desirable to perform poorly in tracking mode thus the agent will try to find alternative controls for which the number of targets tracked is maximized and/or the variance is minimized. Additionally, an agent might even switch to search mode if this results in better collective search-and-track performance, rather than inaccurately tracking targets. It is worth noting that the agent will receive the maximum cost in the case where the number of targets being tracked is 0 i.e.n j k = 0. This behavior is by design since in this situation we wish to switch the agent to search mode. Finally we define the multi-agent objective function for the tracking task as the average of their individual objective costs: 1 where α ⊆ S denotes the set of agents in tracking mode. As before we wish to minimize the objective function of Eqn. (31). In other words given a set of agents α ⊆ S we would like to drive the agents to the states which result in the minimization of Eqn. (31) or the maximization of the tracking accuracy. Thus the agents have incentive to move towards these regions.

Multi-Agent Search Objective
In the previous sub-section we have defined the objective function for the task of tracking for single and multiple agents. In this section we are going to do the same for the task of searching. Let us define the search value at time k for specific location p = (p x , p y ) ∈ A for the agent j just after control action u j k ∈ U j k has been applied as: where the function p D (p, u j k ) is the agent's sensing model andp = [p x , 0, p y , 0] constructs a location vector compatible with Eqn. (3). The idea of defining the search value as the complement of the probability of detection reflects the fact that locations with low probability of detection, should have increased value for searching. As a consequence distant locations with respect to the agent's location appear to have increased search value i.e. these areas are worth exploring in order to find new targets. Assuming pairwise detection independence between all agents and all targets, the search value at location p when accounting for a set of agents α ⊆ S is given by: where u j k ∈ U j k . This is shown in Fig. (3). Finally, we define the total search value over the surveillance area, i.e. the multi-agent search objective function as: where A ⊂ R 2 and A is the total area of the 2D surveillance region. We should note here that the above coverage problem does not consider the current or future target locations whatsoever. However, it will be very interesting to investigate in the future a multi-agent coverage problem which takes into account locations with high probability of target appearance.

MATHEMATICAL PROGRAMMING REFORMULA-TION
To solve the proposed problem, we derive hereafter a mathematical program formulation (P 1 ) which takes into account the mode of operation and control actions in order to jointly optimize the search and track objective. Let j ∈ {1, . . . , |S|} represent all available agents and i ∈ {1, . . . , |U k |} the control actions for each agent. The following formulation considers all possible control actions i.e., u j k ∈ U j k , as described in subsection 3.3. The binary variable a ji ∈ {0, 1} identifies which control is taken by agent j when in search mode, and binary variable b ji ∈ {0, 1} indicates the control decision for agent j when in tracking mode. Using this notation, P 1 can be transformed to the following binary nonlinear program: The objective function, as expressed in eq. (35), is a convex sum (using scalar value w) of the search cost and the track cost achieved for a particular task assignment (i.e. either searching or tracking) and control decision (i.e. which control action to take next). The w value is a user-defined parameter representing the level of emphasis given to the task of searching versus the task of tracking. The linear equality constraints in eq. (36) ensure that each agent is only in the searching or tracking mode at any instance in time and only a single control decision in made in that time instance. Finally, eq. (37) ensures that all agents are a minimum distance d min apart at any point in time which is achieved by computing the distance for any control points (i.e., u ji k ) multiplied by the decision made in any of the 2 modes of operation (searching or tracking) indicated by variables a ji , b ji using indicators λ 1 , λ 2 . Specifically, variables λ 1 , λ 2 check every possible combination of a ji and b ji values against all other a j i and b j i values to ensure that agents in either search or track mode are not closer that a minimum distance d min from each other.
Similarly to (P 1 ), the searching function can be expressed as follows: which is a product of the probabilities that no agent detects a target at a specific locationp in the field. These probabilities are then integrated over the whole field A to compute the total search value of the particular combination of agents and specific control decisions. As shown in Eqn. (39) this search value is normalized by 1 A where A is the maximum value attained when all probabilities equal to 0. The tracking accuracy can be expressed as follows: Generate a candidate GA population of hypothesized control actions u ji k ∈ U j k , ∀j using Eqn. (6) with constraints (36) -(37). 6: According to the generated hypothetical controls generate PIMS as Z j Perform the pseudo-update step using PIMS to calculate the posterior multi-object densityf j k , ∀j using Eqn. (21)-(25). 8: Let Λ iter = wξ search (a) + (1 − w)ξ track (b) according to (35) 9: Evaluate Λ iter and select the fittest solutions according to (36) - (38) 10: until (iter < iter max ) and (Λ iter − (Λ iter − 1) < ) 11: return optimal controls {u j * k }, ∀j ∈ S 12: Apply optimal control action u j * k . 13: Receive the actual measurement set Z j k . 14: Compute posterior multi-object density f j k using Eqn. Clearly, ξ search (a) jointly computes the search cost for a subset of agents being in search mode by the multiplication shown in Eqn. (39) that produce non-linear outputs. Coupled with the binary variables of the discrete control decisions results to an NP-hard optimization problem that is difficult to solve exactly in practice [45]. Nevertheless, for the particular structure of (P 2 ) and the binary encoding of the decision variables, a genetic algorithm (GA) can be applied to provide adequate solutions in practice. GA is a search-based technique that follows natural selection principles of solutions found to be relatively good [46]. To do that, a population of candidate solutions are generated over the search space and evaluated using the objective function of the problem (fitness function). The fittest solutions are then recombined and mutated to generate new populations for the next iteration. The process terminates when specific conditions are met, including the objective function improvement over consecutive iterations or at a maximum number of iterations.
A genetic algorithm is used to solve (P 2 ) with the follow-ing key elements: 1) Binary decision variables {a ji , b ji }, ∀j ∈ S, i ∈ {1, . . . , |U k |} 2) The fitness function is reflected by the objective function in (P 2), i.e., Eqn. (35), and so are the linear equality constrains in Eqn. (36). 3) Control decisions that result to agents coming at a distance closer to d min are prune to satisfy constraint Eqn. (37). 4) Candidate solutions are generated around controls of the same mode of each agent. 5) The algorithm terminates after a maximum number of iterations has been reached or when the fitness function did not improve more than .
The complete algorithm of the proposed system is shown in Algorithm (1). The algorithm implements the recursion which is shown in Fig. (2) and discussed throughout this paper.

Implementation Details
In order to evaluate the performance of the proposed approach we have conducted several numerical experiments involving various number of agents and targets for a variety of scenarios. In each experiment we compare the proposed approach with the ground-truth either qualitatively or quantitatively. In this section we include an extensive set of the conducted numerical simulations. More specifically, we assume that the targets maneuver in an area of 500m × 500m and that the single target state at time k is described by x k = [p x ,ṗ x , p y ,ṗ y ] k where (p x , p y ) give the position of the target in Cartesian coordinates and (ṗ x ,ṗ y ) are the velocities of the target in the x and y direction respectively. For the target dynamics in Eqn. (1), we consider that each target is moving according to the near constant velocity model with the process noise being Gaussian. Thus the single target transitional density is given by π k|k−1 (x k |x k−1 ) = N (x k ; F x k−1 , Q) where: with sampling interval T = 1s. Note that we have used a linear motion model in our evaluation for the target dynamics and so Eqn.
The target survival probability from time k − 1 to time k is constant and does not depend on the target's state i.e. p S (x k−1 ) = 0.99.
Once the agent detects a target it receives bearing and range measurements thus the measurement model of Eqn. (4) is given by: where H = 1 0 0 0 0 0 1 0 . The single target likelihood function is then given by g k (z k |x k , s k ) = N (z k ; h k (x k , s k ), Σ Σ) and sigma is defined as Σ = diag(σ φ , σ ζ ). The standard deviations (σ φ , σ ζ ) are range dependent and given by: with φ 0 = 2π/180 rad, β φ = 10 −5 rad/m, ζ 0 = 1 m, and β ζ = 5 × 10 −5 m −1 . False alarm measurements (i.e. clutter) are generated with a Poisson rate λ k = 10 uniformly distributed over the measurement space. The agent's sensing model parameters take the following values p max D = 0.99, η = 23 × 10 −4 and R 0 = 30m. The agent's dynamical model is shown in Fig. 1 where the radial displacement ∆ R = 5m, N R = 2 and N θ = 8 which gives a total of 17 control actions, including the initial position of the agent. The tuning parameter w which controls the weight given to searching and tracking is set to 0.5 unless otherwise specified. Finally, the parameter V cap is set depending on the simulation scenario to the maximum number of targets that we wish a single agent to be able to track. The genetic algorithm was implemented using the ga Matlab function. We should note here that any optimization/meta-heuristic method which can handle the binary program of Eqn. (35) with integer constraints, can be used in place of the genetic algorithm. However the genetic algorithm fits best to the structure of our problem. Finally, in order to handle the non-linear measurement model we have implemented a Sequential Monte Carlo (SMC) version [34], [47] of the multi-Bernoulli filter. When the target dynamics and the measurement model are linear and the probability of detecting targets is state independent the Gaussian version of the multi-Bernoulli filter provides a computationally more efficient solution [34]. Table 1 shows a list of variables and common values used in the experimental evaluation of the proposed approach.

Results
The first set of experiments is conducted in order to investigate the proposed system on searching. More specifically the desired behavior is for the agents to jointly find the best configuration i.e. select the appropriate controls such that the total search value of the surveillance area is minimized or the area coverage is maximized. Our experimental setup is as follows. For a given number of agents we have conducted 50 Monte-Carlo (MC) trials where the agents are spawned at random locations (uniformly distributed) inside the surveillance area. Then we let our system to run for 50 time-steps and we monitor the total search value given by Eqn. (34). Our intuition is that the total search value should decrease over time since the agents would try to move to locations where the overlap of their sensing ranges, is minimized. In other words, we would like to have the agents spread as much as possible inside the surveillance region to increase coverage. Figure 4 shows the average total search value for each time-step of this experiment for 2, 3, 4, and 5 agents. As we can see from this figure, the total search value is decreasing over time, which indicates that the agents are trying to cover as much area as possible. It is also clear that as the number of deployed agents inside the surveillance region increases, the covered area increases as well.
In order to gain more insights into the behavior of the agents during searching, we have being monitoring the control actions assigned to them. At the end of the experiment,     Fig. 5. Interestingly, we can see that the agents take the appropriate formation which results in maximum coverage.
For instance from Fig. 5a, we see that when the number of agents is 2 the optimal configuration would be to take the diagonal of the surveillance area. On the other hand when we have 3 and 4 agents, the optimal configuration forms a triangle and a square respectively, whereas in the case of 5 agents, the last agent moves into the middle of the area to fill the gap. The previous experiments showed that the solution to the problem of Eqn. (35) resulted in optimal coverage. In other words, depending on the number of agents, the obtained solution is a configuration of agent positions which maximizes the area coverage or minimizes the total search value.
We should note here that once the agents obtain a configuration which provides the optimal coverage (Fig. 5), they lock in that configuration (i.e. there is no movement) until something is changed (e.g. a target appears). This is quite reasonable since the search function of Eqn. (39) does not provide incentive for any movements once the optimal coverage is obtained. However, in certain situations it would be desirable to have the agents move to areas that have not been visited for some time and temporarily take a configuration which does not provide the optimal coverage.
In order to achieve this we extend our proposed approach to include the notion of memory as follows: The total search value of Eqn. (39) now becomes the instantaneous total search value at time k. The new total search value is calculated as the weighted moving average of the κ previous instantaneous total search values where κ is the memory of our system. Figure 6 shows the memory-based searching behavior of our system for the case of 4 agents and κ = 30. More specifically, the figure shows the behavior of 4 agents in memory-based searching for 150 time-steps. In comparison with Fig. 5c we observe that the agents in this experiment are continuously moving in the surveillance area as shown in Fig. 6a. As before the 4 agents at time k = 30 take a square formation which gives the optimal coverage. However, since the system now takes into account the 30 previous time-steps, at time k = 40, 50, ... we observe that a different formation has been taken. This is because the system remembers what areas have been visited and pushes the agents to new locations. At k = 80 the agents move again to form a square formation as shown in the figure. Fig. 6b shows the evolution of the total memorybased search cost for this experiment. As we can observe there is an evident periodicity in this graph which is due to the memory effect.
Next, in order to verify the effectiveness of the proposed approach on the task of joint searching and tracking, we have conducted several experiments with different numbers of targets and agents. In this section, we will give a representative example with 3 agents and 3 targets. This scenario is depicted in Fig. 7 during 3 different timewindows k = 1..26, k = 27..86 and k = 87..130. More specifically, 3 agents enter the surveillance area of size 500m × 500m at the locations marked with asterisks in Fig. 7a [27,489] and [16,492]. In Fig. 7  The agents enter the surveillance area in search mode at k = 1 and at each time-step the proposed controller decides which mode (i.e. search or track) to assign to each agent. As we have already discussed each agent maintains a probability distribution regarding the number of targets and their location. The controller monitors the cost of tracking and the cost of searching and decides what are the optimal controls for all agents which minimize the objective function of Eqn. (35). To summarize the cost of tracking takes into account whether or not targets exist in the area, the uncertainty of the estimated number of targets in the area and finally how many and how well targets are being tracked by each agent.
On the other hand, the cost of searching takes into account how well the area is covered by the agents. Figure 7a shows the trajectories of all agents during period k = 1..26. During this period no targets exist in the area and so the controller decides to assign all agents in search mode. The objective now is to increase coverage and so the agents move away from each other, trying to find the locations where the coverage is maximized. Figure 7d shows the search value for each location in the surveillance area for k = 26, i.e. at the end of the period. As we can observe the agents take a triangle formation at the right locations which results in the maximum coverage. To understand better what is happening inside the controller, we have recorded the raw values of the search and track objective functions of Eqn. (39) and Eqn. (40) respectively for the whole experiment. This is shown in Fig. 8. The first point to note here is that the search cost is decreasing during time-steps k = 1..26, which indicates that the coverage is increasing. On the other hand, we can observe that the tracking cost during the same period is maximum, since the number of estimated targets in the area is 0. Thus the objective function of Eqn. (35) is minimized when all agents are in search mode. Now, at time k = 27 target 1 appears in the area as shown in Fig. 7b which is then being detected by agent 3 at the next time-step k = 28. During time-steps k = 28..35 agent 3 moves towards target 1 as shown in the figure and then for the rest of the period k = 36..86, the same agent tracks target 1. It is also worth noting that during this whole period, agents 1 and 2 are in search mode, and are moving to locations which will result in optimal coverage. Figure 7e shows the search configuration of the two agents in search mode and the resulting search values for the whole area. Figure 8 sheds some more light into this process. During time-steps k = 28..35 we observe that the tracking cost is decreasing and afterwards during k = 36..86 to be reaching a plateau. The tracking cost is decreasing during k = 36..86 (i.e., the tracking accuracy is increasing) since the agent comes closer and closer to the target which allows the agent to perform better tracking. This is because the sensing and the measurement model depend on the distance of the agent to the targets. For distant targets, the uncertainty is higher thus the tracking accuracy lower. Once the target is locked in, the cost of tracking reaches a plateau. On the other hand, the search cost increases instantly since only 2 agents (out of 3) are left in search mode, and thus the coverage has been decreased. There is however a small improvement in the search cost during this period as the agents take a formation which results in better coverage.
Target 1 dies at time-step k = 86, which drops the number of estimated targets, of agent 3, from 1 to 0 at k = 87. This increases the tracking cost to the maximum since no targets exist inside the surveillance area and as a  consequence the controller assigns all agents to search mode. This is shown in Fig. 8 for k = 87..107. At k = 107 the agents take a formation similar to Fig. 7d. During period k = 107..130 two targets appear as shown in Fig. 7c which are being detected and tracked by agent 1. The tracking cost in Fig. 8 for this period reaches a new minimum since now two targets are being tracked and so the value of tracking over searching is increased. Agents 2 and 3 detect no target and remain in search mode moving towards locations which result in maximum coverage as shown in Fig. 7f. Finally, during period k = 131..180, no targets are being detected, thus the controller assigns all agents to search mode. The tracking cost reaches the maximum and the agents optimize the coverage as shown in Fig. 8 Fig. 9, values of w that are close to 0 force all agents to switch to search mode whereas values of w very close to 1 have the opposite effect i.e. all agents are forced into track mode. This is a direct consequence of Eqn. (35). Interestingly, in Scenario 1 the target located in [350, 50] is far away from all the agents and thus for moderate values of w close to 0.5 the system chooses to have all agents in search mode. In particular, in this scenario the target is detected by agent 2 however, the normalized tracking varianceσ of this agent is as high as 0.9 which results in very high tracking cost. This makes the system to prefer to have the particular agent in searching mode rather than in tracking mode for w ≤ 0.83. In scenario 2 however, agent 2 is tracking the target located at [260, 140] very accurately withσ = 0.05. As  a consequence the system assigns agent 2 in tracking mode for w ≥ 0.3 as is shown in Fig. 9. In this scenario however we can still force all agents in searching or tracking mode with the appropriate values for w. Dynamically adjusting w depending on a given situation is something interesting which we will investigate in a future work.
Moreover, we investigate the performance of the proposed approach with respect to the number of agents and the number of targets. In order to quantify the performance of the proposed approach we use the optimal sub-pattern assignment (OSPA) metric [48] which is widely used to measure the accuracy of multi-object filters. Our first experiment is conducted as follows: We randomly generate 10 targets inside the surveillance area and we use OSPA to quantify the effectiveness of the proposed search-andtrack approach with 2, 3, 4 and 5 agents. All agents are spawned from the center of the surveillance area at the start of each trial. This is done in order to simultaneously test the performance of search (i.e. coverage) and track. Intuitively, we would like to see the agents to start acquiring a configuration which results in maximum coverage and while doing so increasing their probability of finding and tracking the scattered targets in the surveillance region. Figure 10 shows the average OSPA error (with parameters c = 100 and p = 2) over 30 Monte Carlo trials for 100 time-steps for different number of agents. As we can see in all scenarios the tracking error starts high but drops significantly as time progresses. The agents start from the center of the surveillance area and jointly perform searching by covering as much surveillance region as possible. This is evident by the high OSPA error in the beginning of the experiment. In particular, we observe in the case of 2 agents that no targets are detected between time-steps 0 and 17. During that time the agents maximize coverage (i.e. see Fig. 5a). Around time-step 16 the OSPA error begins to drop which indicates that targets are being tracked. Between time-steps 20 to 40 the 2-agent system reaches its tracking capacity and achieves its best performance. Between timesteps 40-70 we can see that the error has been increased. We have observed that when this happens the agents are tracking single targets due to targets splits and target deaths which increases the tracking error (while elsewhere other targets might have been born which have not been detected by the already occupied agents). However, when no targets are being tracked, or the targets tracked are far away, the agents switch to searching in order to optimize coverage. This results in a decrease in the tracking error between time-steps 70 to 100. Similar behavior is also true for 3, 4 and 5 agents. As the number of agents increases the time to detect targets decreases as is shown in the figure. Finally, we observe that the tracking capacity of the system increases with the number of the agents, as a consequence the tracking error decreases as the number of agents increases.
In the next experiment we investigate how the number of targets affect the performance of our system. More specifically, we have fixed the number of agents to 2 and we varied the number of targets tracked by each agent. Figure 11 shows the OSPA error when 2 agents track 3, 5 and 7 targets each for 100 time-steps. We should note here that in this experiment, the targets per agent (i.e. 3, 5 and 7) are initialized inside the corresponding agent's sensing range. Additionally the targets per agent move towards the same destination and without target splits occurring for the duration of the experiment. This is done here in order to investigate the ability of an agent to track 3, 5 and 7 targets. The figure shows the average error over 30 Monte Carlo trials. As we can observe, the tracking accuracy drops as the number of targets per agent increases. However, this experiment shows that the proposed approach is able to keep track multiple targets per agent. In particular, 3 and 5 targets per agent can be tracked with relatively high accuracy in this scenario.
Up to this point, all of our experiments were based on the sensing model of Eqn. (3). In this model the parameter p max D determines the maximum detection probability of an agent with its value set to 0.99. In order to investigate the impact of this parameter to the tracking accuracy of our system, 30 Monte-Carlo trials have been conducted of length 100 iterations, for 3 agents and 3 targets with varying values of p max D . More specifically, the experimental set-up is as follows: for each Monte-Carlo run at time-step k = 0, three agents are randomly generated inside the surveillance area. Then for each agent at k = 0, a single target is spawned inside its sensing range. We leave the system to run for 100 time-steps and we measure the OSPA error. Figure 12 shows the average OSPA error over 30 trials for p max D = 0.99, p max D = 0.8 and p max D = 0.7. As we can observe the best accuracy is achieved with a p max D value of 0.99. On the other hand for p max D = 0.8 and p max D = 0.7 we observe significant OSPA oscillations which occur due to target missdetections. More specifically, we have observed that a target miss-detection especially in the area beyond the agent's primary radius R 0 (where the probability of detection drops below p max D ) results in a) increased positioning error and b) the agent to take incorrect control action and/or switch mode (from tracking to searching) in subsequent time-steps which causes cardinality errors. Finally, note that the multi-Bernoulli filter (i.e. Eqn. (19)-(25)) is best suited for situations where the signal to noise ratio is high i.e. high probability of detection and the clutter rate is low. These problems can be alleviated with the more accurate labeled multi-Bernoulli (LMB) filter [38]. The LMB filter however, is computationally more expensive.
Finally, we conclude the evaluation of the proposed approach by investigating some of its limitations. For this purpose we use the following setup: We generate 5 40] for targets 1, 2, 3, 4 and 5, respectively. The birth/death times for the 5 targets are k = 1/100, k = 1/100, k = 50/100, k = 55/100 and k = 60/100. Additionally at k = 0 the position of two agents are [100, 298] and [300, 248] for agents 1 and 2 respectively. This set-up is shown in Fig. 13. At k = 1 both agents detect their nearby targets (target 1 and 2) and switch to track mode. The figure shows with blue and orange squares the estimated target positions. Initially the estimated target states exhibit large positioning errors however, as the figure shows, in subsequent time-steps the estimation error improves as the agents have locked-in the targets. The agent trajectories are shown with black lines. We should note here that in this scenario we use the linear target dynamics described in the beginning of this section to track the non-linear motion of target 1 and target 2. It is evident from Fig. 13 that during time-steps 20 to 50 the OSPA error exhibits high oscillations which is due to the non-linear target behavior during that time-window. This is also shown by the high variance in the estimated target positions (blue and orange squares). At time-steps 50, 55 and 60 targets 3, 4 and 5 appear in the surveillance region following the trajectories shown in the figure. These targets, however are not detected by any of the agents since both agents are occupied tracking target 1 and 2. As we can observe the OSPA error increases dramatically from k = 50 onwards. This example shows the main limitation of the proposed system i.e. agents remain in a particular mode indefinitely. One way to approach this problem is by dynamically controlling the w parameter and periodically forcing an agent to enter either the search or track mode depending on the situation. Another way will be to keep track of the visited areas in the surveillance region and store statistics regarding the time of visit, the number of targets detected at each area, etc. and then perform a priority-based search in order to maximize the number of detected targets. These ideas will be explored in future works.

CONCLUSION AND FUTURE WORK
In this paper we have studied the problem of jointly optimized searching and tracking with multiple agents in stochastic environments. We have presented a novel unified probabilistic framework for this decision and control problem based on Bayesian multi-object stochastic filtering with random finite sets. We have defined suitable objective functions for the tasks of multi-agent searching and tracking and we have showed that the resulting non-linear binary program can be approximated adequately by a genetic algorithm. Finally, we have demonstrated the performance of the proposed decision and control algorithm through extensive numerical experimentation. Future work will look at simpler and computationally improved approximations of the non-linear binary program as well as alternative and simpler objective functions with equivalent behavior and performance. In addition, we are interested in investigating a rolling horizon (i.e. multiple-step look ahead) predictive control approach for this problem and analyzing its performance against the proposed single-step look ahead system. We are also interested in investigating this problem with heterogeneous agents i.e. agents which exhibit different types of sensing models.