Downing a Rogue Drone with a Team of Aerial Radio Signal Jammers

This work proposes a novel distributed control framework in which a team of pursuer agents equipped with a radio jamming device cooperate in order to track and radio-jam a rogue target in 3D space, with the ultimate purpose of disrupting its communication and navigation circuitry. The target evolves in 3D space according to a stochastic dynamical model and it can appear and disappear from the surveillance area at random times. The pursuer agents cooperate in order to estimate the probability of target existence and its spatial density from a set of noisy measurements in the presence of clutter. Additionally, the proposed control framework allows a team of pursuer agents to optimally choose their radio transmission levels and their mobility control actions in order to ensure uninterrupted radio jamming to the target, as well as to avoid the jamming interference among the team of pursuer agents. Extensive simulation analysis of the system's performance validates the applicability of the proposed approach.


I. INTRODUCTION
Drones which have nowadays become the new emerging trend, are currently being utilized in many applications ranging from aerial photography and critical infrastructure inspection to emergency-response missions and aerial monitoring tasks. Unfortunately, drones have also become a threat and a risk to public safety. In particular, numerous times drones have threatened public safety by targeting airports and restricted airspaces [1], attacking critical infrastructures [2] and endangering people's lives [3]. For this reason there is a necessity for counter-drone systems that can detect, track, and interdict rogue or malicious drones [4]. Although, counter-drone approaches and systems have already been proposed in the literature [5]- [7], these are still in their infancy and considerably more work is needed in order for this technology to reach the required level of maturity.
In this work, a distributed multi-agent control framework is proposed, in which a team of pursuer agents (i.e., pursuer drones) cooperate in order to continuously track and radiojam a target (i.e., rogue drone) in 3D space (as depicted in Fig. 1), disrupting its communication and sensing circuitry and thus ultimately forcing it to execute its fail-safe protocols [8] i.e., auto-landing. The assumption is that the pursuer agents are equipped with a 3D range-finding active sensor [9] with limited sensing range, which they use to obtain noisy target measurements (i.e., radial distance, azimuth  angle, and inclination angle) in the presence of false-alarm measurements (or clutter). In this work it is assumed that the target detection process is uncertain and that the target can appear and disappear within the surveillance area at random times (i.e., the target can spawn from anywhere inside the surveillance area and additionally during tracking it can move behind obstacles or other occlusions which results in its disappearance from the field of view of the pursuer agents). Finally, it is assumed that the pursuer agents have the ability to transmit power to the target via their on-board active sensor, at discrete power-levels, with the ultimate purpose of radio-jamming its circuitry. The main contributions of this work are two-fold: • A novel distributed control framework is proposed for the problem of target tracking and target radio-jamming in 3D by a team of cooperative mobile agents. The proposed approach allows the team of pursuer agents to optimally (a) choose their mobility control actions that result in accurate target tracking and (b) choose their transmit power levels to cause uninterrupted target radio jamming. The agents cooperate for improving the target tracking-and-jamming performance of the team while minimizing the jamming interference amongst them. • In the scenario considered, the target can exist in one of two states i.e., present or absent. Thus in order to be jammed, its existence probability along with its spatial density must be estimated from a sequence of noisy measurements in the presence of clutter, by a team of mobile agents equipped with conic directional antennas with limited sensing range. The rest of the paper is organized as follows. Section II discusses the related work. Section III formulates the problem and illustrates the proposed system architecture. Section IV develops the system model and Section V discuses the details of the proposed approach. Finally, Section VI conducts an extensive performance evaluation and Section VII concludes the paper and discusses future directions.

II. RELATED WORK
A recent survey on detection, tracking, and interdiction techniques for malicious drones or unmanned aerial vehicles (UAVs) can be found in [5], where the authors discuss various drone threats and review several counter-measures that have been investigated in the literature. In [7] the authors present a detailed survey on the state-of-the-art anti-drone systems and discuss techniques and technologies used for drone surveillance. The work in [10] discusses the advantages and disadvantages of various drone detection methods including audio-visual, thermal, and RF and proposes a low-cost stationary system for drone detection and tracking which can be easily incorporated into third-party anti-drone platforms. The authors in [11] focus on the problem of detection and tracking of small and fast moving drones with the ultimate goal of developing a system that can be used to prevent such small drones from accessing restricted areas and facilities.
Moreover, in [12]- [16], the problems of formation control, target tracking, and target interception with multiple agents are investigated but without considering jamming capabilities for the pursuers. The work in [17] develops a UAV based solution for localizing a GPS jammer and the work in [18] proposes a low-cost ground jamming system to counteract the operation of small drones. In [19] a team of defense agents forms a cluster around a malicious target in order to prevent it from entering restricted airspaces. The authors assume however the availability of a tracking system for detecting and tracking the target. Finally, in [20], [21] the problem of jamming a rogue drone with a team of mobile agents is investigated, however without considering target appearance/disappearance events.
Compared to the existing techniques, the proposed framework develops a novel cooperative technique for the problem of target tracking and target radio-jamming in 3D by a team of mobile agents, while considering the induced jamming interference amongst the team. The problem is investigated in the presence of target detection uncertainty, clutter, and target appearance/disappearance events which to the best of our knowledge has not been investigated before.

III. SYSTEM OVERVIEW
The problem tackled in this work can be stated as follows: At each time-step t each agent j ∈ [1, .., N ] must decide its mobility control action u j t ∈ U j t and transmit powerlevel j t ∈ L j t that results in accurate target tracking and uninterrupted target radio-jamming. At each time-step the agents cooperate to maximize the joint tracking and radiojamming performance of the system, while avoiding the induced radio-jamming interference amongst them. Figure 2 illustrates the proposed system architecture. Each agent j ∈ [1, .., N ] uses stochastic filtering in order to estimate at each time-step t: (a) the probability mass function Tracking-and-Jamming Controller of the target existence p j t ( j t |Z 1:t ) given target measurements up to time t, Z j 1:t and (b) the filtering distribution f j t (x t |Z 1:t ) of the target state x j t . This is illustrated by the probabilistic graphical model in Fig. 2, where the target state x j t and the target existence j t are the random variables of an unobserved Markov process with observations z j t (i.e., target measurement). In the problem considered, the target measurement z j t received by agent j depends on the agent's applied mobility control action u j t as shown in the figure. As a result, to optimally estimate the state of the target, the agent must make optimized mobility control decisions which is achieved by the proposed controller. The objective of the tracking-andjamming controller is not only to find the optimal mobility control actions u j t but also the optimal transmit power-level j t for agent j such that the received power at the rogue drone is maximized in order to achieve uninterrupted radio jamming. The agents cooperate by exchanging information (i.e., their current state and target estimated state) in order to optimize the target radio-jamming and avoid the jamming interference amongst them. This work does not consider a target that has the potential to jam the pursuer agents. Moreover, it is assumed that each agent is able to identify with absolute certainty if another entity is a pursuer agent, i.e., an agent cannot be mistaken for a target by another pursuer agent.

A. Target Dynamics
This work considers one target (e.g., a rogue drone) which can appear/disappear at random times anywhere inside the surveillance area and thus at any time-step the target can exist in one of two states i.e., present or absent. When the target is present, it moves in 3D space according to a stochastic dynamical model. More specifically, the state of the target at time t is modeled as a Bernoulli random finite set (RFS) [22], [23] , where e t = p t ( t = 1|Z 1:t ) is the probability of target existence given measurements Z 1:t up to time t and s t = f t (x t |Z 1:t ) is the spatial density of the target with state x t . The dynamics of the target are modeled as a Bernoulli Markov Process [25] with transitional RFS density ψ t|t−1 (X t |X t−1 ) given by: where p b denotes the probability of target birth, b t (x) is the birth density (uniform inside the surveillance area), p s is the probability that the target survives to the next time-step, and π t|t−1 (x t |x t−1 ) is the target transitional density which in this work it is assumed to be governed by the following discrete-time dynamical model: where x t = [x, y, z,ẋ,ẏ,ż] t ∈ X denotes the target state at time t which consists of the position and velocity components in 3D Cartesian coordinates and ν t ∼ N (0, Σ v ) denotes the perturbing acceleration noise which is drawn from a zero mean multivariate normal distribution with covariance matrix Σ v . The matrices Φ and Γ are defined as: where ∆T is the sampling period and I 3 , 0 3 are the 3 × 3 identity and zero matrix, respectively.

B. Pursuer Agent Kinematics
A team of N autonomous pursuer agents (e.g., UAV agents) operate inside the surveillance area, each of which is subject to the following kinematic model: denotes the state (i.e., position) of pursuer agent j at time t, ∆ R is a vector of possible radial step sizes, ∆ φ = π/N φ , ∆ θ = 2π/N θ , and the parameters (|∆ R |, N φ , N θ ) determine the number of possible mobility control actions. The set of all admissible control actions of agent j at time t is denoted as . Although in this work a simplified kinematic model for the agents is utilized in order to demonstrate the proposed approach, depending on the application scenario more realistic kinematic/dynamic models can also be incorporated.

C. Agent Sensing and Jamming Model
The pursuer agents are equipped with an onboard directional active 3D range-finding radio which they use in order to detect the target, acquire target measurements, and transmit power to the target in order to radio-jam its communications circuitry. The range-finding characteristics are as follows: 1) Sensing Profile: The radio's sensing profile S in 3D is modeled as a circular right angle cone with Cartesian coordinates (x, y, z) given by: , h a characterizes the effective sensing range, r = tan( θa 2 )h a is the base radius of the cone, and θ a is the opening angle of the cone. Thus, a target with position coordinates Hx t = [x, y, z] t (where H is a matrix which extracts the position coordinates from state x t ) resides inside agent's j sensing range when Hx t ∈ S j . It should be mentioned that the direction of the agent's sensor is given at each time-step by the direction of the vector where x t and u j t are the positions of the target and agent j, respectively.
2) Measurement Model: Each agent j uses its radio received signal to acquire 3D target measurements z j t = [ρ, θ, φ] t ∈ Z (i.e., radial distance ρ, azimuth angle θ, and inclination angle φ) according to the following measurement model: is zero mean Gaussian measurement noise with covariance matrix Σ w . Due to sensing imperfections, at time t, agent j also receives false-alarm measurements or clutter {c j1 t , . . . , c jn t } (in addition to the target measurement) with a Poisson rate of λ c (i.e., E(n) = λ c ), which are uniformly distributed (i.e., with clutter density denoted as p c (c t )) inside the measurement space. To summarize, at each time-step, agent j receives a set Y j t of measurements which is given by: Thus, agent j can receive at time-step t, zero or one target measurements z j t and a set of false-alarms measurements. 3) Target Detection and Jamming: The agents use their onboard radio to detect and jam the target. A target detection occurs, with certain probability, when the target is illuminated (i.e., when the agent transmits power). Hence, it is important to note here that the same transmit signals are used to detect the target and at the same time jam it. The detection probability i.e., p j D = p j D (x t , u j t , l j t ), that agent j with state u j t and transmit power-level l j t detects a target with state x t is given by: where l j max is the maximum transmit power-level, η j t = Hx t − u j t 2 denotes the Euclidean distance between the agent and the target in 3D space, p max D denotes the maximum attainable detection probability of the sensor which can be obtained when a target resides within R 0 distance from the agent's position and the target also resides inside the agent's sensing profile S j , and n e is the path-loss exponent. Note that, p j D = 0, when x t / ∈ S j . Finally, the transmit powerlevel l j t takes its values from the discrete set of admissible transmit power-levels L j , i.e., l j t ∈ L j = {l j1 t , ..., l jn t } with l j max = max L j . Additionally, the received power at the target with location Hx t from an agent with state u j t which transmits at power-level l j t is given by the following path-loss model:

V. COOPERATIVE TRACKING AND JAMMING
The tracking-and-jamming control module depicted in Fig.  2 seeks to find the joint mobility and power-level control actions that result in uninterrupted target tracking and target radio-jamming. This section discusses the details of the proposed control approach.

A. Target State Estimation
Each agent j estimates the probability of target existence e j t and the target spatial density s j t by maintaining and propagating in time the multi-object probability distribution F j t (X j t |Y j 1:t ) of the RFS target state X j t given measurements Y j 1:t = {Y j 1 , .., Y j t }, using multi-object stochastic filtering [23] as shown below (as a note, the index on the agent is dropped for notational clarity): where ψ t|t−1 (X t |X t−1 ) is the target RFS transitional density as discussed in Section IV-A, φ t (Y t |X t ) is the RFS measurement likelihood function according to the measurement model discussed in Section IV-C, F t|t−1 (X t |Y 1:t−1 ) is the multi-object predictive density, and finally F t (X t |Y 1:t ) is the multi-object posterior filtering density. The recursion shown here is a generalization of the Bayes filter [26] on random finite sets [25]. The solution of the above recursion (i.e., see [24]) for the modeling assumptions discussed in Section IV allows each agent to compute recursively (over time) e t and s t via the Bernoulli filter recursion [24] as: for the prediction step and for the update step, where p D (x t , u t , l t ) is abbreviated as p D , q t|t−1 and q t are defined as ) −1 , and finally g t (y|x t , u t ) = N (y; h(x t , u t ), Σ w ) is the measurement likelihood function for the target generated measurement.
That said, each agent j computes at each time-step e j t and s j t using the recursion shown above. The final target state is then fused as follows: The target existence probability is first exchanged among all agents and the fused estimate is computed asê t = mean {e j t | j = 1, .., N }. Then, the set of agents that sense the presence of the target i.e.,Ñ = {j | e j t > 0.5} first extract the estimated target statex j t from s j t and its covariance matrix C j . Then [x j t , C j ], ∀j ∈ N is exchanged among allÑ agents and combined using covariance intersection [27], [28]. Finally, the fused results are communicated to all N agents which they use to sample from and update their filtering densities.

B. Single Agent Tracking-and-Jamming Control
To achieve tracking-and-jamming control first observe that the posterior target existence probability e t , the posterior filtering density s t , and the target received power R t are all influenced by the target detection events. Thus, optimizing u t and l t for maximum target detection performance, results in optimized tracking-and-jamming performance. Optimizing the target detection, results in more accurate estimation of e t and s t , from which the target state can be extracted and thus target radio jamming can be maximized. Hence, the optimal control actions (û j t ,l j t ) for agent j can be computed as: wherex t is the estimated state of the target. It should be noted, however, thatx t is not available until the control actionsû j t andl j t have already been chosen and applied. In order to bypass this problem, each agent j approximateŝ x j t ∼x j t as: where s j t|t−1 (x t ) is the predictive spatial density of the target and e j t|t−1 is the predicted probability of target existence.

C. Centralized Tracking-and-Jamming Control
To tackle the cooperative tracking-and-jamming control problem, the induced jamming interference among the agents must be kept below the critical value so that the agents can remain operational at all times. The joint target detection probability for which exactly m out of N agents jointly track-and-jam the target is given by: where p j D (x t , u j t , l j t ) is abbreviated as p j D for notational clarity, andx t = mean{x j t : ∀j} denotes the mean predicted target state from all agents j with e j t|t−1 > 0.5. The operator 1≤i1<i2<,...,<im≤N (.) computes the combinations N m among the agents. To ensure that at least n out of N agents effectively track-and-jam the target, while at the same time respect the interference constraints amongst them, the cooperative tracking-and-jamming control objective is defined as Ξ N n = N m=n ξ N m and the problem becomes: where δ i denotes the interference tolerance of agent i, i.e., the radio-jamming interference received by agent i from all other agents j should be less than δ i in order for agent i to remain operational.

D. Distributed Tracking-and-Jamming Control
The joint optimization problem of Eqs. (12a)-(12b) is a hard combinatorial problem which quickly becomes computationally intractable as the number of agents and the number of control actions increases.
For this reason, in order to tackle this problem and achieve the desired system behavior in real-time, a distributed suboptimal approach is proposed based on the greedy randomized adaptive search procedure or GRASP [29], [30]. In essence, GRASP is an iterative randomized sampling technique which operates in two steps. In the first step the algorithm constructs an initial greedy randomized solution which is further refined through local search in the second step. The algorithm alternates between these two steps, while keeping track of the best solution, until the stopping criteria are met (e.g., usually the number of iterations). The two steps of GRASP are implemented as follows: 1) Greedy Randomized Solution: An initial randomized solution is achieved via random sampling to find the joint control actions (û j t ,l j t ), ∀j with the following steps:

A. Simulation Setup
The simulation setup used to evaluate the performance of the proposed approach is as follows: The agents and the target maneuver inside a bounded 3D surveillance area of size 100m × 100m × 100m. The dynamics of the target are according to Eq. (1) with Σ v = 0.5I 3 m/s 2 and ∆T = 1s. The dynamics of the pursuer agent are according to Eq. (3) with ∆ R = [1, 3, 5]m, N φ = 8, and N θ = 8, unless otherwise specified. The pursuer agent detection and jamming model is according to Eq. (7) with p max D = 0.95, and transmit power levels L = [off, −50, −7, 0.5]dB, i.e., l j max = 0.5dB ∀j. The path-loss exponent n e = 2 and R 0 = 6m. The agent measurement model has Σ w = diag([0.8m, π/50rad, π/50rad]) and clutter rate λ c = 3. The conic sensing profile S of each agent has h a = 40m and θ a = 80deg and the interference value for each agent δ j is set at −40dB. The Bernoulli filter parameters are p b = 0.02, p s = 0.98 and the birth density b t (x) is uniform inside the surveillance space. Finally, the tracking-and-jamming objective function used in the experiments is given by Eq. (12a) as Ξ N 2 with N = 3 and the jamming constraints are given by Eq. (12b). To handle the non-linear measurement model, the Bernoulli filter was implemented as a particle filter [24]. Finally, in the implementation of the GRASP-based optimization n s = 10 4 samples and n I = 100 iterations were used.

B. Results
The first experiment, shown in Fig. 3, aims to demonstrate the overall behavior of the proposed approach. More specifically, in this simulated scenario which takes place over 30 time-steps, a team of N = 3 pursuer agents cooperate in order to track-and-jam the target. In particular, Fig. 3(a) shows the trajectories of the agents (i.e., blue, purple and orange) while tracking and jamming the target (i.e., ground truth is shown with black line, estimated is shown with cyan line). The agents are initially located at the (x, y, z) coordinates [30,30,30] , [20,10,10] , [10,15,15] and the target is spawned inside their sensing range with initial state [20, 20, 20, 1.5, 2, 1.7] . As shown in the figure, the agents maneuver around the target in such a way so that (a) the tracking-and-jamming performance is maximized while (b) the jamming interference amongst the team is kept below the critical value of -40dB. More specifically, Fig. 3(b) shows the transmit power level of each agent during this experiment and Fig. 3(c) shows the jamming agents (i.e., due to jamming interference). For instance, agent 1 is being jammed by agent 3 during time-steps 1 to 3 and then again during time-steps 16 and 17. Additionally, during time-steps 16 to 18 agent 1 is also being jammed by agent 2 as shown in the first row of Fig. 3(c). Then, Fig. 3(d) shows the target received power (i.e., red line) and the agent jamming interference in dB indicated by the colored circles. Figure  3(e) shows the estimated target existence probability and Fig.  3(f) shows the tracking error i.e., the optimal sub-pattern assignment (OSPA) metric [31] of order 2 with cut-off value of 10m i.e., OSPA(p=2,c=10). In this experiment the target is occluded between time-steps 14 to 17 and thus from the agents' perspective the target seems to disappear at time-step 14 and then to re-appear at time-step 18 as shown in Fig.  3(e). The agents try to estimate the target's appearance and disappearance events captured by the existence probability (i.e., red line) in Fig. 3(e).
From the results it is also observed that during the first few time-steps the agents cause interference to each other. For this reason, the agents transmit at lower power levels to accommodate this interference as shown in Figs. 3(b)-(c). Moreover, at time-step 16, agent 1 is being jammed by agent 2 and agent 3 as shown in Fig. 3(c). Subsequently, agents 2 and 3 switch off their antennas as shown in Fig. 3(b) in order to respect the interference limit of agent 1. Figure 3(d) verifies that at time-step 16 agent 1 does not receive any power from agents 2 and 3. Figure 3(g) shows the configuration of the agents' sensing profile at this particular time step and as it can be seen, agent 1 resides inside the jamming range of agents 2 and 3. Then, the spike in the tracking error at time-  step 15 is due to a cardinality estimation error, i.e., the agents believe that the target is present at time-step 15, however in reality the target at time-step 15 is absent. This error, however, is corrected in the next time-step. Finally, Fig. 3(h) shows the configuration of the agents at time-step 30 along with their antenna orientations. As illustrated, the agents take positions which avoid interferences with each other, while at the same time the positions taken by the agents maximize the target received power. The second experiment aims to investigate in more detail the impact of the various parameters on the behavior of the proposed approach. In particular, it investigates the impact of the opening angle θ a of the sensing profile S and the number of total mobility control actions on the performance of the system. Intuitively, it is expected that as θ a increases, the sensing range increases, and thus the number of times the agents interfere with each other increases as well. However, as the system's degrees of freedom, i.e., the number of admissible mobility control actions, increase, additional options are provided for the agents to try to avoid the interference.
To verify this assumption the parameter configurations shown in Table I were used. This table shows that for configurations 1-4 each agent has a total of 11 mobility control actions, whereas configurations 5-8 allow for a total of 79 mobility actions per agent. Moreover, the opening angle θ a takes increasing values from 60deg to 120deg. The experimental setup is as follows (with all the other parameters taking values according to Section VI-A): First the target is randomly spawned within the surveillance area. Then, the initial locations of 3 purser agents are randomly sampled, from within a sphere with radius 20m centered at the target location. The antennas of the agents point to the center of the sphere. For each configuration shown in Table I the system runs for 30 time-steps (i.e., one trial). This process is performed for each configuration 20 times and the results are averaged and shown in Fig. 4. More specifically, Fig. 4(a) shows the average jamming incidents per trial per agent for each configuration, Fig. 4(b) shows the average received power for the target and agents, Fig. 4(c) shows the average transmit power-level in Watts at each timestep for the 8 configurations, and finally Fig. 4(d) shows the average tracking error. It can be observed from these results that with 11 mobility control actions as the opening angle θ a becomes larger the number of times that the agents are jamming each other increases and so does the agent received power due to interference as shown in Figs. 4(a)-(b). This is due to the limited number of admissible control actions which prohibit the agents to acquire positions which are interference free. In addition, since the target detection is linked to the transmit power-level (as shown in Eq. (7)), the agents by turning off their antennas put themselves in a disadvantage regarding their tracking performance which can result in complete tracking failure. To overcome this issue, the agents try to transmit at low power-levels (i.e., Fig. 4(c)). This however, not only causes reduced tracking performance (i.e., Fig. 4(d)) but also reduced target jamming performance (i.e., Fig. 4(b)). On the other hand, once the system's degrees of freedom are increased, large opening angles can be handled more efficiently with improved jamming and tracking performance and reduced interference as illustrated in Fig. 4 for configurations 5-8. The last experiment aims to investigate how the interference constraints in Eq. (12b) affect the tracking and jamming performance of the system. Again, the same simulation setup discussed in the previous paragraph is followed, where a target is randomly spawned inside the surveillance area and then within a sphere around the target location 3 agents are spawned. The system runs for trials with duration 30 time-steps and evaluated with the interference constraints enabled and disabled. Twenty (20) trials are conducted for each case with the simulation parameters and values set according to Section VI-A. The average results are shown in Fig. 5. As expected, by disabling the interference constraints the number of jamming incidents per agent increases as shown in Fig. 5(a). As a result, the average received power per agent also increases as shown by Fig. 5(b) significantly above the critical value, and thus the system is driven into failure. On the other hand, it is shown in Fig. 5(c) that the target received power is higher with the interference constraints turned off. The results show that when the interference constraints are disabled the agents can get much closer to the target and to each other. As a result, the target receives much higher power compared to the power received with the interference constraints enabled. Similarly, with the interference constraints turned off, the agents optimize the joint target detection probability of Eq. (12a) without any constraints and this results in better tracking performance as shown in Fig. 5(d). However, from the aforementioned results it is clear that the interference constraints are vital for keeping the system operational at all times while achieving a satisfactory tracking and jamming performance as shown in Fig. 5.

VII. CONCLUSION
This work investigated the problem of cooperative target tracking and target radio-jamming with a team of pursuer agents. A novel distributed control framework is presented, in which a team of pursuer agents select their mobility and transmit power-level control actions that result in accurate target tracking and uninterrupted target radio-jamming, while avoiding the jamming interference amongst them. Future directions include a real-world implementation of the proposed system and its extension to multiple targets.