## Abstract

Information-theoretic motion planning and machine learning through Bayesian inference are exploited to localize and track a dynamic radio frequency (RF) emitter with unknown waveform (uncooperative target). A target-state estimator handles non-Gaussian distributions, while mutual information is utilized to coordinate the motion control of a network of mobile sensors (agents) to minimize measurement uncertainty. The mutual information is computed for pairs of sensors through a four-permutation-with-replacement process. The information surfaces are combined to create a composite map, which is then used by agents to plan their motion for more efficient and effective target estimation and tracking. Simulations and physical experiments involving micro-aerial vehicles with time difference of arrival (TDOA) measurements are performed to evaluate the performance of the algorithm. Results show that when two or three agents are used, the algorithm outperforms state-of-the-art methods. Results also show that for four or more agents, the performance is as competitive as an idealized static sensor network.

## 1 Introduction

The target localization and tracking problem involves estimating the current and future states of a target using measurement data and a measurement model [1,2]. Applications include space exploration [3], transportation [4], military-threat monitoring [5] (see Fig. 1), search and rescue [6], and finding gas leaks [7,8]. Recently, increased attention on the use of unmanned autonomous systems, such as unmanned aerial vehicles, has drawn more interest in the target localization and tracking process [9].

This paper develops and evaluates the performance of an information-based technique to coordinate a network of mobile sensors (agents) to efficiently localize and track an uncooperative, dynamic target. Utilizing machine learning through Bayesian inference in the form of a particle filter and leveraging state estimates, an information-theoretic motion planner automatically configures the sensor network to maximize information gain and thus minimizes measurement uncertainty. Few, if any, information-based motion planners have been developed for measured waveform parameters such as time difference of arrival (TDOA), frequency difference of arrival (FDOA), or differential received signal strength (DRSS). One most-related approach assumes that one of the two sensor locations is fixed [10]. The major challenge in 2D-location estimation using either TDOA or FDOA measurements is the sensor measurements are ambiguous (i.e., there is no unique answer); however, mobility provides more measurements with spatial variance to help eliminate incorrect estimates.

A novel approach is presented whereby the motion planner considers the two-locations-one-measurement case to calculate the mutual information (MI). Specifically, the MI is computed for pairs of sensors through a four-permutation-with-replacement process. The resulting MI surfaces are combined to create a composite map, which is then used to determine the optimal locations for any number of agents in a prescribed search space. The approach takes advantage of the geometry of the problem to reduce computation. Detailed simulation and experimental studies are performed to validate the method. The main contributions are:

Developing a mutual information-based algorithm that autonomously configures a team of mobile sensors for effective target estimation and localization;

Validating the algorithm through simulations for two to eight mobile robotic sensors;

Validating the algorithm through physical experiments with two and three micro-aerial vehicles with TDOA measurements tracking a moving mobile ground target; and

Quantifying the performance of the algorithms.

## 2 Prior Related Work

### 2.1 Cooperative and Uncooperative Targets.

Localization and tracking can be applied to two types of targets: cooperative [11] and uncooperative [12]. In the cooperative case, the target shares information with agents to aid state estimation. Examples of signals that are often used are time of arrival and received signal strength measurements, where the initial transmission time and the initial signal strength are shared with the agents, respectively. These measurements, combined with the information shared by the target, allow the target's distance to be determined. On the other hand, an uncooperative target does not share information with agent(s). In this case, the target may not be aware of an agent tracking its location. The location of uncooperative targets can be inferred by (1) reflecting a signal off the target to make the initial transmission time or signal strength available (such as in radar, sonar, and LiDAR) or (2) introducing another sensor to acquire additional information so that relative measurements can be made (such as using TDOA, FDOA, and DRSS). An additional measurement enables alternative means of calculating possible target positions and is described later in Sec. 3.2.

### 2.2 State-of-the-Art Approaches.

Algorithms have been developed for target localization and tracking using TDOA sensor measurements. Deterministic approaches that utilize explicit equations [13–15], least squares [16–18], Taylor series [19,20], and numerical algebraic geometry [21] have been investigated. These approaches work well when a sufficient number of TDOA measurements are available; however, when only one TDOA measurement is available, there is no unique solution. When there is a solution, it does not characterize the distribution of other likely target locations due to noise in the measurements.

Probabilistic methods include maximum likelihood estimators [22,23], genetic algorithms [24], Gaussian mixture models [25,26], bank of maximum a posteriori [27], Kalman filters [28–31], and particle filters [32–34]. Kalman [35] and particle filters [32] are particularly useful for recursive state estimation that leverage prior knowledge of states in combination with current state measurements. One advantage of particle filters is they can handle non-Gaussian distributions. Extended and unscented Kalman filters have been developed to expand the usefulness of the Kalman filter and to deal with nonlinearity [36,37]. Overall, these probabilistic solutions make available an idea of the distribution of possible target locations, such as the covariance matrix in Kalman filters and the particle distribution in particle filters. The information can be updated over time to gain a reliable estimate of the target position, regardless of the number of measurements.

#### 2.2.1 Static Sensor Networks.

In Ref. [38], a stationary distributed network of sensors uses TDOA (and FDOA) measurements with an extended Kalman filter to track a constant-velocity target. This is taken a step further for tracking a dynamically-moving target, where a stationary sensor network is used, with an extended Kalman filter embedded in an interactive multiple model architecture [31]. The stationary receiver case was expanded in Ref. [30], where TDOA and FDOA measurements were used with an extended Kalman filter to track a target traveling in a circular trajectory. In Ref. [39], a particle filter outperformed an extended Kalman filter in a static-sensor scenario with the target moving in a straight line. The effect of using more than just TDOA measurements was investigated in Ref. [40] for a target that moved dynamically in 3D. In practice, static sensor networks can be costly to implement and maintain. Their detection range is also confined to a certain area and they are only useful if the target ventures into the area covered by the network.

#### 2.2.2 Mobile-Sensor Networks.

A network of mobile TDOA sensors is described in Ref. [28], where extended and unscented Kalman filters localized a constant-velocity target. The movement of the two agents followed predetermined paths. In a similar problem, a Gaussian mixture model was included where the agents again traveled in predefined trajectories [26]. An unscented Kalman filter was considered for FDOA in addition to TDOA with a constant-velocity target and predetermined paths for the agents in Ref. [29]. Mobile-sensor networks are more flexible to implement and deploy, and they are not limited to tracking a target within an established area.

### 2.3 Advancements.

While probabilistic solutions can characterize the distribution of possible target locations and help inform future estimates, state-of-the-art target localization and tracking methods do not use this characterization of the possible target locations to guide the motion of or reconfigure the sensors. Instead, the sensor network remains static and is unable to perform as well when few sensors are available.

To advance the state-of-the-art, Bayesian estimation is combined with mutual information motion planning to optimally configure a network of mobile sensors. Specifically, MI is used to autonomously guide the motion of each mobile sensor to maximize information gain and minimize uncertainty for localization and tracking of an uncooperative radio frequency (RF) target. Recently, MI has been applied to problems such as search and rescue [41], emergency response [42], chemical-gas plume monitoring [8], and indoor target tracking [43]. The formulation described herein focuses on the use of TDOA measurements.

Computing MI is computationally demanding due to high-dimension integration. As the number of sensor nodes increases, the computation time increases exponentially. As such, prior MI work for mobile sensors often focuses on approximating the MI to reduce complexity and/or evaluating a single node and making pairwise comparisons [41]. The pairwise comparison process increases coordination between agents. In Ref. [44], the submodularity of MI is exploited. In Ref. [42], a decentralized method of calculating the MI is described. Each agent maintains its own belief of the environment and uses MI to position the sensor node. In Ref. [43], MI is approximated by reducing the number of particles. A simplified method is used to calculate the gradient of MI in Ref. [45]; however, it is best applied to a single sensor.

The use of MI to coordinate the motion of a network of TDOA sensors for uncooperative target localization and tracking has not been well studied. The most-related effort assumes one fixed node [10,46]. Other works have focused on the placement of TDOA sensors using the Fisher information matrix [47–49].

## 3 Technical Approach

### 3.1 Notation.

Lower-case letters are scalars, for example, $a\u2208R$, where $R$ is the set of real numbers. Bold lower-case letters are vectors, for example, $a\u2208Rn$, where $Rn$ represents the *n*th-dimensional vector space over the reals. The variables for continuous-time and discrete-time are $t\u2208R$ and $k\u2208Z+$, respectively, where $Z+$ is the set of positive integers. A continuous-time function is represented by $f(t):R\u2192R$ and the discrete-time function by $z[k]:Z+\u2192R$. The sample time is assumed constant and denoted by $\Delta t\u2208R$.

### 3.2 Time Difference of Arrival Measurement Model.

*x*

_{1}and

*y*

_{1}are the

*x*and

*y*coordinates of one agent, respectively;

*x*

_{2}and

*y*

_{2}are the

*x*and

*y*coordinates of the second agent, respectively; and

*x*and

_{T}*y*are the

_{T}*x*and

*y*coordinates of the target, respectively. The TDOA can then be calculated as

where *c* is the speed of light.

Figures 2(a) and 2(b) show examples of target location distributions for two TDOA measurements. The measurements produce a non-Gaussian distribution of the target location for a given measurement. Figures 2(c) and 2(d) show that increasing the number of agents can decrease ambiguity in the location of the target. Depending on the configuration of the agents and target, a network of three agents is sufficient for a unique solution. The explicit solution for the intersection point(s) is nonlinear and the topic of several research papers cited in Sec. 2.

### 3.3 Problem Statement and Hypothesis.

Given a network of mobile TDOA sensors (two or more), the goal is to maneuver the sensors to quickly and effectively localize an uncooperative target. An estimate of the target position is obtained using a machine learning process through Bayesian estimation in the form of a particle filter. The target states are estimated via a constant-velocity Kalman filter. Next, an MI-based motion planner [41] coordinates the motion of the sensors to reduce the uncertainty of the posterior and refine the target state estimation. This is done using a four-dimensional (4D) MI matrix which is computed for a specified grid space using a four-permutation-with-replacement process. The mutual information matrix is described in Sec. 3.7.

For this problem, the following assumptions are made:

Only one target exists;

The target altitude is known;

The maximum speed of the target is known;

The target moves around inside of a specified area (not necessarily the search area available to the agents);

Noise can be present in the TDOA measurement; and

Robot dynamics are ignored.

The following hypothesis is made: The MI-based motion planners will outperform the five uninformed methods. Specifically, the average percent root-mean-square error for the estimated target location will be lower for the MI-based motion planners. This hypothesis will be tested in simulations and physical experiments.

### 3.4 Overview of Algorithms for Comparison.

Figure 3 shows the block diagram for the information theory 1 (IT1) algorithm. There are four main components to IT1: (1) Bayesian estimation in the form of a particle filter, (2) position and velocity state estimation through a Kalman filter, (3) an information-based motion planner, and (4) a mobile-sensor motion controller.

Let the search domain for the target state-space be given by $A$. First, TDOA measurements are fed into the particle filter which provides estimated states and weights of the true target location, $\alpha [k]$. For the $ith$ particle, $\alpha i=[xT,yT]\u2208A\u2282R2$ is the set of *x* and *y* coordinates for a single target and $wi\u2208[0,1]$ is the importance weight for each particle. The mean of the particles from the particle filter serves as an estimate only of the position of the target and produces noisy measurements since no motion model is considered. To filter this output and estimate the velocity of the target state $\psi $, a constant-velocity Kalman filter is used. The information-based motion planner uses a subset of 500 particles and the corresponding weights from the particle filter to generate information surfaces for every combination of two agents in a grid that spans the search space. The subset of particles is chosen to be 500 because it provides a good approximation of the target state at a much lower computational cost to create the MI surfaces. These surfaces are used to generate the best waypoints for the agents represented as $q(t)$. Finally, the motion controller enables the agents to travel to their respective waypoints with the command signal $u(t)$.

To compare the impact of the feedback in guiding the motion of the agents, the IT1 algorithm is compared to a general version of the same algorithm. The general algorithm is identical to IT1 except that it lacks feedback from the information-theoretic motion planner and is pictured in Fig. 4.

An additional algorithm, information theory 2 (IT2), is also investigated and the structure is depicted in Fig. 5. The differences between IT1 and IT2 are the number of particles used (8000 versus 40,000, respectively) and the lack of a Kalman filter on the output of IT2's particle filter. Because of this, more particles are used for the IT2 algorithm to produce a better approximation; however, this comes with an increase in computational cost. Due to the lack of the Kalman filter, the velocity estimate is calculated by the derivative of the position estimates from the particle filter. The same information-based motion planner is used here as well that only considers a subset of 500 particles.

### 3.5 Particle Filter.

*k*denotes the discrete-time instant. Using the current measurement and the previous measurement, $z[k]$ and $z[k\u22121]$, respectively, in addition to the measurement history given by $z[ks:k]$, the posterior probability distribution $p(\alpha [k]|z[ks:k])$ can be found from Bayes' rule as

*i*th particle. In Eq. (5),

*η*denotes the finite number of particles used, and $\delta (.)$ is the Dirac Delta function. Importance sampling is used to update the weights of particles, i.e.,

### 3.6 Kalman Filter.

Kalman filters are recursive filters that can estimate states given noisy measurements [35]. A Kalman filter consists of three major steps: state prediction, Kalman gain update, and measurement update.

#### 3.6.1 State Prediction

#### 3.6.2 Kalman Gain Update

#### 3.6.3 Measurement Update

For the Kalman filter equations, the state estimate $\psi \u0302$ has the form $[xT,yT,x\u02d9T,y\u02d9T]T$, where *x _{T}* and

*y*are the target position estimates and $x\u02d9T$ and $y\u02d9T$ are the target velocity estimates. The initial state estimate $\psi \u0302k\u22121|k\u22121$ is set equal to the mean of the

_{T}*x*and

*y*particles from the particle filter $\alpha \u0302k$. Furthermore,

*z*is the received measurement at time-step

_{k}*k*. The initial covariance matrix $P\u0302k-1|k-1$ is $[2,0,0,0;0,2,0,0;0,0,0.5,0;0,0,0,0.5]$. The state transition matrix is defined as $Fk=[1,0,\Delta t,0;0,1,0,\Delta t;0,0,1,0;0,0,0,1]$. Since the particle filter in the previous step converts the TDOA measurement(s) into a position measurement, the observation matrix is defined as $Hk=[1,0,0,0;0,1,0,0]$. The process noise covariance is constant and defined as $Qk\u22121=\Delta t*[1/4,0,1/2,0;0,1/4,0,1/2;1/2,0,1,0;0,1/2,0,1]$, where $\Delta t$ is based on an update rate of 100 Hz and the error matrix is constant and defined as $Rk=[0.1,0;0,0.1]$. Although a constant-velocity Kalman filter is used to track the dynamically-moving target, the fast sampling rate used allows for more robust tracking and the process noise does not play as big of a role as the measurement noise.

### 3.7 Information-Based Motion Planner.

*x*

_{1},

*y*

_{1},

*x*

_{2}, and

*y*

_{2}which represents the locations of two agents in

*x*and

*y*coordinates. The mutual information

*I*is expressed as

respectively. These equations are evaluated using numerical integration techniques.

#### 3.7.1 Mutual Information Matrix.

*x*

_{1},

*y*

_{1},

*x*

_{2}, and

*y*

_{2}) must be evaluated in $H$ and stored in a four-dimensional mutual information matrix

where *i*, *j*, *k*, and *l* are indices that correspond to TDOA sensor pair configurations in a grid. By using a four-permutation-with-replacement process, all possible sensor pair configurations are evaluated. The vectors $xg$ and $yg$ contain the possible *x* and *y* locations for the agents in the grid. The pseudo-code is depicted in Algorithm 1. The maximum of $I$ corresponds to which quartet provides the most information, or the best locations to place two agents to maximize information gain and minimize uncertainty about the target states. The argmax function is defined to output the indices of the maximum value in a matrix.

1 fori = 0 to$length(xg)$do |

2 $\theta [0]\u2190xg[i]$ |

3 forj = 0 to$length(yg)$do |

4 $\theta [1]\u2190yg[j]$ |

5 fork = 0 to$length(xg)$do |

6 $\theta [2]\u2190xg[k]$ |

7 forl = 0 to$length(yg)$do |

8 $\theta [3]\u2190yg[l]$ |

9 $I[i,j,k,l]\u2190H(\theta ,\alpha \u0302i,wi)$ |

1 fori = 0 to$length(xg)$do |

2 $\theta [0]\u2190xg[i]$ |

3 forj = 0 to$length(yg)$do |

4 $\theta [1]\u2190yg[j]$ |

5 fork = 0 to$length(xg)$do |

6 $\theta [2]\u2190xg[k]$ |

7 forl = 0 to$length(yg)$do |

8 $\theta [3]\u2190yg[l]$ |

9 $I[i,j,k,l]\u2190H(\theta ,\alpha \u0302i,wi)$ |

Figure 6 provides examples of two-dimensional subsets of $I$. In Fig. 6(a), the information surface when agent 1 is fixed in a corner is depicted. The surface shows the amount of information available for possible locations of agent 2. From this surface, it can be determined that the best location to place agent 2 is in the location with the maximum information or the opposite corner from agent 1. Conversely, as seen in Fig. 6(b), if agent 2 is fixed at the location determined from Fig. 6(a), the information surface will show the best location to place agent 1, which clearly will be the fixed location described in Fig. 6(a). It is also important to recognize that the information surface Fig. 6(a) is the same even if agent 2 is fixed at the location of agent 1 and the information is calculated for various locations of agent 1. Finally, it is important to note that if agent 2 is placed in the same position as agent 1, zero information will result. These facts allow for a reduction in computational time when generating the 4D matrix of information.

#### 3.7.2 Placement of Agents for Time Difference of Arrival Measurements.

Once computed, $I$ is used to place as many agents as desired in the order of maximum information gain. Algorithm 2 and Fig. 6 depict the process of placing the agents. First, the argmax of $I$ is determined, and the information surfaces corresponding to it are obtained as in Figs. 6(a) and 6(b). These surfaces correspond to where to place the first two agents. Next, the surfaces Figs. 6(a) and 6(b) are combined via multiplication to produce a new surface Fig. 6(c), referred to as $M2$. The argmax of $M2$ is used to determine where to place the third agent. To continue placing more agents, the information surface for the fixed third agent is obtained from $I$ and combined with $M2$ Fig. 6(c) to create surface $M3$. The argmax is taken of $M3$ to place the fourth agent and the process continues for placing as many agents as desired. The *x* and *y* locations to place the agents are stored in vectors $x\u02dc$ and $y\u02dc$.

1 $i,j,k,l=argmax(I)$ |

2 $M2=I[i,j,:,:]*I[:,:,k,l]$ |

3 $x\u02dc=[xg[i],xg[k]]$ |

4 $y\u02dc=[yg[j],yg[l]]$ |

5 if$a\u22653$then |

6 forn = 3 to ado |

7 $i,j=argmax(Mn\u22121)$ |

8 $Mn=Mn\u22121*I[i,j,:,:]$ |

9 $x\u02dc.append(xg[i])$ |

10 $y\u02dc.append(yg[j])$ |

1 $i,j,k,l=argmax(I)$ |

2 $M2=I[i,j,:,:]*I[:,:,k,l]$ |

3 $x\u02dc=[xg[i],xg[k]]$ |

4 $y\u02dc=[yg[j],yg[l]]$ |

5 if$a\u22653$then |

6 forn = 3 to ado |

7 $i,j=argmax(Mn\u22121)$ |

8 $Mn=Mn\u22121*I[i,j,:,:]$ |

9 $x\u02dc.append(xg[i])$ |

10 $y\u02dc.append(yg[j])$ |

### 3.8 Error Metric.

where *n* is the number of observations, *m _{i}* is the observed value, and $m\u0302i$ is the estimated value.

## 4 Experimental Design

Experiments compared the IT1, IT2, and uninformed methods. Figure 4 shows the uninformed algorithm without the feedback from the particle filter. It is also important to note that the algorithms depicted in Figs. 4 and 3 are identical except for the lack of this feedback mechanism. Instead, waypoints for the agents in the algorithm depicted in Fig. 4 are predefined and not based on the current information. Comparing these scenarios helps to determine how useful this feedback is in practice. The goal of the experiment is to track the target as closely as possible for the duration of the experiment.

The sampling frequency of the hardware (control and sensing) is 40 Hz; however, in the IT1 and IT2 cases, the agents are given new waypoints each second. The target is given the same path for all scenarios but slight deviations from the path in the physical experiments are possible though not significant enough to impact the outcome. The size of the physical platform (4.2 m × 2.6 m) is also used as a constraint for simulations to allow for better cross-comparison between simulations and physical experiments. The agent search area is limited to a 1.5 m × 1.5 m area, where the *x*-direction search range spans 0.65–2.15 m and the *y*-direction search range spans 0.35–1.85 m. Figure 7 shows the area of interest and the path for which all simulations and physical experiments are performed. The maximum speed of the agents in simulation and physical experiments is set at 0.15 m/s. The speed of the target is constant and equal to that of the agents. However, in cases where there are only two agents, the speed of the target is half at 0.075 m/s.

### 4.1 Simulation Setup.

Simulations are created using python (version 3.8.3) on a Windows 10 machine. A TDOA measurement error of 1% is used. Simulations are completed for two to eight agents using the following seven different methods (100 each, 4900 total):

(M1)

*Static border*: The agents are placed evenly spaced around the border and do not move. Agent 1 is placed in the bottom left-hand corner, and the remaining agents are placed evenly in a counterclockwise fashion.(M2)

*Static uniform*: The agents are placed randomly, in a uniform fashion throughout the area and do not move.(M3)

*Rotation*: The agents are placed as described in M1, but move in a counterclockwise fashion on the border.(M4)

*Raster*: The agents are placed as described in M2, but then move in a raster-scan pattern.(M5)

*Random*: The agents are placed as described in M2, but then move randomly within the search space.(M6)

*IT1*: The agents are placed as described in M2, but then follow the motion trajectory generated by the information theory algorithm described in Fig. 3.(M7)

*IT2*: The agents are placed as described in M2, but then follow the motion trajectory generated by the information theory algorithm described in Fig. 5.

### 4.2 Physical Experiment Setup.

Physical experiments are performed to test the efficacy of the particle filter and MI-based motion planners (IT1 and IT2). Figure 9 summarizes the experimental system and setup. The experiments are performed on a desktop computer running linuxubuntu 18.04 and the robotoperatingsystem (ros) melodic. The uncooperative target is a custom-designed omnidirectional ground robot equipped with an Odroid XU4 single board computer interfaced with a Robotis U2D2 that controls the Dynamixel servomotors (Fig. 9(a)). The single board computer runs the ros environment for vehicle control and navigation after receiving waypoint commands from the ground station (GS) through a 2.4/5 GHz WiFi module. The target robot is also equipped with an RF emitter, loco positioning system (LPS) deck, which operates similarly to the LPS node without the added computing power of the STM32F072 MCU. The RF-sensing aerial vehicles (small custom-built quadcopters named DARCFly) are designed and developed based on the crazyflie 2.1 (Bitcraze, Malmö, Sweden) platform. By incorporating an electronic speed controller and BigQuad deck, brushless motors are used and powered by a 2S 1100 mAh battery (Fig. 9(b)). The upgraded motors and battery allow for greater payload capacity and longer flight times. Furthermore, each robot carries a LPS node, also by Bitcraze, which uses a Decawave DWM1000 module for RF sensing and STM32F07 MCU for additional on-board computation. The GS is equipped with a 2.4 GHz industrial, scientific and medical band radio that communicates with the DARCFly and sends flight commands. The GS computes all necessary calculations, including the particle filter, MI, and position controller for the robots. All experiments are conducted on a 4.2 m × 2.6 m × 1.7 m platform with ten Flex 13 Optitrack motion capture (Mocap) cameras connected to a GS computer (Fig. 9(c)).

#### 4.2.1 Sensor Characterization.

To detect true TDOA measurements, the above-mentioned LPS node and LPS deck are used. These are equipped with a DWM1000 module, a single, low-power chip equipped with all necessary RF circuitry, i.e., antenna, RF transceiver, power management, ultrawideband, and clock circuit. The module is used for two-way ranging or TDOA location systems and deals with all signal processing to report a TDOA measurement. The onboard clock manages clock synchronization across sensors and is equipped with a 38.4-MHz reference crystal which is trimmed during production to get a frequency calibration offset of 2 ppm. TDOA measurements are dependent upon the speed of light. This fact means that for submeter accuracy in small spaces, the time of arrival of the signal needs to be accurate to the nanosecond—a high expectation for a low-cost sensor. The inexpensive sensor is characterized to determine how accurate and reliable it performs in physical experiments. Various tests are conducted to determine the cause of sensor measurement errors. Tests reveal that the angle of the antenna impacts the performance of the sensor. This behavior, however, is nonlinear and cannot practically be inversely modeled. One simplifying assumption that appears to make the sensor behave more consistently is to direct the sensor antenna at the closest agent (see Fig. 10). Therefore, for the duration of the experiments, the target robot rotates to point in the direction of the closest agent. Figure 11 shows the percent error for two TDOA sensor measurements when a target follows the path in Fig. 7 and the agents are in the static border configuration.

#### 4.2.2 Physical Experiments.

Physical experiments are conducted only for the two and three agents' scenarios. This limits the number of agents that can fly in the small 1.5 m × 1.5 m volume. The IT1 and IT2 methods are both tested; however, to prevent collisions between the agents, the only uninformed methods used for comparison are the static border (M1) and rotation (M3) cases.

From the sensor characterization step, physical experiments are conducted in two batches. The first physical experiments use a simulated sensor with the same assumed 1% error as in the simulated experiments. The second set of physical experiments use the real sensor with an assumed 10% error.

## 5 Results

where $l=1.52+1.52$ m is the maximum distance between the agents.

### 5.1 Simulation Results.

Simulations are conducted for two to eight agents for all seven methods (M1–M7). For each agent/method combination, 100 simulations are run. The mean and standard deviation for all cases are illustrated in the bar graphs in Fig. 12. No significant difference is found between methods M4 and M5 when three agents are used (*p *=* *0.17). Table 1 reports the mean and standard deviation values from Fig. 12.

Number of agents | |||||||
---|---|---|---|---|---|---|---|

Methods | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

M1: static border | 59.04/0.35 | 6.76/0.28 | 2.89/0.16 | 3.59/0.23 | 3.08/0.17 | 3.30/0.20 | 2.65/0.15 |

M2: static uniform | 35.76/9.84 | 15.31/10.52 | 9.01/2.82 | 8.03/3.81 | 6.39/1.77 | 5.77/1.71 | 5.24/1.39 |

M3: orbiting | 16.07/0.33 | 5.81/0.38 | 3.49/0.17 | 3.71/0.22 | 2.95/0.15 | 3.10/0.16 | 2.79/0.13 |

M4: raster | 13.91/0.25 | 9.92/2.56 | 5.58/0.20 | 5.29/0.25 | 4.14/0.18 | 3.85/0.17 | 3.93/0.17 |

M5: random | 14.66/2.70 | 10.42/2.55 | 7.17/1.77 | 6.06/1.39 | 5.29/1.16 | 4.75/0.87 | 4.35/0.75 |

M6: IT1 | 11.02/0.96 | 4.85/0.62 | 2.99/0.22 | 3.13/0.24 | 2.70/0.16 | 2.90/0.27 | 2.58/0.17 |

M7: IT2 | 12.00/1.46 | 3.73/1.49 | 1.85/0.17 | 1.90/0.13 | 1.54/0.15 | 1.89/0.24 | 1.52/0.15 |

Number of agents | |||||||
---|---|---|---|---|---|---|---|

Methods | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

M1: static border | 59.04/0.35 | 6.76/0.28 | 2.89/0.16 | 3.59/0.23 | 3.08/0.17 | 3.30/0.20 | 2.65/0.15 |

M2: static uniform | 35.76/9.84 | 15.31/10.52 | 9.01/2.82 | 8.03/3.81 | 6.39/1.77 | 5.77/1.71 | 5.24/1.39 |

M3: orbiting | 16.07/0.33 | 5.81/0.38 | 3.49/0.17 | 3.71/0.22 | 2.95/0.15 | 3.10/0.16 | 2.79/0.13 |

M4: raster | 13.91/0.25 | 9.92/2.56 | 5.58/0.20 | 5.29/0.25 | 4.14/0.18 | 3.85/0.17 | 3.93/0.17 |

M5: random | 14.66/2.70 | 10.42/2.55 | 7.17/1.77 | 6.06/1.39 | 5.29/1.16 | 4.75/0.87 | 4.35/0.75 |

M6: IT1 | 11.02/0.96 | 4.85/0.62 | 2.99/0.22 | 3.13/0.24 | 2.70/0.16 | 2.90/0.27 | 2.58/0.17 |

M7: IT2 | 12.00/1.46 | 3.73/1.49 | 1.85/0.17 | 1.90/0.13 | 1.54/0.15 | 1.89/0.24 | 1.52/0.15 |

The mean is reported first followed by the standard deviation.

An example of how the average percent error and standard deviation evolve over time for method M1, M3, M6, and M7 for two and three agents are shown in Figs. 13(a,1) and 13(b,1), respectively.

### 5.2 Physical Experiment With Simulated Sensor.

Physical experiments are conducted with two and three agents using a simulated sensor with a 1% error. The methods utilized are M1, M3, M6, and M7. For each of these methods, ten physical experiments are conducted. The mean and standard deviation for the experiments are illustrated in the bar graph in Fig. 16. When two agents are used, no significant difference is found between the M6 and M7 cases (*p *=* *0.087). When three agents are used, no significant difference is found between the M3 and M6 cases (*p *=* *0.63). Table 2 reports the mean and standard deviation values from the bar graph. Figure 17 shows three instances in time of an experiment with three agents using the IT2 algorithm (method M7).

Number of agents | ||
---|---|---|

Methods | 2 | 3 |

M1: static border | 51.15/5.29 | 6.75/0.30 |

M3: orbiting | 16.38/0.37 | 5.85/0.29 |

M6: IT1 | 9.62/1.22 | 5.62/1.34 |

M7: IT2 | 8.65/1.07 | 3.69/0.49 |

Number of agents | ||
---|---|---|

Methods | 2 | 3 |

M1: static border | 51.15/5.29 | 6.75/0.30 |

M3: orbiting | 16.38/0.37 | 5.85/0.29 |

M6: IT1 | 9.62/1.22 | 5.62/1.34 |

M7: IT2 | 8.65/1.07 | 3.69/0.49 |

Methods M1, M3, M6, and M7 are used in combination with two and three agents. The mean is reported first followed by the standard deviation.

In Fig. 13, plots (*a*2) and (*b*2) show the average percent error and standard deviation as they evolve over time for the ten physical experiments for two and three agents, respectively.

### 5.3 Physical Experiment With Real Sensor.

Physical experiments are conducted with two and three agents using a real sensor with a high degree of error (see Fig. 11). The methods investigated are M1, M3, M6, and M7. For each of these methods, ten physical experiments are conducted. The mean and standard deviation for the experiments are illustrated in Fig. 16. When using two agents, no significant difference is found between M6 and M7 (*p *=* *0.53). When using three agents, no significant difference is found for three comparisons: M1 and M6 (*p *=* *0.41), M1 and M7 (*p *=* *0.97), and M6 and M7 (*p *=* *0.51). Table 3 reports the mean and standard deviation values from Fig. 16.

Number of agents | ||
---|---|---|

Methods | 2 | 3 |

M1: static border | 42.35/3.60 | 20.63/0.59 |

M3: orbiting | 30.03/1.49 | 18.88/0.67 |

M6: IT1 | 33.16/2.91 | 21.10/1.58 |

M7: IT2 | 34.12/3.52 | 20.64/1.32 |

Number of agents | ||
---|---|---|

Methods | 2 | 3 |

M1: static border | 42.35/3.60 | 20.63/0.59 |

M3: orbiting | 30.03/1.49 | 18.88/0.67 |

M6: IT1 | 33.16/2.91 | 21.10/1.58 |

M7: IT2 | 34.12/3.52 | 20.64/1.32 |

The mean is reported first followed by the standard deviation.

In Figs. 13(a,3) and 13(b,3) show the average percent error and standard deviation as they evolve over time for the ten physical experiments for two and three agents, respectively.

## 6 Discussion

This study introduced a novel method of autonomously positioning a network of agents using mutual information for the localization and tracking of a dynamically moving uncooperative target. The performance of the information-based algorithm and uninformed methods were evaluated using Monte Carlo simulations and physical experiments. All tests are conducted on the same dynamic path.

An important observation is the difference in the estimation process between the first six methods and IT2 (M7). The estimation process for the first six methods is the same; they each use the same particle filter and constant-velocity Kalman filter. The only difference between the first six methods is the type of uninformed method used or the presence of the particle filter feedback in the motion planner for the IT1 case. IT2 is different because it does not utilize a Kalman filter and the particle filter uses 40,000 particles instead of 8000 particles (see Figs. 3–5).

It is clear from Fig. 12 that the performance of the IT2 algorithm is superior to all other methods when there are three or more agents. This result is expected since there is no constant-velocity Kalman filter. Between the 30-s and 50-s time instance in Figs. 13(b,1) and 13(b,2), several peaks of error are observed for methods M1, M3, and M6. These peaks are notably absent for the IT2 algorithm, and they correspond to the rapid changes in the target velocity seen in Figs. 15(a,1)–15(a,4). It can be seen in Fig. 15(a,4) that the target estimate stays on track as the target moves. On the other hand, Figs. 15(a,1)–15(a,3) show significant overshoot due to the anticipation of a constant-velocity by the Kalman filter. The drawback of using the IT2 algorithm instead of the IT1 algorithm, despite the position estimate being more accurate, is the velocity estimate has much higher error for the IT2 algorithm (see Fig. 19).

### 6.1 Simulations Results.

As shown in Fig. 12 and Table 1, the IT1 algorithm performs better than the next closest uninformed method, by 2.89% error when two agents are used. It is emphasized that in the two-agent case, the target and agents do not move at the same speed, rather, the agents are capable of moving at twice the speed of the target. Figure 2 visualizes the importance of more than one TDOA measurement for localization. Allowing the agents to move up to twice the speed of the target allows for pseudo-intersections of TDOA measurements to occur. This is one reason the static methods (M1 and M2) yield poor performance. This two-agent result is significant because it shows that when one is limited to just two sensors, the IT1 algorithm outperforms state-of-the-art methods.

As illustrated in Fig. 12 and Table 1, the hypothesis that the MI-based motion planner would better localize and track a dynamic target compared to the other five uninformed methods is confirmed in all simulation cases, except for when there are four agents. In this case, the static border case (M1) performed marginally better by 0.1% error. In this configuration, the static agents are located in the four corners of the search area, with maximum separation. In the information surface shown in Fig. 6(c), it is better to select agent combinations that are most separated. The static border (M1) and rotation (M3) methods maintain a large separation distance between the agents throughout the duration of the simulations. Agents stay on the edges of the search area with some degree of uniform distance between them. The performance of the IT1 algorithm (M6) compared to the static border method, in this case, is very promising. It illustrates that the IT1 algorithm can perform as well as an ideal scenario containing four static sensors. This means that if an ideal static sensor network does not exist, this algorithm autonomously realizes this configuration and yields behavior consistent with an idealized static sensor network. This can be contrasted with the static uniform, random, and raster uninformed methods which perform consistently worse. These uninformed methods do not encode this notion of maximum separation between agents for optimum performance.

One seemingly strange behavior observed in the results is the increase in the normalized percent RMSE values for methods M1, M3, M6, and M7 when going from four to five agents and six to seven agents. This is related to the initial starting locations of the agents. The importance of the corners in measurement was discussed earlier. When transitioning from four to five agents, the starting configuration for these four methods changes the number of corners being occupied from four to one. When transitioning from six to seven agents, the starting configuration changes from two corners being occupied to one. The importance of these corner positions can be easily observed in Table 1, where the static border case for four agents (all of which reside in corners) performs better than five, six, and even seven agents. The impact of starting in a less ideal starting configuration, therefore, propagates into the normalized percent RMSE. It is also noted that this phenomenon does not occur in the M2, M4, and M5 cases of the uninformed methods. This is due to the fact that the initial positions and motion of each of these agents are not tied to one another, or the corners in the way unlike the M1, M3, M6, and M7 cases. Finally, as expected, the IT2 algorithm performed better than all other methods for position estimation.

### 6.2 Physical Experiments With Simulated Sensor.

The static border (M1) method performs practically identical to simulations in the three-agent case but slightly better than simulations in the two-agent case. This performance difference is due to the ability of the physical vehicle being able to hover in place with slight variations in its location as seen in Fig. 14(b,1).

As seen in Fig. 16 and Table 2, in the two-agent case, the IT1 and IT2 algorithms performed exceptionally well compared to the static border and rotation (orbiting) cases. The performance of both IT1 and IT2 in this scenario is better than the simulations. In the three-agent case, the IT1 algorithm still performed better than the static border and rotation cases but not by as much as in the simulations; the IT2 algorithm has slightly better performance. These discrepancies are likely due to the dynamics of the aerial robots. In the rotation (M3) method, the path of the agents, as well as the estimated path taken by the target, is almost identical to simulations (see (*a*2) and (*b*2) in Figs. 14 and 15), and the agents do not have any dramatic changes in the direction of motion. In the IT1 and IT2 cases, however, dramatic changes in the direction of motion are common which cause the path taken by the agents to differ from simulations. This in turn causes the estimated target path to be different (see (*a*3) and (*a*4) versus (*b*3) and (*b*4) in Figs. 14 and 15).

### 6.3 Physical Experiments With Real Sensor.

As expected, the physical experiments using a real sensor did not perform nearly as well as the simulated sensor. This was due to the nonlinear behavior and uncertainty of measurement of the real sensor as seen in Figs. 10 and 11. In Figs. 14(c,1) and 14(c,4), the two-agent case performed very poorly. For the three-agent case, the results depicted in Figs. 15(c,1) and 15(c,4), the algorithm had much more success. As shown in Figs. 10 and 11, the sensor error is not consistent, and therefore it is possible the poor performance of IT1 and IT2 compared to the rotation (M3) method is due to differing sensor error. In the three-agent case, the only method that was significantly different than the others is the rotation method.

## 7 Conclusions

A novel method of uncooperative target localization and tracking using TDOA measurements with a particle filter and MI to autonomously guide the motion of the agents was developed, described, simulated, and validated in physical experiments. The algorithm performs better than state-of-the-art methods for localization and tracking using TDOA measurements for the two-agent case. The algorithm also performs as well as an idealized static sensor network, which would allow it to be deployed when no existing architecture is present. The algorithm has applications in search and rescue, surveillance, and military threat monitoring. Future work will (1) explore using the algorithm to enable a swarm of agents to autonomously follow/chase a dynamically moving target instead of being confined to a search area, (2) investigate the computational demands of the algorithm as it relates to the number of agents being placed, and (3) incorporate other measurements such as FDOA or DRSS, and expand the scale of the experiments.

## Acknowledgment

The paper/data have been reviewed in accordance with the International Traffic in Arms Regulations (ITAR), 22 CFR PART 120.11, and the Export Admin. Regulations (EAR), 15 CFR 734(3)(b)(3), and may be released without export restrictions.

## Data Availability Statement

The authors attest that all data for this study are included in the paper.