On the Resolution Probability of Conditional and Unconditional Maximum Likelihood DoA Estimation

After decades of research in Direction of Arrival (DoA) estimation, today Maximum Likelihood (ML) algorithms still provide the best performance in terms of resolution capabilities. At the cost of a multidimensional search, ML algorithms achieve a significant reduction of the outlier production mechanism in the threshold region, where the number of snapshots per antenna and/or the signal to noise ratio (SNR) are low. The objective of this paper is to characterize the resolution capabilities of ML algorithms in the threshold region. Both conditional and unconditional versions of the ML algorithms are investigated in the asymptotic regime where both the number of antennas and the number of snapshots are large but comparable in magnitude. By using random matrix theory techniques, the finite dimensional distributions of both cost functions are shown to be Gaussian distributed in this asymptotic regime, and a closed form expression of the corresponding asymptotic covariance matrices is provided. These results allow to characterize the asymptotic behavior of the resolution probability, which is defined as the probability that the cost function evaluated at the true DoAs is smaller than the values that it takes at the positions of the other asymptotic local minima.


I. INTRODUCTION
The determination of the direction of arrival (DoA) of one or multiple far-field sources continues to be a highly relevant problem in multiple fields, such as radar, sonar, seismology or radioastronomy. Among all the DoA determination methods available today, classical Maximum Likelihood (ML) procedures remain to be the ones that offer the best performance in terms of both precision and spatial resolution. The extraordinary performance of ML methods comes at the expense of an increased computational complexity, since they require a non-linear multi-dimensional search instead of a one-dimensional one (as it is the case in subspace-based algorithms). However, ML methods still represent the only valid alternative in challenging scenarios with closely located and/or highly correlated sources. They are also the most attractive solution in offline processing applications.
Several alternatives have been proposed in the literature in order to alleviate the high computational complexity associated with the non-linear multidimensional search of ML methods. One can differentiate between algorithms that try to simplify the local search procedure and algorithms that aim to simplify the global search. Among the first group, we can include the alternating projection method [1], the IQML algorithm for Uniform Linear Arrays (ULA) [2], some methods based on large sample volume approximations of the ML objective function (such as MODE [3] or related alternatives [4], [5]), and some recently proposed approximate ML methods for the specific case where only two closely spaced sources are present in the scenario [6], [7]. All these methods achieve a significant reduction of the computational complexity at the expense of a certain performance loss, caused by the fact that the ML objective function is in fact approximated. The second family of ML-based DoA detection algorithms are aimed at simplifying the global multidimensional search, avoiding the computational burden associated with an evaluation of the ML objective function in a uniform multidimensional fine grid. One may include here genetic algorithms for global search [8], [9], or two-stage methods that first select a set of candidate DoAs with simpler one-dimensional search methods and then refine these initial estimations by a multidimensional ML search procedure [10], [11]. The aim of all these global search methods is to avoid convergence to a local extremum of the ML objective function while avoiding the need to evaluate this function at a high number of points in the multi-dimensional grid.
One must point out that there exist two alternative ML DoA estimation procedures with different objective functions, depending on whether the signals are modeled as stochastic or as unknown deterministic parameters. The Unconditional ML This work was partially supported by the Catalan and Spanish grants 2017-SGR-01479 and RTI2018-099722-B-I00. This paper was presented in part at the European Signal Processing Conference EUSIPCO '13. arXiv:2008.00726v1 [eess.SP] 3 Aug 2020 (UML) method is based on the assumption that the source signals are temporally white Gaussian random variables [12], [13], whereas the Conditional ML (CML) method simply treats them as unknown deterministic parameters that need to be estimated. It was early recognized [14] that the UML method asymptotically outperforms CML when the number of samples tends to infinity for a fixed array dimension. In fact, it was shown in [15] that, contrary to UML, the CML method is statistically inefficient, in the sense that the algorithm does not achieve the Cramér-Rao Bound (CRB) corresponding to the deterministic signal assumption. This effect is caused by the fact that the total number of parameters that need to be estimated by the CML method increases with the sample size, whereas this is not the case in the UML algorithm. On the other hand, it was demonstrated in [16] that CML and UML are in fact equivalent at large signal to noise ratio (SNR), meaning that the difference between their estimates converges to zero in probability, even for finite values of the sample volume.
All of the above performance results for CML and UML are only valid in the small error regime, that is when the estimated DoAs are very close to the true ones, either because of a relatively large sample volume with respect to the array dimension [14] or a relatively low noise power level with respect to the source signals [17], [16]. Unfortunately, none of the above insights carry over to the more relevant case where the number of samples is comparable to the array dimension and the SNR is moderately low. Note that this is precisely the regime where ML offers the main advantages with respect to other one-dimensional DoA estimation techniques, such as more conventional subspace based approaches. The main objective of this paper will be the performance characterization of the CML and UML techniques in this "threshold region", whereby both the number of samples per antenna and the SNR take moderate values. This region is typically characterized by a systematic appearance of outliers in the DoA estimates, which are mainly caused by the incapability of resolving closely spaced sources.
Unfortunately, the performance of CML and UML DoA estimation methods in this threshold region has received less attention in the literature. One of the most relevant contributions in this direction was the work by Athley in [18], which characterized the probability of resolving closely spaced sources for finite values of the sample size and SNR. Some additional insights into the problem were given in [19], where the outlier production mechanism was related to the asymptotic eigendecomposition of the observation covariance matrix. Recently, the approach in [18] has been used to study the resolution probability of both CML and UML in the multi-frequency case [20]. In this paper, we also follow the approach in [18] and characterize the resolution probability of these two ML methods by studying the stochastic behavior of the CML and UML objective functions under the assumption that the observations are Gaussian random variables. Our approach will be asymptotic in both the number of antennas and the number of samples, although we will always assume that both the SNR and the number of samples per antenna are finite and bounded quantities. In practice, this asymptotic regime models the situation where the number of antennas is of the same order of magnitude as the number of available snapshots.
The approach followed by Athley in [18] conforms to the classical Method of Interval Errors (MIE), which predicts the threshold region performance of estimation methods based on minimizing a certain cost function. This method has also been used in [21] to characterize the threshold performance of the Capon method and in [22] to study the threshold of the ML estimation method corresponding to one signal in spatially colored noise. The main idea is that the mean squared error (MSE) of the estimated parameters can be decomposed into a sum of two weighted terms, a local error term (the small-error variance), and an outlier term for global errors (outliers), that is where M SE small is the small error MSE (usually the Cramér-Rao Bound), M SE large is the large error MSE (typically approximated as the average MSE under a uniformly distributed parameter choice) and P res is the resolution probability. In order to characterize the resolution probability, the MIE identifies all the local minima of the asymptotic (deterministic) cost function. Clearly, only one of these local minima will be associated to the true value of the parameters (the global minimum, assuming consistency), whereas the rest will contain the typical values taken by outliers, which can all be associated to other local minima. The resolution probability is defined as the probability that the value of the cost function at the position of the deterministic global minima (true DoAs) is smaller than the values at the rest of the local ones. Observe that the global behavior of the random cost function is summarized as the behavior in a finite set of deterministic points. The rest of the paper is structured as follows. Section II introduces the conditional and unconditional ML DoA estimation methods and presents the two main results of the paper, namely (i) almost sure convergence of the ML cost functions and (ii) convergence in law of the associated finite dimensional distributions. The proof of these two main results is given in Section III and Section IV respectively. Section V provides a numerical evaluation of the asymptotic probability of resolution that is established according to the results in Section II and finally Section VI concludes the paper.

II. DOA ML METHODS AND MAIN ASYMPTOTIC RESULTS
Let us denote by y(n) ∈ C M ×1 a column vector that contains the complex samples received by an array of M elements at the nth time instant. This signal contains the contribution from K < M far field sources with DoAsθ = θ (1), . . . ,θ(K) T , so that we can express y(n) as y(n) = A(θ)s(n) + n(n).
In this expression, s(n) ∈ C K×1 is a column vector with the source signal samples, A(θ) ∈ C M ×N is a matrix that contains as columns the steering vectors corresponding to the different directions of arrival and n(n) ∈ C M ×1 is a column vector with background noise entries, assumed independent and identically distributed (i.i.d.) according to a circularly symmetric Gaussian law with with zero mean and unknown variance σ 2 . Assume that N different snapshots or realizations of y(n) are available, and denote Y = [y(1), . . . , y(N )]. As mentioned above, two different ML methods for the estimation ofθ can be derived depending on the nature of the signal vectors s(n). In both cases, the minimization is carried out in a multidimensional compact domain Θ K ⊂ R K for which A(θ) has full column rank, for example for some small ε > 0.

A. Conditional ML
In the conditional model, the signals are assumed to be deterministic unknown parameters, denoted by S = [s(1), . . . , s(N )]. The corresponding ML estimator is the minimizer of the normalized negative log-likelihood where · F is the Frobenius norm and where we constrain σ 2 > 0. The minimum is achieved atŜ(θ) = (A H (θ)A(θ)) −1 A H (θ)Y and, regardless of σ 2 , the CML estimator of θ is obtained by minimizing the cost function L C (σ 2 ,Ŝ(θ),θ) or, equivalently, the functionη whereR is the sample covariance matrix, i.e.R = 1 N YY H , and P ⊥ A (θ) is the orthogonal projection matrix onto the null column space of A(θ), namely P ⊥

B. Unconditional ML
The unconditional model assumes the signals s(n) are independent circularly symmetric Gaussian vectors with zero mean and unknown covariance matrix P s = E s(n)s H (n) . In this case, the ML estimator can be formulated as the minimizer of normalized negative log-likelihood L U P s , σ 2 , θ = log det R P s , σ 2 , θ + tr R −1 P s , σ 2 , θ R where here againR is the sample covariance matrix defined above and where R P s , σ 2 , θ = A(θ)P s A H (θ) + σ 2 I M . The above minimization is carried with the constraints P s ≥ 0, σ 2 > 0 and θ ∈ Θ K . The solution to the above optimization problem can be formulated as follows. Let us denote byα 1 (θ) ≤ . . . ≤α K (θ) the eigenvalues of the K × K matrix U H A (θ)RU A (θ), where U A (θ) = A(θ)(A H (θ)A(θ)) −1/2 and letq 1 (θ), . . . ,q K (θ) denote the associated eigenvectors. For k = 1, . . . , K, denote byΛ k (θ) a k × k diagonal matrix that contains the k highest eigenvalues and letQ k (θ) be defined as a K × k matrix that contains the associated eigenvectors, that isΛ Now, let m denote the maximum integer such that 1Λ m (θ) >σ 2 m (θ)I m . If there does not exist such an integer (meaning that α K (θ) <σ 2 1 (θ)) then the optimum is achieved at P s = 0 and σ 2 = 1 M trR. Otherwise, the function L U (P s , σ 2 , θ) reaches its minimum at σ 2 =σ 2 m (θ) and P s =P Furthermore, the corresponding negative log-likelihood takes the minimum value The UML estimator can therefore be obtained by exploring the above cost function over the domain Θ K . The problem here is that every time we need to evaluate the above expression at a particular θ we need to find the full eigendecomposition of the matrix U H A (θ)RU A (θ), so finding the true UML estimator is quite complex from the computational complexity. For this reason, typical implementations of the UML DoA estimator assume that N > K andα 1 (θ) >σ 2 K (θ) [12], [13], implying that m = K and turning the problem into the minimization of By using the identity det(I + AB) = det(I + BA) (valid for matrices of compatible dimensions), this cost function can also be expressed in the more conventional form which can be evaluated without the need of computing an eigendecomposition at each θ. Furthermore, this cost function is well defined with probability one even ifα 1 (θ) ≤σ 2 K (θ) (becauseα 1 (θ) > 0 almost surely). For this reason, the minimizer of the above cost function is usually taken to be, by definition, the UML estimator.
When N < K (undersampled regime), we will always haveα 1 (θ) = . . . =α K−N (θ) = 0 and therefore the cost function in (4)-(6) is not well defined. Noting however thatα K−N +1 (θ) > 0 with probability one, we can instead assume that α K−N +1 (θ) ≥σ 2 N (θ), implying that m = N in the original UML problem, which can readily be turned into the minimization ofη being expressible as in (5) simply replacing K by N . This cost function can also be evaluated without the need for eigenvalue decompositions and is always well defined even if the hypothesisα K−N +1 (θ) ≥σ 2 N (θ) does not hold. For this reason, this is the natural extension of the simplified UML cost function to the undersampled regime (N < K). From now on, we will consider this natural extension of the UML method to the undersampled regime.
In conclusion, for both undersampled and oversampled regimes, we will assume that the the minimum positive eigenvalue of U H A (θ)RU A (θ) is larger than the associated noise power estimate, taken to beσ 2 K (θ) as defined in (5) with K replaced withK = min (K, N ). In both cases, this assumption allows to evaluate the corresponding cost function without the need for a parameter-dependent eigenvalue decomposition.
As it can be readily seen, both CML and UML methods are based on the minimization of highly nonlinear multidimensional objective functionsη C (θ),η U (θ) that typically present multiple local minima. In order to obtain the ML estimators, one should first identify all the local minima in the parameter space Θ K and then select the lowest one as the corresponding estimated DoAs. The randomness of these ML cost functions will lead to fluctuations of the local minima, which will generate a loss in both precision and resolution. Fluctuations in the position of the global minimum around the true DoAsθ will result in a loss of precision, which will lead to fluctuations of the estimated DoAs around the true onesθ. The objective of this paper is the characterization of this effect, following the approach established in [18].

C. First order asymptotic behavior
In order to overcome the difficulties in the statistical characterization of the ML cost functions, we will take an asymptotic approach and characterize the behavior of the two ML cost functions when both sample size (N ) and array dimension (M ) are large but comparable in magnitude. Furthermore, we will assume that the source signals follow the unconditional model, as formalized in the following.
(As1) The number of elements of the array depends on the sample size, M = M (N ) and M (N ) → ∞ when N → ∞. Furthermore, the quotient M/N converges to a positive constant as N → ∞, namely M/N → c, 0 < c < ∞. From now on the expressions "large N " and "large M " will be indistinctly used to refer to this asymptotic regime.
(As2) The observations y(n), n = 1, . . . , N , are complex, circularly symmetric, independent and identically distributed Gaussian random variables with zero mean and covariance matrix R. Furthermore, the eigenvalues of R are allowed to fluctuate with increasing M but are always located inside a compact interval of R + (the real positive axis) independent of M.
(As3) The number of sources K is strictly lower than the number of sensors, that is K < M . Furthermore, K may increase with N , so that we can either have 2 (oversampled regime) or (undersampled regime). We consider here a family of L sequences (L fixed and independent of M ) of K-dimensional points in Θ K , which will be denoted by {θ ( ) M } =1,...,L . Our objective is to characterize the asymptotic behavior of the CML and the UML cost functions evaluated at these L distinct point sequences. Observe that according to (As3) the dimensionality of these points will scale up with the array dimension. Consider the M × M deterministic matrix Conversely, when (8) holds, we havē where φ 0 (θ) is the only negative solution to the following equation in φ Proof: See Section III. In the light of the above theorem, one may clearly establish a different asymptotic behavior for the CML and the UML cost functions. The CML cost function is asymptotically equivalent to its large-sample approximation M −1 tr P ⊥ A (θ)R , whereas the behavior of the UML cost function is somewhat different. In the oversampled situation, the UML cost function is asymptotically close to the large sample equivalent (the first term in (9)) plus a constant factor that does not play any role in the parameter optimization process. In the undersampled case, however, the UML cost function presents quite a different behavior, in the sense that it becomes equivalent to a cost function that does not appear to bear any similarity with its large sample approximation. We can see, however, that in all these cases, the maximum of the objective function is achieved at the true DoAs, as formally established in the following lemma.
Lemma 1 Assume that the array does not present ambiguities, so that whenever A(θ 1 ) = A(θ 2 )T with T an invertible matrix, we have θ 1 = θ 2 . Then, regardless of whether N > K or N < K, bothη C (θ) andη U (θ) achieve their minimum at the true DoAs, namelyθ = arg min θ∈Θ Kη where Θ K is as defined above.
Proof: See Appendix A. The fact that both ML cost functions are asymptotically close some deterministic functions with a global minimum at the true parametersθ supports the conjecture that both methods provide consistent estimates even under a finite number of samples per observation dimension. However, this is not formally proven here, since it would require extending the above pointwise convergence result to uniform convergence on the parameter space Θ K . In this paper, we are more interested in the behavior of the cost functions themselves, and more specifically in their fluctuations around their deterministic equivalents, which will eventually lead to the presence of outliers. As mentioned above, the presence of outliers is typically a consequence of the cost function achieving its minimum at a local extremum that is far away from the one associated with the true parameters. In the next subsection, we will further characterize this effect by investigating the nature of the fluctuations of the CML and the UML cost functions at these local minima.
where Q i is as in (12). We assume that the minimum eigenvalue of both W P and W Q is bounded away from zero uniformly in M . Under the above set of assumptions, it is possible to characterize the asymptotic pointwise convergence of the two ML cost functions, which we summarize in the following result. The following result establishes the fact thatη C andη U asymptotically fluctuate around their deterministic equivalents as Gaussian random vectors. We formulate the result so that it holds for both the undersampled (N < K) and the oversampled (N > K) regimes.
with Q denoting the matrix defined in (12).
Proof: See Section IV. The above results provide a means of establishing the asymptotic probability of resolution of the UML and the CML methods. Before turning to its proof, let us draw some conclusions that can be derived from it. First of all, it is interesting to observe that when M, N → ∞, the two cost functions are asymptotically close to the two deterministic counterpartsη C (θ) andη U (θ). Even if the two asymptotic equivalents present a single global minimum at the true value of the DoAs, these functions are in practice highly multimodal, i.e. they present several local minima. As shown in Lemma 1, one of these local minima will coincide with the true DoAs, namelyθ. Let L denote the number of additional local minima inside the feasibility region, other thanθ, and let θ M denote the K-dimensional points where these minima are achieved. In general, both L and the points θ ( ) M may be different inη C (θ) andη U (θ). The probability of resolution is defined as the probability that the original cost functionsη C (θ) andη U (θ) at any of these additional local minima takes a lower value than the corresponding function atθ, namely and equivalently forη U . As explained above, this definition of the resolution probability provides a very accurate description of both the breakdown effect and the expected mean squared error (MSE) of the DoA estimation process. Unfortunately, in our ML setting, (15) is difficult to analyze for finite values of M, N due to the complicated structure of the cost functions (3)- (6). For this reason, previous studies [18], [21], [22] focused instead on the union bound of the complementary of (15), i.e. the outlier probability, obtained by assuming independent events. Theorem 2 provides a very simple way of approximating (15), by simply using the asymptotic distributions (as M, N → ∞) instead of the actual ones. It will be shown below via simulations that the result provides a very accurate description of the actual probability, even for very low M, N . It should be mentioned here that the above results are merely concerned with finite dimensional distributions and do not formally imply convergence of the resolution probability of the CML and UML methods. This is because the number of local minima L ofη (θ) may in practice increase with M , and this substantially complicates the asymptotic behavior of (15). We conjecture that this will be the case for reasonably well behaved A(θ), but a more rigorous study of this problem is left for future research.

III. PROOF OF THEOREM 1
The proof is based on the study of the eigenvalues of matrices of the typeR ( ) ) and strongly relies on random matrix theory techniques. In order to introduce these methods, consider the complex random functionm (z), This function is the Stieltjes transform of the empirical distribution function of the eigenvalues ofR ( ) A , and it is extremely important in order to characterize the asymptotic behavior of the eigenvalues of this matrix. We will also identifyR (0) A =R and therefore denotem 0 (z) as the Stieltjes transform of the empirical eigenvalue distribution ofR, that iŝ The Stieltjes transforms defined above allow to characterize a number of quantities that bear some dependence with the eigenvalues through the Cauchy integration formula. For example, one can readily express the CML objective function aŝ where C − ( ≥ 0) is a negatively (clockwise) oriented contour enclosing all the positive eigenvalues ofR ( ) A and not zero. Hence, we can easily study the asymptotic behavior ofη C (θ where now log z is the principal branch of the complex logarithm, analytic on C\R − , and where C − is as defined above. In particular, since C − encloses the positive eigenvalues ofR

( )
A and not zero, the above representation is valid for both N > K and N < K. Hence, we conclude here again that we can infer the asymptotic properties ofη U (θ ( ) M ) from the asymptotic behavior ofm (z).

( )
A are almost surely located inside a fixed compact interval S ⊂ R + for all N large enough. This implies that for sufficiently large N we can fix the contour C − in (17)-(18) so that it does not depend either on N or the realization of the eigenvalues ofR ( ) A , while still enclosing S and not {0}. This means that, for all N large enough, the only source of randomness in (17)-(18) is through the integrand function m (z), which has been well studied in the random matrix theory literature. In particular, the following result establishes that this function is asymptotically close to a deterministic counterpart, which will be referred to as the asymptotic deterministic equivalent. We will use a uniform notation for R ( ) A and R by identifying R = R (0) A and definingM 0 + 1 as the total number of different eigenvalues of R, which will be denoted by 0 < γ Theorem 3 [25], [26] Let z ∈ C + and assume that (As1) − (As4) hold. Then, for = 0, . . . , L, |m (z) −m (z)| → 0 almost surely as N → ∞, wherem (z) is a deterministic complex function defined as where the complex function ω (z) is defined as the unique solution in C + of the polynomial equation By using the analycity and boundedness of the function |m (z) −m (z)| on the set C\S ∪ {0} one may invoke Montel's theorem to establish uniform convergence on compact sets in C\S ∪ {0}. In this case, the definition of the function ω (z) is extended by conventional analytical continuation. In particular, one can establish that (17)-(18) but replacingm (z) bym (z), ≥ 0. By the Dominated Convergence Theorem (DCT) together with Theorem 3 we can readily see that The first term of the above equation converges almost surely to zero by the DCT and Theorem 3. Regarding the second term, one can clearly see that by to zero and the bound log (1 + x) < x for |x| < 1 we can establish the same result.
At this point, it only remains to prove that the CML and UML deterministic cost functions defined by direct substitution of m (z) bym (z), ≥ 0, in (17)- (18) correspond to those in the statement of the theorem. In other words, it remains to solve the corresponding integrals. To do this, we consider the change of variables z → ω = ω (z), where ω (z) is as defined in (20), and observe that If we denote where we have used the definition ofm (z) in (19) together with the z → ω change of variables, and where ω should be understood as the derivative defined in (22)  as we wanted to show. Regardingη U (θ), we only need to solve the integral corresponding to the second term in (18), namely where in the first identity we used the definition ofm (z) in (19) together with the fact that z = 0 is not enclosed by C − (so we can drop the sum term in m = 0), and where the second identity follows from the change of variables z → ω and the definition The following proposition establishes the value of this integral.
Proposition 1 For every ≥ 1, the integral I takes the value In order to conclude the proof of Theorem 1, we need to ensure that the above integral makes sense even in N → ∞. In particular, we need to show that inf N, |φ ( ) 0 | > 0 in the undersampled regime, so that the logarithm in the above expression is well defined for all large N . From the definition of φ ( ) 0 , we can write and consequently inf N, |φ ( ) 0 | > 0 as a consequence of (As3) and (As4).

IV. PROOF OF THEOREM 2
Following the same approach as in the proof of Theorem 1, we see that we are able to express where C − 0 and C − respectively enclose all the positive eigenvalues of R and R ( ) A , and not zero. On the other hand, using (21), we can also write where we have defined the error term Using the fact that |log (1 − x) − x| < |x| 2 (1 − |x|) −1 for |x| < 1 one can readily see that ( ) M → 0 in probability, so that we can disregard this term in the asymptotic analysis. Now, observe that we can express all the remaining terms in the form where f (z) is a certain complex function that is holomorphic on the positive real axis. For every fixed , it was shown in [28] that the above statistic asymptotically fluctuates as a Gaussian random variable with zero mean and positive variance. However, in this paper we need a different result, that describes the asymptotic joint distribution of collections of variables with the form above.  CN (0, 1). Let m r (z),m r (z) and ω r (z) be defined as in (16), (19) and (20) respectively. Let {f 1 (z), . . . , f R (z)} be complex functions that are holomorphic on the positive real axis, R + and defineη = [η (1) , . . . ,η (R) ] T wherê (27) where C − r is a clockwise oriented contour enclosing all the positive eigenvalues ofR r and not {0}. Letη = [η (1) , . . . ,η (R) ] T , whereη (r) is defined asη (r) replacingm r (z) withm r (z), and consider an R × R matrix Γ with entries where C ωr = ω r (C r ), Consider an L × R complex transformation matrix Ξ and assume that ΞΓΞ H is invertible and that the spectral norm of ΞΓΞ H −1 is bounded in M . Assume, finally that M → ∞ when N → ∞ so that the quotient M/N is enclosed by a compact of the positive real axis. Under these conditions, the column vector M ΞΓΞ H −1/2 Ξ η −η converges in law to a multivariate standardized Gaussian random vector.
Proof: See Appendix D. We can readily particularize Theorem 4 to the problem at hand. For the CML cost function, we can use the above theorem with R = L, Ξ = I L , f r (z) = z and R 1/2 r = P A θ (r) M R 1/2 , r = 1, . . . , L. Computing the integrals in (28) or directly using the formula for the expectation of four Gaussian random variables we can establish that The matrix Γ C has its minimum eigenvalue bounded away from zero. Indeed, observe that we can write Therefore, for any unit norm vector u ∈ C L×1 we have is the minimum eigenvalue of a matrix and W P is defined in (13). Therefore, taking infimum with respect to u and invoking (As2) and (As4) we see that the minimum eigenvalue of Γ C is bounded away from zero uniformly in M . Therefore, we can apply Theorem 4 to conclude that that M Γ −1 C η C −η C converges to a standardized Gaussian distribution.
Regarding the UML cost function, we see from (24) that we can express where Ξ ∈ C L×2L+1 is a deterministic transformation matrix as in the statement of Theorem 4, ξ ∈ C 2L+1×1 is a random column vector with quantities of the form in (26) and where M ∈ C L×1 contains the error quantities ( ) M defined in (25), which converge to zero in probability. More specifically, the column vector ξ can be written asη = [ξ T (1), ξ T (2)] T where ξ T (1) ∈ C L+1×1 has its ( + 1)th entry equal to for ≥ 0, whereas ξ (2) ∈ C L×1 has its th entry equal to On the other hand, the th row of matrix Ξ has zeros everywhere except for the 1st, the ( + 1)th and the ( + L + 1)th positions, which take the values respectively. We can now invoke Theorem 4 in order to establish a CLT on the random vector Ξξ. Let C ξ ∈ C 2L+1×2L+1 denote the asymptotic covariance matrix of ξ, equivalent to Γ in the statement of Theorem 4. Consider first the upper left entries of the matrix C ξ . In this case, we may directly establish that A R] for 0 ≤ , m ≤ L, where we have used the short hand notation for ≥ 0, m ≥ 1, with z 1 (resp. z 2 ) replaced by the right hand side of (20) as a function of ω 1 (resp. ω 2 ). It is shown in whereas Γ ( ,m) 2 where Q l is defined in (12). An easy computation shows that Ξ M C ξ Ξ T M = Γ U as given in the statement of Theorem 2. However, we need to check that the expression in (33) makes sense even for asymptotically large N . By the Cauchy-Schwarz inequality, it is enough to see that sup N, N −1 tr[Q 2 ] < 1. Observe that we can express In the undersampled case we have φ ( ) 0 = 0 and the above quantity is equal to K/N , which is obviously bounded away from 1 according to (As3). For the undersampled case, we use the fact that inf N, |φ ( ) 0 | > 0, which is proven at the end of Section III, so that uniformly in M, . Finally, it remains to show that the minimum eigenvalue of Γ U is bounded away from zero. To see this, we observe that we can write Γ U = Γ (1) We can readily see that Γ U ≥ 0, so it suffices to show that Γ U > 0 uniformly in M . Indeed, observe that Γ U is a Hadamard matrix function [29, p.449] of the L × L matrix W Q defined in (14). Indeed, we can express Γ Q is the nth Hadamard product of W Q with itself. This matrix is positive definite uniformly in M by (As4), implying that Γ Q > 0 uniformly in M , as we wanted to show.

V. NUMERICAL VALIDATION
In this section, we provide a numerical validation of the main results established in Section II. We considered a uniformly spaced antenna array of M = 10 elements separated a quarter of wavelength apart, receiving K = 4 sources coming from DoAs: 16 • , 18 • , 60 • , −50 • . The four signals were received with the same power and were all uncorrelated, except for the first two (closely spaced), which presented a correlation coefficient ρ. Figures 1, 2 and 3 represent the resolution probability of the UML and CML methods when the sample size consisted of N = 100, 10 and 3 snapshots respectively. In these figures, dotted lines represent the resolution probability simulated with 10 4 sample realizations, whereas solid lines represent the behavior predicted by the asymptotic formulas in Theorem 2. The values of the local minima where obtained by running an accelerated proximal gradient search method [30, Section 4.3] over the electrical angle space 3 , taking as initial search values a collection of 331 points uniformly distributed among the feasibility set in (2). At each iteration, this algorithm performs a gradient update followed by an orthogonal projection onto the feasibility set in (2) with = 0.0262 radians, which corresponds to 1/4 of the minimum separation between sources in the scenario. The resulting values where then grouped into clusters using an agglomerative hierarchical clustering algorithm based on single linkage (distance between clusters defined as the minimum of the distance among any pair of elements). Clusters where formed by ensuring a minimum distance of at least 0.6 radians between clusters, whose centroids were used as representatives for the position of the local minima. We considered three different scenarios: (i) uncorrelated and equi-powered sources; (ii) uncorrelated sources with powers {2, 0.5, 1, 1} (so that the closely spaced sources are received with a 6dB power difference); and (iii) almost coherent equi-powered sources (where the closely spaced sources were received with a correlation coefficient ρ = 0.95). The predicted probability of resolution is evaluated by assuming that the cost functions follow a multi-dimensional Gaussian distribution, i.e.η C ∼ N (η C , M −2 Γ C ) and η U ∼ N (η U , M −2 Γ U ), where we used the Matlab c implementation of the multidimensional cumulative distribution function (mvncdf.m), a quasi-Monte Carlo integration algorithm based on methods developed by Genz and Bretz [31].
Observe that in general terms the asymptotic expressions are very good approximations of the actual resolution probability even for relatively low values of the sample size, the only exception being the performance of the CML method in the undersampled regime (N = 3). Surprisingly enough, the asymptotic resolution probability of the UML method provides an accurate prediction even for extremely small vales of M, N . Regarding the actual performance of the methods, the CML method generally provides better performance in terms of resolution probability. The only exceptions appear to be situations where sources are very highly correlated and the sample size is relatively high (Figure 1).
In order to justify the usefulness of the resolution probability, in Figure 4 we represent the mean squared error (MSE) of the CML and UML estimates as a function of the SNR for the same scenario as above, taking N = 100. The predicted MSE is obtained using the expression in (1), where the M SE small is as given in [14] and where M SE large is the MSE that is obtained by choosing uniformly at random a point in the feasibility region Θ K . Using [32,Integral 4.631], one can readily see that (assuming ε → 0 in the feasibility region of (2)) where we recall thatθ (1) < . . . <θ (K) are the true DoAs, assumed to take values on the interval (−π, π). Observe that the aysmptotic expressions of the resolution probability are extremely useful in order to characterize the threshold SNR at which outliers begin to appear. In fact, the main limitation of the MSE prediction based on (1) is the fact that the small error term M SE small is only valid as an approximation when N → ∞ and does not fully characterize the finite N scenario, especially in the presence of highly correlated sources (similar results are reported in [19], [33]).

VI. CONCLUSIONS
The use of random matrix theory techniques has proven to be a very good tool to characterize the behavior of conditional and unconditional ML algorithms for DoA estimation in the threshold region. By studying the first and second order behavior of these two ML cost functions, it has been shown that the finite dimensional distributions of these two cost functions asymptotically fluctuate as multivariate Gaussian random vectors with a certain mean and covariance matrix, which have been derived in closed form. These results have been used in order to approximate and characterize the resolution probability of both methods in threshold region. These studies corroborate the fact that, in general terms, the CML method provides better resolution probabilities than UML, except for very specific cases with highly correlated source signals and relatively large sample size. The fact thatη C (θ) achieves its maximum at θ =θ is well known in the literature, as it is the case forη U (θ) in the oversampled regime. Thus, we will only prove the result forη U (θ) in the undersampled regime. Let us recall that U A (θ) = A(θ) A H (θ)A(θ) −1/2 . Observe first that we can re-writē where and where ξ (U A (θ)) = φ 0 (θ), so that ξ (U) is the unique negative solution of the equation Consider the unconstrained maximization of the function Υ (U) with respect to the set of M × K matrices with orthogonal columns. Let Σ ∈ C K×K be a complex Hermitian matrix that contains the Lagrange multipliers corresponding to the set of constraints U H U = I K . By constructing the Lagrangian associated to this optimization problem and forcing the derivatives with respect to the entries of U to zero we obtain the optimality conditions Probability of resolution, M=10, N=10, K=4 where, with some abuse of notation, we have defined Imposing the orthogonality constraint we obtain an expression for Σ, which inserted into (35) leads to Hence, all local minima of Υ (U) must be zeros of one of the two factors above. Let us first analyze the zeros of the first term, namely the solutions to RU = U(U H RU). Noting that U H RU is Hermitian and positive definite, so we can consider its eigenvalue decomposition, namely QΛQ H . Using this, the above equation can be rewritten as RUQ = UQΛ and we see that the columns of matrix UQ are equal to K different eigenvectors of R. Since the value of the cost function is invariant to the choice of Q, we have determined all the set of solutions corresponding to the first part of the identity in (36). Regarding the second term in (36), we need to have This means that U must belong to the kernel of R− ξ (U) +σ 2 N (U) I M , so that here again the columns of U must be formed by distinct eigenvectors of R up to right multiplication by an orthogonal matrix. Let λ 1 ≤ . . . ≤ λ M denote the eigenvalues of R (some of which may be repeated), and let I denote the index set of the K eigenvectors of R in the selected Probability of resolution, M=10, N=3, K=4 matrix U and I its complementary. At the local extrema, we see that the cost function Υ (U) can be expressed as The first term is independent of I and can be obviated. The second term is non-negative by the arithmetic-geometric mean inequality and reaches its minimum when the eigenvalues (λ k ) k∈I are all equal, that is when I = {1, . . . , M − K} so that all these eigenvalues are equal to the noise power. Since K > N , the third term is minimum under the same circumstances. Regarding the final term, it can readily be seen that it decreasing in (λ k ) k∈I because its derivative with respect to λ j is equal to implying that the minimum is achieved when the selected eigenvalues are maximum, that is I = {M − K + 1, . . . , M }. From all the above, we conclude that Υ (U) is minimized when the columns of U are selected as the eigenvectors associated with the K largest eigenvalues of R. However, these eigenvectors span the column space of A θ and by the manifold regularity condition one must necessarily haveθ = arg min θηU (θ) as we wanted to show.

APPENDIX B PROOF OF PROPOSITION 1
In this appendix, we drop from the notation the dependence on (point sequence index) to simplify the exposition. Since ≥ 1, we will have 0 = γ 0 < γ 1 < . . . < γM . We note that we can rewrite L(ω) in (23) as L(ω) = log [ω (1 − Φ (ω))], where we have defined Let φ 0 < φ 1 < . . . < φM denote theM different roots of the equation φ (1 − Φ (φ)) = 0. It is not difficult to see [27] that only the roots φ 1 , . . . , φM are enclosed by C ω , whereas φ 0 is always outside the contour C ω . Furthermore, when K < N we have φ 0 = 0 and φ 1 > 0 whereas when K > N we have φ 0 < 0 and φ 1 = 0. In particular, we see that {0} is enclosed by C ω only in the undersampled regime (K > N ). First of all, note that, using the fact that φ 0 (1 − Φ (φ 0 )) = 0 we may re-write L(ω) in (23) as where we have defined Consider the function I(x) : [0, 1] → C, defined as and where ω should be understood as the derivative defined in (22) as a function of ω. Note that the above integral we want to solve can be expressed as I(1). It follows from the DCT (cf. [34,Lemma 4]) that I(x) is a differentiable function of x, and therefore we can compute I(1) by first computing the derivative I (x) and its primitive and then using I(0) to fix the value of the undetermined constant. We consider first the integral I(0), which can be expressed as Noting that Φ (0) = 1, we see that φ k = 0 does not constitute a singularity of the above integrand, so the poles that contribute to the above integral are {γ k , k ≥ 1} and {φ k , k ≥ 1, φ k = 0}. By simple residue calculus, we can establish that Next, we focus on the derivative I (x), which takes the form Let us consider x ∈ (0, 1) and let φ 1 (x) < . . . < φM (x) denote theM distinct solutions to the equation x Φ ( ν) = 1. It can readily be seen following the approach in [27] that when x ∈ (0, 1) all the roots φ 1 (x) , . . . , φM (x) are located inside C ω .
Observing that ω = 0 is not a singularity of the above integrand, we see that the poles inside the contour are {γ k , k ≥ 1}, Computing the corresponding residues, we find that where we have used the fact that, for k > 1, we have x Φ( φ k (x)) = Φ(φ k ) = 1 and that φ k (x) is a differentiable function of x with derivative One can easily find a primitive of I (x) and fix the undetermined constraint according to the value of I(0). In this process, one needs to use the fact that, for k ≥ 1, when Once we have fixed the undetermined constant, we can obtain where 1 {·} is the indicator function and where we have defined Lemma 2 For any x ∈ (0, 1), it holds that Proof: See Appendix E. From all the above, we see that when K < N the function I(x) is constant in x and equal to the value in statement of the proposition. Finally when K > N we will have and therefore I(1) is equal to the value in the statement of the proposition.

APPENDIX C SOLUTION TO THE COVARIANCE INTEGRALS
In this appendix, we provide an alternative solution to the covariance integrals in (30)- (31). We will consider the case ≥ 1, m ≥ 1, since the cases = 0 and/or m = 0 can be obtained by a simple renumbering of the eigenvalues. Both integrals are particular instances of the generic covariance integral given in (28), where we recall that we can express where the partial derivatives are well defined because at the contours we have |Ψ ,m (ω 1 , ω 2 )| < 1 (simply use Cauchy-Schwarz and apply the results in [27, Appendix I]). Therefore, using the integration by parts formula we can alternatively express (28) as This means that we can re-write Γ where L m (ω) is defined in (23) and where we have defined the integral In order to solve this integral, we consider the singular value decomposition whereM + 1 is the total number of singular values of R 1/2 (including {0} when > 0). With all these definitions, we may express the function Ψ ,m as . and note that this notation is valid for , m ≥ 0. Using the definition of φ (m) 0 , we can find an equivalent definition of the function L m (ω 2 ) that will be more convenient from now on, that is In order to be able to handle both the undersampled and oversampled cases, Mm denote theM m roots of the equation Φ (m) ( φ) = 1, where Φ (m) (φ) is defined as in (38) but replacing all quantities associated with R by the equivalent ones associated with R m .

A. Computation of I(ω 1 )
In order to find a close form expression for I(ω 1 ), we will follow the same approach as in the proof of Proposition 1 above. More specifically, we express I(ω 1 ) = I(ω 1 , 1) where I(ω 1 , x) is a differentiable function of x ∈ [0, 1] defined as Obviously, we will have I(ω 1 , 0) = 0. On the other hand, using the DCT one can see that One can trivially find a closed form expression for the above integral using conventional residue calculus. One only needs to realize that, since φ (m) 0 is never enclosed by C ωm (the proof is similar to that of Lemma 3 below), the above integrand only has singularities at the points {γ has exactlyM m roots inside ∂Ω m .
Proof: See Appendix F. It can easily be seen that { φ (m) k , k = 1, . . . ,M m } are always enclosed by the contour ∂Ω m . Since the contour C + ωm is always enclosing the region Ω m (cf. [27]) we can readily compute the value of ∂I(ω 1 , x)/∂x using conventional Cauchy calculus, where Φ (m) denotes the derivative of Φ (m) and where we have used the fact that ζ We can therefore find a primitive of ∂I(ω 1 , x)/∂x and force the value of I(ω 1 , 0) = 0 to fix the undetermined constraint. In order to do this, we need to use the fact that lim x→0 ζ All this leads to where we have defined  (0) = 0 to fix the indeterminate constraint. We can now simplify this expression using the following result.

Lemma 7
The following identities hold for any j ≥ 1, Proof: See Appendix J. By direct application of the identities in Lemma 7 below, we can simplify the expression of Γ which can alternatively be expressed as in (33).

APPENDIX D PROOF OF THEOREM 4
Let us consider the following real-valued random variable with ω (z) being the unique solution to the following equation that belongs to C + for z ∈ C + . Let a , = 1 . . . L, denote real-valued bounded quantities and consider Let Ψ(u) = exp (i uη) and consider the characteristic function E [Ψ(u)]. The objective is to show that, in the limit when M, N → ∞, we have pointwise in u, where a = [a 1 , . . . , a L ] T and where Γ is defined in the statement of Theorem 4. Given the boundedness assumptions in the statement of the theorem, the result will follow from a trivial modification of [35,Proposition 6]. The rest of the section is therefore devoted to showing (47). The first step of the proof consists in replacing the original contour C , which depends onR (it encloses all the eigenvalues ofR except zero) by a fixed deterministic contour that only depends on R , which will be denoted byC . To do this, we use the fact that, for all large M, N , the eigenvalues ofR are all located inside a certain support, S [23]. Unfortunately, the random variable η in (46) does not need to have a characteristic function for all M when C is replaced withC . This is because there might exist realizations for which the eigenvalues ofR become dangerously close the contour C or even on C . In order to overcome this difficulty, we will follow the approach in [36], [37] and consider an equivalent (large-M ) representation of η that is guaranteed to have characteristic function for all M . Indeed, let us define S = {x ∈ R : dist (x, S ) ≤ } for > 0.
Assume that is small enough such that S 2 does not contain {0}. Let φ denote a smooth function φ : R → [0, 1] such that φ (x) = 1 for x ∈ S and φ (x) = 0 for x ∈ R\S 2 . We will write φ = det φ R . By [23], we know that φ M = 1 with probability one for all M sufficiently large. Therefore, we may represent η as almost surely for all M sufficiently large. The characteristic function of (48) exists for every realization and every possible M .
Having introduced this regularization parameter φ and the deterministic contours, we are now in the position of introducing the main technical tools that will be used in the proof of this theorem. Following the approach in [35], our derivations will be based on the partial integration formula for Gaussian functionals, together with the Poincaré-Nash inequality. We introduce these tools in the following proposition.

Remark 1
In what follows, the symbol O(N −k ) will denote a general bivariate complex function that is bounded in magnitude by (z 1 , z 2 ) N −k , where (z 1 , z 2 ) does not depend on N and is such that The function itself may be different from one line to another, and it may be matrix valued, in which case (49)  Proposition 2 Assume that, for each fixed z ∈ C, the function Ω (X, X * , z) : R 2M N → C is continuously differentiable and such that both itself and its partial derivatives are polynomically bounded. If X is real valued, simply consider Γ as a function on R M N , with the same properties. Than, under (As1) we can write On the other hand, we can also write The function φ is continuously differentiable (on R 2M N ) with polynomically bounded partial derivatives. If, in addition, sup z∈C E |Ω (X, X * , z) φ | 2 < C for some positive deterministic C independent of M , then under (As1), for any r ∈ N, and also E Ω (X, X * , z) where the term O N −N should be understood as in Remark 1 above.
The above results are well known in the random matrix literature and the proof is therefore omitted. One of the conclusions of Proposition 2 is the fact that we can basically ignore the presence of the regularization term φ up to an error of order O (N −m ) for any m ∈ N, which will be irrelevant for the purposes of our derivations.
From now on we will therefore consider the definition of η in (48) and apply the above tools to investigate the asymptotic behavior of its characteristic function. Using the dominated convergence theorem, we can establish that E [Ψ(u)] is a differentiable function of u with derivative We can study the quantity inside the expectation using the resolvent identity, which states that We start by noting that we can develop the term which can be further developed using the integration by parts formula in (50), together with the identity A direct application of these techniques and the resolvent identity allows us to write and where we have used the identity which follows directly from the definition of ω (z 1 ). In order to investigate the asymptotic behavior of E [m (z 1 )Ψ(u)φ ] from (54), we move the second term on the right hand side of (54) to the left hand side and multiply both sides byQ (z 1 ) so that we obtain Observe that the trace of the left hand side of the above equation is directly equal to M E [(m (z 1 ) −m (z 1 )) Ψ(u)φ ], which is the original quantity that we want to analyze. Let us now analyze the behavior of the quantity E [α (z 1 )Ψ(u)φ ] . To to this, we multiply both sides of (56) by R and take traces of the result, so that we can write where we have defined and where we have used the well known fact that |γ ,m (z 1 , z 2 )| < 1 so that the quantity 1 − γ , (z 1 , z 1 ) is always invertible. Consider the first term on the right hand side of (57). According to Lemma 8 presented below, the expectation of α (z 1 ), β (z 1 ) is O(N −1 ), and its variance decays as O(N −2 ). Hence, we can write (by Cauchy-Schwarz inequality) so that the first term on the right hand side of (57) decays as O(N −1 ). Regarding the second term, we can use a similar argument together with Lemma 10 presented below to establish that Furthermore, the approximations in Lemma 10 allow to express where, for r, s ∈ N, we have defined so that in particular γ ,m (z 1 , z 2 ) = γ (1,1) ,m (z 1 , z 2 ) , and where we have used (55) together with the identity Let us now go back to the expression in (56). Taking traces and applying the same decorrelation technique together with Lemma 10, we can write where we have defined ξ( Now, the first term is of order O(N −1 ) because by Cauchy-Schwarz As for the second term, we see that it corresponds to the one derived in (57). Hence, inserting the derived expression for E [α (z 1 )Ψ(u)] derived above and using the approximations in Lemma 10 we obtain ,m (z 1 , z 2 ) γ (2,1) ,m (z 1 , z 2 ) (1 − γ ,m (z 1 , z 2 )) 2 + γ (2,2) ,m (z 1 , z 2 ) 1 − γ ,m (z 1 , z 2 ) .
Therefore, applying the change of variables z 1 → ω (z 1 ), z 2 → ω m (z 2 ) we arrive at the result of the theorem.

A. Auxiliary Lemmas
In this appendix, we provide some bounds on expectations and variances of different random functions of complex variable. The notation O(N − ) should be understood as a deterministic term whose magnitude is upper bounded by a quantity of the form ε (z 1 , z 2 ) N − , where ε (z 1 , z 2 ) is a bivariate real valued positive function independent of N such that sup (z1,z2)∈C ×Cm ε (z 1 , z 2 ) < ∞.

Lemma 8 Let
A denote an M × M deterministic matrix with bounded spectral norm. Then, we can write and also Proof: It follows from the application of the integration by parts formula in (50) together with the Nash-Poincaré variance inequality in (51).

Lemma 9 Let
A denote an M × M deterministic matrix with bounded spectral norm. Then, we can write Proof: The proof of the variances follows directly from the Nash-Poincaré variance inequality in (51), so that we will only prove the first two identities. Let us first consider on the first identity. Using the resolvent's identity onQ m (z 2 ), developing with respect to X and applying the integration by parts formula in (50), we obtain 1 N E tr AQ (z 1 )R ,mQm (z 2 )φ φ m where we have additionally decorrelated the expectations of two terms and used the approximations in Lemma 8 above. We can also apply the integration by parts formula on the quantity where we have also decorrelated terms of expectations using the fact that all variances decay as O(N −2 ) and subsequently applied the approximations in Lemma 8. Inserting this last equation with A replaced by R H/2 m A into the first one we obtain Particularizing this expression for A replaced withQ m (z 2 )R m, and using Lemma 8 we obtain 1 N E tr R m, Q (z 1 )R ,mQm (z 2 )φ φ m = 1 1 − γ ,m (z 1 , z 2 ) 1 N E tr Q (z 1 )R ,mQm (z 2 )R m, + O(N −1 ).