System-Level Modeling and Optimization of the Energy Efficiency in Cellular Networks -- A Stochastic Geometry Framework

In this paper, we analyze and optimize the energy efficiency of downlink cellular networks. With the aid of tools from stochastic geometry, we introduce a new closed-form analytical expression of the potential spectral efficiency (bit/sec/m$^2$). In the interference-limited regime for data transmission, unlike currently available mathematical frameworks, the proposed analytical formulation depends on the transmit power and deployment density of the base stations. This is obtained by generalizing the definition of coverage probability and by accounting for the sensitivity of the receiver not only during the decoding of information data, but during the cell association phase as well. Based on the new formulation of the potential spectral efficiency, the energy efficiency (bit/Joule) is given in a tractable closed-form formula. An optimization problem is formulated and is comprehensively studied. It is mathematically proved, in particular, that the energy efficiency is a unimodal and strictly pseudo-concave function in the transmit power, given the density of the base stations, and in the density of the base stations, given the transmit power. Under these assumptions, therefore, a unique transmit power and density of the base stations exist, which maximize the energy efficiency. Numerical results are illustrated in order to confirm the obtained findings and to prove the usefulness of the proposed framework for optimizing the network planning and deployment of cellular networks from the energy efficiency standpoint.


I. INTRODUCTION
The Energy Efficiency (EE) is regarded as a key performance metric towards the optimization of operational cellular networks, and the network planning and deployment of emerging communication systems [1]. The EE is defined as a benefit-cost ratio where the benefit is given by the amount of information data per unit time and area that can be reliably transmitted in the network, i.e., the network spectral efficiency, and the cost is represented by the amount of power per unit area that is consumed to operate the network, i.e., the network power consumption. Analyzing and designing a communication network from the EE standpoint necessitate appropriate mathematical tools, which are usually different from those used for optimizing the network spectral efficiency and the network power consumption individually [2]. The optimization problem, in addition, needs to be formulated in a sufficiently simple but realistic manner, so that all relevant system parameters appear explicitly and the utility function is physically meaningful.
Optimizing the EE of a cellular network can be tackled in different ways, which include [1]: the design of medium access and scheduling protocols for optimally using the available resources, e.g., the transmit power; the use of renewable energy sources; the development of innovative hardware for data transmission and reception; and the optimal planning and deployment of network infrastructure. In the present paper, we focus our attention on optimizing the average number of Base Stations (BSs) to be deployed (or to be kept operational) per unit area and their transmit power. Henceforth, this is referred to as "system-level EE" optimization, i.e., the EE across the entire (or a large portion of the) cellular network is the utility function of interest.
System-level analysis and optimization are useful when the network operators are interested in optimizing the average performance across the entire cellular network. Hence, they are relevant for optimally operating current networks, and for deploying and planning future networks. In the first case, given an average number of BSs per unit area already deployed, they may provide information on the average number of BSs that can be switched off based on the average load of the network, and on their optimal transmit power to avoid coverage holes. In the second case, they may guide the initial deployment of cellular infrastructure that employs new types of BSs (e.g., powered by renewable energy sources), new transmission technologies (e.g., large-scale antennas), or that operate in new frequency bands (e.g., the millimeter-wave spectrum).
In the last few years, the system-level modeling and analysis of cellular networks have been facilitated by capitalizing on the mathematical tool of stochastic geometry and, more precisely, on the theory of spatial point processes [3]- [5]. It has been empirically validated that, from the system-level standpoint, the locations of the BSs can be abstracted as points of a homogeneous Poisson Point Process (PPP) whose intensity coincides with the average number of BSs per unit area [6]. A comprehensive survey of recent results in this field of research is available in [7].
A relevant performance metric for the design of cellular networks is the Potential Spectral Efficiency (PSE), which is the network information rate per unit area (measured in bit/sec/m 2 ) that corresponds to the minimum signal quality for reliable transmission. Under the PPP modeling assumption, the PSE can be obtained in two steps: i) first by computing the PSE of a randomly chosen Mobile Terminal (MT) and by assuming a given spatial realization for the locations of the BSs and ii) then by averaging the obtained conditional PSE with respect to all possible realizations for the locations of the BSs and MTs. In the interference-limited regime, this approach allows one to obtain a closed-form expression of the PSE under the (henceforth called) standard modeling assumptions, i.e., single-antenna transmission, singular pathloss model, Rayleigh fading, fully-loaded BSs, cell association based on the highest average received power [3]. Motivated by these results, the PPP modeling approach for the locations of the BSs has been widely used to analyze the trade-off between the network spectral efficiency and the network power consumption, e.g., [8], as well as to minimize the network power consumption given some constraints on the network spectral efficiency or to maximize the network spectral efficiency given some constraints on the network power consumption [9]. The PPP modeling approach has been applied to optimize the EE of cellular networks as well. Notable examples for this field of research are [10]- [26]. A general study of the energy and spectral efficiencies of multi-tier cellular networks can be found in [27]. In the authors' opinion, however, currently available approaches for modeling and optimizing the systemlevel EE of cellular networks are insufficient and/or unsuitable for mathematical analysis. This is further elaborated in the next section.

A. Fundamental Limitations of Current Approaches for System-Level EE Optimization
We begin with an example that shows the limitations of the available analytical frameworks. In the interference-limited regime, under the standard modeling assumptions, the PSE is: where λ BS is the density of BSs, B W is the transmission bandwidth, γ D is the threshold for reliable decoding, β > 2 is the path-loss exponent, 2 F 1 (·, ·, ·, ·) is the Gauss hypergeometric function, P cov (·) is the coverage probability defined in [3,Eq. (1)], and (a) follows from [3,Eq. (8)]. The main strength of (1) is its simple closed-form formulation. This is, however, its main limitation as well, especially as far as formulating meaningful system-level EE optimization problems is concerned. Under the standard modeling assumptions, in fact, the network power consumption (Watt/m 2 ) is 1 P grid = λ BS (P tx + P circ ), where P tx is the transmit power of the BSs and P circ is the static power consumption of the BSs, which accounts for the power consumed in all hardware blocks, e.g., analog-to-digital and digital-to-analog converters, analog filters, cooling components, and digital signal processing [1]. The system-level EE (bit/Joule) is defined as the ratio between (1) and the network power consumption, i.e., EE = PSE/P grid . Since the PSE in (1) is independent of the transmit power of the BSs, P tx , and the network power consumption, P grid , linearly increases with P tx , we conclude that any EE optimization problems formulated based on (1) would result in the trivial optimal solution consisting of turning all the BSs off (the optimal transmit power is zero). In the context of multi-tier cellular networks, a similar conclusion has been obtained in some early papers on systemlevel EE optimization, e.g., [8], where it is shown that the EE is maximized if all macro BSs operate in sleeping mode. A system-level EE optimization problem formulated based on (1) would result, in addition, in a physically meaningless utility function, which provides a non-zero benefit-cost ratio, i.e., a strictly positive EE while transmitting zero power (EE (P tx = 0) = PSE/(λ BS P circ ) > 0). In addition, the EE computed from (1) is independent of the density of BSs. We briefly mention here, but will detail it in Section III, that the load model, i.e., the fully-loaded assumption, determines the conclusion that the EE does not depend on λ BS . This assumption, however, does not affect the conclusion that the optimal P tx is zero. This statement is made more formal in the sequel (see Proposition 1 and Corollary 1). It is worth nothing that the conclusion that the PSE is independent of P tx is valid regardless of the specific path-loss model being used 2 . It depends, on the other hand, on the assumptions of interference-limited operating regime and of having BSs that emit the same P tx .
Based on these observations, we conclude that a new analytical formulation of the PSE that explicitly depends on the transmit power and density of the BSs, and that is tractable enough for system-level EE optimization is needed. From an optimization point of view, in particular, it is desirable that the PSE is formulated in a closed-form expression and that the resulting EE function is unimodal and strictly pseudoconcave in the transmit power (given the density) and in the density (given the transmit power) of the BSs. This would imply, e.g., that the first-order derivative of the EE with respect to the transmit power of the BSs (assuming the density given) would have a unique zero, which would be the unique optimal transmit power that maximizes the EE [2]. Similar conclusions would apply to the optimal density of the BSs for a given transmit power. Further details are provided in Section IV. In this regard, a straightforward approach to overcome the limitations of (1) would be to abandon the interference-limited assumption and to take the receiver noise into account. In this case, the PSE would be formulated in terms of a single-integral that, in general, cannot be expressed in closed-form [3], [17,Eq. (9)]. This integral formulation, in particular, results in a system-level EE optimization problem that is not easy to tackle. This approach, in addition, has the inconvenience of formulating the optimization problem for an operating regime where cellular networks are unlikely to operate in practice.

B. State-of-the-Art on System-Level EE Optimization
We briefly summarize the most relevant research contributions on energy-aware design and optimization of cellular networks. Due to space limitations, we discuss only the contributions that are closely related to ours. A state-of-the-art survey on EE optimization is available in [2].
In [8], the authors study the impact of switching some macro BSs off in order to minimize the power consumption under some constraints on the coverage probability. Since the authors rely on the mathematical framework in (1), they conclude that all macro BSs need to be switched off to maximize the EE. In [9], the author exploits geometric programming to minimize the power consumption of cellular networks given some constraints on the network coverage and capacity. The EE is not studied. A similar optimization problem is studied in [11] and [17] for two-tier cellular networks but the EE is not studied either. As far as multi-tier cellular networks are concerned, an important remark is necessary. In the interference-limited regime, optimal transmit powers and densities for the different tiers of BSs may exist if the tiers have different thresholds for reliably decoding the data. The PSE, otherwise, is the same as that of single-tier networks, i.e., it is independent of the transmit power and density of the BSs. In [14], the authors study the EE of small cell networks with multi-antenna BSs. For some parameter setups, it is shown that an optimal density of the BSs exists. The EE, however, still decreases monotonically with the transmit power of the BSs, which implies that the EE optimization problem is not well formulated from the transmit power standpoint. More general scenarios are considered in [10], [12], [13], [15], [16], [18]- [25], but similar limitations hold. In some cases, e.g., [20], the existence and uniqueness of an optimal transmit power and density of the BSs are not mathematically proved or, e.g., in [24], the problem formulation has a prohibitive numerical complexity as it necessitates the computation of multiple integrals and infinite series. It is apparent, therefore, that a tractable approach for system-level EE optimization is missing in the open technical literature. In the present paper, we introduce a new definition of PSE that overcomes these limitations.

C. Research Contribution and Novelty
In the depicted context, the specific novel contributions made by this paper are as follows: • We introduce a new closed-form analytical formulation of the PSE for interference-limited cellular networks (during data transmission), which depends on the transmit power and density of the BSs. The new expression of the PSE is obtained by taking into account the power sensitivity of the receiver not only for data transmission but for cell association as well. • Based on the new expression of the PSE, a new systemlevel EE optimization problem is formulated and comprehensively studied. It is mathematically proved that the EE is a unimodal and strictly pseudo-concave function in the transmit power given the BSs' density and in the BSs' density given the transmit power. The dependency of the optimal power as a function of the density and of the optimal density as a function of the power is discussed. • A first-order optimal pair of transmit power and density of the BSs is obtained by using a simple alternating optimization algorithm whose details are discussed in the sequel. Numerical evidence of the global optimality of this approach is provided as well. • Two load models for the BSs are analyzed and compared against each other. It is shown that they provide the same PSE but have different network power consumptions. Hence, the optimal transmit power and density of the BSs that maximize their EEs are, in general, different. Their optimal EEs and PSEs are studied and compared against each other. The paper is organized as follows. In Section II, the system model is presented. In Section III, the new definition of PSE is introduced. In Section IV, the EE optimization problem is formulated and studied. In Section V, numerical results are shown. Finally, Section VI concludes the paper.
Notation: The main symbols and functions used in the present paper are reported in Table I.

II. SYSTEM MODEL
In this section, the network model is introduced. With the exception of the load model, we focus our attention on a system where the standard modeling assumptions hold. One of the main aims of the present paper is, in fact, to highlight the differences between currently available analytical frameworks and the new definition of PSE that is introduced. The proposed approach can be readily generalized to more advanced system models, such as that recently adopted in [5].

A. Cellular Network Modeling
A downlink cellular network is considered. The BSs are modeled as points of a homogeneous PPP, denoted by Ψ BS , of density λ BS . The MTs are modeled as another homogeneous PPP, denoted by Ψ MT , of density λ MT . Ψ BS and Ψ MT are independent of each other. The BSs and MTs are equipped with a single omnidirectional antenna. Each BS transmits with a constant power denoted by P tx . The analytical frameworks are developed for the typical MT, denoted by MT 0 , that is located at the origin (Slivnyak theorem [28,Th. 1.4.5]). The BS serving MT 0 is denoted by BS 0 . The cell association criterion is introduced in Section II-C. The subscripts 0, i and n identify the intended link, a generic interfering link, and a generic BS-to-MT link. The set of interfering BSs is denoted by Ψ (I) BS . As for data transmission, the network operates in the interference-limited regime, i.e., the noise is negligible compared with the inter-cell interference. PPP of base stations, mobile terminals, interfering base stations BS 0 , BS i , BS n Serving, interfering, generic base station P tx , P circ , P idle Transmit, circuits, idle power consumption of base stations r n , g n Distance, fading power gain of a generic link l (·), L n , L 0 Path-loss, shorthand of path-loss, path-loss of intended link κ, β > 0 Path-loss constant, slope (exponent) B W , N 0 Transmission bandwidth, noise power spectral density Noise variance, aggregate other-cell interference γ D , γ A Reliability threshold for decoding, cell association Probability that a base station is in transmission mode f X (·), F X (·) Probability density/mass, cumulative distribution/mass function of X ½ (·), 2 F 1 (·, ·, ·, ·), Γ(·) Indicator function, Gauss hypergeometric function, gamma function max {x, y}, min {x, y} Maximum, minimum between x and y Signal-to-interference-ratio, average signal-to-noise-ratio P cov , PSE, P grid Coverage, potential spectral efficiency, network power consumption First-order, second-order derivative with respect to x

B. Channel Modeling
For each BS-to-MT link, path-loss and fast-fading are considered. Shadowing is not explicitly taken into account because its net effect lies in modifying the density of the BSs [5]. All BS-to-MT links are assumed to be mutually independent and identically distributed (i.i.d.). a) Path-Loss: Consider a generic BS-to-MT link of length r n . The path-loss is l (r n ) = κr β n , where κ and β are the path-loss constant and the path-loss slope (exponent). For simplicity, only the unbounded path-loss model is studied in the present paper. The analysis of more general path-loss models is an interesting but challenging generalization that is left to future research [29].
b) Fast-Fading: Consider a generic BS-to-MT link. The power gain due to small-scale fading is assumed to follow an exponential distribution with mean Ω. Without loss of generality, Ω = 1 is assumed. The power gain of a generic BS-to-MT link is denoted by g n .

C. Cell Association Criterion
A cell association criterion based on the highest average received power is assumed. Let BS n ∈ Ψ BS denote a generic BS of the network. The serving BS, BS 0 , is obtained as follows: where the shorthand L n = l (r n ) is used. As for the intended link, L 0 = min rn∈ΨBS {L n } holds.

D. Load Modeling
Based on (2), several or no MTs can be associated to a generic BS. In the latter case, the BS transmits zero power, i.e., Ptx = 0, and, thus, it does not generate inter-cell interference.
In the former case, on the other hand, two load models are studied and compared against each other. The main objective is to analyze the impact of the load model on the power consumption and EE of cellular networks. Further details are provided in the sequel. Let N MT denote the number of MTs associated to a generic BS and B W denote the transmission bandwidth available to each BS. If N MT = 1, for both load models, the single MT associated to the BS is scheduled for transmission and the entire bandwidth, B W , and transmit power, P tx , are assigned to it. a) Load Model 1: Exclusive Allocation of Bandwidth and Power to a Randomly Selected MT: If N MT > 1, the BS randomly selects, at each transmission instance, a single MT among the N MT associated to it. Also, the BS allocates the entire transmission bandwidth, B W , and the total transmit power, P tx , to it. The random scheduling of the MTs at each transmission instance ensures that, in the long term, all the MTs associated to a BS are scheduled for transmission. b) Load Model 2: Equal Allocation of Bandwidth and Power Among All the MTs: If N MT > 1, the BS selects, at each transmission instance, all the N MT MTs associated to it. The BS equally splits the available transmission bandwidth, B W , and evenly spreads the available transmit power, P tx , among the N MT MTs. Thus, the bandwidth and power are viewed as continuous resources by the BS's scheduler: each MT is assigned a bandwidth equal to B W /N MT and the power spectral density at the detector's (i.e., the typical MT, MT 0 ) input is equal to P tx /B W .
In the sequel, we show that the main difference between the two load models lies in the power consumption of the BSs. In simple terms, the more MTs are scheduled for transmission the higher the static power consumption of the BSs is. The analysis of general load models, e.g., based on a discrete number of resource blocks [5], is left to future research due to space limits.

E. Power Consumption Modeling
In the considered system model, the BSs can operate in two different modes: i) they are in idle mode if no MTs are associated to them and ii) they are in transmission mode if at least one MT is associated to them. The widespread linear power consumption model for the BSs is adopted [1], [30], which accounts for the power consumption due to the transmit power, P tx , the static (circuit) power, P circ , and the idle power, P idle . If the BS is in idle mode, its power consumption is equal to P idle . If the BS is in transmission mode, its power consumption is a function of P tx , P circ , and depends on the load model. Further details are provided in the sequel. In the present paper, based on physical considerations, the inequalities 0 ≤ P idle ≤ P circ are assumed.

III. A NEW ANALYTICAL FORMULATION OF THE PSE
In this section, we introduce and motivate a new definition of coverage probability, P cov , and PSE, which overcomes the limitations of currently available analytical frameworks and is suitable for system-level optimization (see Section I-A). All symbols are defined in Table I.
Definition 1: Let γ D and γ A be the reliability thresholds for the successful decoding of information data and for the successful detection of the serving BS, BS 0 , respectively. The coverage probability, P cov , of the typical MT, MT 0 , is defined as follows: where the Signal-to-Interference-Ratio (SIR) and the average Signal-to-Noise-Ratio (SNR) can be formulated, for the network model under analysis, as follows: Remark 1: The definition of P cov in (3) reduces to the conventional one if γ A = 0 [3].
Remark 2: The average SNR, SNR, in (4) is averaged with respect to the fast fading. The SIR depends, on the other hand, on fast fading. This choice is discussed in the sequel.
Remark 3: The new definition of coverage probability, P cov , in (3) is in agreement with the cell selection criterion specified  (3)), which prevents one from taking into account the strong interplay between the transmit power and the density of the BSs for optimal cellular networks planning. In fact, the authors of [3] have shown that, in the interference-limited regime, P cov is independent of the transmit power of the BSs. If, in addition, a fullyloaded model is assumed, i.e., λ MT /λ BS ≫ 1, then P cov is independent of the density of BSs as well. This is known as the invariance property of P cov as a function of P tx and λ BS [5]. The tight interplay between P tx and λ BS is, on the other hand, illustrated in Fig. 1, where, for ease of representation, an hexagonal cellular layout is considered. Similar conclusions apply to the PPP-based cellular layout studied in the present paper. In Fig. 1, it is shown that, for a given λ BS , P tx needs to be appropriately chosen in order to guarantee that, for any possible location of MT 0 in the cell, two conditions are fulfilled: i) the MT receives a sufficiently good signal quality, i.e., the average SNR is above a given threshold, γ A , that ensures a successful cell association, i.e., to detect the presence (pilot signal) of the serving BS and ii) the BSs do not overprovision P tx , which results in an unnecessary increase of the power consumption. It is expected, therefore, that an optimal value of P tx given λ BS and an optimal value of λ BS given P tx that optimize EE exist [32].
b) Advantages of the New Definition of P cov : The new definition of P cov allows one to overcome the limitations of the conventional definition and brings about two main advantages. The first advantage originates from direct inspection of (4). In the conventional definition of P cov , only the SIR is considered and the transmit power of the BSs, P tx , cancels out between numerator and denominator. This is the reason why P cov is independent of P tx . In the proposed new definition, on the other hand, P tx explicitly appears in the second constraint and does not cancel out. The density of the BSs, λ BS , appears implicitly in the distribution of the path-loss of the intended link, L 0 . The mathematical details are provided in the sequel. The second inequality, as a result, allows one to explicitly account for the interplay between P tx and λ BS (shown in Fig.  1). If λ BS increases (decreases), in particular, L 0 decreases (increases) in statistical terms. This implies that P tx can be decreased (increased) while still ensuring that the average SNR is above γ A . The second advantage is that the new definition of P cov is still mathematically tractable and the PSE is formulated in a closed-form expression. This is detailed in Proposition 1.
Remark 4: The new definition of P cov in (3) is based on the actual value of L 0 because a necessary condition for the typical MT to be in coverage is that it can detect the pilot signal of at least one BS during the cell association. If the BS that provides the highest average received power cannot be detected, then any other BSs cannot be detected either. The second constraint on the definition of P cov , in addition, is based on the average SNR, i.e., the SNR averaged with respect to the fast fading, because the cell association is performed based on long-term statistics, i.e., based on the path-loss in the present paper, in order to prevent too frequent handovers.
Remark 5: Compared with the conventional definition of coverage based on the Signal-to-Interference+Noise-Ratio (SINR) [3], the new definition in (3) is conceptually different. Equation (3) accounts for the signal quality during both the cell association and data transmission phases. The definition of coverage based on the SINR, on the other hand, accounts for the signal quality only during the data transmission phase. In spite of this fundamental difference, P cov in (3) may be interpreted as an approximation for the coverage probability based on the SINR, and, more precisely, as an alternative method to incorporate the thermal noise into the problem formulation. Compared with the coverage based on the SINR, however, the new definition in (3) accounts for the impact of thermal noise when it is the dominant factor, i.e., during the cell association phase when the inter-cell interference can be ignored as orthogonal pilot signals are used.
Remark 6: Figure 1 highlights that the new definition of coverage in (3) is not only compliant with [31] but it has a more profound motivation and wider applicability. In PPPbased cellular networks, in contrast to regular grid-based network layouts, the size and shape of the cells are random. This implies that it is not possible to identify a relation, based on pure geometric arguments, between the cell size and the transmit power of the BSs that makes the constraint on SNR in (3) ineffective in practice. In equivalent terms, in this case, the threshold γ A may turn out to be sufficiently small to render the constraint on SNR ineffective. This is, e.g., the approach employed in [32,Eq. (1)], where the relation between the transmit power and density of BSs is imposed a priori based on the path-loss. In practice, however, cellular networks are irregularly deployed, which makes the optimal relation between the transmit power and density of BSs difficult to identify because of the coexistence of cells of small and large sizes. The constraint on SNR in (3) allows one to take into account the interplay between the transmit power and density of BSs in irregular (realistic) cellular network deployments.

A. Analytical Formulation of the PSE
In this section, we provide the mathematical definitions of the PSE for the two load models introduced in Section II-D. They are summarized in the following two lemmas, which constitute the departing point to obtain the closed-form analytical frameworks derived in Section III-B.
Remark 7: The PSE is defined from the perspective of the typical MT, MT 0 rather than from the perspective of the typical cell (or BS). This implies that the proposed approach allows one to characterize the PSE of the so-called Crofton cell, which is the cell that contains MT 0 . This approach is commonly used in the literature and is motivated by the lack of results on the explicit distribution of the main geometrical characteristics of the typical cell of a Voronoi tessellation. Further details on the Crofton and typical cells are available in [34] and [35].
LetN MT be the number of MTs that lie in the cell of the typical MT, MT 0 , with the exception of MT 0 .N MT is a discrete random variable whose probability mass function in the considered system model can be formulated, in an approximated closed-form expression, as [33, Eq. (3)]: Remark 8: The probability mass function in (5) is an approximation because it is based on the widely used empirical expression of the probability density function of the area of the Voronoi cells in [36,Eq. (1)]. A precise formula for the latter probability density function is available in [37]. It is, however, not used in the present paper due to its mathematical intractability, as recently remarked in [26]. Throughout the rest of the paper, for simplicity, we employ the sign of equality ("=") in all the analytical formulas that rely solely on the approximation in (5). This is to make explicit that our analytical frameworks are not based on any other hidden approximations.
Based on (5), a formal mathematical formulation for the PSE is given as follows.
Lemma 1: Let Load Model 1 be assumed. The PSE (bit/sec/m 2 ) can be formulated as shown in (6) at the top of this page.
Proof : It follows from the definition of PSE [5], where (a) originates from the fact that MT 0 is scheduled for transmission Pr N MT = u u + 1 .
with unit probability if it is the only MT in the cell, while it is scheduled for transmission with probability 1/(u + 1) if there are other u MTs in the cell.

Lemma 2:
Let Load Model 2 be assumed. The PSE (bit/sec/m 2 ) can be formulated as shown in (7) at the top of this page.
Proof : It follows from the definition of PSE [5], where (b) originates from the fact that MT 0 is scheduled for transmission with unit probability but the bandwidth is equally allocated among the MTs in the cell, i.e., each of the u + 1 MTs is given a bandwidth equal to B W /(u + 1).

Remark 9:
By comparing (6) and (7), we note that the same PSE is obtained for both load models. This originates from the fact that P cov in (3) is independent of the number of MTs in the cell. This property follows by direct inspection of (4) and has been used in the proof of Lemma 1 and Lemma 2. As far as the first load model is concerned, this property originates from the fact that a single MT is scheduled at every transmission instance. It is, however, less intuitive for the second load model. In this latter case, as mentioned in Section II-D, P tx and B W are viewed as continuous resources by the BS's scheduler. The transmit power per unit bandwidth of both intended and interfering links is equal to P tx /B W . Regardless of the number of MTs available in the interfering cells, MT 0 "integrates" this transmit power per unit bandwidth over the bandwidth allocated to it, which depends on the total number of MTs in its own cell. Let the number of these MTs be u+1. Thus, the receiver bandwidth of MT 0 is B W /(u + 1). This implies that the received power (neglecting path-loss and fast-fading) of both intended and interfering links is P rx = (P tx /B W ) (B W /(u + 1)) = P tx /(u + 1). As a result, the number of MTs, u+1, cancels out in the SIR of (4). Likewise, the received average SNR (neglecting the path-loss) is equal to P rx /(N 0 B W /(u + 1)) = (P tx /(u + 1))/(N 0 B W /(u + 1)) = P tx σ 2 N , which is independent of the number of MTs, u + 1, and agrees with the definition of average SNR in (4). In the next section, we show that the load models are not equivalent in terms of network power consumption.

B. Closed-Form Expressions of PSE and P grid
In this section, we introduce new closed-form analytical frameworks for computing the PSE. We provide, in addition, closed-form expressions of the network power consumption for the two load models under analysis. These results are summarized in the following three propositions.
Let N MT be the number of MTs that lie in an arbitrary cell. The probability that the BS is in idle mode, P where L (·) is defined in Table II. Using (8), PSE and P grid are given in the following propositions. Proposition 1: Consider either Load Model 1 or Load Model 2. Assume notation and functions given in Tables I and II. The PSE (bit/sec/m 2 ) can be formulated, in closed-form, as follows: Proof : See Appendix A. Corollary 1: If γ A = 0, i.e., the conventional definition of P cov is used, the PSE in (9) simplifies as follows: If, in addition, λ MT /λ BS ≫ 1, the PSE in (9) reduces to (1).
Remark 10: Corollary 1 substantiates the comments made above in this section about the need of a new definition of PSE, . (14) as well as the advantages of the proposed analytical formulation. In particular, (10) confirms that the PSE is independent of P tx if γ A = 0 and that the PSE is independent of P tx and λ BS if fully-loaded conditions hold, i.e., λ MT /λ BS ≫ 1. Proposition 2: Let Load Model 1 be assumed. P grid (Watt/m 2 ) can be formulated as follows: Proof : The network power consumption is obtained by multiplying the average number of BSs per unit area, i.e., λ BS , and the average power consumption of a generic BS, which is P tx + P circ if the BS operates in transmission mode, i.e., with probability L (λ MT /λ BS ), and P idle if the BS operates in idle mode, i.e., with probability 1 − L (λ MT /λ BS ).
Proposition 3: Let Load Model 2 be assumed. P grid (Watt/m 2 ) can be formulated as follows: Proof : It is similar to the proof of Proposition 2. The difference is that the power dissipation of a generic BS that operates in transmission mode is, in this case, equal to P tx +P circ Remark 11: The power consumption models obtained in (11) and (12), which account for the transmit, circuits, and idle power consumption of the BSs, have been used, under some simplifying assumptions, in previous research works focused on the analysis of the EE of cellular networks. Among the many research works, an early paper that has adopted this approach under the assumption of fully-loaded BSs and of having a single active MT per cell is [8].
Remark 12: grid by assuming the same P tx and λ BS for both load models. This originates from the fact that, in the present paper, we assume that the circuits power consumption increases with the number of MTs that are served by the BSs. It is unclear, however, the best load model to be used from the EE standpoint, especially if P tx and λ BS are optimized to maximize their respective EEs. In other words, the optimal P tx and λ BS that maximize the EE of each load model may be different, which may lead to different optimal EEs. The trade-off between the optimal PSE and the optimal EE is analyzed numerically in Section V for both load models.

IV. SYSTEM-LEVEL EE OPTIMIZATION: FORMULATION
AND SOLUTION In this section, we formulate a system-level EE optimization problem and comprehensively analyze its properties. For convenience of analysis, we introduce the following auxiliary function (LM = Load Model): A unified formulation of the EE (bit/Joule) for the cellular network under analysis is provided in (14) shown at the top of this page, where the parameters of interest from the optimization standpoint, i.e., P tx and λ BS , are explicitly highlighted. In the rest of the present paper, all the other parameters are assumed to be given.

A. Preliminaries
For ease of presentation, we report some lemmas that summarize structural properties of the main functions that constitute (14). Some lemmas are stated without proof because they are obtained by simply studying the sign of the first-order and second-order derivatives of the function with respect to the variable of interest and by keeping all the other variables fixed. Functions of interest for this section are given in Table  II. Also, we define ∆P = P circ − P idle ≥ 0. ..
Lemma 4: As far as Load Model 2 is concerned, the function M (λ MT /λ BS ) fulfills the following properties with respect to . ..
The function Q (λ BS , P tx , λ MT /λ BS ) fulfills the following properties with respect to P tx : Proof : The result in v) follows from ..

B. Optimal Transmit Power Given the Density of the BSs
In this section, we analyze whether there exists an optimal and unique transmit power, P (opt) tx , that maximizes the EE formulated in (14), while all the other parameters, including λ BS , are fixed and given. In mathematical terms, the optimization problem can be formulated as follows: where P (min) tx ≥ 0 and P (max) tx ≥ 0 are the minimum and maximum power budget of the BSs, respectively. One may assume, without loss of generality, P The following theorem completely characterizes the solution of (15).
Theorem 1: Let S P (·) be the function defined in Table II. The EE in (14) is a unimodal and strictly pseudo-concave function in P tx . The optimization problem in (15) has a unique solution given by P .
Proof : See Appendix B.

C. Optimal Density Given the Transmit Power of the BSs
In this section, we analyze whether there exists an optimal and unique density of BSs, λ (opt) BS , that maximizes the EE formulated in (14), while all the other parameters, including P tx , are fixed and given. In mathematical terms, the optimization problem can be formulated as follows: where λ The following theorem completely characterizes the solution of (17).
Theorem 2: Let S D (·) be the function defined in Table II. The EE in (14) is a unimodal and strictly pseudo-concave function in λ BS . The optimization problem in (17) has a unique solution given by λ  (14) that is obtained as the unique solution of the following equation: .
Proof : See Appendix C.

D. On the Dependency of Optimal Transmit Power and Density of the BSs
The optimal transmit power and BSs' density that maximize the EE are obtained from the unique solutions of (16) and (18), respectively. These equations, however, cannot be further simplified and, therefore, explicit analytical expressions for P (opt) tx and λ (opt) BS cannot, in general, be obtained. This is an inevitable situation when dealing with EE optimization problems, and, indeed, a closed-form expression of the optimal transmit power for simpler EE optimization problems does not exist either [1]. In some special cases, the transmit power can be implicitly expressed in terms of the Lambert-W function, which, however, is the solution of a transcendental equation [2]. Notable examples of these case studies include even basic point-to-point communication systems without interference [40]. Based on these considerations, it seems hopeless to attempt finding explicit analytical expressions from (16) and (18), respectively. However, thanks to the properties of the EE function, i.e., unimodality and strict pseudo-concavity, proved in Theorem 1 and Theorem 2, P can be efficiently computed with the aid of numerical methods that are routinely employed to obtain the roots of non-linear scalar equations, e.g., the Newton's method [42]. For example, the unique solutions of (16) and (18) may be obtained by using the functions FSolve in Matlab and NSolve in Mathematica. Theorem 1 and Theorem 2 are, however, of paramount importance, since they state that an optimum maximizer exists and is unique.
Even though explicit analytical formulas for P (opt) tx and λ (opt) BS cannot be obtained, it is important to understand how these optimal values change if any other system parameter changes. For instance, two worthwhile questions to answer are: "How does P (opt) tx change as a function of λ BS ?" and "How does λ (opt) BS change as a function of P tx ?". These questions are relevant to optimize the deployment of cellular networks from the EE standpoint, since they unveil the inherent interplay between transmit power and density of BSs discussed in Section III and illustrated in Fig. 1. A general answer to these two questions is provided in the following two propositions.
Proposition 4: Let P * tx be the unique solution of (16) if λ BS = λ BS . Let the optimal P tx according to Theorem 1 be P . Let λ BS ≶ λ BS be another BSs' density. Let . EE Ptx (·, ·) be the first-order derivative in (16). The following holds: Proof : Theorem 1 states that the EE function has a single stationary point that is its unique global maximizer. In mathematical terms, this implies tx for every λ BS ≥ 0. Therefore, the optimal transmit power needs to be increased (decreased) if the first-order derivative of the EE is positive (negative). Based on this, (19)  EE λBS (·, ·) be the first-order derivative in (18). The following holds: Proof : It follows from Theorem 2, similar to the proof of Proposition 4.
Remark 13: It is worth mentioning that the approach utilized to prove Proposition 4 and Proposition 5 is applicable to study the dependency of P (opt) tx and λ (opt) BS , respectively, with respect to any other system parameters. The findings in Proposition 4 and Proposition 5 are especially relevant for cellular network planning. Let us consider, e.g., (19). By simply studying the sign of the first-order derivative . EE Ptx (·, ·), one can identify, with respect to an optimally deployed cellular network, the set of BSs' densities that would require to increase or decrease the transmit power while still operating at the optimum. In Section V, numerical examples are shown to highlight that P (opt) tx may either decrease or increase as λ BS increases or decreases.

E. Joint Optimization of Transmit Power and Density of the BSs
In Sections IV-B and IV-C, either λ BS or P tx are assumed to be given, respectively. In practical applications, however, it is important to identify the optimal pair P that jointly maximizes the EE in (14). This joint optimization problem can be formulated as follows: where a notation similar to that used in (15) and (17) is adopted.
In Theorem 1 and Theorem 2, we have solved the optimization problem formulated in (21) with respect to P tx for a given λ BS and with respect to λ BS for a given P tx , respectively. By leveraging these results, a convenient approach for tackling (21) with respect to P tx and λ BS is to utilize the alternating optimization method, which iteratively optimizes P tx for a given λ BS and λ BS for a given P tx until convergence of the EE in (14) within a desired level of accuracy [41, Proposition 2.7.1]. The algorithm that solves (21) based on the alternating optimization method is reported in Table III. Its convergence and optimality properties are summarized as follows.
BS (m), and EE(m) be P tx , λ BS and EE obtained from the algorithm in Table III at the mth iteration, respectively. The sequence EE(m) is monotonically increasing and converges. In addition, every limit point of the sequence P Proof : At the end of each iteration of the algorithm in Table  III, the value of EE does not decrease. The sequence EE(m), hence, converges, because the EE in (14) is a continuous function over the compact feasible set of the problem in (21) and, thus, it admits a finite maximum by virtue of the Weierstrass extreme value theorem [41]. From [41, Proposition 2.7.1], the alternating optimization method fulfills the KKT optimality conditions, provided that i) the objective and constraint functions are differentiable, ii) each constraint function depends on a single variable, and iii) each subproblem has a unique solution. The first and second requirements follow by direct inspection of (21). The third requirement is ensured by Theorem 1 and Theorem 2.
Remark 14: The optimization problems in Theorem 1 and Theorem 2 can be efficiently solved by using the Newton's method, which allows one to find the root of real-valued objective functions via multiple iterations of increasing accuracy and at a super-linear (i.e., quadratic if the initial guess is sufficiently close to the actual root) convergence rate [42]. The properties of convergence of the alternating maximization algorithm in Table III to a stationary point of the objective function in (21) are discussed in [41, Proposition 2.7.1]. Under mild assumptions that hold for the specific problem at hand, the algorithm in Table III is locally q-linearly convergent to a local maximizer of the objective function provided that the initial guess is sufficiently close to the actual root [43,Section 2]. Further details can be found in [43].
In Section V, numerical evidence of the global optimality of the algorithm in Table III is given as well. In addition, numerical results on the average (with respect to the initial guess) number of iterations as a function of the tolerance of convergence, ǫ > 0, are illustrated.

V. NUMERICAL RESULTS
In this section, we show numerical results to validate the proposed analytical framework for computing the PSE and EE, as well as to substantiate the findings originating from the analysis of the system-level EE optimization problems as a function of the transmit power and density of the BSs. Unless otherwise stated, the simulation setup is summarized in Table  IV. For ease of understanding, the BSs' density is represented via the inter-site distance (R cell ) defined in Table IV. A similar comment applies to the density of the MTs that is expressed in terms of their average distance (R MT ). As far as the choice of the setup of parameters is concerned, it is worth mentioning that the power consumption model is in agreement with [8] and [30]. according to its definition in (6) and (7), as well as the power consumption based on the operating principle described in the proofs of Proposition 2 and Proposition 3. It is worth mentioning that, to estimate the PSE, only the definitions in the first line of (6) and (7) are used. The results depicted in Figs. 3 and 4 confirm the good accuracy of the proposed mathematical approach. They highlight, in addition, the unimodal and pseudo-concave shape of the EE as a function of the transmit power, given the BSs' density, and of the BSs' density, given the transmit power. If the same transmit power and BSs' density are assumed for both load models, we observe, as expected, that the first load model provides a better EE than the second load model. b) Validation of Theorem 1 and Theorem 2: In Figs. 5 and 6, we compare the optimal transmit power and BSs' density obtained from Theorem 1 and Theorem 2, i.e., by computing the unique zero of (16) and (18), respectively, against a brute-force search of the optimum of (15) and (17), respectively. We observe the correctness of Theorem 1 and Theorem 2 for the load models analyzed in the present paper. Figures 5 and 6, in addition, confirm two important remarks that we have made throughout this paper. The first is that a joint pair of transmit power and BSs' density exists. This is highlighted by the fact that the EE evaluated at the optimal transmit power, given the BSs' density, and at the optimal BSs' density, given the transmit power, is still a unimodal and pseudo-concave function. This motivates one to use the alternating optimization algorithm proposed in Section IV-E. The second is related to the difficulty of obtaining an explicit closed-form expression of the optimal transmit power as a function of the BSs' density and of the BSs' density as a function of the transmit power. Figure 6(a), for example,  clearly shows that the behavior of the optimal transmit power is not monotonic as a function of the BSs' density. This is in contrast with heuristic optimization criteria based on the coverage probability metric [32]. Figure 5(a), on the other hand, provides more intuitive trends according to which the optimal transmit power increases as the density of the BSs decreases. This is, however, just a special case that is parameter-dependent. A counter-example is, in fact, illustrated in Fig. 2, where, for a different set of parameters, it is shown that the optimal transmit power may increase, decrease and then increase again as a function of the average inter-site distance of the BSs (R cell ). In this case, the density of the MTs coincides with the average density of inhabitants in Sweden and a large path-loss exponent is assumed to highlight the peculiar performance trend. These numerical examples clearly substantiate the importance of Theorem 1 and Theorem 2, and  Table III  highlight the complexity of the optimization problem that is analyzed and successfully solved in the present paper.
c) Validation of the Alternating Optimization Algorithm in Table III: In Figs. 7 and 8, we provide numerical evidence of the convergence of the alternating optimization algorithm introduced in Section IV-E towards the global optimum of the optimization problem formulated in (21). The study is performed by computing the joint optimal transmit power and BSs' density as a function of the density of the MTs (Fig. 7) and of the reliability thresholds (Fig. 8). We observe a very good agreement between the algorithm in Table III and a bruteforce search of the optimum of (21). Similar studies have been conducted as a function of other system parameters, but they are not reported in the present paper due to space limitations. d) Comparison Between Load Model 1 and 2: With the exception of Figs. 3 and 4, all the figures reported in this section illustrate the achievable EE of the two load models analyzed in the present manuscript when they operate at their respective optima. Based on the obtained results, we conclude that, for the considered system setup, the first load model outperforms the second one in terms of EE. Figures  7 and 8 show, for example, that this may be obtained by transmitting a higher power but, at the same time, by reducing the deployment density of the BSs. It is worth mentioning that, even though both load models provide the same PSE and serve, in the long time-horizon, all the MTs of the network, they have one main difference: the MTs under the first load model experience a higher latency (i.e., the MTs experience a longer delay before being served, since they are randomly chosen among all the available MTs in the cell), since a single MT is served at any time instance. We evince, as a result, that the higher EE provided by the first load model is obtained at the price of increasing the MTs' latency. The analysis and optimization of energy-efficient cellular networks with latency constraints is, therefore, an important generalization of the study conducted in the present paper. e) Analysis of the EE vs. PSE Trade-Off: In Fig. 9, we illustrate the trade-off between EE and PSE, which is obtained by setting the transmit power and density of the BSs at the optimal values that are obtained by solving the optimization problem in (21) with the aid of the algorithm in Table III. Figure 9 provides a different view of the comparison between Load Model 1 and 2 introduced in Section II-D. The Load Model 1 is a suitable choice to obtain a high EE at low-medium PSEs, while the Load Model 2 is a more convenient option torequired to converge within obtain a good EE at medium-high PSEs. Based on these results, the optimization of the EE vs. PSE trade-off constitutes an interesting generalization of the study carried out in the present paper.
f) Convergence Analysis of the Maximization Algorithm in Table III: Motivated by Remark 14, Fig. 10 shows the average number of iterations of the alternating optimization algorithm in Table III as a function of the convergence accuracy ǫ. We observe that the algorithm necessitates more iterations for Load Model 1. In general, however, we observe that the number of iterations that are required to converge within the defined convergence accuracy is relatively small.

VI. CONCLUSION
In the present paper, we have introduced a new closedform analytical expression of the potential spectral efficiency of cellular networks. Unlike currently available analytical frameworks, we have shown that the proposed approach allows us to account for the tight interplay between transmit power and density of the base stations in cellular networks. Therefore, the proposed approach is conveniently formulated for the optimization of the network planning of cellular networks, by taking into account important system parameters. We have applied the new approach to the analysis and optimization of the energy efficiency of cellular networks. We have mathematically proved that the proposed closed-form expression of the energy efficiency is a unimodal and strictly pseudo-concave function in the transmit power, given the density, and in the density, given the transmit power of the base stations. Under these assumptions, as a result, a unique transmit power and density of the base stations exist, which can be obtained by finding the unique zero of a simple non-linear function that is provided in a closed-form expression. All mathematical derivations and findings have been substantiated with the aid of numerical simulations. We argue that the applications of the proposed approach to the system-level modeling and optimization of cellular networks are countless and go beyond the formulation of energy efficiency problems.
Extensions and generalizations of the analytical and optimization frameworks proposed in the present paper, include, but are not limited to, the system-level analysis and optimization of i) the energy efficiency versus spectral efficiency trade-off, ii) uplink cellular networks, iii) three-dimensional network topologies with elevated base stations and spatial blockages, iv) cache-enabled cellular networks, v) cellular networks with network slicing, vi) cellular networks with renewable energy sources and energy harvesting, and vii) multi-tier (heterogeneous) cellular networks.

APPENDIX A PROOF OF PROPOSITION 1
Under the assumption that MT 0 is selected, from (3) and (4), we have: where f L0 (x) = 2πλ BS κ 2/β β −1 x 2/β−1 e −πλBS(x/κ) 2/β is the probability density function of L 0 that is obtained by applying the displacement theorem of PPPs [5, Eq. (21)]. It is worth mentioning that (22) is exact if the Crofton cell is considered, while it is an approximation if the typical cell is considered (see Remark 7 for further details). The probability term, G (·; ·), in the integrand function of (22) can be computed as follows: where (a) follows from the probability generating functional theorem of PPPs [3] by taking into account that, based on (8), the interfering BSs constitute a PPP of intensity equal to λ BS is, in particular, obtained with the aid of the independent thinning theorem of PPPs, similar to [5] and [38]. The impact of the spatial correlation that exists among the BSs that operate in transmission mode [39], is, on the other hand, postponed to future research.
By inserting (23) in (22) and by applying some changes of variable, we obtain: The proof follows from (6) and (7) with the aid of some simplifications and by using the identity

APPENDIX B PROOF OF THEOREM 1
In this section, we are interested in the functions that depend on P tx . For ease of writing, we adopt the simplified notation: P tx → P, L (·) → L, M (·) → M, Q (·, P tx , ·) → Q (P), . Q Ptx (·, P tx , ·) → . Q (P), P circ = P c , P idle = P i , EE (P tx , ·) → EE (P), and . EE Ptx (P tx , ·) → . EE (P). A similar notation is adopted for higher-order derivatives with respect to P.
The stationary points of (14) are the zeros of the first-order derivative of EE (·) with respect to P. From (14), we obtain . EE (P) = 0 ⇔ P i − S P (P) = 0, which can be re-written as follows: With the aid of some algebraic manipulations and by exploiting Lemmas 3-6, the following holds: i) W right ≥ 0 is a non-negative function that is independent of P, ii) W left (P) ≥ 0 is a non-negative and increasing function of P, i.e., . W left (P) ≥ 0, since Q (P) ≥ 0 and .. Q (P) ≤ 0 from Lemma 5, iii) W left (P → 0) = 0 and W left (P → ∞) = ∞. This implies that W left (·) and W right intersect each other in just one point. Therefore, a unique stationary point, P * , exists. Also, . EE (P) > 0 for P < P * and . EE (P) < 0 for P > P * . Finally, by taking into account the constraints on the transmit power, it follows that the unique optimal maximizer of the EE is P (opt) = max P (min) , min P * , P (max) , since P ∈ P (min) , P (max) . This concludes the proof.
. . W 3 (λ) ≤ 0 for λ ≥ 0 and . W 5 (λ) ≤ 0 for λ ≥ 0 immediately follow by inserting into them the firstorder derivatives of L (·) and M (·) with respect to λ and with the aid of simple algebraic manipulations. Less evident is the behavior of W 1 (·) as a function of λ. Using some algebra, the first-order derivative satisfies the following: . ..
A sufficient condition for W 1 (·) to be a decreasing function in λ is that A ℓ (λ) ≤ 0 for ℓ = 1, . . . , 4. From Lemmas 3-6, this can be readily proved. In particular, A ℓ (λ) ≤ 0 for λ ≥ 0 if ℓ = 1, 2, 3 and A 4 (λ) ≤ 0 for λ MT /λ ≥ 2.8. Therefore, a unique stationary point, λ * , exists. Also, . EE (λ) > 0 for λ < λ * and . EE (λ) < 0 for λ > λ * . Finally, by taking into account the constraints on the density of BSs, it follows that the unique optimal maximizer of the EE is λ (opt) = max λ (min) , min λ * , λ (max) , since λ ∈ λ (min) , λ (max) . b) Case Study λ MT /λ ≤ 2.8: As for this case study, we leverage a notable result in fractional optimization [2]: the ratio between a i) non-negative, differentiable and concave function, and a ii) positive, differentiable and convex function is a pseudo-concave function. It is, in addition, a unimodal function with a finite maximizer if the ratio vanishes when the variable of interest (i.e., the BSs' density) tends to zero and to infinity. As for the case study under analysis, the EE in (14) can be re-written, by neglecting unnecessary constants that are independent of λ and do not affect the properties of the function, as follows: From Lemma 6, the numerator of (28) is a non-negative, differentiable, increasing and concave function for λ ≥ 0. From Lemma 7, the EE in (28) tends to zero if λ → 0 and λ → ∞. Therefore, a sufficient condition to prove the unimodality and pseudo-concavity of the EE is to show that the denominator of (28) is a positive, differentiable and convex function in λ for λ MT /λ ≤ 2.8. From Lemma 3 and Lemma 4, the first two properties are immediately verified. To complete the proof, the convexity of the denominator of (28) needs to be analyzed.
L (λ). The second-order derivative of Den (·), as a function of λ, is as follows: ..