Cache-Enabled Millimeter Wave Cellular Networks With Clusters

Wireless content caching in cellular networks is an efficient way to reduce the service delay and alleviate backhaul pressure. For the benefits of sharing spectral and storage resources, clustering in cached networks has recently attracted significant research interests. Meanwhile, since the multimedia content (e.g., video) of caching networks may require a huge transmission rates, millimeter wave (mmWave) communication is considered to be an efficient transmission scheme for cache-enabled networks. We investigate the ergodic rate and average service delay for typical user terminal (UT) in the clustered cache-enabled small cell networks (SCN) and ultra dense networks (UDN) with mmWave channels. In SCN, each cluster consists of cache-enabled UTs, and in the UDN a cluster is formed by cache-enabled UTs and small base stations (SBSs) with non-uniform caching capacity. The clusters are assumed to be discs and content sharing is only possible within clusters through mmWave device-to-device (D2D) tier and SBS tier communications. With stochastic geometry methods, the distributions of content sharing distance and signal-to-interference-noise-ratio (SINR) of typical UT in a cluster are derived for both SCN and UDN scenarios. To minimize the average service delay in high SINR region, we provide an algorithm to jointly optimize caching scheme for SBSs and UTs. By simulations, we validate our theoretical analysis and the performance of proposed caching scheme. The numerical results also show that there exists best radius in the design of cluster for UDNs.


I. INTRODUCTION
W IRELESS data traffic has experienced tremendous growth in recent years. According to [1], [2], the data traffic is increasingly concentrated to hotspots and is expected to increase almost tenfold by 2020 compared with 2016. The heavy traffic is normally connected to the core network (CN) through backhauling, which may have limited capacity. This makes the backhaul a bottleneck of wireless networks. One promising technology to reduce the expensive usage of the backhaul and to shorten service delay is wireless caching [3]- [11]. In a cache-enabled system, storage memory is utilized to prefetch popular contents in small cells during off-peak hours before being requested by user terminals (UTs). The requests can be served directly through wireless links if the requested contents are already stored in the local small base stations (SBSs) instead of fetching from the remote CN. Thus the service delay can be largely reduced. Since the UTs, e.g., smart phones and laptops, are normally equipped with storage units, popular contents can also be stored at the local cache of UTs. Then the requests can be served locally if the requested contents are already available at local storage. Moreover, the saved contents can also be shared via deviceto-device (D2D) communication among neighboring UTs [12]- [18], through which the backhaul bandwidth and service delay can be further reduced. However, due to the co-channel interference caused by D2D communications, and the limited energy resources for UTs, the resource management for D2D communications should be well designed. In [15], a joint D2D link scheduling and power allocation problem is formulated in order to maximize the system throughput. The trade-off between the energy efficiency (EE) and spectral efficiency (SE) is investigated in [19]. The cluster-based D2D communication is studied in [20]- [24]. Reference [20] presents a system model where the cell is divided into clusters. Each cluster is formed by UTs with caching, and the UTs in a same cluster can share stored contents via D2D communication. With clustering, the same time-frequency resource can be reused in each cluster, through which the intra-cluster interference can be avoided. A cluster-based multicast transmission method for D2D communication with the objective of decreasing the data distribution time is proposed in [21]. In [22], a probabilistic caching scheme is designed to minimize the energy consumption of a clustered cache-enabled D2D network. Reference [23] 0090-6778 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
investigates and maximizes the cache offloading gain and rate coverage probability for a clustered D2D caching network, where a sub-optimal caching solution is obtained. Besides, a joint caching and spectrum partitioning scheme is proposed to reduce the average service delay for clustered D2D networks in [24]. Meanwhile, as one promising technology for the fifth generation (5G) and beyond mobile networks, millimeter wave (mmWave) communication can support higher data rate than microwave communications [25]- [27]. According to references [28]- [30], mmWave communications can facilitate popular content exchange in cache-enabled small cell networks (SCNs), especially for video streaming, which may require very high transmission rates [31]. The offloading gain of D2D-aware device caching with mmWave D2D communication is revealed in [28]. While the successful content delivery probability is studied in [29]. In [30], the success probabilities and area spectral efficiencies are analyzed in a novel cache-enabled heterogeneous networks (HetNets), where macro base stations (BSs) with traditional sub-6 GHz are overlaid by dense mmWave pico BSs. Besides, it is shown that combining caching and mmWave at SBSs can significantly reduce the connection and retrieval delays [31]. Thus, it is expected that mmWave will be widely used in mobile networks and it is valuable to investigate wireless caching networks with mmWave communications, which is significantly different from traditional microwave communications in terms of channel models, performance and design principles etc. In the above-mentioned works, none of them investigates the clustering scheme for cache-enabled mmWave cellular networks.
Another capacity-increasing approach in 5G systems is deploying ultra dense networks (UDN), where the network is closer to UTs than traditional SCNs. Different from SCN, the density of access points (APs), including macro BSs, micro BSs (pico BSs and femto BSs), and relays, are much higher than that in SCN. The distance between the APs and UTs is further decreased. Thus, the network capacity is increased with UDN [32]. The cache-enabled SCN is investigated in [33]- [37], while the cache-enabled UDN is investigated in [38]- [43]. Reference [35] considers a cluster-centric SCN with the combined design of cooperative caching and transmission policy. In [36], UTs with similar content popularity are grouped into a cluster, and served by a same SBS. As for UDN, a clustering approach is considered as a promising way to simplify the topology structures of UDN and to reduce complexity [38]. In [39], the user-centric cooperative interference mitigation strategy is proposed, where dense SBSs are able to store contents. In [42], the small cells are grouped into disjoint clusters. However, there are no results investigating the clustering scheme for the cache-enabled mmWave SCN and UDN, to our knowledge.
We focus on studying the effect of cluster size, caching policies and capacity of UTs and SBSs by characterizing the ergodic rate and average service delay for typical UT in clustered cache-enabled SCN and UDN. The main contributions of this paper are listed as follows.
• We investigate the service delay for typical UT located at the center of a cluster in cache-enabled SCN and UDN, where both mmWave D2D and SBS tiers are involved. • We derive the exact distribution of serving distance for typical UT in cache-enabled SCN and UDN with clusters, based on which the closed-form expression of SINRs with mmWave communications are calculated through Gaussian-Chebyshev quadrature approximation. • We derive the exact ergodic rate and average service delay for the typical UT in SCN and UDN scenarios. To gain further insights, we assume having a high SINR and the closed-form approximation for ergodic rates is derived. • To minimize the average service delay of typical UT, we propose an Alternate Optimization based algorithm to jointly optimize the caching policy for SBSs and UTs in both SCN and UDN scenarios. • Numerical results show that: 1) in mmWave SCN and UDN, the approximations of SINRs and ergodic rates in high SINR region coincide with theoretical results; 2) the proposed caching scheme can achieve smaller service delay compared with other policies; 3) there exists a best size of clusters in cache-enabled SCN upon the service delay of typical UT. The rest of the paper is organized as follows. We present the model of clustered cache-enabled SCN and UDN with mmWave communication in Section II. Then for both scenarios, the distributions of content transmission and SINR for typical UT are provided in Section III. Furthermore, in Section IV, ergodic rate and average service delay for typical UT are derived. Numerical experiments are presented to evaluate the performance of the clustering scheme in Section V. Finally we make conclusions in Section VI.

II. SYSTEM MODEL
The system model of clustered cache-enabled networks with mmWave channel is illustrated in Fig. 1. We assume that the locations of UTs follow a two-dimensional homogeneous Poisson Point Process (PPP) Φ 1 with density λ 1 . The locations of SBSs are spatially distributed following a two-dimensional homogeneous PPP Φ 2 (Φ 2 ) with density λ 2 (λ 2 ) in SCN(UDN) [44]. The popular contents are denoted by F = {1, . . . , F }(F ∈ N). Without loss of generality, we assume that each content has the same length l, and they are stored as a whole at each UT and SBS. The storage capacity for UT and SBS is Θ 1 and Θ 2 respectively. Notation is summarized in Table I.

A. Network Architecture
We consider clustering schemes for two scenarios, where clusters are disjoint discs with common radius d. Fig. 1 (a) shows the clustered cache-enabled SCN, where the density of SBSs is much lower than that of UTs, i. e. λ 1 λ 2 . For this scenario, cells are divided into virtual clusters as demonstrated in [20]. Within each cluster, UTs can share stored contents with each other through D2D links, where the content sharing is completed in one hop to reduce the delay and complex communication protocols for multi-hop transmission. When a content is requested by a UT, the request will be served immediately if the content is stored at the cache. Otherwise, the desired content can be obtained from the nearest UT where it is available. When the requested content is not available in the cluster, it needs to be retrieved from the cache of the nearest SBS or the CN.
While for the cache-enabled UDN shown in Fig. 1 (b), the considered area is divided into virtual clusters that consist of SBSs and UTs. In UDN, the density of the SBS is comparable with that of UT. Hence in the design of the clustering scheme, we need to take both SBSs and UTs into account.
According to resource sharing schemes in [45]- [48], we assume that in each cluster, local D2D links reuse the same radio resource. That is, concurrent transmissions of D2D links are available. References [45], [46] have shown that the use of directional antenna and high propagation loss in mmWave bands can result in relatively lower mutual interference or even no interference by properly selecting the concurrent links formed by geographically distributed wireless devices. Thus, we ignore the interference for concurrent transmissions in each cluster, which may provide an upper bound of system performance. To make the interference of D2D links between clusters negligible, according to [20], we allow the maximum number of D2D links in a disc of radius d as N d = d/r d , where r d is D2D collaboration distance [20]. With multiple D2D links in a cluster, we use orthogonal radio resource (frequency or time) to avoid mutual intra-cluster interference among D2D links [47]- [50]. We consider equal bandwidth partition among D2D links in each cluster. There is only one SBS link is allowed to be active at one time in each cluster. For tractable analysis, the concurrent transmissions of D2D links and D2B link in a cluster is not allowed. Our models (SCN and UDN) are applicable to any radio allocation schemes and non-orthogonal resource allocation schemes (e.g., non-orthogonal multiplexing access [51]).

B. Transmission Model With mmWave Channel
We assume the connections between SBS and UT, UT and UT are modeled as mmWave channels, where the D2D tier and SBS tier share the same frequency resource. Then we consider directional beamforming and use a sector model to analyze effective beam pattern [25], [52]. Let G gM,gm,ϕm (ϕ) denote the sectored antenna pattern, where ϕ is the angle off the boresight direction, ϕ m ∈ [0, 2π] is main-lobe beamwidth, g M (dB) is main-lobe gain, and g m (dB) is side-lobe gain. Then the antenna gain between a transmitter and a receiver is a discrete random variable, which is described by where Pr lk (l, k ∈ M = {m, M}) is the probability for antenna gain g l g k . We further assume that perfect beam alignment can be achieved at the desired link with perfect CSI. Thus, the directivity gain for the desired signal link is G 0 = g 2 M . In both SCN and UDN scenarios, we assume that the total bandwidth for each cluster is W . Without loss of generality, we suppose that each user connects only one UT(SBS) when it is served. In addition we consider both path loss and small-scale fading. For small-scale fading of each link, the Nakagami-M fading channel is considered as in [52], [53]. Besides, in SCN and UDN, we consider line-of-sight (LOS) for links within the cluster and non-line-of-sight (NLOS) for links outside the cluster according to [25]. The path loss exponents for the LOS and NLOS links are denoted as α L and α N , respectively [27].

C. Caching Model
Similar to [16], we consider homogeneous content preference in SCN and UDN scenarios, where all UTs have the same content preference distribution. In other words, each UT independently and randomly requests contents according to the same popularity distribution p ∈ (0, 1) 1×F . According to results in [54], p follows a Zipf-like distribution, while the element p f represents the probability of UT requesting content f . Defining the content popularity index as vector ω = {ω 1 , . . . , ω F }, the request probability of the i-th most popular content ω i can be calculated as where η is the shape parameter. The vector ω is permutation of F . All UTs(SBSs) follow the same policy Π 1 (Π 2 ) (e.g., random caching (RC)) to store contents. Denote the index of cached contents for UT as ω Π1 ⊂ ω with length Θ 1 , while the index of cached contents for SBS as ω Π2 ⊂ ω with length Θ 2 . The caching capacity of SBSs and UTs is nonuniform, i. e., Θ 2 > Θ 1 . We define Pr 1,f Pr{f ∈ ω Π1 } and Pr 2,f Pr{f ∈ ω Π2 }. When Π 1 (Π 2 ) is fixed, the probability Pr 1,f (Pr 2,f ) is identical among UTs(SBSs).

III. SINR ANALYSIS IN CLUSTERED CACHE-ENABLED NETWORKS
We first present the distribution of transmission distance for a typical UT in clustered SCN and UDN. Then considering mmWave channel in D2D and SBS tiers, the SINR distribution is derived for both scenarios.

A. Distance Distribution in Clustered SCN
Since the density of SBS is much smaller than that of UT, i. e. λ 1 λ 2 , SBSs in Φ 2 are located only in a few clusters. Hence we consider a cluster A with radius d, which does not include a SBS. Moreover, to simplify the analysis, we focus on the typical UT o, which is inserted at the center of A. As illustrated in Section II, the typical UT can retrieve desired content from neighboring UTs within A with distance X 1 , or from the nearest SBS with X 2 (> d). The probability density functions (PDFs) of X 1 and X 2 are given in Lemma 1. Lemma 1: In clustered cache-enabled SCN, for the typical UT o requesting content f , which is located at the center of A with radius d, the PDF of X 1 is where

B. SINR Distribution in mmWave SCN
Based on the distance distribution of content sharing in Lemma 1, we then investigate the SINR distribution for the typical UT when mmWave channel is applied. At first the SINR distribution for UT o is derived with mmWave D2D transmission. Let D 1 Φ 2 \A and denote s ∈ D 1 the interference SBS outside of cluster A. Thus according to [25], the SINR between the typical UT and the serving UT can be expressed as where is the inter-cluster interference with mmWave channel and r s is the distance between the interference SBS and typical UT. P 1 is the transmission power of D2D tier, while P 0 is the average transmission power for SBSs within D 1 . Then we derive the cumulative distribution function (CDF) of Γ 1 in the following lemma.
Proof: See Appendix B the Proof of Lemma 2. To obtain the closed-form expression for F Γ1 (γ), we adopt Gaussian-Chebyshev quadrature in the derivation, which approximates the definite integral of a function by weighted sum of function values at specified points within the domain of integration. t balances the trade-off between the complexity and accuracy of Gaussian-Chebyshev quadrature approximation. The detailed derivation of Ψ λ1,f ,λ2,ψ 1,lk (·) can be found in Appendix B.
When f is not available for the typical UT in A, the content will be retrieved from the nearest SBS. Let X 2 denote the distance between the typical UT and the nearest SBS, and A 2 denote the disc with radius X 2 . Thus, when the nearest SBS provides service, we obtain the SINR for the UT o as where is the disc of radius X 2 centered at o. P 2 is the transmission power of mmWave SBS tier. Following similar approaches in Lemma 2, we can obtain the CDF of Γ 2 .
Lemma 3: In cache-enabled mmWave SCN, the CDF of Γ 2 for typical UT at cluster A can be approximated as where w i and u i are given in (7), and where ψ 2,lk = κP0 g l g k MP2 G0 . Proof: From the proof of Lemma 2, the CDF of Γ 2 can be obtained as where Fig. 3. Content sharing within a cluster in cache-enabled mmWave UDN.

C. Distance Distribution in Clustered UDN
For the cache-enabled mmWave UDN with clusters shown as Fig. 1 (b), the density of SBS is much higher than that of SCN. The clusters in UDN are formed by UTs and SBSs, where two independent PPPs, Φ 1 and Φ 2 , are included. In cluster A, a UT has access to the contents stored at the caches of neighboring UTs and SBSs. Specifically, we consider the typical UT o, which is inserted at the center of cluster A. This is shown in Fig. 3, where A is a disc with radius d. To differentiate UDN from the SCN scenario, we denote the density of SBSs PPP Φ 2 as λ 2 , where λ 2 λ 2 [32]. Besides, without loss of generality, we assume that there exists at least one SBS within cluster A. In what follows, we will first investigate the distribution of transmission distance for the typical UT. Then the distribution of SINR for UT o with mmWave channel will be given.
Considering UT o requesting content f in cluster A, we denote the distance for the D2D tier between typical UT o and nearest UT which has content f as X 3 . The distance that o retrieves f from nearest SBS which stores f is denoted as X 4 . When f is not available in A, the typical UT is served by the nearest SBS with distance X 5 . Then denoting λ 2,f = Pr 2,f λ 2 , we have the following results.
Ψ λ1,f ,λ2,ψ 1,lk (u i ) = λ 1,f πd 2 (u i + 1) Lemma 4: In clustered cache-enabled UDN, for typical UT requesting content f , which is located in the center of A with radius d, the PDF f X3 (x) is the same with f X1 (x) given in (3), while the PDFs f X4 (x) and f X5 (x) are obtained by substituting λ 1,f with λ 2,f and λ 2 in f X1 (x) respectively.
Proof: Recalling the proof of Lemma 1, the only difference in UDN is that the density of SBSs becomes λ 2 . Based on the assumption N 2 > 0, the distributions f X4 (x) and f X5 (x) can be derived through calculating Pr{X 3 ≤ x|N 2,f ≥ 1} and Pr{X 4 ≤ x|N 2 ≥ 1} respectively as given in (33), where N 2,f denotes the number of SBSs in A which have content f .

D. SINR Distribution in mmWave UDN
Based on the distance distribution of content sharing given in Lemma 4, we then investigate the SINR distribution for the typical UT in cache-enabled mmWave UDN. Since only one SBS link is active in each cluster at a time, there is at most one active SBS in neighboring clusters. Hence to ensure that the expected amount of SBSs in D = Φ 2 \A is equal to the case where one SBS exists at each cluster in average sense, we let the density of SBS in D be λ c 2 = 1 πd 2 . Besides, not all the SBSs in D are active for serving due to the presence of D2D tier content exchanging. In order to characterize the SBSs which interfere UT o, we further thin the density of SBSs in area D to The Ξ represents the probability that a SBS is active. Let point s ∈ Φ 2 \A denote the interference SBS outside of disc A. Thus the SINR experienced by the typical UT for three scenarios shown in Fig. 3 can be defined as where I = s∈D P 0 Gr −αN s |h u | 2 is the inter-cluster interference with mmWave channel. P 0 is the average transmission power for SBSs within D. The CDFs of Γ j (j = 3, 4, 5) are given in Lemma 5 1 .
Lemma 5: In cache-enabled mmWave UDN, the CDF of Γ 3,4,5 for typical UT requesting f at cluster A can be approximated as where w i and u i are given in (7), and Proof: Following the proof of Lemma 2, we can similarly obtain the results in (16) but with different density of interference SBSs and the densities given by (5).
The calculation of the interference in Lemma 5 by using the thinned density λ 2,f is only an approximation to the true case. Comparing to F Γ1 in (6), the CDFs F Γ3,4,5 have similar form. This is because the densities of SBSs with f and interference SBSs in D in UDN are different from those in SCN.
It should be noted that when we account the interference in clustered cache-enabled SCN, we do not thin the density of SBSs λ 2 . This is due to the fact that the SBSs in SCN have much larger coverage than those in UDN, which makes Ξ → 1. Besides, the CDFs given in Lemmas 2, 3 and 5 are only approximations. But with enlarging t → ∞, the CDFs will coincide with the true value.

IV. PERFORMANCE ANALYSIS AND OPTIMIZATION
In this section, we will derive the ergodic rate of a typical UT at a cluster for both cache-enabled mmWave SCN and UDN. Then the average service delay for typical UT will be studied.

A. Ergodic Rate Analysis
The ergodic rate refers to the mean data rate when adaptive modulation/coding is used over many fading realization for a given average SINR [44], which is given by where γ max = 2 Rmax − 1 is the SINR threshold determined by practical constraints for the radio frequency circuit, and R max is the maximum achievable rate.

Theorem 1: The ergodic rate of a typical UT in cluster A for cache-enabled mmWave SCN and UDN is given by
where W 1,3 = W N d and W 2,4,5 = W . Proof: From the proof of Proposition 3 in [27], we have which completes the proof.
Since the CDFs given in Lemmas 2, 3 and 5 are related to content f , (19) is the ergodic rate for the typical UT requesting f . Though Theorem 1 gives the expression of ergodic rate, the closed-form expression is hard to be obtained with the CDFs of F Γ1,2,3 (γ). Hence we consider the high SINR scenario for mmWave D2D and SBS tiers, and present the approximated result on ergodic rates.

Corollary 1: In high SINR region, the ergodic rate of D2D tier for typical UT at cluster A in cache-enabled mmWave SCN is approximated by
Proof: See Appendix C the Proof of Corollary 1. Under high SINR approximation, the closed-form expression of Υ λ 1,f ,λ2,ψ 1,lk can be obtained directly through integraion without using Gaussian-Chebyshev quadrature approximation and Gaussian Hypergeometric function.

Corollary 2: In high SINR region, the ergodic rate of SBS tier for typical UT at cluster A in cache-enabled mmWave SCN is approximated by
where Proof: This can be similarly proved by following the process of Appendix C but with substituting X 1 , I 1 with X 2 , I 2 and the corresponding distributions.
In clustered cache-enabled UDN, the ergodic rate of typical UT in high SINR region can be also derived with Lemma 5.
Corollary 3: In high SINR region, the ergodic rates R + j (j = 3, 4, 5) of typical UT at cluster A in cache-enabled mmWave UDN are derived as where (a j , b j ) are given by (16). Proof: Following the proof of Corollary 2, we can derive the final results by changing the densities of SBSs (λ 2,f , λ 2 ) within A and interference SBSs ( λ 2,f ) in D.
It is worth noting that except for R 2 , the other ergodic rates given in Theorem 1 and Corollaries 1, 2 and 3 are derived with respect to (w.r.t.) the requested content f . Besides, the feasible domain for transmission power P 1 and P 2 can be obtained by letting R + i ≥ 0(i = 1, 2, 3, 4, 5).

B. Service Delay Analysis
With the derived ergodic rates, the delay for two-nodes transmission is described as T = l/R. We then evaluate the average service delay for the typical UT 2 which is inserted at the center o of cluster A.
Proposition 1: In cache-enabled mmWave SCN, the average service delay T 1 for typical UT o at cluster A is evaluated by In cache-enabled mmWave UDN, the average service delay T 2 for typical UT at cluster A is given by Proof: In cache-enabled mmWave SCN, considering the request for content f from typical UT at center of cluster A, the delay T 1 for fetching content from local cache, the UTs within A, the cache of nearest SBS and the CN is 0, l/R 1 , l/R 2 and l/R 2 + T 0 , respectively. Accordingly, the probability for delay T 1 is given by where T 0 is the consumed time for fetching content from CN. From the proof of Lemma 1, it has Pr{N 1,f = 0} = e −λ 1,f πd 2 . Hence with the probability p f for requesting content f , the average service delay T 1 for typical UT in SCN is deduced in (26). Similarly, we can derive the expression for Since all the rates R i (i = 1, 3, 4, 5) except R 2 are determined by Θ 1 and Θ 2 , it is necessary to investigate how the performance of average service delay behaves with adjusting caching capacity.
Proposition 2: For fixed caching policy Π 1 (Π 2 ), the probability Pr 1,f (Pr 2,f ) is non-decreasing with Θ 1 (Θ 2 ), while T 1,2 are non-increasing with Θ 2 when l R3 ≤ l R4 ≤ l R5 + T 0 . Proof: When Θ 1 (Θ 2 ) increases, more contents can be stored at UTs(SBSs). Hence Pr 1,f (Pr 2,f ) will grow when Π 1 (Π 2 ) is fixed. This can also be shown by looking at the derivative of Pr 1,f for the policies described in Section V. Considering the average service delay T 1 , the derivative satisfies ∂T 1 ∂ Pr 2,f < 0. Since Pr 2,f is non-decreasing with Θ 2 from Proposition 1, it can be concluded that T 1 is non-increasing with Θ 2 . As for the UDN scenario, it can be shown that when l R4 ≤ l R5 + T 0 , the average service delay T 2 decreases with Pr 2,f (i. e. ∂T 2 ∂ Pr 2,f ≤ 0), while ∂T 2 ∂ Pr 1,f ≤ 0 under the condition l R3 ≤ l R4 ≤ l R5 + T 0 . The condition presented in Proposition 2 demonstrates that for a typical UT in cache-enabled UDN, the delay of fetching content from the cache of SBS is smaller than D2D content share, while that of fetching content from CN is the largest. This condition can be easily satisfied when the density of SBSs is comparable with UTs in UDN. From the expressions of T 1 and T 2 , it is easy to find out that the advantage of enabling caching at SBSs is to offload backhauling to CN. Meanwhile the caching capability of UTs can determine the rates of UT both in clustered mmWave SCN and UDN. In what follows, we study the caching policy which can minimize the service delay for typical UT.
Proposition 3: We can obtain the optimal caching scheme that minimizes average service delays T 1,2 by solving Proof: The constraint (29c) is due to that Pr 1,f and Pr 2,f are probabilities. Suppose the constraints (29b) are not active, i. e.
f ∈F Pr * 1,f < Θ 1 , f ∈F Pr * 2,f < Θ 2 where Pr * 1,2 are optimal solution for (P1). Then we can always find i with 0 < Pr * 1,i < 1, and Since T 1 increases over Pr 1,f with fixed Pr * 2 , we can substitute Pr * 1,i with Pr * 1,i + to further reduce T 1 . Similarly, with fixed Pr * 1 , we can also introduce a small increment on some Pr * 2,f to reduce T 1 . Hence the constraint (29b) must be active for the optimum. Following the same clue, it can be proved that the constraint (29b) should also be active in minimizing T 2 .
Since Pr 1 and Pr 2 are coupled variables in objective (29a), the caching strategies of UTs and SBSs should be optimized jointly. We adopt an Alternate Optimization (AO) approach to solve (P1), which is concluded in Alg. 1. At the first step of each iteration, we optimize the caching policy of UTs with fixed caching scheme of SBSs. Then the caching policy of SBSs is optimized with updated Pr 1 at the second step. It can be verified that the average service delay reduces at each iteration of Alg. 1. In high SINR region, where P 1 and P 2 are assumed to be large enough to make R i (f ) → R max (i ∈ {1, 2, 3, 4, 5}, ∀f ∈ F), we can obtain the approximation T 1,2 by applying R i → R max in (26) and (27).
It should be noted that Alg. 1 is a Probabilistic Caching (PC) based policy. Different from the PC scheme provided in [56], we jointly optimize the caching policy for SBSs and UTs to minimize the service delay for typical UT. With the output Pr 1 (Pr 2 ) from Alg. 1, UTs(SBSs) randomly build a list of up to Θ 1 (Θ 2 ) contents to be cached as [56].

V. NUMERICAL RESULTS
In what follows, we will provide some numerical results to validate the analysis of distance and SINR distributions, and the ergodic rate for typical UT. Moreover, the effects of caching capacity and policies of SBSs and UTs, as well as the radius of clusters, on the average service delay for both cache-enabled mmWave SCN and UDN with clusters will be provided. The parameters used for simulations are given in Table I.
We consider homogeneous content preference among UTs with ω = F . Besides, we apply the proposed caching scheme Alg. 1, and the following caching policies for SBSs and UTs: • RC: UTs(SBSs) choose to store contents randomly until where • Most popular caching (MPC): UTs(SBSs) choose to store contents according to the order of ω until Θ 1 (Θ 2 ) is occupied, where • PC: UTs(SBSs) store contents up to Θ 1 (Θ 2 ) according to the probabilistic content caching method as [56], but with the following probability where μ * (≥ 0) is found to make f ∈F Pr i,f = Θ i satisfied. Since we assume homogeneous content preference and caching capacity for UTs, the contents that are stored at UTs will be unique if the MPC scheme is adopted. Besides, to exploit the diversity of caching content at UTs, we adopt PC scheme for UTs, i. e., Π 1 = PC.   Fig. 4 presents the CDFs F Xi (x) with radius d = 20 m, η = 1.2, Θ 2 = 2 Gb and Π 2 = PC. Since the distribution for X 1 and X 3 is the same, we show the CDFs of X 1,4,5 Fig. 4 (a). Because the transmission distance is within interval (0, d], both CDFs have F X1,4,5 (d) = 1. As for content f = 10, the probability Pr 1,f =10 is less than 1. The UTs which has content f = 10 follows the PPP Φ 1,f =10 with density Pr 1,f =10 λ 1 < λ 1 . Hence the CDF F X1,f =10 (x) is not larger than F X1,f =1 (x) with fixed x. With enlarging caching capacity Θ 1 , the probability Pr 1,f =10 is non-decreasing according to Proposition 1. This densifies the PPP Φ 1,f =10 , which hence increases the CDFs F X1 (x). The CDF F X5 (x) is only determined by the density of SBS λ 2 . For the CDF F X4 (x), it is always smaller than F X5 (x) for fixed x since λ 2,f ≤ λ 2 . The CDF F X2 (x) is shown in Fig. 4 (b), which is only determined by the density λ 2 . As the result shown, it has almost probability 1 for the typical UT to connect a SBS within distance 200 m in SCN.   (12) and variants of (35) directly. While F Γi (γ) in approximation are given in Lemmas 2, 3 and 5. From Fig. 5, it can be concluded that Gaussian-Chebyshev quadrature with t = 30 can provide accurate approximation for F Γi (γ). For α L ∈ {2.5, 3} and Π 2 = {RC, PC}, the SINR of D2D tier for typical UT in mmWave SCN is expected to be larger than that of SBS tier and the SINR of typical UT in mmWave UDN. This is due to the truth that the interference experienced by the typical UT in mmWave UDN is stronger than that in SCN. When the pathloss exponent of LOS α L increases to 3, the performance of SINR for all scenarios become worse. While the CDF F Γ2 (γ) keeps the same since it is only determined by the pathloss exponent of NLOS α N . For the considered caching setup, the SINRs F Γ1,3,4 (γ) in both SCN and UDN cenarios with caching policy Π 2 = PC are larger than those achieved by Π 2 = RC policy. This is because with Pr 2,f =1,Π2=PC ≥ Pr 2,f =1,Π2=RC , more SBSs will store content f = 1 when policy PC is adopted rather than RC. Hence the typical UT is likely to retrieve content f = 1 from a closer SBS with PC than RC, i. e.

B. Ergodic Rates
We evaluate ergodic rates R i and approximations R + i with Π 2 = PC, d = 20 m, η = 1.2, α L = 3 and α N = 3.5. From the results presented in Fig. 5, we show that adopting PC policy at SBSs can achieve better SINR performance than RC for popular contents (e.g., f = 1). Hence we only consider the Π 2 = PC policy here. The ergodic rates R 1,3 and R + 1,3 are shown in Fig. 6 (a). As P 1 increases from −40 dBm to −10 dBm, the ergodic rate R 1 (f = 2) and R 3 (f = 2) will grow to the maximum achievable rate R max = 2 Gbps. Moreover, when P 1 is higher than −30 dBm, the approximation  R + 1 (f = 2) approaches theory R 1 (f = 2). As for R + 3 (f = 2), the gap from theory R 3 (f = 2) becomes negligible when P 1 > −15 dBm. While Fig. 6 (b) shows the ergodic rate R 2,4,5 and approximation R + 2,4,5 . We can see that the approximation R + 2 gets close to theoretical rate R 2 when P 2 > 30 dBm, where both R 2 and R + 2 almost are R max . As for R 4,5 (f = 10), the approximation approaches the theoretical rate when P 2 is larger than 10 dBm and −10 dBm, respectively.

C. Average Service Delay
In Fig. 7 we evaluate the average service delay for typical UT in clustered mmWave SCN and UDN w.r.t. the D2D tier transmission power P 1 . In the simulation, we set P 2 = 10 dBm, η = 1.2, Θ 1 = 0.5 Gb, Θ 2 = 2 Gb and α L = 2.5. For fixed radius d = 10 m, the average service delay for different caching policies satisfy T 1,MPC < T 1,PC < T 1,RC and T 2,MPC < T 2,PC < T 2,RC . When the caching policies RC and PC are adopted by SBSs, the average service delay of typical UT in mmWave UDN is smaller than that in mmWave SCN, i. e., T 2 < T 1 , when P 1 is larger than −15 dBm with dBm, the average delay of typical UT in clustered UDN is smaller than that in SCN with d = 10 m. It is because that with Π 2 = MPC, more requests from the typical UT are served by the D2D tier both in clustered SCN and UDN than cases Π 2 ∈ {RC, PC}. Furthermore, the interference for the typical UT in mmWave UDN is stronger than that in SCN. Hence the average delay presents the trend T 2,MPC < T 2,MPC when P 1 is small enough, which can also explain the case d = 15 m.
By fixing Π 2 = PC, P 1 = 0 dBm and P 2 = 20 dBm, we present the average service delay over the caching capacity Θ 2 in Fig. 8. As shown in this figure, both delay T 1 and T 2 decrease with enlarging Θ 2 . This is because larger Θ 2 can offload more backhauling to CN. Besides, the delay can be reduced by enlarging Θ 1 . This phenomenon can be explained from two aspects. With a larger Θ 1 , more requests from typical UT can be served by its local cache, which consumes 0 s delay. Besides, the expected distance for D2D tier transmission will be shorten since the requested contents have higher probability to be cached by the neighboring UTs within the same cluster. When the shape parameter of Zipf-like distribution reduces from 1.2 to 0.9, the content preference of UTs becomes less concentrated. Hence the advantage brought by caching unit becomes weaken, which leads to a larger service delay.
Finally, we present the impact of radius d of clusters on the service delay for typical UT in SCN and UDN in Fig. 9, with P 1 = 0 dBm, P 2 = 20 dBm, α L = 2.5, Θ 1 = 0.5 Gb and Θ 2 = 2 Gb. When caching policies RC, PC and MPC are adopted at SBSs, the delays T 1,2 decrease with d in range [5,15] m. The reason is that the increased amount of D2D tier content sharing can reduce the delay. Besides, the interference for typical UT is non-increasing with radius d for both mmWave SCN and UDN scenarios. However, service delay T 2,RC , T 2,PC and T 2,MPC degrade by enlarging cluster size to d > r d = 15 m. This is because multiple D2D links are allowed when d > r d , and the D2D transmission bandwidth for the typical UT reduces due to spectrum partition. When the cluster size gets larger, we have T 2,RC < T 2,PC < T 2,MPC for UDN scenario. This demonstrates that with Π 1 = PC and large caching capacity Θ 2 , the necessity for optimizing caching policy for SBSs decreased in UDN. In Fig. 9, the average service delay achieved by proposed caching policy Alg. 1 is smaller than that of caching scheme Π 1 = PC, Π 2 = {RC,PC,MPC} for both mmWave SCN and UDN scenarios. Besides, we simulate the service delay for MPPC scheme [57], which greedily store contents at SBSs and UTs to maximize the total hit-rate. As shown in Fig. 9, Alg. 1 outperforms MPPC for considered cluster size region. Note that the service delay achieved by Alg. 1 presents the similar trend with benchmarks Π 1 = PC, Π 2 = {RC,PC,MPC}, where the best performance can be achieved with radius d = r d .

VI. CONCLUSIONS
We studied clustering schemes for cache-enabled mmWave SCN and UDN through investigating the ergodic rate and service delay for typical UT in a cluster. By defining the clusters as disjoint discs with equal size, the closed-form expressions of SINR distribution and ergodic rates in high SINR region are derived for both scenarios, which are characterized by the radius d of clusters, caching capacity of UTs and SBSs, and caching policies. Besides, we propose a PC based caching policy which jointly optimize the caching scheme for SBSs and UTs to minimize the service delay. Through analysis and simulations, we show that the average service delay of typical UT can be reduced by enlarging the caching capacity of UTs and SBSs. Moreover, we reveal that the service delay in cache-enabled mmWave UDN with clusters can lead a conditional performance gain compared with those in mmWave SCN due to deploying dense SBSs. It shows that the proposed caching scheme can further reduce the service delay of typical UT compared with benchmark policies. Moreover, simulation results show that there exists a best radius d on average service delay in mmWave SCN, which can guide the design of the clustering scheme.  2 (b(o, d)) as N 1,2 . The D2D tier transmission happens only when the requested content exists in cluster A. Considering Pr 1,f which represents the probability that f exists at any UTs, we thin the Φ 1 by deleting each UT with probability Pr 1,f to obtain a new PPP denoted as Φ 1,f , which has density λ 1,f = Pr 1,f λ 1 and consists of the UTs with f . Denote N 1,f (b(o, x)) as the number of UT in Φ 1,f located within disc b(o, x) while N 1,f = N 1,f (b(o, d)). The probability Pr{X 1 ≤ x 1 |N 1,f ≥ 1} conditioned on N 1,f ≥ 1 implies that there exists at least one UT with content f in area A with radius d, which is expressed as Then we can obtain the PDF of X 1 as shown in (3). Next we derive the distribution of distance X 2 for SBS tier transmission. Denote N 2 as the number of SBS within cluster A. The probability Pr{X 2 ≤ x|N 2 = 0} conditioned on N 2 = 0 implies that there is no SBS located in area A with radius d, which is expressed as According to the above expression, the PDF of X 2 can be derived as (4).

APPENDIX B PROOF OF LEMMA 2
The CDF of Γ 1 can be expressed as where (a) follows from the CDF approximation of normalized gamma random variable |h 0 | 2 ∼ Gamma(M, 1 M ) [52], and (b) follows from the Binomial Theorem. The term B 1 in (35) is given by where (c) follows from the expectation of the antenna gain G, (d) follows form the moment-generating function of gamma distribution, (e) follows from the probability generating functional of a PPP [58], and (f) is from the property of Gaussian Hypergeometric function and holds for α L , α N > 2.
Then plugging (36) into (6) and taking expectation over X 1 , we obtain the CDF as where (g) is obtained by invoking approximation of Gaussian-Chebyshev quadrature, which follows . w i , u i and Ψ λ1,f ,λ2,ψ 1,lk (u i ) are given in (7) and (8) where (a) follows from the approximation of (1 − exp(−x)) M ≈ M x when x → 0, Γ(x) is the gamma function, and (b) is since the truth that x∈Φ f (x) = λ Ê 2 f (x)dx, where λ is the density of PPP Φ. Then substituting (38) into (19)