How Much Can D2D Communication Reduce Content Delivery Latency in Fog Networks with Edge Caching?

A Fog-Radio Access Network (F-RAN) is studied in which cache-enabled Edge Nodes (ENs) with dedicated fronthaul connections to the cloud aim at delivering contents to mobile users. Using an information-theoretic approach, this work tackles the problem of quantifying the potential latency reduction that can be obtained by enabling Device-to-Device (D2D) communication over out-of-band broadcast links. Following prior work, the Normalized Delivery Time (NDT) --- a metric that captures the high signal-to-noise ratio worst-case latency --- is adopted as the performance criterion of interest. Joint edge caching, downlink transmission, and D2D communication policies based on compress-and-forward are proposed that are shown to be information-theoretically optimal to within a constant multiplicative factor of two for all values of the problem parameters, and to achieve the minimum NDT for a number of special cases. The analysis provides insights on the role of D2D cooperation in improving the delivery latency.

multicast fronthaul link was studied in [17] and [18], where the advantages of coded multicast delivery were investigated. An F-RAN with heterogeneous contents was studied in [19], and the NDT region was characterized for the case with two ENs and two users. A caching and delivery scheme was presented for a partially-connected F-RAN in [20] and in [21].
Under the constraints of linear precoding and uncoded fronthaul transmission, upper and lower bounds on the minimum NDT in an F-RAN were presented in [22], and the ratio between bounds was shown to be less than 3/2 for all system parameters and equals to one for some special cases. This work was extended in [23] to include caches also at the users.
An F-RAN with imperfect Channel State Information (CSI) at the CP was studied in [24], and a non-orthogonal transmission scheme was shown to improve the latency performance.
To the best of our knowledge, F-RANs with D2D communication have not yet been considered, apart from the conference versions of this work [25], [26]. Content delivery in a multi-hop D2D caching network was instead studied in [27], where the per-node capacity scaling law was derived. In [28], it was shown that in-band transmitter or receiver cooperation cannot increase the sum DoF of an interference channel. In contrast, out-of-band D2D receiver cooperation was proven in [29] to increase the Generalized DoF metric for an interference channel. Importantly, reference [29] only imposes a rate constraint on the D2D links, hence not accounting for the latency overhead caused by D2D communications, which is of central interest in this work. The conference versions of this work cover the special case of an F-RAN with two ENs and users, whereas, in this work, as discussed next, we consider arbitrary numbers of ENs and users.
Main Contributions: In this work, we study the general D2D-aided F-RAN system with M ENs and K users illustrated in Fig. 1. First, we propose two caching and delivery strategies based on a novel form of interference alignment and on compress-and-forward. The first strategy is developed for the special case M = K = 2 and is shown to be optimal. The approach is however difficult to scale to a larger system and suffers from the typical lack of robustness to imperfect CSI of interference alignment [30]. For the general case of arbitrary number of M and K, we prove that a more practical D2D strategy based on compress-and-forward achieves the minimum NDT to within a multiplicative factor of 2. This implies that the optimality gap of this strategy does not scale with the size of the system. Based on these results, we identify regimes in terms of fronthaul and cache capacities under which D2D communication is beneficial in reducing delivery latency.
Organization: The rest of the paper is organized as follows. In Sec. II, we present an information-theoretic model for a general D2D-aided F-RAN under serial or pipelined delivery policies. In addition, the metric of interest, namely the NDT, is defined. In Sec. III, we describe the proposed D2D-based caching and delivery strategies. In Sec. IV, upper and lower bounds on the minimum NDT under serial delivery are derived. In Sec. V, we present an exact characterization of the minimum NDT for the special case with M = K = 2 and a finite-gap characterization for arbitrary M and K. In Sec. VI, we discuss pipelined delivery policies.
Lower and upper bounds on the minimum NDT along with a finite-gap characterization are presented. Finally, in Sec. VII we conclude the paper and highlight some open problems.

II. SYSTEM MODEL
We consider the F-RAN system with Device-to-Device (D2D) links depicted in Fig. 1, where K ≥ 2 single-antenna users are served by M ≥ 2 single-antenna Edge Nodes (ENs) over a downlink wireless channel. Each user is connected to all other users by an orthogonal out-of-band broadcast D2D link of capacity C D bits per symbol. The model generalizes the set-up studied in [11] by including D2D communications. Each EN is connected to a Cloud Processor (CP) by a fronthaul link of capacity C F bits per symbol. A symbol refers to a channel use of the downlink wireless channel.
Let F denote a library of N ≥ K files, F = { f 1 , . . . , f N }, each of size L bits. The library is fixed for the considered time period. The entire library is available at the CP, whereas the ENs can only store up to µN L bits each, where 0 ≤ µ ≤ 1 is the fractional cache size. During the placement phase, contents are proactively cached at the ENs, subject to the mentioned cache capacity constraints.
After the placement phase, the system enters the delivery phase, which is organized in Transmission Intervals (TIs). In every TI, each user arbitrarily requests one of the N files from the library. The users' requests in a given TI are denoted by the demand vector This vector is known at the beginning of a TI at the CP and   ENs. The goal is to deliver the requested files to the users within the lowest possible delivery   latency by leveraging fronthaul links, downlink channel, and D2D links. For a given TI, let T E denote the duration of the transmission on the wireless downlink channel. At time t ∈ [T E ], each user k ∈ [K] receives a channel output given by Note that, as per (2), we consider policies where only coding within each file is allowed, i.e., no inter-file coding (e.g., [31]  where T F is the duration of the fronthaul message. Note that the fronthaul message cannot exceed T F C F bits, i.e., H(u m ) ≤ T F C F .

3) Edge Transmission Policies:
After fronthaul transmission, in each TI, the ENs transmit using a function π m e (·) that maps the local cache content, s m , the received fronthaul message u m , the demand vector d and the global CSI H, to the output codeword

4) D2D Interactive Communication Policies:
After receiving the signals (1) over T E symbols, in any TI, the users apply a D2D conferencing policy. For each user k ∈ [K], this is defined by the interactive functions π k D2D,t (·) that map the received signal y k (y k [1], . . . , y k [T E ]), the global CSI, and the previously received D2D message from users where t ∈ [T D ], with T D being the duration of the D2D communication, and v t−1 All users broadcast the D2D messages (5) to all other users over orthogonal broadcast channels of capacity C D . Hence, the total size of each D2D message cannot exceed T D C D bits. i.e.,

5) Decoding
Policy: After D2D communication, each user k ∈ [K] implements a decoding policy π k d (·) that maps the channel outputs, the D2D messages from users [K]\{k}, the user demand, and the global CSI to an estimate of the requested file f d k given aŝ where V k {v 1 , . . . , v k−1 , v k+1 , . . . , v K } is the set of D2D messages sent by users k ∈ [K]\{k} and received by user k.
The probability of error is defined as which is the worst-case probability of decoding error measured over all possible demand vectors d and over all users k ∈ [K]. A sequence of policies, indexed by the file size L, is said to be feasible if, for almost all channel realization H, we have P e → 0 when L → ∞.

B. Performance Metric
We adopt the Normalized Delivery Time (NDT), introduced in [11], as the performance metric of interest. The NDT is the high-SNR ratio between the worst-case delivery time per bit required to satisfy any possible demand vector d and the delivery time per bit for an ideal reference system in which each user can receive the desired file at the maximum high-SNR rate of log(P) [bits/symbol]. To formalize the NDT, we parametrize fronthaul and D2D capacities as C F = r F log(P) and C D = r D log(P). With this parametrization, the fronthaul rate r F ≥ 0 represents the ratio between the fronthaul capacity and the high-SNR capacity of each EN-to-user wireless link in the absence of interference; a similar interpretation holds for the D2D rate r D ≥ 0.
As discussed, under serial delivery, in each TI, the CP first sends the fronthaul messages to the ENs for a total time of T F symbols; then, the ENs transmit on the wireless shared channel for a total time of T E symbols; and, finally, the users use the out-of-band D2D links for a total time of T D symbols. The corresponding NDT contributions are obtained by normalizing these terms by the delivery time needed on the mentioned reference system: .
The factor L/log(P), used for normalizing the delivery times in (9), represents the minimal time to deliver a file in the reference system. The total NDT under serial delivery is hence defined as where the notation emphasizes the dependence of the NDT on the fractional cache size µ, and the fronthaul and D2D rates r F and r D , respectively.
The minimum NDT is finally defined as the minimum over all NDTs achievable by some feasible policy: By construction, we have the lower bound δ * (µ, r F , r D ) ≥ 1. Furthermore, the minimum NDT can be proved by means of file-splitting and cache-sharing arguments to be convex in µ for any fixed values of r F and r D [11, Lemma 1].

C. Pipelined Transmission
The system defined above is based on serial delivery as illustrated in Fig. 2a. Here we describe an alternative model, whereby, as seen in Fig. 2b, simultaneous transmissions on fronthaul, edge, and D2D channels are enabled. Specifically, the ENs can simultaneously receive messages over the fronthaul links and transmit on the wireless channel; and the users can receive on the wireless channel while, at the same time, transmitting messages on the D2D links. Following [11], we refer to this model as enabling pipelined delivery.
To elaborate, at time instant t ∈ [T], where T denotes the delivery latency in a TI, each EN and user transmits using the information received at times 1, . . . , t − 1, in a causal way.
Mathematically, each EN m ∈ [M] at time t ∈ [T] uses a function π m P,e,t (·) to map the local cache content, the fronthaul messages received up to time t − 1, the demand vector, and the global CSI to the output symbol Furthermore, user k ∈ [K] transmits using the function π k P,D2D,t (·) that maps the received edge signal up to time t − 1, global CSI, and the previously received D2D messages from users Similar to the serial transmission case, the NDT and minimum NDT under pipelined delivery are defined as δ P (µ, r F , r D ) lim P→∞ lim L→∞ ¾[T] log(P)/L, and δ * P (µ, r F , r D ) inf{δ P (µ, r F , r D ) : δ P (µ, r F , r D ) is achievable}, respectively. Furthermore, we have the lower bound δ * P (µ, r F , r D ) ≥ 1, and the minimum NDT is a convex function of µ for any fixed values of r F and r D . Finally, since serial delivery is a special case of pipelined delivery, by the definition of the minimum NDT, we have the inequality δ * P (µ, r F , r D ) ≤ δ * (µ, r F , r D ). The pipelined delivery model is studied in Sec. VI.

III. DELIVERY STRATEGIES FOR EDGE CACHING WITH D2D COOPERATION
In this section, we start by developing delivery schemes for the special case in which the fractional cache size is µ = 1/M and the fronthaul capacity is not used. This scenario corresponds to the important special case in which the edge cache capacity is the minimum necessary to guarantee that the entire library F is available across the caches of all ENs, and hence fronthaul resources may not be used for delivery. Note that, for any request vector, users need to download equal fractions of the requested file from all ENs. This set-up is also known as an X-channel [30]. We first introduce a delivery strategy based on a new interference alignment scheme for an F-RAN with M = K = 2. A more scalable strategy based on compress-and-forward is then introduced for any number of ENs and users.
A. Interference Alignment for M = K = 2 For the case of M = 2 ENs and K = 2 users, we present a delivery scheme that integrates D2D communication in the Real Interference Alignment (RIA) scheme introduced in [30].
Our main interest in this scheme stems from its optimality, which will be proved in Sec. V.
Proposition 1: For a D2D-aided F-RAN with M = 2 ENs, each with a fractional cache size µ = 1/2, K = 2 users, a fronthaul rate r F ≥ 0, and a D2D rate r D ≥ 0, the minimum NDT under serial delivery is upper bounded as δ Prop. 1 was proved in the conference paper [25] by the authors by leveraging layered transmission, RIA, D2D cooperation, and successive cancellation decoding at the receivers.
While referring to [25] for details, we sketch here the main features of the scheme by comparing it to the original RIA scheme introduced in [30] for an X-channel model without D2D cooperation. In RIA, each EN applies layered transmission with two layers by transmitting where symbols a 1 , a 2 , b 1 , and b 2 are chosen from a discrete constellation. Each layer is coded using random coding with rate R. Layers a 1 and b 1 are intended for user 1, whereas a 2 and b 2 are intended for user 2. Note that the precoders in (15) are based on perfect knowledge of the CSI at the ENs. The signals (1) received by the two users are hence given as As shown in [30], user 1 is able to decode the signalỹ 1 y 1 − z 1 from y 1 , in the high-SNR regime, if the rate is selected as R = log(P)/3. Next, user 1, which has perfect CSI, searches for a set of symbols {a 1 , b 1 , a 2 + b 2 } that generatesỹ 1 . Since the ENs use a discrete constellation and the channel coefficients are drawn i.i.d. from a continuous distribution, almost surely, this set is unique. This implies that user 1 can decode the desired layers a 1 and b 1 once it has decodedỹ 1 . Similarly, user 2 can decode layers a 2 and b 2 . Note that the RIA scheme requires T E = 3L/(2 log(P)) channel uses in order to satisfy the users' demands, since each layer consists of L/2 bits and is transmitted at a rate of log(P)/3 bits per channel use. It follows that RIA without D2D cooperation achieves an NDT of 3/2.
In order to leverage D2D cooperation, in the proposed scheme, the ENs apply layered transmission with n d layers, where n d is odd. The transmitted signals are hence given as where precoder gains {g m,i }, with m ∈ [2] and i ∈ [n d ], are selected to satisfy h 11 g 1,i = h 12 g 2,i−1 and h 22 g 2,i = h 21 g 1,i−1 . The signals (1) received by the two users are hence given as In a manner similar to the RIA scheme, it can be shown that user 1 is able to decode the signal The uniqueness of this set is determined by the same arguments used for the RIA scheme. Likewise, user 2 is able to identify the set R 2 {b 1 , b 2 + a 1 , . . . , b n d + a n d −1 , a n d }.
In order to decode the desired layers, the users exchange the even-numbered layers over the D2D links, so that user 1 transmits the message v 1 = {a 2 + b 1 , a 4 + b 3 , . . . , a n d −1 + b n d −2 } to user 2, whereas user 2 transmits v 2 = {b 2 + a 1 , b 4 + a 3 , . . . , b n d −1 + a n d −2 } to user 1. User 1 is thus able to decode {a 1 , b 2 , a 3 , b 4 , . . . , a n d , b n d } by means of successive cancellation decoding To this end, it starts by decoding a 1 in R 1 ; then, it uses a 1 together with b 2 + a 1 in v 2 to decode b 2 ; next, it uses b 2 and a 3 + b 2 in R 1 to decode a 3 ; and so on, until the desired layers are decoded. Similarly, user 2 decodes {b 1 , The scheme requires T E = (n d + 1)L/(n d log(P)) downlink channel uses since each EN conveys L/2 bits to each user over n d /2 layers, which are transmitted at a rate of log(P)/(n d + 1) bits per channel use. Unlike RIA, there is an additional latency overhead of T D = L/(2r D log(P)) due to sharing (n d − 1)/2 layers over each D2D link. Therefore, assuming an arbitrarily large number of layers at the ENs, the NDT (14) is obtained.

B. Compress-and-Forward D2D Transmission
The scheme discussed above appears to be cumbersome to generalize beyond the case M = K = 2. Furthermore, at a practical level, this approach is mostly of theoretical interest since the performance of RIA is known to degrade catastrophically when CSI at the transmitters is imperfect [32]. Therefore, here we present an achievable scheme that applies to all values of M and K and requires only CSI at the receivers. The scheme is based on Compress-and-Forward (CF) D2D communication, and its near-optimality properties will also be discussed in Sec. V.
Proposition 2: For a D2D-aided F-RAN with M ENs, each with a fractional cache size µ = 1/M, K users, a library of N ≥ K files, a fronthaul rate r F ≥ 0, and a D2D rate r D ≥ 0, the minimum NDT under serial delivery is upper bounded as δ is achieved by means of CF-based D2D communication and Zero-Forcing (ZF) equalization at the devices.
The NDT (19) . . , z K ] T represents the white Gaussian noise, and q k [q 1 , . . . , q K ] T represents the compression noise vector. We have q k = 0 since user k receives y k directly over the downlink channel (1). The channel coefficients are drawn i.i.d. from a continuous distribution; therefore, almost surely, matrix H K is invertible. Hence, each user can apply ZF equalization, i.e., multiply the received signals by . Note that, after ZF equalization, the ENs' transmissions no longer cause interference. Therefore, the achievable rate is determined by the power of the additive noise . As shown in [11, App. II-A], by compressing with a rate equal to log(P) bits per downlink symbol, we can guarantee that the SNR after compression scales linearly with P. Thus, in the high-SNR regime, each EN is able to transmit with a rate of R ≈ log(P) bits/channel use.
To satisfy the users' demands, each EN must convey L/M bits to each user. To this end, we cluster the ENs into all possible M K subsets of K ENs, and schedule each cluster into distinct time intervals of duration T E / M K . Since each EN participates in M−1 K−1 clusters, and the total number of bits transmitted by each EN is K L/M, then the duration of each interval is given as Therefore, the number of downlink channel uses is T E = L/log(P), and hence the proposed scheme achieves an ideal edge NDT of δ E = 1. Since, for each downlink channel use, each user transmits log(P) bits over the D2D link, a latency overhead of T D = T E log(P)/C D = T E /r D is added to the delivery time, and hence the total NDT is (19).

A. Upper Bounds and Achievable Strategy
In the previous section, we presented schemes for the special case in which the fractional cache size is µ = 1/M. To obtain a policy that applies for any value of fractional cache size µ, we combine, via file-splitting and cache-sharing, the D2D-based CF scheme (Prop. 2) with the best-known general strategies for an F-RAN model with no D2D cooperation. These strategies are described next for reference, followed by a review of file-splitting and cache-sharing.  To formulate the main result, we define the threshold values Proposition 3: For a D2D-aided F-RAN with M ENs, each with a fractional cache size µ ∈ [0, 1], K users, a library of N ≥ K files, a fronthaul rate r F ≥ 0, and a D2D rate r D ≥ 0, the minimum NDT under serial delivery is upper bounded as δ * (µ, r F , r D ) ≤ δ ach (µ, r F , r D ), where the achievable NDT δ ach (µ, r F , r D ) is obtained by combining the mentioned schemes as follows: • Low cache, low fronthaul, and low D2D regime (µ ≤ 1/M, r F ≤ r th F , and r D ≤ r th D ): Combining EN coordination and soft-transfer policies yields the NDT • High cache, low fronthaul, and low D2D regime (µ > 1/M, r F ≤ r th F , and r D ≤ r th D ): Combining EN coordination and ZF precoding policies yields the NDT • High fronthaul and low D2D regime (µ ∈ [0, 1], r F > r th F , and r D ≤ r th D ): Combining ZF precoding and soft-transfer policies yields the NDT • Low cache and high D2D regime (µ ≤ 1/M, r F ≥ 0, and r D > r th D ): Combining soft-transfer and CF policies yields the NDT • High cache and high D2D regime (µ > 1/M, r F ≥ 0, and r D > r th D ): Combining CF and ZF precoding policies yields the NDT Proof: See Appendix A.
For the special case of M = 2 ENs and K = 2 users, the following NDT is achieved by using the D2D-enhanced RIA scheme of Prop. 1.
Proof: Follows from Prop. 3 by replacing the D2D threshold r th D in (22) with r th D = max{1, r F }, and, for D2D rate r D > r th D , by applying the D2D scheme of Prop. 1 instead of the CF-based scheme.

B. Lower Bound
A general lower bound on the minimum NDT is given in Prop. 5. Following [11], the bound is derived by identifying subsets of information resources from which, for high-SNR, all requested files must be reliably decoded when a feasible policy is implemented.
Specifically, for l = 0, 1, . . . , min{M, K }, we consider a subset that consists of the signals Proposition 5: For a D2D-aided F-RAN with M ENs, each with a fractional cache size µ ∈ [0, 1], K users, a library of N ≥ K files, a fronthaul rate r F ≥ 0, and a D2D rate r D ≥ 0, the minimum NDT under serial delivery is lower bounded as δ * (µ, r F , r D ) ≥ δ lb (µ, r F , r D ), with δ lb (µ, r F , r D ) being the minimum value of the following linear program where (29b) is a family of constraints with l = 0, 1, . . . , min{M, K }, and Proof: See Appendix B.
Note that, without D2D communication, i.e., r D = 0, the linear program (29) is identical to that of [11,Proposition 1]. For r D > 0, the additional term g(l)r D δ D in (29b) reflects the novel trade-off between the D2D NDT δ D and the edge and fronthaul NDTs δ E and δ F , respectively.

V. CHARACTERIZATION OF THE MINIMUM NDT FOR SERIAL DELIVERY
In this section, based on the lower and upper bounds presented in Sec. IV, we discuss the optimality properties of the D2D-based strategies.

A. 2 × 2 D2D-Aided F-RAN
For the case of M = 2 ENs and K = 2 users, as detailed in the following proposition, the D2D-based strategy of Prop. 4 is optimal.
Proof: See Appendix C.
Prop. 6 can be used to draw conclusions on the role of D2D cooperation in improving the delivery latency. We start by observing that, for r D ≤ max{1, r F }, the minimum NDT δ 2×2 (µ, r F , r D ) (28) is identical to the minimum NDT without D2D links derived in [11,Corollary 3]. Therefore, D2D communication provides a latency reduction only when we have r D > max{1, r F }.
The minimum useful value max{1, r F } for the D2D rate r D increases with fronthaul rate r F . This demonstrates that there exists a trade-off between fronthaul and D2D resources for the purpose of interference management, although their role is not symmetric. The use of fronthaul links is in fact necessary to obtain a finite NDT when the library is not fully available at the ENs, i.e., when µ < 1/2. D2D links can instead only reduce the NDT in regimes where fronthaul and edge resources would already be sufficient for content delivery with a finite NDT. In particular, when r D > max{1, r F }, D2D communication reduces the minimum NDT for all values 0 < µ < 1. Furthermore, when µ > 1/2, irrespective of the value of r F , the minimum NDT is achieved by leveraging only edge caching and D2D links, without having to rely on fronthaul resources, thus reducing the traffic at the network infrastructure.

B. General D2D-Aided F-RAN
For arbitrary number of ENs and users, we start with the main result in the following proposition, which shows that the achievable CF-based strategy of Prop. 3 is optimal to within a multiplicative factor of two. The key result in Prop. 7 is that the multiplicative suboptimality factor of the CF-based D2D approach defined in the previous section does not scale with the size of the system. This is illustrated in Fig. 3, where we plot the achievable NDT δ ach (µ, r F , r D ) and the lower bound δ lb (µ, r F , r D ) as a function of the number of ENs and users, with M = K, fractional cache size µ = 1/M, fronthaul rate r F = 1, and D2D rate r D = 1.25. As seen, the suboptimality gap can be, in practice, significantly smaller than two.
While the gap identified in (31) is generally not zero, the next corollary states that CF is close to optimal for sufficiently high D2D rate.
Corollary 1: For a D2D-aided F-RAN with M ENs, each with a fractional cache size µ ∈ [0, 1], K users, a library of N ≥ K files, a fronthaul rate r F ≥ 0, and a D2D rate r D ≥ max{r th D , 1/ } with r th D in (22) and > 0, the achievable strategy of Prop. 3 is close to optimal in the sense that we have  Proof: Cor. 1 follows directly from the proof of Prop. 7 (App. D) since, for r D ≥ r th D , we have δ ach (µ, r F , r D )/δ * (µ, r F , r D ) ≤ 1 + 1/r D (cf. (63) and (66)). Fig. 4. where we plot the achievable NDT δ ach (µ, r F , r D ) and the lower bound δ lb (µ, r F , r D ) as a function of the D2D rate r D , for M = 3 ENs, K = 3 users, fractional cache size µ = 1/3, and fronthaul rate r F = 1. As the D2D rate r D increases, the achievable NDT δ ach (µ, r F , r D ) is seen to approach the lower bound δ lb (µ, r F , r D ). For instance, for r D ≥ 1/ = 10, the gap to optimality is smaller than = 0.1. This is because, for arbitrarily large D2D rate, the latency overhead caused by D2D communications is negligible, and an ideal NDT of one can be achieved by means of ZF-equalization at the users. In addition, the figure highlights the gains that can be achieved with sufficiently high D2D rate.

VI. PIPELINED DELIVERY
In this section, we study the D2D-aided F-RAN model with pipelined delivery as defined in Sec. II-C. We proceed in a manner similar to serial delivery by first deriving lower and upper bounds on the minimum NDT, and then discussing the optimality of CF-based D2D delivery.

A. Lower Bound on the Minimum NDT
and g(l) is defined in (30). a serial delivery policy with its fronthaul, edge, and D2D transmission strategy. As illustrated in Fig. 5, in order to convert this strategy into one that leverages pipelining, every file in the library is split into B blocks of size L/B bits each, and every TI is divided into B + 2 slots.
In each slot b ∈ [B], the CP uses the fronthaul links to deliver the bth block of the requested files using the fronthaul transmission strategy of the selected serial policy. At the same time, the ENs, having received the fronthaul message for the (b − 1)th block in the previous slot, apply the edge transmission strategy of the serial policy to deliver the (b − 1)th block of the For a serial delivery scheme that achieves fronthaul, edge, and D2D transmission durations T F , T E , and T D , respectively, the block-Markov approach, with arbitrarily large number of blocks B, achieves the pipelined NDT where δ F , δ E , and δ D are the fronthaul, edge, and D2D NDTs of the serial transmission scheme as defined in (9). Moreover, for two serial transmission schemes, one that achieves NDTs δ (1) F , δ (1) E , and δ (1) D , whereas the other achieves NDTs δ (2) F , δ (2) E , and δ (2) D , and for some α ∈ [0, 1], the following pipelined NDT is achievable [11, Sec. VI-B] δ P,ach (µ, r F , r D ) = max αδ (1) Proposition 8: For an M × K D2D-aided F-RAN with a fractional cache size µ ∈ [0, 1], a library of N ≥ K files, a fronthaul rate r F ≥ 0, and a D2D rate r D ≥ 0, the minimum NDT under pipelined delivery is upper bounded as δ * (µ, r F , r D ) ≤ δ P,ach (µ, r F , r D ), where the achievable NDT δ P,ach (µ, r F , r D ) is given for two distinct regimes of operation as follows: • High fronthaul rate (r F ≥ min{M, K }/M): • Low fronthaul rate (r F < min{M, K }/M): where we have defined and Proof: See Appendix E.

C. Characterization of the Minimum NDT
In the following propositions we discuss the optimality of the D2D CF-based strategy under pipelined delivery. First, we prove that the multiplicative suboptimality factor of two, identified in Prop. 7, applies also to pipelined delivery policies.
Proposition 9: For a D2D-aided F-RAN with M ENs, K users, a library of N ≥ K files, a fronthaul rate r F < min{M, K }/M, and a D2D rate r D < 1 − Mr F /min{M, K }, the strategy of Prop. 8 achieves the minimum NDT under pipelined delivery to within a factor of two, i.e., Proof: See Appendix F.
Next, we show that the achievable strategy of Prop. 8 is optimal for the high fronthaul regime with r F ≥ min{M, K }/M; for the high D2D regime with r D ≥ 1 − Mr F /min{M, K }; for the low cache regime with µ ∈ [0, µ 1 ]; and for the high cache regime with µ ∈ [µ 2 , 1].
Proposition 10: For a D2D-aided F-RAN with M ENs, each with a fractional cache size µ ∈ [0, 1], K users, a library of N ≥ K files, a fronthaul rate r F ≥ 0, and a D2D rate r D ≥ 0, the minimum NDT is characterized for three distinct regimes of operation as follows: • High fronthaul rate (r F ≥ min{M, K }/M): • Low fronthaul rate and high D2D rate (r F < min{M, K }/M and r D ≥ 1−Mr F /min{M, K }): • Low fronthaul and D2D rates (r F < min{M, K }/M and r D < 1 − Mr F /min{M, K }): where µ 1 and µ 2 are defined in (38) and (39), respectively.
Proof: See Appendix G.
In the pipelined case, as seen in Fig. 5, the latency is dictated by the largest among fronthaul, D2D, and edge NDTs. Therefore, whenever the fronthaul rate is large enough to enable ZF precoding on the wireless channel without causing a bottleneck, the minimum NDT can be achieved without using D2D communication. However, for low fronthaul rate and low cache capacity, cooperation via CF-based ZF equalization allows the delivery latency to be reduced by alleviating fronthaul load without increasing the edge NDT.
Comparing the results for serial and pipelined delivery policies, we observe that both the achievable NDT in Prop. 3 and the lower bound in Prop. 5 are strictly decreasing functions of r D for all r D ≥ r th D , and hence the minimum NDT under serial delivery is strictly decreasing as well (cf. Fig. 4). In contrast, under pipelined delivery, the minimum NDT (42) for large r D is a constant function of r D . This is because, when r D ≥ 1 − Mr F /min{M, K }, the duration of the D2D transmission in each slot of the optimal block-Markov strategy is smaller than the fronthaul or edge transmissions, and hence increasing the D2D rate further does no reduce the minimum NDT.
The role of D2D cooperation in improving the delivery latency under pipelined delivery policies is further illustrated in Fig. 6, where we plot the lower and upper bounds on the minimum NDT as a function of the fractional cache size µ for an F-RAN with M = 10 ENs, K = 10 users, and a fixed fronthaul rate r F = 0.4. For small cache capacities satisfying µ ≤ µ 1 , D2D communication cannot reduce the minimum NDT because, in this regime, the total delivery time is dictated by fronthaul communication, which is required to deliver a large part of the requested files. In addition, for µ ≥ 1 − Mr F /min{M, K }, the cache capacity is large enough to support delivery via cache-aided ZF with a fronthaul overhead that does not affect the achievability of the ideal NDT of one. However, for µ 1 < µ < 1 − Mr F /min{M, K }, a D2D-based scheme provides a latency reduction. For example, as depicted in Fig. 6, for r D ≥ 1 − Mr F /min{M, K }, an ideal NDT of one can be achieved with a fractional cache size M times smaller than is required when no D2D communication is allowed (r D = 0).

VII. CONCLUSIONS
In this work, we have studied the benefits of out-of-band broadcast Device-to-Device µ NDT δ P,lb (µ, r F , r D ) for r D = 0 δ P,ach (µ, r F , r D ) for r D = 0 δ * P (µ, r F , r D ) for r D = 0.6 the case in which CSI at the ENs and cloud may be imperfect; the case in which inter-file coding is allowed; and the case in which security constraints are imposed on the ENs [33].

A. Proof of Proposition 3
For the first three regimes, i.e., for low D2D rate r D ≤ r th D , the NDTs in (23)-(25) are achieved by applying the strategy of [11,Proposition 4], which does not require D2D resources.
Next, for low cache and high D2D rate, i.e., for µ ≤ 1/M and r D > r th D , a fraction µM of each of the requested files is delivered via D2D-based CF, whereas the remaining (1 − µM) fraction is delivered via cloud-aided soft-transfer. The cache capacity constraint is satisfied since µM × 1/M + (1 − µM) × 0 = µ, and the overall NDT is Finally, for high cache and high D2D rate, i.e., for µ > 1/M and r D > r th D , a fraction

B. Proof of Proposition 5
For the proof of Prop. 5, we use the notation introduced in [11, App. I]. Accordingly, and similarly for Z  We bound H f [1:l] Y [1:l] in (49) as follows where L ≥ 0 is a function of L, independent of P, such that L → 0 as L → ∞; and  .
where the last equality follows from µ ≤ µ 1 .