Efficient Multimedia Broadcast for heterogeneous users in cellular networks

Efficient Multimedia Broadcast and Multicast Services (MBMS) to heterogeneous users in cellular networks imply adaptive video encoding, layered multimedia transmission, optimized transmission parameters, and dynamic broadcast area definition. This paper deals with MBMS by proposing a multi-dimensional approach for broadcast area definition, which provides an effective solution to all of the above aspects. By using multi-criteria K-means clustering, our scheme provides users with high levels of Quality-of-Experience (QoE) of multimedia services. Adaptive video encoding and allocation of radio resources (i.e., time-frequency resource blocks, and modulation and coding scheme) are performed based on user spatial distribution, channel conditions, service request, and user display capabilities. Simulation results show that our solution provides a 70% improvement in user QoE and 86% in number of served customers, as compared to an existing multimedia broadcast scheme.


I. INTRODUCTION
Digital Television (DTV) over wireless networks is a popular application that is becoming commonplace.It consists of several service providers broadcasting multimedia content to stationary and mobile customers on their heterogeneous devices such as smart TV, car-infotainment systems, and smartphones.Multimedia streaming over cellular networks is provided by the standard on evolved Multimedia Broadcast Multicast Services (eMBMS), defined in 3GPP Release 9.An important definition therein is the synchronization area, whose evolved Nodes B (eNBs) are required to be synchronized in time.Within the synchronization area, the standard defines Multicast/Broadcast Single Frequency Networks (MBSFNs), i.e., groups of eNBs that simultaneously transmit the same content using the same radio resources [1].This way, the signal from various eNBs can be combined at the receiver resulting in better quality.It follows that in each cell of an MBSFN, broadcast and unicast transmissions coexist, sharing the cell capacity.
Fig. 1 shows an example of multimedia broadcast scenario in LTE network.Some eNBs are associated with more than one MBSFN area.As a result, such MBSFN areas overlap: a set of MBSFN areas with at least one eNB in common will be referred to as an overlap set.Obviously, MBSFN areas belonging to a particular overlap set should operate at different, non-overlapping frequencies.In the given example, {1, 2, 3}, {2, 3, 4}, {1, 3, 5}, and {4, 5} are the overlap sets.
Previous studies related to modeling of television viewing patterns suggest that TV programs (channels) are consumed by viewers (customers) based on socioeconomic composition, social affiliation, and demographic characteristics [2].This motivates us to use clustering to optimally form dynamic MBSFN areas and allocate eMBMS resources based on the user interests and multimedia content popularity within the MBSFN synchronization area.Specifically, we focus on DTV programs and propose a scheme for MBSFN area formation that accounts for user content demand and location, channel conditions and user display capabilities, and aims at grouping together cells where users have similar features.Then, we present a problem formulation that, given the set of MBSFNs, maximizes the multimedia service quality perceived by the users by adapting video encoding to the radio resources that are available in the cellular network.Simulation results show a very significant gain with respect to an existing broadcast scheme.

II. RELATED WORK
For mobile-rich media content delivery, the video codec standard that is mostly used is H.264/MPEG-4 AVC.The joint video team of ITU-T VCEG and the ISO/IEC MPEG has standardized the scalable video coding (SVC) extension of H.264/AVC, which achieves a rate-distortion performance comparable to H.264/AVC and has the same visual perception quality with at most 10% higher bit rate [3].SVC is primarily used for adaptive multimedia services.The scalability is in terms of spatial resolution, frame rate, and quantization level.The content is in the form of video layers, with the base layer being the most important and essential content that ensures the delivery of a minimum acceptable video quality.The enhancement layers improve the decoded video quality when received in addition to the base layer.LTE eMBMS resource allocation in order to maximize proportional fair to users with heterogeneous channel conditions has been discussed in [4].[4] did not consider the aspects related to adaptive multimedia encoding, QoE performance, and user device heterogeneity.[5] presented a formulation for MBSFN formation, aiming at maximizing the total system throughput as well as a heuristic solution.However, the scheme in [5] does not account for video coding or heterogeneous display capabilities.[6] proposed static clustering deployment of eNBs in LTE system in order to balance downlink spectral and energy efficiency.Clustering based load balancing in LTE networks has been given in [7].Dynamic MBSFN area creation to optimize multicast transmission efficiency has been studied in [8].However, user-centric multi-criteria clustering to dynamically define MBSFN areas, adaptive multimedia encoding, and optimized LTE eMBMS resource allocation, is novel and has not been discussed in literature by far.

III. SYSTEM ARCHITECTURE AND COMPONENTS
The system architecture that we consider for efficient multimedia broadcast to cellular users is depicted in Fig. 2. User Equipments (UEs) notify their serving eNB about their capabilities as well as the requested DTV service.In their turn, eNBs forward such information to the eMBMS gateway and Multi-cell/multicast Coordination Entity (MCE), a node tasked with the coordination of transmissions from multiple cells.In order to efficiently broadcast DTV content to heterogeneous users, eMBMS gateway and MCE properly define MBSFN areas and adaptively allocate radio resources.The multimedia server adaptively encodes multimedia content using SVC based on user requests, UE device capabilities, and radio resource constraints.In our paper, we consider that there are various types of UE devices, each of them characterized by the spatial resolution level s of its display.Displays with more pixels have a higher s.

A. SVC video QoE and rate model
In order to broadcast the multimedia content to heterogeneous UEs, the video content is encoded into SVC layers.The SVC spatio-temporal scalability grid used for broadcasting is as shown in Fig. 3. There, SVC layers are indexed by the (s, f ) pair, where s is the spatial resolution level and f is the frame rate level.Let P be the catalogue of TV programs.For a program p ∈ P, if a total of L p SVC layers are transmitted, then users need to successfully receive all SVC layers till l in order to reconstruct l (l ≤ L p ) layers.Additionally, video may be encoded using a different quantization level q, which yields a tradeoff between bit rate and quality (the higher the q, the poorer the quality but the lower the bit rate requirement).The overall video quality is assessed using a parametric function 0 ≤ Q(q, f ) ≤ 1 that approximates the Mean Opinion Score (MOS), a subjective measure indicating the user QoE.Q(q, f ) has a direct relationship with the MOS [9]: MOS = 4×Q(q, f )+1.The numerical correspondence among Q(q, f ), MOS, and QoE is listed in Table I.
The parameters for the quality model are specific to a video and are based on its inherent features.In this paper we will use the quality parametric model defined in [9].For a given spatial resolution s, Q(q, f ) is a function of the quantization level q and frame rate f , as follows: where Q max is the top quality level of video received at the UE when it is encoded at minimum quantization level q min and at the highest frame rate f max .In order to normalize, we consider Q max to be equal to 1. Clearly, Q f is an increasing function of f , while Q q decreases as q increases (further details can be found in [9]).
We use a similar parametric model for the bit rate, derived from [10].In this model, the bit rate is expressed as a function of quantization level q, frame rate f , and resolution s: where R max is the maximum bit rate of the video sequence with minimum quantization level q min , maximum frame rate f max and maximum spatial resolution s max .We remark that the higher the SVC layer, the larger the resolution value s or the frame rate f , hence the rate requirement.

B. eMBMS resource allocation
For the sake of concreteness, we refer to LTE cellular systems, where the channel quality experienced by each user is represented by the channel quality indicator (CQI).Depending on the value of CQI, the eNB transmits to the user employing an appropriate MCS (modulation and coding scheme).In LTE, downlink radio resources are grouped into resource blocks (RBs), each including 12 consecutive subcarriers and lasting for 0.5 ms.Practically, however, resource scheduling is performed with a periodicity of one subframe, i.e., 1 ms.According to LTE release 11, eMBMS can be allocated a maximum of 192 over 320 subframes [1], i.e., 60% of the available resources.In broadcasting, however, such allocation is driven by the UE, among the ones receiving the broadcast transmission, that experiences the worst CQI in the area.Therefore, for broadcast data over a fraction of σ subframes (0 ≤ σ ≤ 0.6), channel bandwidth B MHz, worst-UE CQI = m, and resulting MCS m with spectral efficiency e m , the capacity is given by: Hence, the capacity of a MBSFN is determined by the MCS and the fraction of subframes used to transmit data.

IV. MBSFN FORMATION AND RESOURCE ALLOCATION
Here we present our solution for the formation of MB-SFNs (Sec.IV-A) and for the allocation of radio resources (Sec.IV-B), in presence of heterogeneous users.

A. Multi-criteria K-means clustering
In accordance with the 3GPP specifications, we consider that each user periodically reports to its eNB its CQI, its device type, and the requested TV program.Based on this information, the MCE determines the MBSFN areas and the program(s) that each area should broadcast.
The algorithm that we propose to make such decisions is based on K-means Clustering.Specifically, given users u i (i = 1, . . ., N ), we devise a multi-criteria clustering approach that exploits the following information related to users: the experienced SINR (γ i ) that can be easily derived from the reported CQI, UE type (τ i ), requested program (p i ) and position of the eNB to which the user is associated (ϑ i ).UEs are then clustered into groups so that users with same interest in DTV program, similar channel conditions, device type and location, will be included in the same cluster.User clusters are then translated into groups of eNBs, i.e., MBSFN areas.Note that, considering users and grouping them into clusters according to the aforementioned criteria, allows the formation of MBSFNs that well adapt to heterogeneous UE conditions and to an inhomogeneous UE distribution over the service area.
Each MBSFN k is initially assigned a set of programs to broadcast, P k ⊆ P, that includes all of those requested by the users in the corresponding cluster.The actual set of programs transmitted by an MBSFN is then determined as described in the next section, so as to account for radio resource availability and different video quantization options.
We recall that clustering is an NP hard problem even in two-dimensions; it follows that our multi-criteria (fourdimensional) clustering approach is NP-hard as well.We  therefore apply the Lloyds K-means heuristic [11] to cluster the heterogeneous users into K MBSFN areas.We initially set K to 1 and we increase it by 1 at each iteration.The algorithm stops when the obtained performance is lower than for the previous value of K, which is then selected as the optimal one.Note that the standard mandates that an eNB cannot be part of more than 8 MBSFNs.Thus, for high values of K, an eNB might violate this constraint.In this case, we enforce the limitation by associating the eNB only to a maximum of 8 clusters that include the highest fraction of its users.The proposed clustering procedure for a given K is reported in Algorithm 1.Note that in the algorithm the cluster centroids are selected using the K-means++ approach [12].Also, beside the K MBSFNs and the associated sets of TV programs P k , the algorithm returns the overlap sets for each program to be broadcast.An overlap set for program p is given by a group of adjacent MBSFN areas that are supposed to broadcast p and that overlap, partially or totally, in space.The use of kmeans heuristic clustering algorithm for user-based clustering and dynamic MBSFN area formation is novel in our proposed scheme.

B. Adaptive multimedia encoding and resource allocation
Given the set of MBSFN areas, the associated set of programs to be broadcast and the overlap sets, we outline a scheme that further refines broadcasting decisions in terms of: (i) which programs should actually be broadcast, given the constraints on radio resources, (ii) which video quantization level should be adopted for each TV program, and (iii) how many resources should be used for broadcasting each program.
Fig. 4 shows the logical representation of the resource allocation (modulation and coding, m k l , and proportion of SFs, σ k l ) to the the SVC layers (1 ≤ l ≤ L p ) of the various TV programs (1 ≤ p ≤ P k ) for MBSFN area k (1 ≤ k ≤ K).Our adaptive multimedia encoding and radio resource allocation makes such decisions with the aim to maximize the overall QoE, accounting for different UE display capabilities and channel conditions.For clarity of presentation, in the following we consider that each user requests only one program.Also, let L p be the maximum number of SVC layers for the generic program p. Given that, we define the user multimedia service quality as follows.
Definition 1. Multimedia service quality Q i for a user u i who has requested program p, is the effective QoE of the user that receives (at most) l i (1 ≤ l i ≤ L p ) video layers for TV program p, subject to its channel conditions and display Algorithm 1 MBSFNs formation for a given value of K Input: Reassign centroid c k to decrease average measure 7) Reiterate (steps 5 and 6) until cluster assignments remain unchanged 8) An eNB with more than two UEs in a cluster k, belongs to MBSFN area k 9) Set of TV programs being broadcast in MBSFN k, P k , are those requested by users in cluster k 10) For each TV program, the overlap sets include MB-SFN areas that are supposed to broadcast the program and spatially overlap Output: K MBSFN areas, {P k } k , Overlap sets capabilities.It is defined as: where q p is the quantization level used for program p, Q(q p , f li ) is given in (1), SIN R thr is the SINR threshold of the MCS m li allocated to video layer l i .
Note that in the above definition it is assumed that the channel characteristics are stationary, and the user SINR remains unchanged during the transmission of all L p layers of a group of frames of the requested TV program.Hence, a user can receive l i SVC layers if its SINR exceeds the threshold corresponding to the MCS m li assigned to layer l i .The condition is checked for the l i -th layer only since the higher the SVC layer, the higher the required rate, hence the MCS and the corresponding SINR threshold.Furthermore, we remark that Q(q p , f li ) ≥ 0.25 corresponds to a video quality level better than 'Fair', as given in Table I.Thus, the service quality experienced by user u i is zero unless it corresponds to a value of MOS that is high enough.
Next, recall that the video encoding into SVC layers for a program p is determined by the values of spatial resolution and of the video frame rate.In each MBSFN, the maximum value of spatial resolution is determined by the type of UEs that have to be served.Given a generic layer l, the associated video quality Q(q p , f l ) is a function of the quantization level q p and of the frame rate f l (with the latter being a fixed value once l is given).Thus, for adaptive multimedia encoding and optimal radio resource allocation, we perform our optimization, in each MBSFN k (1 ≤ k ≤ K) and for each program p in P, with respect to the quantization level q k p .The objective is to maximize the multimedia service quality for the N users of the system, subject to the constraints on the system capacity.Specifically, it must hold that:   The resulting formulation is as follows, where the inequalities from (5.a) to (5.d) express the above constraints from (a) to (d), respectively.
∀ eNB j (5.a) where q k = [q k 1 , . . ., q k P k ] (k = 1, . . ., K) is the vector including the quantization levels associated with the programs to be broadcasted in MBSFN k (p ∈ P k , 1 ≤ k ≤ K). 1 1 p,j is an indicator function taking 1 if program p is actually broadcast by eNB j and 0 otherwise.Note that p is broadcast by j if j is part of at least one MBSFN k, with p ∈ P k , such that a nonzero number of users enjoy a service quality Q i > 0 for p.This implies that constraints (5.b) and (5.c) account for the resource requirement due to program p only if p is broadcast by j.Furthermore, we stress that these constraints are imposed for each eNB rather than for each MBSFN, as an eNB can be part of more than one MBSFN and, thus, more stringent conditions may hold for it.Given program p and its generic layer l, q appearing in (5.b) and (5.c) are auxiliary variables referring to the overlap set o to which eNB j belongs.Clearly, overlap set o is updated based on the programs that are actually broadcast in the MBSFN areas.The bit rate required for the transmission of SVC video layer l of program p, R(q p , f l , s l ), and C(σ l ) are given, respectively, in (2) and in (3).
The solution of our optimization problem provides: (i) the optimal set of programs to be actually broadcast in each MBSFN area (represented by the values of the indicator function); (ii) the optimal array of SVC quantization levels to be used for TV programs broadcasting in the K MBSFN areas (i.e., [q 1 , . . ., q K ]), and (iii) the optimal strategy for radio resource allocation, i.e., MCS level m k l and time-frequency resource allocation σ k l , for broadcasting the video layers of the different TV programs in each MBSFN.Also, the following theorem holds.
Proposition 1.The objective function in (5) is a strictly convex function of the quantization levels [q 1 , . . ., q K ].
Proof: The non-negative weighted linear sum of convex functions is convex [13].Hence, in order to prove that our objective function is a strictly convex function of the quantization levels, it is sufficient to prove it for a generic Q i .Using (1) and ( 4), Q = αQ q (q), where α = Q max Q f (f ).Considering the detailed expression provided in [9], the second derivative of Q i with respect to q k p (i.e., the quantization level of the program requested by u i in the MBSFN(s) to which the UE belongs) is of the form c•e −q k p /qmin (with c > 0 and constant).This is positive thus proving the assertion.
In conclusion, the optimization formulation in ( 5) is a constrained convex maximization problem that can be solved using the branch-and-bound algorithm [13].

V. PERFORMANCE EVALUATION
To assess the performance of our scheme, we have considered test videos with different spatial and temporal variances.The snapshots of these video sequences, along with their spatial perceptual information (SI) and temporal perceptual information (TI) measures [14], are shown in Fig. 5. SI quantifies the complexity of the video spatial details, while TI indicates the amount of temporal changes of a video sequence.As far as the network scenario is concerned, Fig. 6(a) shows the sample LTE network scenario under study.It includes 1000 uniformly randomly distributed users (user density is in accordance with studies in [6]) and [8]), across 10 eNB cells.Three types of devices (shown in Fig. 3) are present in the proportion 2:1:1, i.e., 50% of UEs are assumed to be of lesser spatial resolution (smaller display size, as in smartphones).This assumption follows from the recent studies [15] on user device proportions in multimedia and data networks.Each user randomly selects the DTV program (4 programs in this scenario).The LTE system simulation parameters are listed in Table II along with other settings in accordance with [1], [4], and [8].For each user in an LTE MBSFN area with a given number of interfering cells, the SINR is computed according to [6].The performance of the proposed multimedia broadcast system is obtained by averaging the results over 100 iterations with uniformly randomly distributed users.Scenarios with increasing number of users, clusters, and TV programs have also been studied.
Given the above scenario, our approach leads to the formation of 4 MBSFN areas, as shown in Fig. 6(a), with overlap sets {1, 2} and {3, 4}.In order to better understand the significance of multi-criteria K-means area formation, Fig. 6(b)-(d) shows the total number of users, number of users requesting each TV program, and number of users of each type, within the MBSFN areas.As evident from Fig. 6(c), each MBSFN area has to allocate resources for only two TVprogram each.
Next, we use the average QoE and Churn count as performance metrics to compare our solution against the DTV broadcast scheme proposed in [4] for LTE eMBMS, which we refer to as "Fair allocation".Churn count is the total number of users dropped by the system from the ongoing multimedia service, i.e., the users experiencing less than 'Fair' QoE level during DTV reception.Fig. 7 depicts the effect of clustering in our multimedia broadcast solution with increased number of TV programs being broadcast.It is evident that clustering improves performance in terms of more number of users getting served with higher QoE.Even with increase in number of TV programs to be broadcast, clustering results in an improved performance as compared to a system without user-based clustering.
Fig. 8 depicts the performance of our multimedia broadcast solution with respect to the Fair allocation scheme, as the number of users increases.It is evident that our methodology  results in higher average QoE (on average 71% higher) and lesser churn count (on average 86.4% lesser), i.e., more users get served, as compared to Fair allocation.This gain is due to the user-centric adaptive multimedia encoding, multi-criteria K-means MBSFN area formation, and optimal eMBMS resource allocation in LTE multimedia broadcast network.

VI. CONCLUSIONS
We introduced a novel and efficient multimedia broadcast scheme for cellular networks, which significantly improves the overall user QoE.Our approach accounts for heterogeneous display capabilities and channel conditions of the users in the network, as well as for different multimedia service requests.The scheme leverages the multi-criteria K-means method to dynamically define MBSFN areas, and it optimally determines multimedia content encoding and radio resources allocation so as to maximize the user QoE.Realistic simulations have shown that our solution greatly outperforms a recently proposed technique, providing an 86% increased number of heterogeneous users at an appreciably higher QoE level.
Future work will further investigate the proposed scheme in larger and more complex scenarios, and it will address revenue-based dynamic radio resource allocation for multimedia broadcast services.
a. at each eNB the total rate requirement due to broadcast transmission cannot exceed the available capacity; b. the fraction of time-frequency resources allocated to broadcasting by each eNB cannot exceed 0.6; c. all MBSFNs that belong to the same overlap set o for program p must use the same quantization level q (o) p , allocate the same fraction of resources σ layer and use the same MCS m ; d. the selected MCS for a video layer broadcast must belong to the set of allowed values.

Fig. 6 .Fig. 7 .
Fig. 6.(a) Multi-criteria K-means area formation in the sample LTE network, resulting in 4 MBSFN areas.Number of users (b) in each MBSFN area, (c) requesting each TV program, and (d) of each type in each MBSFN area.

Fig. 8 .
Fig. 8. (a) Average QoE and (b) Churn count (number of users dropped), as the total number of users increases.