A Survey on Parametric QoE Estimation for Popular Services

– As we are moving forward to the 5G era, we are witnessing a transformation in the way networks are designed and behave, with the end-user placed at the epicenter of any decision. One of the most promising contributors towards this direction is the shift from Quality of Service (QoS) to Quality of Experience (QoE) service provisioning paradigms. QoE, i.e., the degree of delight or annoyance of a service as this is perceived by the end-user, paves the way for flexible service management and personalized quality monitoring. This is enabled by exploiting parametric QoE assessment models, namely specific formula-based QoE estimation methods. In this paper, recognizing a gap in the literature between the lack of a proper manual regarding the objective QoE estimation and the ever increasing interest from network stakeholders for QoE intelligence, we provide a comprehensive guide to standardized and state-of-the-art quality assessment models. More specifically, we identify and describe parametric QoE formulas for the most popular service types (i.e., VoIP, online video, video streaming, web browsing, Skype, IPTV and file download services), indicating the key performance indicators (KPIs) and major configuration parameters (MCPs) per type. Throughout the paper, it is revealed that KPIs and MCPs are highly variant per service type, and that, even for the same service, different factors contribute with a different weight on the perceived QoE. This finding can strongly enable a more meaningful resource provisioning across different applications compared to QoE-agnostic schemes. Overall, this paper is a stand-alone, self-contained repository of QoE assessment models for the most common applications, becoming a handy tutorial to parties interested in delving more into QoE network management topics.


Introduction
Quality of Service (QoS) is currently considered inadequate for the thorough characterization of a provided service. The reason is that QoS handles purely technical aspects regarding a service, so it does not incorporate any kind of human-related quality-affecting factors nor does it reflect the actual user experience. This means that the same QoS level might not guarantee the same perceived quality level for two different users. In addition, apart from the system's technical characteristics, other factors, such as the context of use (e.g., surrounding environment, type of device used, etc.), humanspecific characteristics (e.g., demographics or current state), the delivered content and the pricing of a service, have a significant impact on the finally perceived user's experience, aspects not captured by QoS. Furthermore, the effect of QoS technical factors on the user's quality perception is, in general, governed by a non-linear relation. Hence, towards the next generation of mobile communications, new terms, such as the Quality of Experience (QoE), have been coined to better depict the end-users' perception of a provided service.
QoE represents the overall quality of a provided service, as it is perceived by the end-users, and as such it is a very appealing alternative for evaluating the quality of a provided service. Similarly to QoS, QoE may be incorporated in network mechanisms and specifically in network decision processes. "QoE-driven" or "QoE-aware" algorithms can help the network function in a more efficient and effective way. For instance, QoE may be both the criterion and trigger factor of Mobility Management (e.g., [1]) and Radio Resource Management (e.g., [2]) mechanisms, replacing presently used criteria, such as Signal-to-Interference plus Noise Ratio (SINR) measurements. QoE has been even identified as a driver in Software-Defined Networking (SDN) architectures, e.g., in [3], where by monitoring video QoE metrics at the client side, and by dynamically selecting delivery nodes and routing paths, the QoE of video streaming applications is improved.
QoE-awareness may also drive a more resource-efficient network operation, by helping recognize when the provision of extra resources would not improve the QoE that is eventually perceived by the end-user (e.g., [4]). In other words, "over-engineering" could be avoided through QoE monitoring and control. Additionally, QoE awareness may be exploited in many other aspects by network designers and network operators. For instance, network protocol designers may evaluate their implementations based on QoE metrics. Moreover, network problems such as bottlenecks or local failures, may be identified by insufficient QoE values, leading to QoE-improving actions (proactively or reactively). QoE can also drive virtual machine placement, which is an important challenge for modern data centers, aiming to maximize the resource usage but also the final impact on the user experience. Therefore, optimization-based algorithms for virtual machines placement like the promising approach of [5] may be enhanced with QoE objectives, by slightly modifying the already proposed utility functions. However, without the existence of proper estimation methods, such QoE-based solutions are impossible, driving stakeholders to still rely on QoS metrics, despite the recent boost in QoE literature.
Recognizing the importance of QoE awareness, and motivated by the lack of a proper "repository" for QoE estimation methods, we study some approaches that can be used to efficiently acquire the QoE level of a provided service. We focus on parametric QoE models, a specific category of QoE estimators, which are appealing for live estimation of QoE based on monitored network performance indicators, namely during actual network operation. More specifically, we provide a thorough description of the dominant QoE estimation models in the literature for the most important service types and applications. Our main intention is to provide a detailed reference map towards efficiently and objectively acquiring the users' perception of a provided service. To this end, various numerical results for different parametric QoE estimation models are presented on a per service type basis. Therefore, the contribution of this paper is three-fold: a) the identification and description of QoE models and metrics that can be used for real-time QoE monitoring and QoE management inside an operating network, b) the quantification of the contribution and impact of key quality influence factors on the achieved QoE per service type, and c) the proposal of potential research and exploitation directions based on this collected knowledge.
The structure of this paper starts with a taxonomy of QoE estimation methods, dividing them into two main categories: namely subjective and objective. Emphasis is given on objective QoE estimation where various evaluation methods are discussed. The parametric category of objective QoE estimation is identified as the most appealing one for online network control, and therefore, the paper then focuses on this specific category only. Our study is done per-service, since QoE is highly service dependent. First, standardized QoE methods are described (referring to VoIP and real-time video services), followed by the non-standardized literature-based methods (referring to the popular services of FTP, IPTV, Skype and video streaming). Finally, a concise summary of this study is provided together with a discussion on possible research and exploitation directions, before the final conclusions.
Based on the previous, the remainder of this paper is organized as follows. In section 2, we classify the QoE estimation approaches, highlighting the importance of parametric QoE estimation. In section 3, we present and evaluate standardized QoE estimation models, while, in section 4, we describe literature-based dominant parametric QoE estimation models for different types of services. In section 5, we summarize guidelines and study directions towards designing and efficiently exploiting parametric QoE estimation models. Our conclusions are included in section 6.

QoE estimation taxonomy
Current literature on the topic of QoE modeling mainly offers classifications of existing standards and focuses on one specific service at a time. For instance, [6] studies speech quality estimation and provides a detailed taxonomy of standardized objective speech quality prediction models. Similarly, [7] conducts a thorough survey of QoE assessment approaches for VoIP services, while [8] focuses on IPTV. However, these considered service types are just a subset of the plethora of services available in current networks. With the availability of 4G and with 5G on the horizon, which allows the coexistence of multiple parallel resource-hungry requests, applications like video streaming and video on demand constitute the prevalent traffic over a network.
What is more, survey papers on QoE estimation usually focus on models that require the originally transmitted signal or part of it to deduce the QoE at the receiver side, not targeting in this way at realtime network management application (e.g., [9]). On the contrary, parametric QoE models are appropriate for this type of scenarios; however, a handy collection of these models for different types of services is currently missing from the literature.
There are various different approaches for quantifying the QoE level of a provided service. A primary classification of the available approaches is based on whether the QoE is evaluated directly by humans or automatically through technical factors. In the first case, specific assessment processes are used, referred to as subjective tests, while in the second case mathematical formulas or algorithms are exploited, referred to as objective models. The main classification of QoE models is presented graphically in Figure 1, and is further discussed in the next subsections.

Subjective QoE estimation
Subjective tests are usually based on controlled real life experiments with human participants who directly evaluate their experience of an application or service. These users may be involved in the experiment in a passive way (just viewing/listening) or in an active/interactive way (participating in a conversation) and they judge the quality regarding some stimulus' presentation. For instance, the participants may be called to evaluate the listening or conversational quality of a phone service, the quality of a video, etc. These tests need to be thoroughly designed in advance and the user group needs to be properly selected based on guidelines and recommendations by standardization bodies. Perhaps the most important recommendation towards that direction is the ITU-T P.800 [10]. Various techniques may be used for subjective evaluation. For instance, users may score the quality using an absolute rating scale or they may compare sequential images/videos/sounds stating which one is better. The results are based on user opinions, past experiences, expectations, user perception, judgement and description capabilities, etc. and primarily quantify the effectiveness, efficiency and overall satisfaction of using a service.
These kinds of subjective tests are considered as the most reliable ones, since they incorporate any conscious and unconscious aspects of human quality evaluation, aspects that can otherwise not be captured. Only perceptual quality tests can validly and reliably express the internal state of the human factor. Nevertheless, such subjective techniques are considered reliable, if and only if they are designed carefully and users are unbiased and objective.
One drawback of the above method is that the results of such experiments are valuable only for the laboratory testing of some service, and not for real-time QoE support. One way to overcome this issue is to conduct "real-service" QoE evaluation, where users rate their experience on the run (in-service) or after a service has ended (post-service). Such an example is the "OneClick" paradigm, which may be used for real-time QoE monitoring and feedback, and consequently for QoE control. This framework only requires a subject to click a dedicated key whenever he/she feels dissatisfied with the quality of the application in use [11]. Furthermore, an example of post-service test is that of Skype, where users rate their experience once a session is terminated, using the Mean Opinion Score (MOS) scale.
Subjective experiments in controlled laboratory environments need thorough design that strictly follows guidelines provided by standardization bodies. These guidelines describe all aspects such as room conditions (e.g., isolated room, without any noise), audio headset or generally the dedicated equipment used for hearing/viewing/talking, test methodologies, guidelines for the selection of the panel, etc. Regarding the latter, there are guidelines regarding the number of participants, their age, their background (experts or non-experts), their past involvement in similar experiments, the randomness in their selection, etc. However, lately there is also a trend to evaluate the quality of an application in a more relaxed way, i.e., at one's own and familiar environment, using one's own equipment and so forth. In this kind of experiments, a service is evaluated using "streaming" or "download" approaches. These methods are considered as more realistic and are open to a much broader public comparing to laboratory experiments, thus allowing for better management. Indeed, a large number of participants may reveal very reliable and realistic QoE scores. Approaches that follow this paradigm are called "Crowdsourcing" techniques [12], because they outsource the task of quality evaluation to arbitrary anonymous online users. One such example is the Google Microworkers platform as well as the Amazon Mechanical Trunk, where an Internet user may conduct QoE experiments designed by other parties (such as researchers), who require a general public for an evaluation task.
Finally, an important issue in subjective test methodologies is the discrimination between "instantaneous" and "overall" quality evaluations. The former method implies a continuous evaluation of the perceived quality by the end-user during one experiment (see ITU-T P.880), whereas the latter simply requires that the user gives one cumulative score for his/her own experience at the end of each experiment. The first method gives a better insight to the system designers, since they can correlate the instantaneous quality with momentary technical parameters in the network; however the latter better describes the overall user experience.

Objective QoE estimation
Subjective tests are costly, time-consuming and not reproducible on demand. Moreover, they are usually not real-time and hence cannot be used for in-service quality monitoring. These constraints have raised the need for the development of objective models that try to measure or predict the quality perceived by end-users, without their intervention. The objective models may be classified using various criteria [13], [14]:  Reference signal utilization: Regarding whether the source signal or part of it is required or not in the QoE estimation process, we distinguish the Full Reference (FR) or reference-based or double-ended models, the Reduced Reference (RR) models and the No Reference (NR) or singleended models, where "reference" refers to the original signal. [9] conducts a survey on the evolution of video quality assessment methods using this classification.
 Evaluation method: Regarding the kind of input information that is used for QoE measurement, we distinguish the: Media-layer (signal-based), Packet-layer / Bitstream, and Parametric models.
Media-layer models make use of signals and may be Full-, Reduced or No-Reference; Packetlayer models extract information from packet headers, while bitstream models may use both packet headers and payload data. Parametric (or parametric planning) models use network planning parameters and measure the values of specific network metrics. Packet-layer and bitstream models are also referred in literature as "protocol-information-based" models, because they base their estimations on parameters collected at run time from network processes and control protocols. Various surveys in the literature review media-layer models (e.g., [15] thoroughly discusses media-layer models for video quality assessment), while others focus on packet-layer/bitstream models (e.g., [16]). Finally, [17] conducts a study of the correlation models mapping QoS to QoE for multimedia services, providing in this way generic formulas that parametric models usually follow.
 Model mode: Regarding the injection of a test signal to the system under test or not, we have: Intrusive (active) and Non-intrusive (passive) modes respectively.
 Model timeframe: Offline models refer to pre-service or post-service evaluation, while online models require in-service evaluation.
 Usage purpose: The usage purpose of a QoE estimation model may vary from network planning, to real-time service monitoring, optimization, benchmarking, etc. The QoE models should be carefully used only within their scope.
Parametric QoE models are basically derived by conducting subjective experiments (lab or crowdsourcing) and then by performing statistical analysis (e.g., regression analysis) on the acquired evaluation results. The derived objective models may be then well-described by providing formulas for the direct computation of QoE based on specific input parameters. On the contrary, signal-based models are based on one-to-one comparison between the original source signal and the degraded destination signal, by exploiting knowledge from the area of Psychophysics.
Also worth mentioning is a third category of QoE modeling, which lies between subjective and objective ones. It operates in a hybrid fashion, namely it works as an automatic and objective quality estimator, relying however on prior available subjective scores. These hybrid methods are based on Machine Learning tools, and they are using subjective test scores as input to train a QoE model. This model then maps network parameters (e.g., codec used, packet loss rate, mean loss burst size, packetization interval, one-way delay, jitter, etc.) to Mean Opinion Score (MOS) values and it can be further used for real-time quality prediction. Characteristic examples of this approach are the Pseudo-Subjective Quality Assessment (PSQA) method [18] and the MLQoE, a modular algorithm for usercentric QoE prediction [19].
What is more, some research works propose methodologies for the construction of objective QoE models. For instance, [20] describes basic principles for building a QoE model, which is based on the egress of parameterized mappings among three layers: the transport-layer, the service-layer and the end-user layer (bottom up). Similarly, in [21], the authors build a QoE estimation function based on a general regression model and prove its applicability to web-browsing and file upload/download scenarios.
At the moment, most objective models account for the human factor in terms of their inherent characteristics, but the context and content of the service are considered only at a limited extent. Under this observation, more work is needed for designing more accurate objective estimation models. More specifically, extra forces should be allocated toward the designing of new parametric QoE estimation models, since they are currently the most appealing candidates for quantifying QoE levels in an indirect and user-transparent way in mobile networks [22]. Taking this into account, the dominant parametric QoE estimation models are studied in this paper.
Having described the taxonomy of QoE methods, we now focus on objective QoE estimation and specifically on parametric QoE evaluation methods. Our goal is to study parametric QoE estimation approaches for popular services, since this category is particularly useful for real-time network management and control. We, therefore, discuss the key models for file download (FTP), web browsing, voice over IP (VoIP), online/real-time video (e.g., IPTV or video-calls), video streaming services (e.g., YouTube), and Skype, explaining in detail the factors and the parameters that each model requires. Our structure is as follows: we first study standardized QoE approaches and then literature-based ones. The former study handles VoIP and online/real-time video QoE estimation, while the latter refers to the remaining popular services.

Standardized parametric QoE estimation
In this section, we present the two standardized parametric models that are available in the literature, namely G.107 for VoIP and G.1070 for online video. Our main goal is to study the impact of various factors on the user's QoE; however, for completion, we also present the most substantial parts of these standards, the full versions of which may found in the ITU-T portal. In this way, this document can comprise a stand-alone guide of QoE estimation models.

Parametric QoE estimation for VoIP services
In VoIP applications, similarly to conventional telephony, the QoE is expressed in terms of how clearly the user can listen and understand his or her interlocutor's speech, and how easy or not the communication is due to potential arrival delays of speech internet packets. Because of this, the models for this service are divided into the following categories: listening-only and conversational. Subjective assessment methods, in VoIP services, are based on four testing axes [23]: Comprehensibility, Multi-dimensional, Listening Quality and Conversational Quality tests. The MOS is the most extensively used measurement scale for observations of this kind. Concerning parametric objective methods, the ITU-T Rec. G.107, a.k.a. the "E-model" [24], [25] is the most reliable and representative approach.

The basic rating factor
The E-model provides a formula that can be used for the computation of the transmission quality of voice communications by estimating the mouth-to-ear conversational quality as perceived by the user at the receive side, both as listener and talker ( Figure 2). It is a parametric model that takes into account a variety of transmission impairments producing the so-called Transmission Rating factor ( factor). The conversational quality is estimated by means of this rating factor R, scaling from 0 (worst) to 100 (best): where:  0 represents in principle the basic signal-to-noise ratio, including noise sources such as circuit noise and room noise.
 is a combination of all impairments which occur more or less simultaneously with the voice signal.
 represents the impairments caused by delay.
 − represents impairments caused by low bit-rate codecs (effective equipment impairment factor). It also includes impairments due to randomly distributed packet losses.
 The advantage factor allows for compensation of impairment factors when the user benefits from other types of access to the user. Reference connection for the E-model [24] Worth mentioning at this point is the methodology proposed in [26] and presented in Figure 3. According to this, the E-model is reduced to transport-level parameters only, which can be easily measured within the network. The main idea is to combine transport-level measurements such as delay and packet loss with architectural-specific parameters such as the de-jitter buffer at the receiver side to get an estimation of the effective equipment impairment factor − presented above. This methodology can be therefore directly used for VoIP quality measurement and monitoring. The detailed calculation of each one of the factors used in G.107 is provided in the following subsections, while summarized default values can be found in Appendix A.

Basic signal-to-noise ratio,
represents in principle the basic signal-to-noise ratio, including noise sources such as circuit noise and room noise and it is calculated as follows: (Send Loudness Rating) represents the loudness rate at the sender and the power addition of different noise sources. More specifically, is as follows: = 10 log [10 10 + 10 10 + 10 10 + 10 10 ] where is the sum of all circuit noise powers, all referred to the 0 dBr point, and is the "noise floor" at the receive side.
is the equivalent circuit noise at the 0 dBr point, caused by the room noise at the send side: where, is the D-Value of telephone at the sender side (normally = 3) and the overall loudness rating ( ) is the sum of the loudness rate at the sender and the receiver (denoted by and , respectively), i.e.:

= +
Similarly, is the equivalent circuit noise at the 0 dBr point for the room noise at the receive side: This factor includes all the potential degradations that can arise together with the voice transmission. Three main impairments are considered: where represents the decrease in quality caused by too-low values of OLR, represents the impairment caused by non-optimum sidetone, and represents the impairment caused by quantizing distortion. More specifically the formula used for the calculation of is: where: For the factor the formula is the following: refers to the number of quantizing distortion units from the sender to the receiver.

Delay impairment factor,
The impairment factor represents all impairments due to the delay of voice signals and it is further divided into three factors as follows: More specifically, the factor represents the impairment due to echo at the talker: = − 40 1 + 10 1 + 150 Note that when < 9 , the term is replaced by the term : while when > 20 , the term is replaced by the term: The factor represents impairments due to listener echo calculated by: where: Finally, the factor represents the impairment caused by too-long absolute delay , which occurs even with perfect echo cancelling: The and reflect different use cases and user groups. Two aspects are addressed by these settings:  The interactivity of the conversation and the sensitivity of the users to the delay-effect  The application scenario, that is, whether a given call is being made in a business context or in an everyday situation. Even if users may not notice the delay, it may be very critical for the efficiency or even effectiveness of a given call, for example in a business context. Applicable only in cases where it is known that users have very low sensitivity to delay, e.g., in primarily non-interactive cases, such as to mainly listen to a conversation or a lecture.
As a consequence, in case that it is uncertain what user group or what application scenario is being addressed with the planned service, it is recommended that the default class is used. Any case of nondefault value usage should be explicitly mentioned when reporting results. Based on these considerations, Table 1 depicts the settings that are recommended for parameters and for different use cases.

Equipment impairment factor, −
The − factor is based on metrics close related to packet losses. More specifically: where:  is the codec related factor and its potential values have been defined based on subjective results. Recommended values can be seen in Table 2.
 is the packet-loss robustness factor  is the packet-loss probability, and  is the burst ratio, which is defined as: When packet loss is random (i.e., independent) = 1, and when packet loss is bursty (i.e., dependent) > 1.
For example, for packet loss distributions corresponding to a 2-state Markov model with transition probabilities between a "found" and a "loss" state, and between the "loss" and the "found" state, the burst ratio can be calculated as:

Advantage factor,
The use of factor and its selected value in a specific application is up to the planner's decision. However, the values in Table 3 should be considered as absolute upper limits for factor .

Communication system (Max value)
Conventional wired communication 0 Mobility by cellular networks in a building 5 Mobility in a geographical area or moving in a vehicle 10 Access to hard-to-reach locations, e.g., via multi-hop satellite connections 20

Quality metrics derived by the basic rating factor
The transmission rating factor ranges from 0 to 100, where = 0 represents an extremely bad quality and = 100 represents a very high quality. Even if the factor can be directly used as an assessment value, in most of the times it is preferable to transform it to MOS values to retrieve results comparable with results provided by subjective methods. The transformation formula is as follows:

Numerical results
Below we quantify the impact of key parameters referred in the previous section on the MOS domain for VoIP services. First of all, Figure 4 depicts the MOS level for an increasing value of the echo path delay. As expected, an increase at the echo path delay leads to QoE downgrade. However, the rate of reduction varies depending on the type of user and the delay value. When the user's profile refers to high delay sensitivity (e.g., user class 1), then an increase of causes a steeper decline in QoE, compared to a delay-tolerant user (e.g., user class 3). In Figure 4, we can also observe that for < 150 , all three user categories perceive the same, almost constant quality, whatever their requirements. This suggests that the value = 150 is the maximum acceptable delay for VoIP services, namely the delay value by which no rapid change in perceived QoE of speech is observed. Once the delays become greater than this threshold, an exponential degradation is observed, the slope of which depends on the parameter , i.e., the term that expresses the user's sensitivity to delays.  Figure 5 studies the relationship between QoE and packet loss probability. It is shown that higher MOS can be offered by a coding system with smaller equipment downgrades and greater robustness (resistance) to packet loss. Specifically, the relationship between QoE and packet loss rate is exponential, depending on the and values. As it can be observed, the smaller the value of the parameter , the higher the starting point of the curve (higher MOS). Additionally, the term seems to affect the slope of the curves, with more steep curves for small values of . A typical example is the case of G.711 in Figure 5. For this case, the small deterioration due to equipment impairments ( = 0) leads to a high QoE for a small percentage of lost packets ( ); however a smaller packet-loss robustness factor ( = 4.3 ) leads to a steep slope, i.e., to a sharp QoE degradation as the percentage of lost packets ( ) increases.  Figure 6, we can observe that as the intensity of noise at the receiver increases, the feeling of good speech quality decreases. This is a reasonable result, but a quite interesting observation is that the parameter slightly affects the way that QoE is related with . More specifically, for < 45 , and for all evaluated values, no effect on QoE level is observed. This quantifies the noise range that does not influence the QoE. However, for higher noise values the QoE level reduces exponentially.
Finally, Figure 7 depicts the impact of the environment at the sender side on the QoE level at the receiver, and as it can be observed, as the noise increases the QoE decreases exponentially. The way that this exponential decrement is performed depends on the value. Practically, since possible errors and faults in the device of the sender travel throughout the network, small noise degradations at the sender device escalate on their way to the receiver causing notable impact to QoE, as depicted by the shift of the curves in Figure 7.

Parametric QoE estimation for online video
In this section we focus on the online video service and we examine the ITU G.1070 recommendations [27] for QoE estimation. ITU Rec. G.1070 describes a parametric model applicable to online multimedia services over IP, such as a video conference. The model consists of three functions, named the video quality estimation, the speech quality estimation and the multimedia quality integration functions.
The degradation caused by pure delay is considered only in the multimedia quality integration function. The model provides three output quality metrics in MOS scale, named the multimedia quality ( ), the video quality influenced by speech quality ( ( )), and the speech quality influenced by video quality ( ( )). Note that various implementations can be found for a coding technology (e.g., MPEG-4 codecs) due to variations in coding-parameter settings and decoder characteristics. Therefore, the coefficients of video and speech quality estimation functions in this model were determined by referring to tables prepared in advance for each video and speech codec.
The framework and methodology of G.1070 is presented in Figure 8, where the key influence parameters for each one of the three aforementioned dimensions are presented (multimediavideospeech). By mapping video, speech and common assumptions into specific coefficients, the impact of e.g., terminal type, monitor characteristics, environmental noise and conversational task on the multimedia quality is quantified.

Narrowband speech
For the estimation of the narrowband speech quality an index similar to the factor for VoIP communications is defined. This index is denoted by and it is calculated as follows:

Wideband speech
For wideband speech the index is refined as follows: Similarly, to the speech quality for narrowband speech, the speech quality for wideband speech is given by the following formula: where:

Video quality estimation function
The function that provides the video quality ( ) is: Note that when = , then = , and represents the maximum video quality at each bit rate: 0 ≤ ≤ 4, 3 , 4, 5 : Additionally, represents the degree of video quality robustness due to frame rate and is given by: Finally, parameter is as follows: For the case studies defined in Table 4, we depict in Table 5 the values of the , = 1,2, … 12 coefficients.

Multimedia quality integration function
The multimedia quality, , is estimated as follows: The first term in the formula above ( ) is: while the term is: where: Some values for the , = 1,2, … 14 coefficients are given in Table 6.

Numerical results
Below, we study the influence of key quality parameters on the MOS level of online video services. First, of all, we evaluate the impact of video delay on QoE and present the results in Figure 9. The curves in this figure show a "split-linear" behavior for the QoE level as the video delay increases. As it can be observed, the higher QoE level of a multimedia service, among other conditions, requires similar delay values between speech and video signals. This explains why in each curve of Figure 9, the higher point is observed when = (i.e., full synchronization of speech and video). According to Figure 10, an exponential decreasing behavior for the QoE is observed as the percentage of lost packets increases. However the rate of the QoE degradation, i.e., the slope of the curves in this figure, depends on the characteristics of the video and of the projector. Specifically, in Figure 10 it is shown that with MPEG-4 encoding type, QQVGA type analysis (i.e., 160x120 or 120x160 pixels), and 2.1 inches display size, the packet losses affect the end user's QoE less. In other words, it is revealed that in a network with high packet loss rates the multimedia services projected to small (mobile) screens ("smartphones") offer users a better experience than large computer monitors. Figure 10: MOS for increasing packet loss probability Figure 11 depicts the relationship between the QoE level and the video frame rate. As it can be observed, for increasing frame rate the QoE increases up to a certain point (maximum) and then decreases exponentially. This behavior is reasonable considering that the frame transmission rate ( ) indicates the number of frames per second entering the network, while the bit rate, denoted by , represents the serving rate of the network. Having studied standardized parametric QoE estimation for VoIP and real-time video, next we move on to literature-based estimation methods for other popular services (namely FTP, web browsing, IPTV, video streaming, Skype). Since standardization efforts are still ongoing for these common services, we present and analyze well-cited models from research literature.

Literature-based parametric QoE estimation
In this section, we describe non-standardized parametric QoE models which can be used for a reliable estimation of QoE for various types of services, namely file download (FTP), web browsing, lossy video streaming (IPTV), lossless video streaming (conventional and adaptive YouTube) and Skype applications.

Parametric QoE estimation for FTP services
The main characteristic of FTP services is that there is no need for a continuous and in-sequence packet arrival. Taking into account that the delay expected by the end-user is proportional to the size of the downloaded file and the fact that the FTP service is not adjusted in the application layer, the data rate is the dominant factor that affects the QoE level. More specifically, the model that provides MOS for an FTP service is as follows [28]: where represents the data rate of the correctly received data, i.e., = • (1 − ) where is the data rate and the error ratio. The values of the 1 and 2 coefficients are obtained from the upper ( + ) and lower rate ( − ) expectations for the service. For instance for − = 8 and + = 315 it holds that 1 = 2.5037 and 2 = 0.3136, while the estimated MOS values are depicted in Figure 12.

Parametric QoE estimation for web browsing
The main observation for web browsing services is that the delay is the key QoE performance indicator. A long waiting time for the response of web will make users lose patience and negatively affect their perception for the provided service. Taking this into account, the model described in [29] is a good candidate for QoE estimation. This model is based on subjective validation tests, and the resulted empirical formula is as follows: where is the response time. In Figure 13 the estimated MOS for different response times is depicted.

Parametric QoE estimation for video streaming
Due to their increasing popularity, video services cause the majority of traffic over the Internet, while they are characterized by high resource requirements. Therefore, the estimation of QoE for video streaming applications becomes of notable importance. In this section, we study two different types of video streaming: a) IPTV, which is a lossy type of service, and b) Video on Demand, which is a lossless service type. For the latter, we focus on the paradigm of YouTube, and further study it in two versions, namely adaptive (or monolithic) and non-adaptive streaming over HTTP.

IPTV
IPTV is a common video streaming service. It is UDP-based and is therefore prone to packet losses. In [30], [31] a MOS prediction formula is proposed for three video content types, named "Slight movement (SM)", "Gentle walking (GW)" and "Rapid movement (RM)". This formula considers the objective parameters Send Bitrate ( ), Frame Rate ( ) and Packet Error Rate ( ): where the coefficients: 1 , 2 , 3 , 4 , 5 , are obtained by linear regression of the proposed model with the training set of video sequences. More specifically, the values depicted in Table 7 have been experimentally calculated.  In Figure 14 we consider the values included in Table 7

YouTube -conventional streaming
Another type of video content delivery that deserves attention is that of streaming pre-encoded video, known as Video on Demand (VoD). YouTube is the most popular paradigm in this category. YouTube is not subject to packet losses, since the connection is TCP-based. The following description regarding YouTube QoE analysis is based mostly on [32], the authors of which demonstrate an extensive literature on the topic. The most popular models of YouTube QoE are built based on subjective experiments, conducted either in controlled laboratory environments or using crowdsourcing tests and field studies. Through these well-designed experiments, the system-level key influence factors that affect the YouTube video delivery quality may be found, which are:  Number of stalling events,  Duration of stalling events,  Total video duration, (significant is the total stalling duration over the whole video duration)  Initial delay (video start-up delay).
Regarding these influence parameters, some important findings, that follow up from these subjective experiments are:  The number of stalling events together with the stalling length (i.e., the stalling pattern) are clearly dominating the user perceived quality.
 Initial delays have almost no influence on MOS for videos of duration 60s and 30s, namely they are tolerated up to a reasonable level.
 User ratings are statistically independent from video characteristics such as resolution, video motion, content type, encoding scheme and video bit rate. The stalling pattern is what really influences the end-user's experience.  Figure 15, the MOS is depicted for various numbers of stalling events and stalling durations. As it can be observed from this figure, the number of stalling events is the dominant factor that affects MOS. Also, for a stalling duration of up to a point the degradation of the MOS is fast, while after that point the degradation of MOS is slower. This is a reasonable result, since when the MOS reaches a very low value (of about 1.5 in Figure 15, e.g., for = 2 and = 3 this is reached when = 2 seconds) the effect of a longer stalling duration is hardly considered by the end-users.
QoE metrics tailored to the YouTube application have also been defined. These metrics can be used instead of the MOS scale to get an indication about the quality of the user viewing experience. For instance, the reception ratio, , is calculated as follows [32]: Although the reception ratio cannot be directly related to QoE, it is a good indicator about whether there are problems in the network. If > 1, the video has good quality, otherwise poor.
Moreover, rate λ gives a good indication of YouTube video delivery quality according to [33], and its value should ideally be zero or close to zero: = Following a similar logic, [34] proposes the following metric: In this case, ′ > 1 indicates a stalling-free video, while ′ < 1 implies a non-seamless video session.

YouTube -adaptive streaming
In adaptive streaming scenarios, a video file is broken into multiple segments, while each segment is available at different quality levels. These levels may differ in video bit rate (bps) or in the video resolution, etc. Then, each user independently requests the next segment in a specific quality level, based on the user's current perception of available bandwidth for this session.
For the case of HTTP Adaptive Streaming (HAS), [35] proposes a simple but highly accurate QoE model, that is: = 0.003 * 0.064 * + 2.498, where = time on highest layer.
Based on this formula, the QoE of HAS applications depends mainly on the fraction of time that the highest layer is being played out over the total viewing time. Moreover, as it can be seen in Figure 16, the MOS is bounded by the quality that can be achieved by the highest and lower layers (4.3 and 2.498 respectively).
Another important influence factor of HAS QoE based on [35] is the "adaptation amplitude", which refers to the gap between two subsequent quality levels. In the case that the highest and then the lowest quality levels are sequentially selected (or vice versa), the amplitude will be high and the QoE impression will be low; if however such intense switches are refrained, the amplitude will be lower. The higher the amplitude, the worse the perception of the overall quality at the end user.
Moreover, some additional quality influence factors, with lower impact though, are the frequency of switches (i.e., adaptation events) and their direction. Last but not least, it has been shown that the buffer length of the user's application and the size of the segment encoded at the server's side play an important role on the QoE [36]. For the case of HAS, additionally the "activity factor" metric proposed in [34] applies: = If this metric is close to 1, it means that the client was "struggling" to download each segment on time; however if this factor is much lower than 1, it means that the client had sufficiently available bandwidth and could even afford higher video resolutions, if those were available. Note that gaps in the video download are occurring because the client is not buffering the full content at once, but is just targeting to maintain an acceptable buffer threshold.

Parametric QoE estimation for Skype
For Skype applications, a practical QoE estimation approach can be found in [37]. The proposed model has been derived by measurements conducted on Skype video calls. It has been found that three types of resolutions are available, namely 160x120, 320x240 and 640x480. Moreover, the maximum frame rate is 35fps. Then, the MOS level for this service type is as follows: where is the Image Quality ranging from 0 (worst) to 1 (best) and is the Frame Rate.
In Figure 17, we evaluate the relationship between Skype QoE and the image quality. Moreover, we vary the frame rate to study its impact. As expected, MOS degrades linearly while the image quality is reduced, while the frame rate also has a significant influence on the perceived QoE.
Based on this model, the authors in [37] also propose an adaptation mechanism of the Skype application to poor network conditions. Assuming a maximum acceptable threshold of the packet delay, if this threshold is reached, the Skype application starts to gradually degrade first the frame rate, then the image quality and finally, if required, the resolution. This adapting behavior helps sustain a viable and meaningful communication between two Skype applications, compromising on the quality though.

Summarized results and exploitation directions
The parametric QoE estimation models described above define a major set of formulas that can be exploited by academia and industry to understand how the end users perceive the quality of a provided service. Summarizing the study in the previous sections, in Table 8 we indicate the key parametric QoE estimation models available in the literature and list the major configuration parameters (MCPs) and key performance indicators (KPIs) that affect the QoE performance per service type. Packet loss ratio for audio and video packets, relative delay between video and audio packets, data rate, frame rate, monitor size From an academic and research perspective, through a clear collection of parametric QoE models, an easy and straightforward "translation" of QoS research works to QoE vocabulary may be applied. To be more specific, a potential research direction that can be aided by our work is the direct quantification of the impact of existing research works on QoE. This may be possible either via the realization of appropriate QoE estimation models (column 2 of Table 8) or by quantifying a potential improvement on specific MCPs and KPIs per service (column 3 of Table 8). Furthermore, the collection of KPIs helps identify the specific influence factors that play the most important role on the end user's perceived quality, guiding in this way future works towards devising network and application mechanisms that target at improving exactly those factors.
Regarding the impact of explicit QoE parametric models on the industry sector, this is twofold. On the one hand, it can help operators design their networks in a QoE-rather than QoS-meaningful way. That is, the operators are guided to give emphasis on designing and maintaining their networks in such a way that requirements regarding the KPIs per service are met (e.g., through a QoE-meaningful resource provisioning, a proper positioning of network servers and gatewayse.g., close to the enduser, etc.). Furthermore, attention on the per-service KPIs has to be given during the network management process, i.e., during the network's real-time operation. Mechanisms such as scheduling, mobility management and power control can be tuned so that proper weight is given on the actual QoE impact factors per service. In this way, an indirect QoE improvement will be achieved through the targeted enhancement of carefully selected QoS parameters. What is more, if we consider the recently emerged paradigm of "User Provided Networking" (UPN), like the one proposed in [38], a massive potential is unlocked. According to this paradigm, end-users are actively involved not only in service evaluation tasks by providing feedback about the experienced quality either passively (e.g., device capabilities, response times, context of use) or actively (e.g., MOS feedback), but they can also participate in the service provisioning loop by becoming "micro-providers", given the proper incentives.
Another important, even though less obvious capability exposed by the collection of the different KPIs per service, is the opportunity to achieve a more meaningful cross-service resource provisioning towards a) higher QoE and b) higher resource utilization. All services are currently competing for the same resources on an equal basis; nevertheless it would make more sense to allocate the limited resources on a service-dependent rather than on a service-oblivious (i.e., blindly fair) way. This may be possible a) by performing the scheduling process on a per-flow basis (e.g., prioritize a more delaycritical service with respect to another) or b) by optimizing the sum QoE in a cell by taking appropriate cross-service management decisions. Regarding the latter, a potential enabler is to exploit the adapting behavior of Skype or HAS applications and provoke a deliberate quality degradation at specific Skype or HAS flows, so that resources are moved to other applications, whose QoE/KPI is currently at a critical level. The goal would be to keep all users' QoE above a critical threshold, or, to achieve a maximum possible summed QoE. Note, that average QoE values per cell should not be considered as appropriate indicators of quality though (e.g., a cellular MOS of 3 may be a result of user1's MOS=1 and user2's MOS=5); on the contrary each flow should be treated independently, or, at least, standard deviations should be considered as well (see [39]).

Conclusions
As we are moving closer and closer to future network generations, the human factor is becoming the epicenter of attention and the driving force for the network design. Thus, the comprehension and, in extension, the control of the provisioned QoE to the end-users has become a necessity for network operators. Parametric QoE estimation models are a prerequisite for this purpose. They constitute the ideal tools towards live network quality monitoring and, hence, QoE management. Nevertheless, despite the increased interest from academia and industry to push towards a QoE service provisioning model, a clear/comprehensive manual on the available parametric models and the critical QoE performance parameters per service type is currently missing. Identifying this gap, this paper aspires to become a thorough and handy "manual", currently absent from the literature, that identifies and describes appropriate parametric models for popular services nowadays, such as YouTube, Skype and IPTV, as well as describe and study standardized ones. Therefore, this paper may become a standalone, useful tutorial both for researchers and operators, who are interested in moving from the pure technical QoS-domain to a more meaningful QoE-domain, so that they can really understand and influence the impact of their network decisions on the final recipient, the end-user.