A Survey on Encrypted Network Traffic Analysis Applications, Techniques, and Countermeasures

The adoption of network traffic encryption is continually growing. Popular applications use encryption protocols to secure communications and protect the privacy of users. In addition, a large portion of malware is spread through the network traffic taking advantage of encryption protocols to hide its presence and activity. Entering into the era of completely encrypted communications over the Internet, we must rapidly start reviewing the state-of-the-art in the wide domain of network traffic analysis and inspection, to conclude if traditional traffic processing systems will be able to seamlessly adapt to the upcoming full adoption of network encryption. In this survey, we examine the literature that deals with network traffic analysis and inspection after the ascent of encryption in communication channels. We notice that the research community has already started proposing solutions on how to perform inspection even when the network traffic is encrypted and we demonstrate and review these works. In addition, we present the techniques and methods that these works use and their limitations. Finally, we examine the countermeasures that have been proposed in the literature in order to circumvent traffic analysis techniques that aim to harm user privacy.

the remaining sections. In Section 7, we discuss the countermeasures that have been developed against encrypted network traffic analysis, and finally in Section 8, we discuss the lessons that we acquire from this survey. In the following paragraphs of this section, we present the related surveys that exist in the literature.

Related Surveys
In this section, we present the related works that can be found in the literature. More specifically, we research the most recent and popular surveys that exist in the domain of network traffic classification with a focus on works that target current trends in the area, such as encrypted and mobile networks. In 2015, Velan et al. [156] surveyed methods for encrypted traffic analysis and classified the literature using a multi-level taxonomy proposed in [81]. The taxonomy distinguishes the different traffic classification methods based on (i) the input (e.g., traffic payload or traffic properties), (ii) the technique (e.g., payload inspection, statistical or machine learning), (iii) the output type and (i,v) the dataset used. The survey of Velan et al. covers the works that aim for application protocol and type detection in encrypted streams. Our survey is more extensive; application and protocol identification in encrypted traffic is just a subset of the works that we survey. In 2018, Conti et al. [48] published a study for network traffic analysis specifically for mobile devices. They categorized the surveyed works based on the goal of the analysis (e.g., application identification, user fingerprinting), the point of traffic capturing (e.g., mobile devices, Wi-Fi access points), and the targeted mobile platform (e.g., Android). Their study includes works that inspect either unencrypted or encrypted traffic, but specifically on mobile devices. In this survey, we do not focus only on mobile devices. In 2019, Wang et al. [157] published a survey of deep learning techniques for traffic classification in mobile and encrypted networks. Similarly, Aceto et al. [28,29] presented a review for mobile encrypted traffic classification works that use deep learning techniques. Kwon et al. [86] published a survey for deep learning-based network anomaly detection, which corresponds to a subcategory of the works that we survey. In 2019, Mohammed et al. [101] surveyed the  [156] ------Conti et al. [48] mobile Wang et al. [157] -----mobile Kwon et al. [86] ------Mohammed et al. [101] -----SDN Sultana et al. [142] -----SDN Zhang et al. [173] --mobile Tahaei et al. [143] ---IoT Salman et al. [127] ----Aceto et al. [28,29] ----mobile This survey counter/res We compare the surveys based on their coverage in (i) applications and use cases and (ii) techniques and methods.
techniques for traffic classification and prediction using machine and deep learning in Software Defined Networks (SDNs). Also in 2019, Sultana et al. [142] surveyed the SDN-based network intrusion detection systems that use machine learning algorithms. In 2020, Tahaei et al. [143] presented a survey regarding traffic classification in Internet of Things (IoT) networks. In 2020, Salman et al. [127] published a review on machine learning techniques for internet traffic classification, based on the different objectives for traffic classification, such as protocol classification, application classification, action classification, category classification, and device classification. Finally, Zhang et al. [173] presented how deep learning is used in wireless and mobile networking in an extensive survey. In Table 1, we present a comparison of this survey against other related surveys. In this survey, we present works in a wide range of encrypted network traffic classification and analysis applications, without targeting only one domain like related surveys (e.g., network security or analytics). In addition, we present the techniques and methods that have been proposed in the literature. Thus, we do not focus on surveying works that focus only on machine learning, for instance. Finally, along with the different encrypted network traffic analysis applications and techniques, we also present the countermeasures against traffic analysis that have been proposed in the literature.

NETWORK TRAFFIC PROCESSING AND INSPECTION
In this section, we present typical network traffic processing and inspection methods and techniques that have been widely used by the literature through the years. This section will help the reader comprehend how network traffic processing is achieved and identify the challenges that network encryption has introduced.
Anomaly detection is a technique applicable in a variety of domains, such as network intrusion detection, aiming to identify unusual or suspicious activity and behavior. An anomaly detectionbased approach estimates a model of "normal network activity, " and future activity on the network is evaluated with respect to its probability under the learnt model. This way, it is possible to distinguish a typical and potentially malicious network activity from benign. This classification is based on certain heuristics or rules and aims to detect misuses or abnormal behavior. Specifically, for an anomaly-based intrusion detection system to positively identify suspicious traffic, the system must be taught to recognize normal system activity [44]. For example, a number of volume-based detection techniques exist. These techniques monitor the traffic load of a network, to identify anomalies that trigger significant traffic volume changes (e.g., flooding attacks). Feature-based anomaly detection aims to fill the gap of volume-based systems by examining a range of network traffic features [83]. The two phases of the majority of anomaly detection systems consist of the training phase (where a profile of normal behavior is constructed) and testing phase (current traffic is compared with the model created in the training phase). Anomalies are detected mostly using machine or deep learning techniques. However, anomaly detection often suffers from (i) high false-positive rates, (ii) the difficulty to obtain reliable training data, and (iii) the longevity of training data and behavioral dynamics of the system.
Shone et al. [135] conclude that the dynamic nature of modern networks, such as the ever increasing traffic speed, volume, and diversity, introduces an inevitable deficiency in creating a holistic and widely established anomaly detection solution. Indeed, the number of different protocols and the diversity of data in todays' connections introduce increased difficulty and complexity levels when attempting to differentiate between normal and abnormal behavior, increasing the difficulty in establishing an accurate norm and widens the scope for potential exploitation or zeroday attacks. Although anomaly-based detection techniques can be very effective in the domain of network security, signature-based techniques offer the ability to process and inspect a wider area of applications.
Signature-based network inspection is another technique for traffic classification and characterization and is mostly performed using DPI techniques. DPI is the core component for many systems that are part of the network, such as traffic monitors, classifiers, packet filters, network intrusion detection, and prevention systems. Network components use DPI in different layers of the OSI model. Unlike the early years of packet inspection, when DPI was applied in packet headers only (e.g., proxies and firewalls), nowadays, protocol complexity and obfuscation has led us to inspect the packet content in all encapsulated layers. In fact, governments, Internet and Communications Service Providers (ISPs/CSPs), among other organizations, heavily rely on DPI systems for accurate traffic monitoring and characterization. The use of DPI can result in better quality of service (QoS) by identifying different styles of content and streaming them accordingly. Similarly, by examining the contents of the incoming packets, suspicious and malicious traffic can be identified and examined. In the domain of network security, signature-based network inspection techniques can be very effective when similar attacks are known to security software developers a priori.
As already mentioned, DPI is the core procedure of typical applications in the domain of network security, analytics, and others. The requirements for complex traffic inspection, as well as the constant increase in network speeds, have motivated numerous works for constantly proposing novel DPI approaches. Network intrusion detection systems have become very powerful tools in the hands of network administrators and security experts over past decades, assisting in the detection and prevention of a wide range of attacks. Popular network intrusion detection system (NIDS) solutions like Snort [22] and Suricata [21] utilize pattern matching and regular expressions in order to analyze network traffic while Zeek/Bro [23] utilizes scripts that allow easier automation. The research community has also put effort into improving the performance of NIDS using either commodity hardware, such as graphics processing units (GPUs) [69,70,76,116,138,[152][153][154][155] and parallel nodes [117,149], or specialized hardware, such as ternary content-addressable memories (TCAMs), application-specific integrated circuits (ASICs), and field-programmable gate array (FPGAs) [47,99,141,169]. However, these works are able to process network traffic that is unencrypted, since they extract meaningful information from network packet payload content.
Traffic classification and network analytics is an important QoS component at central network traffic ingress points, as well as at end-host computers. In order to identify the incoming traffic accurately, a fine-grained analysis should be performed, inspecting both packet header and payload. The majority of traffic classification approaches are flow-based techniques that aim to characterize flows instead of packets, according to the application that creates them. Many works have proposed a number of methods to identify the application associated with a traffic flow [79,103]. Wang et al. [158] iOS application classification 2015 Alan et al. [31] Android application classification 2016 Taylor et al. [146] Android application classification 2016 Lopez et al. [94] IoT traffic classification 2017 Taylor et al. [147] Android application classification 2018 Aceto et al. [26] Mobile application classification 2018 Aceto et al. [27] Mobile application classification 2019 Aiolli et al. [30] Identification of user activities on smartphone-based Bitcoin wallet apps 2019 Yao et al. [168] IoT traffic classification 2019 Ede et al. [151] Mobile application classification 2020 Coull et al. [54] Application Usage Identification Identification of user actions in iMessage 2014 Conti et al. [49,50] Identification of user actions on Android devices A simple approach is to examine TCP port numbers, since well-known applications have specific port numbers (e.g., HTTP uses port 80, SSH uses port 22). Yet, aiming to hide from firewalls and network security tools, many applications and tools use dynamic port negotiation; of course, this makes port-based traffic classification weak. Other approaches propose the deep packet inspection against packet payload contents. Of course, this technique expects unencrypted packet payloads; however, assuming unencrypted packet payloads is not a realistic scenario in the present. The research community has investigated other ways to extract meaningful, side-channel information (packet-or network-level) that is able to be expressed into signatures. We discuss the methods of these works in the following sections.

ANALYTICS AFTER NETWORK ENCRYPTION
In this section, we discuss the literature for network analytics in encrypted communications. More specifically, we present works that focus on protocol and application classification (Section 3.1), application usage identification (Section 3.2), as well as the investigation of quality and experience in encrypted networks (Section 3.3). The works that are presented in this section are summarized in Table 2.

Protocol/Application Identification
There is a large number of works that examine the feasibility of traffic classification even when the communications are encrypted. The majority of such works focus on classifying the traffic's nature (e.g., video streaming, p2p traffic) and automating the classification procedure. BLINC [80] was one of the pioneer papers that aimed to classify the network traffic in the dark, having no access to packet payloads, no knowledge of port numbers; only information that flow collectors provide, solely based on the host behavioral patterns. Bernaille et al. [36] propose a method to detect different applications even in encrypted communication channels, by observing the first packets of an SSL connection and their sizes. This enables them to recognize the application soon enough, achieving an 85% accuracy and early classification.
Wang et al. [158], taking advantage of the fact that mobile applications produce more identifiable traffic patterns, perform a packet-level analysis to determine what applications a single individual is using, exploiting side-channel information (e.g., traffic bursts) that is exposed inside the network traffic that is generated by mobile devices. To perform mobile app classification, the authors use random forests for 13 selected iOS applications achieving a classification accuracy of more than 68.59% with selected features and more than 87.23% with complete features.
Alan et al. [31] investigate whether Android applications can be identified through their network traffic launch-time using only the contents of TCP/IP headers. The experiments were conducted using 1,595 applications on four distinct Android devices. The authors made use of supervised learning methods to identify the apps that generated the traffic. Their approach is based on packet sizes, observed within the launch time traffic, since they are expected to yield good feature sets for application identification.
AppScanner [146] automatically fingerprints Android applications even in encrypted traffic. For the generation of the fingerprints, authors collect network traffic traces on the mobile device while running the corresponding applications. The application classification is conducted using a supervised learning algorithm that is fed with features that are exported through the collection of network traces. This scalable framework implementation is able to identify the profiled applications (110 most popular applications in Google Play Store) with more than 99% accuracy.
Stringoid [120] is a static analysis tool that estimates constructed URL strings using string concatenations in Android applications. The purpose of this work is to analyze web requests that originate from Android mobile applications. The authors use a dataset of 20 randomly selected Android applications, and with the Stringoid tool, they extract URLs from 30,000 Android applications.
Lopez-Martin et al. [94] propose using a recurrent neural network (RNN) combined with a convolutional neural network (CNN) for IoT traffic classification. The advantage of this work is that it outperforms alternative algorithms for traffic classification, while it does not require any feature engineering when applying new models.
In a succeeding work, Taylor et al. [147] show that a passive eavesdropper is able to identify mobile applications by fingerprinting the network traffic that they send, despite encryption. Again, using AppScanner and machine learning techniques they exploit the information that lays in network traffic, such as packet size and direction. In addition, they investigate how application fingerprints change over parameters like time, diversity of devices, and versions.
Aceto et al. [26] aim to improve the classification performance of mobile applications by proposing a Multi-Classification system, which combines specific decisions from base classifiers explicitly devised for mobile and encrypted traffic classification. The dataset that the authors used for testing, was collected by a mobile solutions provider. In a succeeding work, Aceto et al. [27] perform mobile traffic classification in encrypted network flow using deep learning techniques.
Shen et al. [133] perform encrypted traffic classification of decentralized applications on Ethereum using a feature selection of packet lengths, bursts, and time series. Aiolli et al. [30] identify user activities on Bitcoin wallet applications in mobile devices and are commonly used for sending, receiving, and trading Bitcoin.
Yao et al. [168] perform IoT traffic classification for smart cities using a method that relies on a deep learning aided capsule network for efficient classification. Their proposed work eliminates the process of manually selecting traffic features. FLOWPRINT [151] offers mobile application identification by analyzing the network traffic. It introduces an approach for application fingerprinting by combining destination-based clustering, browser isolation, and pattern recognition (in a semi-supervised manner). It is able to construct mobile application fingerprints for not known applications. Authors evaluate FLOWPRINT and they find that it is able to perform an accuracy of 89.2%. Even after application updates or newly encountered applications, FLOWPRINT has a precision of 93.5%. As ground-truth, authors use publicly available datasets. As features, authors extract all header values controlled by the communicating app as well as the sizes and inter-arrival times of packets. In addition, for the size and time-related features authors compute statistical properties, such as the maximum, standard deviation, and mean absolute deviation values.

Application Usage Identification
The works presented in this section offer fine-grained application event identification over encrypted traffic, often with the use of machine learning techniques.
Coull and Dyer [54] propose a method for traffic analysis of encrypted messaging services. More specifically, the authors aim to show that an eavesdropper would be able to retrieve fine-grained information by the communication channel, such as specific user actions, the size of messages that are exchanged, or even the language that is being used for the communication. Their results demonstrate the feasibility of gaining information by observing packet lengths, but their analysis is limited to Apple's iMessage application and is an offline study.
Conti et al. [49,50] propose a system to analyze encrypted network traffic to identify user actions on Android devices, such as email exchange, interactions over social network, and so on. Their framework uses TCP/IP packet fields, like IP addresses and ports, along with other features, like packet size, direction, and timing. They analyze numerous Android applications with diverse functionalities, such as Gmail, Facebook, Twitter, Tumblr and Dropbox. Using machine learning, they achieve high accuracy and precision for the identification of different user actions in each tested Android application (e.g., mail exchange, posting a photo online, or publishing a tweet).
NetScope [128] is a work that performs robust inference of users' activities, for both Android and iOS devices, based on inspecting IP headers. This work demonstrates how a passive eavesdropper is capable of identifying fine-grained user activities within a network (even over encrypted communication channels) generated by the applications used. Based on the intuition that the implementation of each individual mobile application leaves a fingerprint on its traffic behavior, such as transfer rates and packet exchanges, NetScope learns the subtle traffic behavioral differences between user activities becoming able to distinguish them.
Fu et al. [68] propose an approach to classify usage in mobile messaging applications. Their system, namely, CUMMA, classifies the usage in mobile messaging applications by taking into account user behavioral patterns, network traffic characteristics, and temporal dependencies. More specifically, they show that the observation of packet lengths and time delays can allow the classification of WhatsApp and WeChat traffic and identify the corresponding usage types (e.g., photo sharing). With this framework, the authors achieve 96% and 97% accuracy in WeChat and What-sApp, respectively.
Liu et al. [93] develop an analyzer to classify encrypted mobile traffic to application usage activity. Using similarity measurements, the authors select discriminative features from traffic packet sequences. For their online analyzer, the authors represent a traffic flow with a series of time windows. For their experiments, they analyze WeChat, WhatsApp, and Facebook applications.
OTTer [114] is a scalable engine that identifies fine-grained user actions, like voice/video calls or messaging, in popular Over-The-Top mobile applications, such as WhatsApp, Skype, Viber, and Facebook Messenger in encrypted network traffic connections. The engine operates at traffic loads with an average of 109 Gbps.
Wang et al. [162] identify financial transactions at the trading stage via analyzing the encrypted network traffic, by identifying the mobile payment app from traffic data, classifying specific actions on the mobile payment app, and finally, detecting the detailed steps within the action.
Jiang et al. [77] investigate if remote desktop traffic, even if encrypted, can reveal usage information. Indeed, their results show the feasibility of this, taking advantage of side-channel information leakage.

VoIP Conversation and Video Stream Decoding.
There are also works that use traffic analysis to extract voice information from encrypted VoIP conversations or identify encrypted video streams.
For example, Wright et al. [165] show that when the transmitted audio is encoded using variable bit rate codecs, the length of VoIP packets can be used to recognize words or phrases within a standard speech corpus. This means that a passive observer can identify phrases even in encrypted calls with an average accuracy of 50%. Schuster et al. [129] explain the root causes of burst patterns in encrypted video streams, show how to exploit these patterns for video identification, develop and evaluate a noise-tolerant identification methodology based on deep learning and, finally, they demonstrate how an attacker without direct observations of the network can identify videos being streamed. The features that authors use are flow attributes, such as down/up/all bytes per second, down/up/all packet per second, and down/up/all average packet length. The applications that authors examine are Netflix, Youtube, Amazon, and Vimeo.

Quality of Service/Experience Investigation
Streaming video content on mobile devices is a trend that is continually growing among users. This causes a tremendous demand for higher bandwidth and better provisioning throughout the network infrastructure. End-to-end encryption, though, leaves providers with limited indicators for identifying QoE issues. Thus, the works presented in this section aim to measure QoS and QoE from the perspective of a telecommunication service provider that has only visibility on the network traffic that is often encrypted.
Dimopoulos et al. [58] propose models able to detect different levels of QoE degradation that is caused by stalling, average video quality, and quality variations. The predictive models that the authors develop are evaluated on the production network of a large-scale mobile operator, where the authors show that their system is able to accurately detect Quality of Experience (QoE) problems with up to 92% accuracy. The significant features that the authors extract are Round-trip time (RTT)-related, bytes transmitted, packet loss percentage, and other network-related features.
Orsolic et al. [110] use machine learning for the estimation of YouTube Quality of Experience. To test their approach, the authors collect more than 1k different YouTube video traces under different bandwidth scenarios. Mazhar et al. [98] investigate the QoS of video in HTTPS and QUIC protocols. The set of features that expose usable information is based on (i) network and transport layer header information for TCP flows, and (ii) network layer features (based on inter-arrival time, packet sizes, packet/byte counts, throughput) for QUIC flows. Khokhar et al. [82] put YouTube under experimentation and perform network traffic measurements for QoE estimation using  [80] Protocol/Application Identification

Graphs, Statistics
Completeness, Accuracy Bernaille and Teixeira [36] Clustering with Gaussian Mixture Model True-/False-Positive Rates Wang et al. [158] Classification with Random Forests (RFs) Estimated accuracy, Overall accuracy, True-Positive Rate Alan and Kaur [31] Classification with Jaccard's coefficient, Gaussian Naïve Bayes, and Multinomial Naïve Bayes  [110] Classification with RF, Naïve Bayes, SVM, Decision Trees Accuracy Mazhar and Shafiq [98] Classification with Decision Trees Precision, Recall Khokhar et al. [82] Classification with RF, Linear Regression, and RF Regression Precision, Recall, F1-score, Accuracy, Root Mean Square Error (RMSE) Xu et al. [167] Fingerprint construction from chunk sizes Accuracy The table also includes the metrics that authors use to evaluate the work.
network-related features, as well. CSI [167] infers mobile ABR video adaptation behavior under HTTPS and QUIC using packet size and timing information.

Techniques
The majority of the works discussed in this section employ machine learning techniques to investigate the feasibility of traffic analysis and usage classification even when the communications are encrypted. In this section, we present the techniques and algorithms that are more popular among the works of this category. A detailed overview of the techniques used for classification can be found in Table 3. As illustrated in Table 3, machine and deep learning techniques are very popular in the domain of network analytics. More specifically, it seems that supervised or semi-supervised algorithms are mostly used for traffic classification and network analytics. These algorithms perform well with labeled and big datasets, while they require training. For instance, for mobile application classification, the authors choose Multinomial Naive Bayes (e.g., [26]), Support Vector Machine (e.g., [146]), and Hidden Markov Models (e.g., [165]) algorithms as well as other classifiers, such as Random Forest, Decision Trees, Gaussian Naive Bayes [31], and the k-Nearest Neighbors algorithm for pattern recognition [60]. For a more detailed classification (i.e, the identification of user actions and events inside mobile applications), the authors choose hierarchical clustering techniques [50,68]. Besides machine learning, it seems that recently, researchers have turned to neural networks since they perform better than single machine learning algorithms. A neural network combines different machine learning algorithms for modeling data using graphs of neurons, while it is able to make accurate decisions and learn from its own errors. This makes a neural network work independently without requiring any human intervention.
Being encrypted, network packet payloads do not offer significant information. Thus, most techniques discussed in this section take advantage of data that are available in packet headers. The majority of these techniques use information like network packet sizes, directions, and time-related data, which are treated as features to train the corresponding machine learning models. More specifically, many works use as features the packet size and the packet direction [31,40,50,54,75]. Herrman et al. also use the IP packet length distribution [75]. Selecting a subset of packets in a single network flow is also common. For instance, Lu et al. do not consider incoming MTU packets [96], while Bernaille et al. keep only the first packets [37].
In Table 3, we also display the metrics used for the evaluation of each technique. Since the majority of these works are based on machine learning, they mostly measure their technique's (i) accuracy, (ii) precision, (iii) recall, (iv) true-, and (v) false-positive rates. However, even though the majority of the works discussed in this section use similar evaluation metrics, we would not be able to properly compare their effectiveness since they use different datasets to train their systems. Finally, we notice that only few works present processing evaluation metrics, such as memory consumption, training time, or throughput. In the next section (Section 3.5), we discuss the datasets used for the works of traffic classification for network analytics. Table 4 presents the public datasets that were used in the domain of network analytics. In the category of application identification and classification, Yao et al. [168] used the USTC-TFC2016 dataset [12] that among others contains network traffic from BitTorrent, Facetime, Gmail, and Skype. Ede et al. [151] used the Recon dataset [14], which consists of labeled network traces of 512 Android apps from the Google Play Store, including multiple versions for over a period of 8 years.

Datasets
We notice that the vast majority of the works in this section do not use public datasets to train their models. The proprietary datasets used come either by real or emulated usage representing network usage with applications of interest, either in mobile or fixed networks. The absence of public datasets that contain network traffic that is both encrypted and labeled is apparent. Thus, authors are producing their own datasets.

Objectives and Limitations
The majority of the works in the category of network analytics focus on the functionality of their approaches. More specifically, authors focus on the thorough examination of the traffic analysis feasibility when the network traffic is encrypted. Indeed, they show that it is possible to detect the nature of the traffic in a fine-grained manner. For instance, Conti et al. [49] are able to identify different actions (e.g., post a tweet or send a message) in mobile applications (such as Twitter and Facebook) accurately even when the network traffic is encrypted. Yet, there are works that except for functionality, they aim for programmability and deployment as well [93,114]. OTTer achieves a detailed characterization of usage (i.e., video call, voice call, chat) in different Over-The-Top applications like Skype and WhatsApp [114]. It is also integrated into a DPI engine that is deployed in a live traffic test-bed with an average of 109 Gbps.
All the works that use machine learning algorithms for the classification of network traffic need to be retrained in order to remain robust across new and diverse data. In addition, the majority of such works choose a subset of applications or protocols in order to examine the feasibility of classification. This makes such solutions mostly effective only for applications and protocols used for training, something that introduces potential scalability and adaptability issues. Even while scoring high performance (with metrics like accuracy, precision, and recall) in close-world scenarios (i.e., under a specific ground-truth dataset), these systems will certainly produce high rates of false positives when tested against real-world datasets of traffic traces. Furthermore, numerous countermeasures exist that can perform against traffic analysis techniques (such countermeasures are discussed in Section 7), making many of the works discussed in this section unable to bypass them.

SECURITY AFTER NETWORK ENCRYPTION
In this section, we present the state-of-the-art on encrypted network traffic analysis for network security. The works that are presented in this section are summarized in Table 5.

Intrusion Detection
Some techniques focus on identifying malicious behavior in the network, examining the characteristics of the underlying traffic, using exclusively machine learning approaches.
Taleb et al. [65,144] propose an approach that identifies misuses in encrypted protocols with network packet inspection that focuses on processing of packet header information. Amoli et al. present a real-time unsupervised NIDS, able to detect new and complex attacks within encrypted and plaintext communications [32]. Anderson and McGrew [33] compare the properties of six different machine learning algorithms for encrypted malware traffic classification.
Shone et al. [135] propose a system that combines deep learning techniques for network intrusion detection. For the evaluation of their system, the authors use the KDD Cup '99 and NSL-KDD datasets with high accuracy (almost 90%). Kitsune [100] is a NIDS, based on neural networks, and designed for the detection of abnormal patterns in network traffic. It monitors the statistical patterns of recent network traffic and detects anomalous patterns.
Papadogiannaki et al. [113,115] perform network intrusion detection by generating signatures from packet metadata sequences. They evaluate their methodology using the UNSW-NB15 dataset [104] that contains network traffic traces for numerous attacks.
Tang et al. [145] present a deep learning approach for flow-based anomaly detection in SDN environments. The authors build a Deep Neural Network (DNN) model for an intrusion detection system and train it with the NSL-KDD dataset, using six basic features of the NSL-KDD dataset. Niyaz et al. [109] utilize deep learning in order to detect DDoS attacks in SDN environments. The proposed system identifies individual DDoS attacks with an accuracy of almost 96% and classifies the traffic into benign or attack traffic, with an accuracy of 99.82% with low false positives.

Intrusion and Malware Detection in Mobile Devices
While the ever increasing adoption of traffic encryption has significantly improved the user privacy and security, traditional intrusion detection systems based on inspecting unencrypted traffic are becoming obsolete. Thus, there is a number of works that aim to detect malicious bahavior on mobile devices, mainly relying on machine learning approaches. Andromaly [130] is a framework for malware detection on Android mobile devices. The hostbased malware detection system monitors features and events that are retrieved from mobile devices and applies anomaly detection for the classification of the collected data.
Shabtai et al. [131] propose a system that identifies (i) attacks or masquerading applications installed on a mobile device and (ii) injected applications with malicious code.
In TrafficAV [159], the mobile network traffic is mirrored from the wireless access point to the server for data analysis. The data analysis and malware detection are performed on the server side. TrafficAV performs network traffic analysis in multiple levels. The proposed method combines network traffic analysis with a machine learning algorithm (i.e., C4.5 decision tree) and is able to identify malware in Android devices with good accuracy results. In an evaluation with 8,312 benign apps and 5,560 malware samples, the TCP flow detection model and the HTTP detection model achieve detection rates of up to 98% and 99.65%, respectively.
Lashkari et al. [88] detect malicious and masquerading applications on mobile devices. Their proposed method shows a good average accuracy (91.41%) and precision (91.24%) with a low falsepositive rate. The authors use five different classifiers: Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree (DT), Random Tree (RT), and Regression (R). The authors have published a labeled dataset of mobile malware traffic that contains benign Android applications or injected applications (e.g., with adware or other types of malware).

Techniques
In this section, we present the techniques and algorithms that are more popular among the works of this category. Table 6 displays works that perform intrusion detection even in encrypted networks. The majority of these works utilize machine learning algorithms for intrusion detection and classification. Besides machine learning, researchers make use of neural networks and deep learning techniques. For intrusion detection, Shone et al. [135] propose a deep learning classification model constructed using stacked non-symmetric deep autoencoders (NDAEs). Tang et al. [145] present a deep learning approach for flow-based anomaly detection in SDN environments. The authors build a DNN model for intrusion detection and train it with the NSL-KDD dataset, using basic features found in the dataset. Niyaz et al. [109] utilize deep learning in order to detect DDoS attacks in SDN environments. Kitsune [100] is a NIDS, based on neural networks, and designed for the detection of abnormal patterns in network traffic. It monitors the statistical patterns of recent network traffic and detects anomalous patterns. Again, the majority of these techniques take advantage of data that are available in packet headers like network packet sizes, directions, and time-related data. More sophisticated techniques make use of additional, more advanced features, like the TLS handshake information [33,67].
In Table 6, we also display the metrics used for the evaluation of each technique. Similarly to Section 3, the most common evaluation metrics are (i) accuracy, (ii) precision, (iii) recall, (iv) true-, and (v) false-positive rates, while few of the works report classification time and processing throughput. Table 7 presents the public datasets that were used for intrusion detection. The DARPA dataset [1] is one of the most popular datasets to train and evaluate intrusion detection systems. The ISCX-IDS-2012 intrusion detection dataset [25] consists of 7 days of benign and malicious network activity. The KDD Cup '99 dataset [2] includes a wide variety of intrusions simulated in a military network environment. The NSL-KDD dataset [7] is suggested to solve some of the inherent problems of the KDD '99 dataset. Finally, Moustafa and Slay [104] build and make publicly available a handy dataset for network intrusion detection systems, namely, UNSW-NB15 [10]. This dataset is used for evaluation in [113,115].  [113,115] Public UNSW-NB15 [10] Tang et al. [145] Public NSL-KDD [7] As Anderson and McGrew [33] correctly point out, finding the most proper features for classification with high accuracy, recall, and precision, is not a trivial procedure, while it is highly dependable on the ground-truth dataset collection that is available each time. However, user privacyrelated regulations make data sharing a very challenging practice for organizations and data holders. In Table 7, we can notice that the NSL-KDD dataset [7] is the most popular dataset that is publicly available. Yet, we observe that the majority of the datasets that are used for intrusion detection are not very recent. For instance, Amoli et al. [32] and Shone et al. [135] use the DARPA [1] and KDD Cup '99 [2] datasets, respectively, both of which were more than 15 years old at the time the papers were published.

Objectives and Limitations
As most works that perform encrypted network traffic inspection for intrusion detection use machine learning techniques to examine the feasibility of attack classification, the primary objective is functionality. They train their models offline providing poor support for online intrusion detection when it comes to encrypted communications. The works that also aim for programmability, except for functionality, are [100,113,115,130,131]. In Section 6, we discuss middleboxes that can be deployed for online traffic inspection. Some of the network functions of these middleboxes can be used for intrusion detection and firewall applications.
The majority of the works discussed in this section can be bypassed by traffic analysis resistant techniques. Anderson et al. [33] examine and present mistakes and limitations in the network traffic analysis literature, such as the utilization of old and unreliable ground-truth datasets. Indeed, if we examine the datasets that are used as ground-truth for training most of the machine learning models in this section, are not very recent.

USER PRIVACY AFTER NETWORK ENCRYPTION
Besides network analytics and security, traffic analysis has been also used to monitor and profile the characteristics of mobile applications. This kind of tools empower the user to gain (i) insight regarding the applications used and (ii) control over personally identifiable information (PII) handling. Furthermore, in this section we discuss works that are able to fingerprint websites and applications even when anonymity tools like TOR [24] are used to hide the activity's nature, the user's identity, or the user's location. Finally, we present works that enable OS and device identification even in encrypted networks. The works that are presented in this section are summarized in Table 8.

Endpoint Device Tools and PII Leakage Detection
ProfileDroid [163] is a monitoring and profiling system that can characterize the behavior of Android applications at the static, user, OS, and network layers. The authors evaluate multiple Android applications, which present privacy and security, operational and performance issues. The authors evaluate the proposed tool using free and paid Android applications, observing (i) discrepancies between the app specification and app execution, (ii) higher costs resulting from free versions of apps, due to an order of magnitude increase in traffic, (iii) a great amount in network Lotfollahi et al. [95] Application identification in VPN traffic 2017 Shahbar and Zincir-Heywood [132] Identification of anonymity networks 2017 Montieri et al. [102] Traffic classification of anonymity tools 2019 Chen et al. [46] Device/OS identification OS identification, NAT and tethering detection 2014 Sivanathan et al. [136] IoT device identification using traffic features traffic that is not encrypted, (iv) excess communication with more than expected sources, and finally (v) the communication of the majority of the examined apps with Google. TaintDroid [64] identifies privacy leaks of Android applications with dynamic information-flow tracking. More specifically, the authors monitored the behavior of popular third-party Android applications and discovered potential misuse cases of user private information across applications. TaintDroid provides users with information regarding third-party applications. Haystack [121] is a mobile application distributed via popular app stores that can correlate contextual information (such as application identifiers and radio state) with specific traffic flows (encrypted or not) destined to remote services. To handle encrypted traffic, haystack employs a transparent man-in-the-middle (MITM) proxy for TLS traffic. PrivacyGuard [140] is an open source platform that intercepts the network traffic that is generated by mobile applications (through VPN) in order to detect sensitive information leakage. PrivacyGuard is able to effectively detect information leakage in the majority of the applications under examination and it is shown that it outperforms TaintDroid. AntMonitor [90] is a system that passively monitors and collects packet-level measurements from Android devices in order to provide a fine-grained analysis. AntMonitor provides users with control over their personal data and supports client-side traffic collection and analysis. The authors examine PIIrelated features, such as (i) the IMEI, which uniquely identifies a device within a mobile network, and Android Device ID, which is an identification code associated with a device, and (ii) the phone number, email address, and location, which can be uniquely associated with users. ReCon [122] is a cross-platform system that reveals personally identifiable information leakages and gives users the control over leaked information without requiring any special privileges or custom OSs. Re-Con uses machine learning to identify and reveal possible PII leakage by inspecting the network traffic, while it provides a visualization tool that empowers users to control how their information is being handled by blocking or substituting it. Continella et al. [51] propose a system that detects leakages of PII and is resilient to obfuscation methods and techniques (e.g., encoding, formatting), encryption, or any other kind of transformation performed on private information before it is leaked. Moreover, Rosner et al. [125] present a black-box approach for detecting and quantifying side-channel information leaks in TLS-encrypted network traffic.

Fingerprinting
In this section, we present works in the field of webpage and application fingerprinting, even while covered by privacy enhancing technologies. In addition, we study the state-of-the-art in the device/OS identification and location estimation from network traffic inspection. The works that are presented in this section are also summarized in Table 8.

Webpage Identification. Traffic analysis has been used to identify fine-grained information, like webpages and websites, even while transferred over encrypted tunnels established by technologies such as OpenVPN [20] or TOR [24].
Herrmann et al. [75] present a classifier that identifies up to 97% of web requests from packet traces. Lu et al. [96] identify dynamic websites that are transferred over SSH and SSL tunnels. Panchenko et al. [111,112] show how website fingerprinting in onion routing based anonymization networks, such as TOR, is still possible using information like packet sizes, total transmitted bytes, percentage of incoming packets, and others. Cai et al. [40] present a webpage fingerprinting attack that is resilient to recent traffic analysis countermeasures' methods, such as applicationlevel defenses like HTTPOS [97] and randomized pipelining over TOR [118]. Kwon et al. [84] perform a passive attack against hidden services and their users called circuit fingerprinting attack. Using the attack, an attacker can identify the presence of hidden service activity in the network with high accuracy.
Draper-Gil et al. [60] study flow-based and time-related features that can be analyzed to detect VPN traffic and to classify encrypted traffic according to the type of traffic (e.g., browsing or streaming) using two machine learning techniques to test the accuracy of the features. They show that time-related features, when properly handled can reveal enough information for encrypted traffic characterization. Cruz et al. [55] present a deep learning method that takes advantage of a feature set that is based on the statistical behavior of TCP tunnels proxying BitTorrent traffic. The next steps are the transformation of this feature set into multiple timestep sequences and the training of a recurrent neural network. The results of this work show that it is possible to identify the existence of BitTorrent traffic in SSH tunnels.
Lotfollahi et al. [95] present a system that is able to handle both traffic characterization and application identification by analyzing encrypted traffic with deep learning. The proposed scheme can characterize between VPN and non-VPN traffic, protocols (e.g., FTP or P2P), and end-user applications (e.g., Skype or BitTorrent).
Shahbar and Zincir-Heywood [132] identify multilayer-encryption anonymity networks and the obfuscation techniques they use with a small number of features and a small number of packets. Montieri et al. [102] perform hierarchical traffic classification of anonymity tools, like TOR.

Device/OS Identification.
There are many works that focus on extracting TCP or IP packet metadata, in order to investigate if the behavior of specific packet contents can be correlated with OSs, device types, and other characteristics.
Chen et al. [46] are able to perform OS identification, NAT, and tethering detection by examining multiple features in the TCP/IP packet headers. The authors use real network traffic traces to evaluate the accuracy of fingerprinting and show that several techniques that can successfully fingerprint desktop OSs are not similarly effective for fingerprinting mobile devices. For OS fingerprinting, the authors use the following packet header values: the IP TTL value, the IP ID monotonicity, the TCP timestamp option, the TCP window size scale option, and the clock frequency. Features like the TCP timestamp monotonicity, clock frequency, and boot time can be used for tethering detection. Ruffing et al. [126] aim for OS identification of mobile devices even in encrypted traffic. The authors propose a traffic content agnostic algorithm that implements spectral analysis of the encrypted traffic and they show that even a network traffic input of 30 seconds can be enough for high accuracy results. Sivanathan et al. [136] classify IoT devices using network traffic features. More specifically, the authors instrument an ecosystem of different IoT devices (e.g., cameras, plugs, and motion sensors) and examine traffic characteristics, such as port numbers, signaling patterns, and cipher suites that are used. Lastovicka et al. [89] examine traffic patterns of the TLS protocol and train a machine learning model using features from the TLS handshake in order to identify the operating system of a device. They focus their research on mobile devices connected in a wireless network.

Location Estimation.
The position of a mobile device can be calculated and estimated by collecting and monitoring the network traffic that is produced by applications that contain locationbased services, even when the communication channels are encrypted. For example, Ateniese et al. [35] show that an adversary could estimate the position of a mobile device by analyzing the timing and sizes of encrypted network packets that are exchanged between the user's mobile device and any location-based service provider that communicates with the device.

Techniques
In this section, we present the techniques and algorithms that are more popular among the works of this category. Table 9 briefly describes the techniques and evaluation metrics used.
The majority of works in the category of website or device/OS fingerprinting domain use machine learning techniques, while it is also common to build fingerprints for a webpage and then compare its similarity for classification. Works that detect PII leakages either work offline (e.g., [90,125]) or online (e.g., [51,121]). Offline works use similar machine learning techniques as previously discussed works, while online tools intercept the encrypted network traffic before processing it.
For instance, between works that examine tunnelled network traffic (e.g., over VPN or SSH protocols) for website classification and fingerprinting, the most popular algorithms are Multinomial Naive Bayes [75], Support Vector Machine [112], and Hidden Markov Models [40]. In addition, Levenshtein distance and the Jaccard classifier are used in a number of works to examine similarities between website fingerprints and properly classify them into categories [40,92,96].
In addition, more recent works like [55,95] use neural networks. Lotfollahi et al. [95] present a system that is able to handle both traffic characterization and application identification by analyzing encrypted traffic with deep learning, embedding stacked autoencoder, and CNN to classify network traffic. Cruz et al. [55] identify tunnelled BitTorrent traffic with a deep learning implementation. Their approach examines features that are related to the statistical behavior of TCP tunnels proxying BitTorrent traffic. Then, the authors transform the features into multiple time sequences and train a recurrent neural network. Profit combines techniques like network trace alignment, phase detection, feature selection, feature probability distribution estimation, and entropy computation to quantify the amount of information leakage that is revealed via the network traffic. Rosner et al. [125] present in "Profit" a dynamic technique to detect information leakages in applications that support encryption and communicate via TLS. Profit receives a "user-supplied profiling-input suite" where application data is annotated as secret or sensitive. Profit runs the  [96] Fingerprint similarity with Levenstein distance Accuracy, Effect of time on accuracy Panchenko et al. [111,112] Classification with SVM, Fingerprint similarity Accuracy, FPR, TPR, Runtime, Recall, Precision Cai et al. [40] Classification application over the user-supplied input and captures a set of variable-length network packet traces. The traces include information like packet sizes and timestamps along with their aggregations (e.g., total time and median size). Again, the authors agree that finding the features that leak the most information is challenging. To another end, device and OS identification techniques use network packet header contents, such as IP TTL value, the IP ID monotonicity, the TCP timestamp option, the TCP window size scale option, and the clock frequency. For the detection of tethering, similar approaches take into consideration the monotonicity of the TCP timestamp, the timestamp clock frequency, and the booting time [46]. Works that investigate PII leakage from mobile applications are also presented in Tables 8 and 9. In this category, works tend to perform network traffic interception to be able to process the encrypted traffic and follow the information flow that is exposed by mobile applications. After the traffic interception, the authors are able to extract information by inspecting plaintext packet contents. Table 10 contains the public datasets that were used by each work. Lastovicka et al. [89] produced a dataset that was made public after the publication of their work. The dataset contains TLS fingerprints collected. The VPN-nonVPN dataset (ISCXVPN2016) [11] contains traffic from user sessions in applications of browsing, email, chat, streaming, and others. The traffic is either regular or transferred over VPN. The LBNL/ICSI dataset [3] contains packet traces that span more than 100 hours of activity from a total of several thousand internal hosts. The Anon17 dataset [13] contains network traffic traces from TOR, JonDonym, and I2P anonymity tools. The CRAWDAD SIGCOMM  [132] Public LBNL/ICSI [3] Montieri et al. [102] Public Anon17 [13] Chen et al. [46] Device/OS Identification Public CRAWDAD SIGCOMM'08 [5], CRAWDAD OSDI'06 [4] Lastovicka et al. [89] Proprietary dataset that was made public TLS fingerprints for OS identification [17] Rosner et al. [125] PII Leakage Detection Public DARPA Space/Time Analysis for Cybersecurity program [9] '08 dataset [5] contains traces of wireless network activity from the SIGCOMM 2008 conference. Similarly, the CRAWDAD OSDI '06 dataset [4] contains network activity from the OSDI 2006 conference. Finally, the goal of the DARPA Space/Time Analysis for Cybersecurity program [9], among others, is to enable researchers to identify vulnerabilities related to the space and time resource usage behavior of algorithms.

Objectives and Limitations
In the categories of website, location, and device/OS identification, the major goal is to produce an effective methodology for accurate fingerprinting. Thus, all the works that we study in these categories have one primary target; functionality. On the other hand, works in the domain of PII leakage detection focus on the programmability and deployability. Hence, we encounter performancedriven solutions.
The limitation of high false-positive rates is major in any domain of traffic analysis. Unfortunately, it appears to be very difficult to produce a universal solution that will be able to cover a whole domain, due to the vast diversity and heterogeneity that has been introduced during recent years in every single aspect of th Internet. In addition, as Juarez et al. [78] argue regarding fingerprinting, accuracy scores reduce over time.
In addition, in the category of PII leakage detection, it is common to use proxy servers or VPN in order to redirect the traffic to a controlled environment for interception and decryption of encrypted traffic. Even if we neglect the latency that is added, this technique could eventually raise privacy-related concerns if users are not properly informed about the procedure of the processing and manipulation (e.g., storage) of their traffic and personal data. In Section 6, we discuss middleboxes that address this issue by processing sensitive information using hardware-assisted technologies for trusted execution.

NETWORK FUNCTIONS IN MIDDLEBOXES AFTER NETWORK ENCRYPTION
Quoting from RFC 3234, "a middlebox is defined as any intermediary device performing functions other than the normal, standard functions of an IP router on the datagram path between a source host and destination host" [42]. The typical use of a middlebox is to offer security (e.g., firewall, intrusion detection) or performance (e.g., caching, protocol accelerator), while other common uses are network address translation and protocol conversion. One challenge that occurred after the rapid growth of network encryption is that in order to process and operate on TLS traffic, the middlebox must perform a man-in-the-middle in a connection. Of course, this raises major concerns on user privacy preservation. Network middleboxes that aim to inspect encrypted traffic operate by acting as proxies. They terminate and decrypt the client-initiated TLS session, they analyze the HTTP plaintext content, and then they initiate a brand new TLS connection to the destination. TLS makes interception difficult by encrypting data and defending against attacks (like manin-the-middle) via the certificate validation. During the validation process, the client is responsible to authenticate the identity of the destination server, rejecting impostors. To circumvent this

Network Functions in Middleboxes
Sherry et al. [134] DPI on the encrypted traffic using encrypted rules 2015 Naylor et al. [106] Extend the TLS protocol to support middleboxes 2015 Asghar et al. [34] Trusted execution for Network Functions (NFs) in the cloud 2016 Canard et al. [41] DPI on the encrypted traffic using encrypted rules 2017 Lan et al. [87] Trusted execution for NFs in the cloud 2016 Yuan et al. [171] DPI on the encrypted traffic using encrypted rules 2016 Fan et al. [66] DPI on the encrypted traffic using encrypted rules 2017 Naylor et al. [105] Secure outsourcing of middlebox NFs in untrusted infrastructures 2017 Han et al. [74] Secure Secure outsourcing of middlebox NFs in untrusted infrastructures 2019 Ning et al. [107] DPI on the encrypted traffic using encrypted rules 2019 Guo et al. [72] Privacy preserving packet header processing for middleboxes in the cloud 2020 validation process, a self-signed CA certificate is injected into the client browser's root store at the time of installation. For network middleboxes, administrators deploy the middlebox certificate to the corresponding devices (e.g., of the organization) in a similar manner. Then, when the proxy intercepts a connection, it will dynamically generate a certificate for a website's domain name that is signed with its CA certificate. The proxy will deliver this certificate chain to the browser [62]. Some works that allow TLS interception to inspect encrypted network traffic have already been presented in Section 5.
In this section, we discuss the works that enable secure processing of encrypted traffic by middleboxes. Tables 11 and 12 present details about these works.

Network Functions in Middleboxes
BlindBox [134] performs deep-packet inspection directly on the encrypted traffic, utilizing a new protocol and new encryption schemes. Specifically, the functionality of BlindBox is provided through (i) a searchable encryption scheme [139] that enables the inspection of encrypted traffic for certain keywords, (ii) a technique to allow the middlebox to obtain encrypted rules (i.e., based on the rules from the middlebox and the private key of the endpoints), and (iii) a mechanism to allow flow decryption when a suspicious keyword is observed in the flow. mcTLS extends the traditional TLS protocol to support middleboxes by allowing endpoints and content providers to explicitly introduce middleboxes in secure end-to-end sessions while controlling which parts of the data they can read or write [106].
Asghar et al. [34] propose SplitBox, in which a cloud service provider privately computes network functions on behalf of the client. More specifically, SplitBox provides security guarantees in the honest-but-curious model and works based on cryptographic secret sharing. As proof-ofconcept, the authors implemented a firewall and measured the bandwidth and latency achieved.
Embark [87] enables a cloud provider to support middlebox outsourcing respecting user privacy. Embark encrypts the traffic that is transmitted to the cloud and enables the cloud to process the encrypted traffic without having to decrypt it. Yuan et al. [171] propose a system architecture for outsourced middleboxes to perform DPI over encrypted traffic, without revealing either packet payloads or inspection rules. SPABox [66] is a middlebox-based system that supports keyword-based and data analysis-based DPI functions over encrypted traffic without having to decrypt it.
Canard et al. [41] present BlindIDS, which is able to perform deep packet inspection directly on encrypted network packets for intrusion detection. BlindIDS does not assume knowledge over the traffic content or the patterns of detection signatures. The authors evaluate the performance of BlindIDS by presenting the overhead on sender and receiver sides (i.e., connection setup time, data encryption time) and the overhead on the service provider (i.e., detection time, memory usage). Ning et al. [107] propose PrivDPI, a tool that addresses the performance limitations of Blind-Box [134]. Guo et al. [72] propose a privacy preserving packet header processing approach for middleboxes that are outsourced to cloud infrastructures. The authors perform a security analysis and identify information leakages, while they evaluate the performance of the prototype (i.e., initialization time, memory cost, latency, throughput, and overhead).
To overcome the limitation of privacy violation when the network traffic is intercepted for further processing, there are works that propose hardware-assisted solutions that enable trusted execution. These works enable the secure processing of sensitive information, such as the network traffic, inside encrypted memory regions provided by Trusted Execution Environments (TEEs). One example of TEE is the Intel SGX technology, which is supported by Intel Skylake processors (and successors). TEEs offer secure processing when the execution environment could not be trusted (e.g., a cloud infrastructure).
Naylor et al. [105] propose mbTLS, a protocol that enables secure outsourcing middlebox functionality to untrusted infrastructure using the Intel SGX technology [8].
Han et al. [74] present SGX-Box, a secure middlebox system implementation that offers visibility in encrypted network traffic, taking advantage of the Intel's SGX technology [8]. SGX-Box ensures that the sensitive information, such as decrypted payloads and session keys, is securely protected within the protected memory enclave.
Coughlin et al. [53] propose the Intel SGX technology to overcome security issues of Network Function Virtualization (NFV) applications in cloud environments. Poddar et al. [119] present SafeBricks that also proposes the utilization of Intel SGX to shield the execution of NFV functions in untrusted environments like the cloud infrastructure. Similarly, Trach et al. [148] present ShieldBox, a middlebox framework for deploying network functions over untrusted commodity servers. ShieldBox takes advantage of the Intel SGX technology and the authors deploy two use cases: (i) a multiport IP router and (ii) an intrusion detection system. Goltzsche et al. [71] propose EndBox. EndBox executes middlebox functions on client machines at a network edge and combines a VPN with middlebox functions that are protected in Intel SGX hardware enclaves. Duan et al. [61] present LightBox, which offers efficient and protected middlebox functionality using the Intel SGX technology.

Techniques
Works in this category either process the encrypted network traffic using software-based cryptographic techniques, such as searchable encryption, or process the network traffic inside encrypted memory regions in order to ensure the preservation of privacy. With searchable encryption, tools are able to search directly on encrypted data without decrypting it, and thus, without leaking information in plaintext (e.g., for privacy preserving malware detection [56]). This is achieved by encrypting the rules to be searched against the already encrypted traffic (e.g., [41]). TEEs enable the secure code execution and information processing commonly using hardware-assisted features like memory enclaves. Intel SGX [8] is one of the most popular hardware-assisted TEEs and is frequently used to provide confidentiality and integrity guarantees to applications in environments where the OS could eventually become untrusted, like a cloud infrastructure (e.g., trusted antivirus in the cloud [57]). In [34,41,66,72,87,106,107,134,171], the authors perform network middlebox applications with core functionality the deep packet inspection using searchable encryption. In [53,61,71,74,105,119,148] the authors shield the network traffic processing using hardware enclaves that the Intel SGX technology provides. Table 12 displays the most common techniques used for network functions in middleboxes. Also, we present the metrics that each work used for evaluation.

Datasets
The datasets used to evaluate the works of this section are not other than the ones used by traditional works in network traffic processing inside middleboxes. Depending on the application goal (e.g., intrusion detection or firewall), the authors utilize the relevant rules (e.g., Snort rules [22] in [41,66,107,171]) and traffic traces (e.g., DARPA [1] in [171]).
In this section, we do not include a table, since the datasets used by works in this category do not provide labeled and encrypted network traffic to be used as ground-truth data for encrypted network traffic analysis.

Objectives and Limitations
While the main objective of the majority of the works presented in this survey is to ensure the functionality of their systems by providing knowledge on how to properly analyze encrypted network traffic in order to extract information about its nature, the works in this category focus also on the programmability and deployment of their systems. Thus, the authors provide evaluation results not only for the effectiveness of their solution but also for the processing performance and the overhead that is introduced using either a software-centric solution like the searchable encryption or a hardware-assisted technology like TEEs.
Although cryptographic tools like the searchable encryption are very effective and significantly preserve user privacy when it comes to network functions in middleboxes, the individual functions that are performed (e.g., rule encryption and connection setup) add an essential overhead to the end-to-end performance. Similarly, TEEs also increase the processing overhead. For instance, with Intel SGX, substantial overhead can be presented when the computations are I/O bound [43]. Even though both techniques are effective, they introduce the tradeoff of user privacy versus performance.

ENCRYPTED TRAFFIC ANALYSIS COUNTERMEASURES
As we discuss in this survey, numerous techniques have been employed in order to enable traffic processing when the network is encrypted or tunnelled. Even though encrypted traffic analysis techniques can be used by different actors (such as ISPs/CSPs, etc.) to benignly extract information about network usage, encrypted traffic analysis techniques can be also used by malicious actors (such as governments that censor websites or prohibit Internet usage) in order to harm the privacy that network encryption offers to Internet users. Some of the most popular solutions propose randomizing network packet sizes, padding bytes to packets for a fixed size and time tuning for inter-packet transmission. More sophisticated solutions are explicitly discussed in the following paragraphs. In this section, we discuss the (i) techniques that can be used against encrypted network traffic analysis and (ii) systems that exist and aim to defend against it.

Anonymity Tools
Onion routing serves as an overlay network designed to anonymize communications and applications (e.g., web browsing and instant messaging). TOR is one good example of onion routing. Among others, it uses fixed-size cells that are the unit of communication in TOR [24]. As we have already discussed in Section 5.2 though, anonymity tools are not enough to prevent website and hidden services identification [123], since the existence of anonymity tools can be identified using numerous encrypted traffic analysis techniques (Section 5.3).

Traffic Shaping
As already discussed, features and characteristics of network traffic that present patterns after encryption (e.g., packet sizes and timing), can reveal surprising information about the traffic's nature and contents. Even though encrypted traffic analysis can be legitimate, these techniques raise important concerns about privacy-related issues. An approach that typically mitigates such threats is the padding of packet sizes or the transmission of packets at fixed timing intervals; obfuscating the behavior of a communication mean. However, this method can become inefficient because it results in time overheads. Wright et al. propose a method for hindering statistical traffic analysis algorithms. Their approach proposes the modification of a certain network traffic "class" to look like another. The authors show how to modify packets' characteristics in real-time with low overhead in order to reduce the accuracy measurements of traffic classifiers. The morphed data is then sent to the network stack encrypted and then sent to the destination [166]. AnonRep [172] builds on top of anonymity and privacy guarantees for the case of reputation and voting systems. TARN [170] randomizes IP addresses, while TARANET [45] employs packet mixing and splitting to achieve constant-rate transmission, providing anonymity at the network layer. Luo et al. [97] design the HTTPOS fingerprinting defense at the application layer. HTTPOS acts as a proxy that receives HTTP requests and obfuscates them before allowing transmission. Specifically, it modifies network-related features, such as the total packet size, the packet timestamp, and the payload size. In addition, it uses HTTP pipelining to obfuscate the number of the transmitted packets. The authors show that HTTPOS was successful in defending against a number of traffic classifiers. Dyer et al. [63] create a defense, namely, BuFLO that combines previously proposed countermeasures, such as fixed packet sizes and constant rate traffic. The authors improve other related defenses at the expense of a high bandwidth overhead. Cai et al. [39] make modifications to the BuFLO defense proposing the rate adaptation technique. Yet, this adds a bandwidth overhead. Nithyanand et al. [108] propose Glove, which groups website traffic into clusters. This provides privacy guarantees and reduces the bandwidth overhead by grouping web traffic into similar sets. Panchenko et al. [112] propose "website camouflage, " which is actually an obfuscation technique that randomly requests a second website, simultaneously with the actually requested one. Frolov and Wustrow [67] propose uTLS that enables tool maintainers to automatically mimic other popular TLS implementations to prevent censorship. Walkie-Talkie is a website fingerprinting defense approach that modifies the browser to communicate in half-duplex mode, since it produces burst sequences that leak less information to the adversary. This makes sensitive and non-sensitive pages look the same [160].

Traffic Analysis Resistant IoT Devices.
Hafeez et al. [73] demonstrate that an adversary, with access to the network traffic of a "smart" home network, can lead to the identification of the device types and some user interactions with IoT devices. In order to defend against traffic analysis attacks, the authors propose a "traffic morphing" technique that shapes network traffic in ways that make it more difficult to achieve an attack that identifies devices and activities. In order to mask the background traffic, the authors send traffic on an upstream link at a constant rate, irrespective of real background traffic rate of an IoT device. Meanwhile, when an IoT device is inactive, the authors send dummy traffic representing device activity to upstream link, so that an adversary cannot identify real activity of the IoT device. While this approach is not very sophisticated, it points to the direction of defending against traffic analysis on IoT devices.

Traffic Analysis Resistant Messaging Applications.
There have been efforts to create messaging protocols that provide anonymity and privacy guarantees in the face of traffic analysis. Dissent [164] and Riposte [52] are systems that provide strong user privacy guarantees. They protect packet metadata, but they suffer from scalability issues. Herd [91] is another system that tackles the case of anonymity for VoIP calls, by addressing, like the former proposals, some of the limitations of the more general-purpose Tor anonymity network [59]. Vuvuzela [150] and Atom [85] are more scalable systems (thousands of messages for millions of users) that employ differential privacy to inject noise into observable metadata.

LESSONS LEARNED AND CONCLUSIONS
In this section, we summarize and review the findings that are present in the previous sections (Sections 3-6). As we have already mentioned, network traffic analysis and classification is very popular in many domains, such as network analytics, security, privacy, and fingerprinting. The proposed solutions could be divided using the techniques and methods used for encrypted traffic analysis and inspection into three categories: the works that perform (i) manipulation of traffic metadata and characteristics, (ii) interception of encrypted traffic, and (iii) utilization of cryptographic functions (see Figure 1).
Systems in the first category (i.e., manipulation of traffic metadata and characteristics) focus on investigating which network-related characteristics reveal the nature of the network traffic. Throughout all these works, we find similar methodologies that lay above different algorithms. As it occurs, we can divide the resulting network-related characteristics that make traffic classification feasible--even in encrypted networks-into three groups: (i) packet-level, (ii) connection-level, and (iii) time-related features. Of course, none of these characteristics include information extracted from packet payload contents, since it is assumed that network packets are encrypted. In Table 13, we present some of these characteristics (in the literature are also referred to as features, side-channel data, or/and metadata), which are either found inside packet headers or calculated. These features are the most widely used in the state-of-the-art for network analytics, security, privacy, and fingerprinting. Systems in the second and third categories (i.e., interception of encrypted traffic and utilization of cryptographic functions), focus on proposing high-end techniques to offer efficient, adaptive, and scalable solutions for live and real-time network traffic inspection. The majority of such works are able to inspect the network and report findings, always aiming for good computational performance that can cope with current network speeds.

Limitations and Possible Future Directions
In this subsection, we review the state-of-the-art in network traffic analysis and inspection techniques by demonstrating their limitations.
In the first category "manipulation of traffic metadata and characteristics, " we encounter a very large number of works that examine the feasibility of identifying the network traffic class, while encrypted (e.g., a webpage, a mobile application, or some malicious activity). As already stressed, all these works focus on machine learning techniques that signify the network-related characteristics that indicate this specific class. Then, to validate and evaluate their approaches, they calculate metrics like accuracy, recall, and precision, comparing themselves to related works. However, all these works utilize a certain ground-truth dataset that enables them to train their models and eventually, teach them how to spot specific patterns inside the network traffic in order to locate the occurrence of a specific event (e.g., visiting a certain website). Even while scoring high performance (in metrics like accuracy, precision, and recall) in close-world scenarios (i.e., under a specific ground-truth dataset), these systems will certainly produce high rates of false positives when tested against real-world datasets of traffic traces; surely, encrypted network traffic will not enable this kind of evaluation from an ISP/CSP level. The limitation of high false-positive rates is major in any domain of traffic analysis. Unfortunately, it appears to be very difficult to produce a universal solution that will be able to cover a whole domain, due to the vast diversity and heterogeneity that has been introduced during recent years in every single aspect of the Internet. In addition, as Juarez et al. [78] argue regarding fingerprinting, accuracy scores reduce over time.
In the second category "interception of encrypted traffic, " we encounter solutions that take advantage of behavioral patterns or signatures to report a specific event (e.g., malicious activity in a network). These solutions focus on the implementation of network inspection engines that perform directly on encrypted traffic that is actually intercepted and decrypted before the processing. This raises concerns about the privacy preservation and data integrity. In addition, decrypting and then re-encrypting network packets adds a significant end-to-end performance overhead. In order to offer privacy guarantees, numerous works propose hardware-assisted solutions, like TEEs. Using TEEs in untrusted infrastructures and OSs enables secure code execution in encrypted memory regions, something that offers strong privacy preserving guarantees. Of course, TEEs are prone to side-channel attacks [38,161], while swapping into enclaves adds performance overhead, especially when applications are I/O intensive like network processing.
Finally, in the third category "cryptographic functions, " we explore works that use techniques like searchable encryption that enable the direct processing of encrypted data without having to decrypt it. These techniques are very effective in terms of privacy preservation, but they also add important performance overhead for real-time systems (such as network processing systems).
Apparently, there is a gap in the literature when it comes to real-time traffic processing when the network is encrypted. While there are numerous machine learning techniques that effectively classify the nature of the traffic, there is no option for online traffic processing. Furthermore, works that enable the online encrypted traffic inspection add considerable processing performance overheads that harm the concept of real-time processing in Gbps networks. Thus, the future research directions that will significantly enhance the state-of-the-art must meet the following requirements: (i) effectiveness in terms of classification accuracy in fully encrypted networks, (ii) reduction of false positives, (iii) good processing performance, and (iv) real-time processing in Gbps networks with reduced packet losses.

Conclusions
The increasing adoption of network encryption has introduced major challenges to typical deep packet inspection systems. Traditional deep packet inspection engines that have been widely used until recently for network analytics and security are not able to process network packets in the same way, since packet payloads are now getting encrypted. In this survey, we examine how network traffic processing systems are adapting to current encryption trends and how network traffic inspection systems process the traffic even though it is encrypted.
Indeed, after the thorough examination of the literature, it seems that the research community has already started finding solutions on how to bypass the constraints that are caused by network encryption to traditional deep packet inspection systems. Many works focus on the functionality of their solutions using machine or deep learning methods and algorithms, while others develop sophisticated techniques to process encrypted traffic directly in an online fashion. Yet, we discover that there is still considerable room for improvement. We present the limitations that we identify in the works that we survey, enabling researchers to recognize the possible future directions within the research area of encrypted traffic analysis and processing.