Design and Implement a Hybrid WebRTC Signalling Mechanism for Unidirectional & Bi-directional Video Conferencing

ABSTRACT


INTRODUCTION
WebRTC (Web Real-Time Communication) was developed as a standard by the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) [1].It is an open source and a collection of protocols and standards [2].WebRTC allows the transportation of audio, video and data.Also, it does not need plug-ins, licensing, downloads and so on [3].It is a technology that consists of three principal components [4]: getUserMedia: allows a web browser to access the camera and microphone and to capture media, RTCPeerConnection: manages the peer-to-peer connection and RTCDataChennel: allows browsers to share arbitrary data.On the other hand, WebRTC does not specify any particular signalling mechanism or protocol between the client and the server [5].Moreover, it does not support the multi-browser communication essential for conferencing over participating browsers [6].Including, the client-server architecture that does not seem to be a feasible solution [7].Therefore, choosing the suitable network topology in the architectural design of the WebRTC application is considered as one of the most potential problems.Thus, it must select an architecture for the application while dealing with a multiparty of audio/video call in WebRTC [8].A signalling mechanism is the core of peer detection that coordinates the communication between users; it starts exchanging media and supports the establishing communication among users [1].Signalling connects the browser to a server and permits the participants to access this server.Many experiments have been achieved to offer video calls in WebRTC.Therefore, some of them are used XMLHttpRequest (XHR/polling).However, using XHR leads to waste of bandwidth and delay, as long as the browser keeps polling for data regularly and the server continues responding even when no messages can be sent or received [9].XHR is active with communication that does not need to full duplex approach [10].In addition, several developers used SIP (Session Initiation Protocol) with WebRTC to obtain video calls, nevertheless SIP still needs software such servers and installation [11].Besides, the current real-time communication APIs in an application is more cost efficient and faster than developing a SIP client [12].Furthermore, SIP has a high bandwidth consumption and delays as compared with other protocols such as Inter-Asterisk eXchange2 (IAX2) [13].In this paper, WebNSM was created for video conferencing based on RTCPeerConnection (API) using socket.iomechanism to connect between each of the browsers.Socket.io(API) offeres real-time bi-directional communication between a client and a server [14].RTCPeerConnection (API) is an array of URL objects that send any ICE (Interactive Connectivity Establishment) candidates to the other peer, handles the video stream, and starts offer/answer negotiation process, etc [15].WebNSM can provide hybrid characteristics as follows: (a) one-to-one (sample) bi-directional video conferencing, (b) oneto-one (sample) unidirectional video conferencing, (c) one-to-many (star) inidirectional video conferencing, (d) many-to-many (mesh) bi-directional video conferencing, (e) provides two kinds of communications, so each peer is free to be as a broadcaster or viewer, (f) determine room initiator, (g) keep a session productive even another participant leaves, (h) participants are able to share with all users, (i) join existing session, (j) stop self-streams and (k) sharing new user with current participants.Furthermore, WebNSM is useful to be used for various communications.For example, m-Health (many doctors can communicate many technicians and patients), e-learning (many teachers can communicate many students and many students can communicate others), communication applications, etc.In addition, it gives a user a full flexibility to use appropriate topology according to its resources.The essential objectives of this paper are to create a hybrid signalling mechanism to serve different topologies at the same time.In addition to designing and implementing a WebRTC video conferencing for many users, including an evaluation of signalling performance, bandwidth consumption, CPU performance, memory usage, Quality of Experience (QoE), using mesh topology (full duplex), star topology (simplex/unidirectional) and calculating the maximum links and RTP (Real Time Protocol).This paper is organised and outlined as follows, Section 2 reports on survey WebRTC related work.In section 3, the methodology of the paper is explained along with implementation and analysis.Section 4 discusses the evaluation.Finally, Section 5 has the conclusion and future work.

RELATED WORK
Different developers attempted to create or develop a signalling mechanism or a protocol for WebRTC.However, most of them faced some reasons.The following elaborations will describe some of these issues: As mentioned in [16], signalling management has not yet been specified by WebRTC to allow the developer to modify, reuse existing protocols and permits them freedom to design their signalling to avoid redundancy and to increase compatibility with established technologies [11].Moreover, an overview of WebRTC video conferencing architecture using MCU (Multipoint Conferencing Unit) was shown in [17], including a demonstration of some challenges.However, this scenario does not discuss any signalling mechanism or protocol while the proposed test was relying on using MCU that can be applied using a single connection.Also, [17] ran an application of WebRTC video conferencing using the Licode-Erizo (MCU) and Samsung Galaxy for each participant.Licode offers a client API with -Erizo that handles connections for virtual rooms and a server API for communication.Nevertheless, without using the third party (Licode-Erizo) it cannot run this application.The test was achieved among three rooms each room consists of maximum three participants, as well as they have not presented anything about the signalling mechanism.On the other hand, as illustrated in [18], using MCU is very expensive, and [19] mentioned that MCU is costly and it can be rented from service providers during a conference, although some video conferencing CODECs are able to support up to 4 users.Adding to that, [18] emphasised that MCU consumes a significant amount of bandwidth.According to [20], implemented REST APIs (Representation State Transfer) interoperating with SIP (Session Initiation Protocol) over WebSocket protocol to control the signalling message exchange for the audio/video call via Chrome.However, the signalling should be supported by a central component (named REST service) to exchange messages and establish media channel, besides the communication had 5 seconds in delay and was done between only two browsers.Additionally, [21] evaluated the performance of WebRTC video calls using the node.jsserver, WebSocket protocol for the signalling and TURN servers.This evaluation was done over different topologies such as a mesh (using separate switches) and star (using MCU).On the other hand, the calls were established between three participants in each topology using a fake device and video frame instead of employing a live camera.Besides, all calls were forced to stream through the TURN servers.Moreover, [11] designed and implemented a novel WebRTC signalling mechanism for chat messages using WebSocket via Node.JS cross-platform on the local host.The signalling of this application only supports a chat between two peers.

METHODOLOGY, IMPLEMENTATION AND ANALYSIS 3.1. Methodology
Thirty computers were used as seventeen PCs (CPU Xeon & 16 GB RAM), three Laptops (core i5 & 4-8 GB RAM), ten PCs (CPU Core i5 and i7 with 4-12 GB RAM) were connected through Wired of Local Area Network and Wide Area Network, Logitech cameras and microphones.

Implementation
A test-bed lab was created to implement a hybrid signalling mechanism in real-time implementation for video conferencing.Therefore, several methods and APIs have been embedded to be used coherently.This implementation can be divided into following:

Setup a Browser Web Page
The main HTML (web page) of this experiment was programmed using JavaScript and Firefox to set up many features, such as opening room, mute-audio/video, using full-screen, using volume slider and screenshot.In the beginning, to open a room there must always be a room initiator while the participants are free to select "As Viewer" to watch and listen to the broadcaster or select "As Broadcaster" to set up bidirectional video conferencing, as well as the communication can include both as broadcaster and viewer to stream and view the video.All peers do not need to specify "user-id" since they are using the same URL as "user-id" to access the main page.Otherwise, they cannot join the room.In this application, communication has one initiator and different peers as viewers and broadcasters.When the room is opened, it will arbitrarily audio and video to present MediaStream, which can be obtained using navigator.getUserMedia()method to create a synchronised video and audio.After getUserMedia, a web browser will request permission to access the camera and microphone to capture peer"s screen.A camera will start streaming when the permission is given; now the application is ready for other peers to join the room.On the other hand, when peers would like to be as viewers they do not need to invoke their camera and microphone, while they will only receive videos.These steps of opening/joining the room applies to every peer, as well as stopping the streaming of their camera/microphone without influencing on the rest.Figure 1 shows the main page and the options.

WebNSM (A Hybrid Signalling Mechanism)
This signalling must occur before a Peer-to-Peer (P2P) connection can be occurred [22].WebNSM was created using RTCPeerConnection API and socket.io(API) mechanism for an instant handshake.

393
Therefore, WebNSM must be carried out before streaming can begin between peers.It relies on offer and answers negotiation process to describe the SDP (Session Description Protocol) of the session.The offerer is a peer who initiates the session to connect other peers.In contrast, the answerer is asked for connection from the offerer.The offerer is assumed to know the answerer"s URL and then requests a connection through WebNSM.When the initiator opens the main room, WebNSM will be ready to support any offerer and detect a room presence.Thus, several functions and steps have been employed to create it.First of all, it should transmit the data as a String and setup a default channel passed through constructor using "connection.channel= channel || RMCDefaultChannel".Additionally, it connects with a signalling channel when only the first participant is found using invoke "getUserMedia" then initRTCMultiSession function.
WebNSM was built to accomplish many characteristics, such as determining the room initiator "connection.initiator= true", allowing a single user to join a room "connection.join= joinSession", keeping a session active even if the initiator leaves (clone data from initial moderator to the second initiator and make sure that if second leaves.The control is shifted to a third person if the initiator wants to close an entire session then shifts the initiation control to another user), hearing new user with existing participants on New Participant (response)"", participants are shared with a single user or with all users, if the initiator disconnects sockets, participants should also disconnect, close the entire session, reject user-id, disconnect for all, open private socket that is used to receive offer-sdp "newPrivateSocket" and ask other users to create offer-sdp and function PeerConnection.They also utilise RTC (Real Time Connection) to send data "connection.send= function(data, _channel)", initialize "RTCMultiSession" which is the backbone object.The custom devices are selected and screen_constraints, such as screen.width,screen.height.Participants also check if the screencapturing extension is installed.When a stream is stopped, it must be removed from "attachStreams" array to allow re-capturing of the screen, if the muted stream is negotiated, audio/video are fired earlier than screen, stop local stream ""if (response.stopped)"",stop remote stream ""if (response.promptStreamStop"",create an offer SDP using "createOffer()" function, create answer SDP using "createAnswer()" function, createDescription() function, getBrowserInfo() function, construct a new RTCPeerConnection, trigger the stun server request, match just the IP address, remove duplicates, listen for candidate events and etc.
To establish a peer-to-peer connection, both clients need to create an RTCPeerConnection object.Then, each peer needs to obtain their Session Description, an object that indicates what kind of data they want to send to the other client through the connection and what they can do by built-in methods of the RTCPeerConnection object.Thus, the offerer will send the request to the answerer for the availability, including SDP offer to receive audio and video.The answerer (initiator/broadcaster) will receive the request and sends a confirmation of the availability as "room is active" with the SDP constraints to receive audio and video.The offerer gets the remote stream and creates an offer using "getLocalDescription" with RTCPeerConnection.Additionally, the offerer creates DataChannel method which is added to the RTCPeerConnection to create an "RTCDataChannel" object.When an "RTCDataChannel" on the offerer"s side is generated, the offerer invokes "createOffer" of RTCPeerConnection, thereby enabling "createOffer" to return an offerer"s SDP message.The offerer enables the SDP-offer message by setting various information and send them through WebNSM.For instance, bandwidth information, using the period audio and video codecs, etc.Additionally, both the offerer and answerer change WebNSM state to "stable", to realise that there is no offer/answer exchange in progress.Once the "SDP-offer" message reaches the answerer through WebNSM, the answerer also initiates its RTCPeerConnection instance to accept the request.The answerer uses the "SDP-offer" into its RTCPeerConnection to create an "SDP-answer" and then forward it to the offrer.Also, the two clients need to exchange information about communication methods that they can use to reach each other.These communication methods are known as ICE Candidates and they will be exchanged through the WebNSM.Now the answerer and offerer are able to respond and they both configure the Real Time Communication (RTC) packets transported.After two peers exchange SDP-offer/answer and ICE candidates, they can create their session.The answerer and offerer "add SDP" to candidate UDP by the host IP for both of them.The other participants can join the session based on similar steps.
According to a communication as viewers, when an initiator is active for streaming, a peer is able to accede the room as a viewer after detecting a room presence using WebNSM.WebNSM sends a notification to the initiator that " a participant has asked for availability and the target has no stream".In other words, it is a unidirectional video conferencing from an initiator to a viewer.An initiator receives a request and sends a confirmation of the availability as "room is active" with the SDP constraints.Thus, an initiator has started broadcasting the audio and video to the viewer.In contrast, if there are other broadcasters, a viewer will communicate all of them, so the viewer can communicate all broadcasters by receiving their audio and video at the same time.In addition, a session can be active even if any broadcaster leaves; also all viewers communicate all broadcasters at the same time.

Analysis
This test was achieved among thirty peers during three to four minutes over Local Area Network (LAN) and Wide Area Network (WAN).The Quality of Experience (QoE) was used because it offers significant insight for developers on how the peers experience the quality of their video and audio applications [3].Also, a measurement of CPU and memory usage using the task manager of Windows 10 within the established connection was obtained, including WebNSM performance via inspect element of Firefox in real-time communication.The analysis can be explained as follows:

WebNSM (A Hybrid Signalling Mechanism)
A performance of WebNSM has been analysed individually among two to thirty users according to two concepts; the first was based on the delay to get ready and the second depends on sending a request and receiving a response.Therefore, WebNSM over LAN network consumes 79 (milliseconds/ms) as minimum consumption and 113 (ms) as maximum consumption to get ready, as well as it consumes 106 (ms) as a minimum use and 120 (ms) as maximum consumption to send a request and receive a response.The mean time was calculated so WebNSM expands 89 (ms) to be ready and expands 111 (ms) to send a request and receive a response.On the other hand, WebNSM over WAN network consumes 78 (ms) as minimum consumption and 89 (ms) as maximum consumption to get ready, as well as it consumes 106 (ms) as minimum consumption and 124 (ms) as maximum consumption to send a request and receive a response.The mean time was calculated so it expands 83 (ms) to be ready and expands 111 (ms) to send a request and receive a response.Based on the consumed time, it has noticed that LAN & WAN networks are exhibited a convergent consumption.WebNSM has an efficient performance while it leads to setup, establish and end a session.

Quality of Video Conferencing
Actual users have participated in this scenario to give their individual opinions on the perceived user experience by the use of questionnaires.The quality of audio and video has been analysed based on three topologies: a. Bidirectional (mesh): the quality of audio and video up to ten peers using bi-directional system were excellent.However, due to CPU limitations, the increasing of a number of peers influenced the quality of audio and video.Thus, it would not raise the number of users, while CPU capability was not able to communicate anymore.b.Unidirectional (simplex & star): this scenario was specified for viewers.All viewers were connecting to all broadcasters from different devices concurrently, but they were not able to connect between themselves.The quality of audio and video up to thirty peers as one broadcaster and 29 viewers using unidirectional system were excellent.Nevertheless, it would not increase the number of viewers, while CPU capability was not able to communicate anymore.c.Hybrid (Bi-directional & Unidirectional) system: the quality of audio and video using both topologies were excellent.Nevertheless, due to CPU limitations, the number of users was limited especially when the number of broadcasters was raised.Moreover, as much as the number of broadcasters is decreased it would be possible to enhance the number of viewers, while the broadcasters are using mesh topology, which needs a high CPU usage.

Mesh Topology
In a mesh, any conference member can invite another user to join or leave at any time without influencing the remaining participants.In addition, all peers connect among themselves to transmit data from different devices simultaneously.Thus, many links can be created among peers, so there is p*(p-1) number of connections where p is the number of peers.Moreover, each peer needs a minimum of four RTP (Real Time Protocol) to transmit data.Therefore, communication in mesh requests a high CPU and high bandwidth speed, as long as each peer sends and receives different RTPs from the all connected participants at the same time as illustrated: one RTP port for outgoing video, one RTP port for outgoing audio, one RTP port for incoming video and one RTP port for incoming audio.

CPU Performance
It plays a significant role on WebRTC video conferencing, especially using mesh topology.In this experiment, a Xeon CPU was used which is a new generation that has very high performance and bandwidth connectivity to meet the most exacting camera viewing, management needs and processing [23], including CPU core i5 and i7 was used.Mesh handles a high load due to different sources is sending and receiving the videos at the same time, this loading will impact the CPU performance which in turn affects the quality of audio and video.On the other hand, CPU performance in the hybrid unidirectional system was exhibited with rather a low usage than bi-directional.In the meantime, using unidirectional system requires CPU abilities less than the bi-directional system.Each viewer requires a maximum of two RTPs (Real Time Protocol) from each broadcaster to receive data as one RTP port for incoming video and one RTP port for incoming audio.Using simplex will promote resources while it requires less CPU and bandwidth consumptions than mesh topology.Figure 3 displayed the CPU performance on the broadcaster side.

Memory Usage
Practically, memory did not consume much capabilities while peers only need to hold a small amount of session state data, such as when peers are connected.Also, the conferencing was in real time; therefore, there is no need to utilise a high memory as needed for storing or uploading data.Memory usage did not impact the quality of the audio and video or communication, so all needed over LAN and WAN networks was between 18% to 38%.

Bandwidth Consumption
Different users have different bandwidth speed while each peer might use the various browser, as well as bandwidth requires to handle the overall session grows for every new participant [14].In this fashion, each browser is built or can be forced based on several video codec and audio codec so that they will consume different bandwidth depends on their codecs.This system used Firefox that relies on Opus audio codec which can change bitrates dynamically from 6 kb/s to 510 kb/s [24]; and VP8 as a video codec.According to this analysis, the following results were found: each peer needs to minimum 1Mb/s bandwidth for each RTP on the video via LAN and WAN networks and needs to 52 -55 kb/s bandwidth for each RTP on the audio via LAN and WAN networks.As a consequence, bandwidth consumption leads to a bottleneck on the client end, which effects on Quality of Experience (QoE) of video and audio, and the performance may drop significantly [25].Figure 4, Figure 5, Figure 6 and Figure 7 present the difference of bandwidth consumption via broadcasters and viewers on LAN and WAN networks.

Hybrid Topology
A host peer should initiate and start its browser to allow any user to participate in the session at any time without affecting the remaining participants, so using different systems allowing all peers to connect with each other as viewers and broadcasters to transmitted data from different devices simultaneously.A hybrid uses different topologies and gives the users flexibility, reliability and multi-choice of communications such as initiator, broadcaster or viewer.Moreover, it allows several resources such as devices, networks and users to obtain video conferencing without any registration, downloading or installation and can be used in different applications.Using this scenario shows that it built a strong WebRTC application that works across multiple browsers, networks and topologies.Figure 8, indicates the architecture of the hybrid system.

EVALUATION
It is proved that WebNSM is able to setup, establish and close a session over LAN or WAN networks.WebNSM is able to offer simplex (unidirectional), star (unidirectional) and mesh topology (bi-directional).On the contrary, it is affected by the CPU, which limits the number of peers.A performance of CPU and bandwidth consumption has major issues in audio and video conferencing, while video conferencing requests the processor for decoding, encoding and providing the video and audio concurrently.This can be defined as CPU stress and it depends on different elements e.g. the used codec"s and the quality of the audio and video.In addition, the variety of bandwidth speeds among the various users can impact the quality of video and audio.Therefore, mesh topology requests a high CPU and high bandwidth speed.For instance, when a user uses CPU core i5, they cannot perform as another user, who uses CPU Xeon, etc.In other words, as high as the CPU core, it will lead to allow more peers to join, better communication and encoding & decoding.Thus, CPU Xeon, which has very high performance and bandwidth connectivity in order to find out the difference among the existing CPUs, was used.According to the indicated limitations, it can be emphasised that CPU plays a significant role in communication and the number of peers, as long a bandwidth does a leading role in the quality of audio and video.The available CPUs at the used computers (e.g.Core i5 & Core i7) are not able to encode, decode, send and receive video conferencing at the same time more than eight peers via mesh topology in real implementation.This is a very productive system that offers two mechanisms for video conferencing.The user is free to choose the appropriate mechanism based on its available bandwidth, and CPU capabilities, as well as this system is changeable as long as the user can change its position from broadcaster to viewer conversely.Additionally, the participant can simply join the session as a broadcaster (using mesh) or as a viewer (using simplex), so using the hybrid system reduces the load on the CPU and bandwidth consumption efficiently and without impacting other participants.The quality of experience (QoE) verifies that this testbed environment works correctly and that it can be used to conduct more extensive experiments on user expertise in the future while having high core CPUs.

CONCLUSION AND FUTURE WORK
In this paper, a hybrid WebRTC signalling mechanism and video conferencing using uni-directional and bi-directional systems were designed and tested in real implementation among thirty PCs.Besides, WebNSM can be considered as a novel signalling mechanism while it presents a flexible communication among users.Moreover, this can be applied in different applications, such as get a group of people together on one call at the same time, conferencing among users, entertainment.e-Learning between teacher and students, m-Health among patients and doctor or specialist and technicians, etc. WebNSM takes an average of 89 (milliseconds) to be ready and 111 (milliseconds) to send a request and receive a response, even when the network is congested.A deep explanation of CPU performance, memory usage, signalling performance, RTPs calculation, QoE, mesh topology and simplex topology in a physical implementation was done.This scenario is efficient while it provides visually demo over the various devices and networks with a user that requires deep explanation and face-to-face communication.Also, it improves communication & reinforces relationships and increase productivity among users and teams.In the future, there is an intention to expand this work over more scalable video conferencing using MATLAB simulator to discover the effectiveness of resources in WebRTC.

Figure 1 .
Figure 1.Shown the main web page using Firefox Int J Elec & Comp Eng ISSN: 2088-8708  Design and Implement a Hybrid WebRTC Signalling Mechanism for... (Naktal Edan)

Figure 3 .
Figure 3. Demonstrated CPU performance based on the initiator end over both LAN and WAN networks

Figure 4 .Figure 5 .Figure 6 .Figure 7 .
Figure 4. Illustrated the bandwidth consumption of audio and video over LAN network as broadcasters.The unit of bandwidth is kb/s

Figure 8 .
Figure 8. Demonstrates the architecture of hybrid systems