Time-sensitive autonomous architectures

Autonomous and software-defined vehicles (ASDVs) feature highly complex systems, coupling safety-critical and non-critical components such as infotainment. These systems require the highest connectivity, both inside the vehicle and with the outside world. An effective solution for network communication lies in Time-Sensitive Networking (TSN) which enables high-bandwidth and low-latency communications in a mixed-criticality environment. In this work, we present Time-Sensitive Autonomous Architectures (TSAA) to enable TSN in ASDVs. The software architecture is based on a hypervisor providing strong isolation and virtual access to TSN for virtual machines (VMs). TSAA latest iteration includes an autonomous car controlled by two Xilinx accelerators and a multiport TSN switch. We discuss the engineering challenges and the performance evaluation of the project demonstrator. In addition, we propose a Proof-of-Concept design of virtualized TSN to enable multiple VMs executing on a single board taking advantage of the inherent guarantees offered by TSN.


Introduction
The increasingly high level of automation of current Autonomous Vehicles prototypes (AVs) requires unprecedented computing power, at reduced Size, Weight and Power (SWaP), but at the same time must meet the tight latency requirements of advanced control loops, providing real-time guarantees.To accomplish higher automation, we foresee that AVs will be equipped with a number of increasingly complex sensors, processors, and Electronic Control Units (ECUs), which when connected all together further exacerbate the (already critical) integration tasks of Extended author information available on the last page of the article 1 3 Real-Time Systems (2023) 59:568-608 the in-vehicle network architecture, and the amount of data transiting throughout the system.Moreover, these systems involve communications with external edges (Vehicle-to-everything, V2X), such as traffic lights, intelligent pedestrian assistants, nearby vehicles (V2V), and much more.
Nowadays, the in-vehicle network uses different types of buses to communicate, such as the Controller Area Network (CAN) (Corrigan 2016) and its derivatives, FlexRay (FlexRay Consortium 2010), Local Interconnect Network (LIN) (Hackett 2022), amongst others.However, these types of networks do not have a high bandwidth capacity; hence, Ethernet-based communications were introduced for both off-/in-vehicle networks (Wang et al. 2019).Since safety is one of the most important requirements in the automotive domain, it is necessary to evaluate and set all characteristics of the used communication in a worst case scenario, i.e., latencies, maximum bandwidth, jitters, packet loss rate, etc. Modern high-performance embedded ECUs, such as those powered by NVIDIA or Xilinx accelerators, can work jointly with off-vehicle cloud servers to serve this purpose.However, the key point of predictable data transmission amongst them is still an open research area.A more recent technology enabling networking capabilities for high bandwidth and low latency communication in a mixed criticality environment is named Time-Sensitive Networking (TSN) (Farkas et al. 2018).TSN is a set of standards that extend the Ethernet protocol by focusing on transmission time guarantees (Finn 2022).Ethernet TSN communications can be achieved with specific components providing switching and regulation to frames traffic, such as silicon components or Field Programmable Gate Array (FPGA) based switches.The first challenge of our interest is applicability of TSN-based networking to complex applications like autonomous cyber-physical systems.
Another fast-growing automotive trend aims to provide unprecedented flexibility and ease of development to hardware/software architecture.A Service-Oriented Architecture, on the one hand, can deliver a Software-Defined Vehicle (SDV) that can evolve and adapt along with time and the user's needs.The edge-cloud continuity first enables a convenient development and testing environment during the product conception, and then allow computing distribution along vehicle and infrastructure during the product operation (Kane et al. 2022;Andreozzi and Shirasat 2022;SOAFEE 2022).The second challenge of our interest is enabling softwaredefinedness to TSN-networked hardware/software architectures.
This paper introduces Time-Sensitive Autonomous Architecture (TSAA), the next-generation software multi-zonal architecture that leverages TSN protocols to manage mixed-criticality system data flows in software-defined vehicles.
Our first contribution is the description of the TSAA concepts and its applicability in real-time applications.Our case-study is based on a real self-driving car that is capable of accomplishing safety-critical L4 functionality, i.e., following a predetermined trajectory in a safe manner, even in the presence of high network traffic/ interference.This case-study was instantiated, tested, and evaluated directly on-field in our Maserati Quattroporte, a luxury sports car targeted by this project, shown in Fig. 1.
Our second contribution is a Proof-of-Concept (PoC) of Virtual TSN (VTSN), a software solution that brings TSN switching capabilities to a virtualized Ethernet endpoint.Flexibility and scalability of the hardware/software are provided to a hypervisor-based environment where virtual machines can enjoy TSN and its quality-of-service.With VTSN, a single Ethernet port is managed by a dedicated VM executing the TSN scheduling algorithm.This dedicated VM is used as a proxy by the other ones for incoming and outgoing traffic.An early prototype of VTSN prototype is evaluated to demonstrate feasibility and validity of the approach.
This paper is organized as follows: in Sect.2, we review the literature of existing technologies in the realm of virtualization and timing sensitive networks.In Sect.3, we detail the TSAA future prototype by describing and motivating our choices.In Sect.4, we introduce a PoC as well as a feasibility analysis of the VTSN solution.In Sect.5, we explore the engineered solutions taken for the case-study of the TSAA applicability to a self-driving vehicle capable of following a predefined trajectory even in the presence of interference.In Sect.6, we present the prototype in its current state.Finally, in Sect.7 we summarize the experience we gained by engineering the autonomous car, and open future research directions.

Related works
Systems designed with virtualization reported great advantages related to security, cost, reliability, availability, and adaptability, making it a valid choice with remarkable performance (Obasuyi and Sari 2015).In recent years, paravirtualization has exhibited higher performance compared to full virtualization, where all the hardware is emulated, like in Qemu (Motika and Weiss 2012;Fayyad et al. 2013).In addition to partitioning computing resources, paravirtualization enables multiple applications to gain access to the hardware resources on the host machine (Obasuyi and Sari 2015).
One hardware resource concerns connectivity, as the scarcity of communication connectors raises new challenges for hypervisors: they need to also act as a transparent proxy between the network and the OSes operating in a virtualized environment.Techniques have therefore been proposed in order to enable time-sharing mechanisms for shared hardware resources across all guest machines through different protocols.The most well-known protocol used in system automation is the Controller Area Network (CAN) (Gergeleit and Streich 1994), for which an extension for virtual environments vCAN has been proposed with great results (Herber et al. 2014;Breaban et al. 2016).The field-bus protocol EtherCAT (Jansen and Buttner 2004) is another example.This protocol is suitable for both hard and soft real-time computing requirements in automation technology.It has been used in a real-time virtualized context for industrial automation, and EtherCAT has proven to be a suitable choice for real-time control systems with guaranteed performance (Huang and Lu 2014).However, EtherCAT has strong requirements in terms of hardware supportspecific controllers are required for 'slave' hosts-and topology-only lines and rings are allowed.
The original Ethernet has been extended to allow sharing a Network Interface Card (NIC) between multiple guest machines, where, once again, it is shown that paravirtualization offers better performance compared to emulated contexts (Motika and Weiss 2012).Dong et al. (2012) propose the use of the single-root I/O virtualization (SR-IOV) standard for sharing the Network Interface Card.This technique uses PCI Express (PCIe) and needs to be supported by the hardware device.
The typical approach for Ethernet virtualization consists of having a virtual machine acting as a proxy for the others.For example, the hypervisor Xen (Barham et al. 2003) uses bridges for connecting the VMs to the device, and research is ongoing on how to improve this type of communication for real-time scenarios (Li et al. 2015(Li et al. , 2022).An alternative technique involves having a specific VM for managing the NIC.This method is at the base of our VTSN Proof-of-Concept, and it has also been explored by the Ethernet virtualization investigation by Borgioli et al. (2022).Although their work considers similar reference hardware and virtualization designs, it uses the proprietary CLARE hypervisor and focuses on the memory regulation capabilities offered by QoS-400.Our approach instead is based on the open source Jailhouse hypervisor, targets the real-world integration and virtualization of TSN technologies.A comparison with the Ethernet virtualization approach proposed by Borgioli et al. (2022) cannot be performed because it is based on a closed-source hypervisor.However, we compared our solution against Xen, and we show that we greatly outperform it in Sect.4.3.
In recent years, the virtualization of TSN started to be addressed.Biondi et al. (2021) advocate the use of this technology for autonomous driving architectures.Leonardi et al. (2020), Caruso et al. (2021) propose to enable TSN communications in heterogeneous platforms handled by a hypervisor, while Garbugli et al. (2022) propose a novel approach to support the TSN protocol in virtual machines through a precise clock synchronization method.In Garbugli et al. (2023), a solution is presented for meeting the time-sensitive requirement in containers based on Kubernetes (Cloud Native Computing Foundation 2014).
While the use of TSN technology has been explored in V2X communication scenarios (Ding et al. 2022;Boutin et al. 2021), Ethernet TSN is a natural match for in-vehicle communication in autonomous driving (AD) scenarios.As highlighted by Brunner et al. (2017), ECUs in commercial vehicles are growing in number and complexity.This increases the need for a communication technology like TSN to guarantee the functional safety of the vehicle.Lee and Park (2019) proposed a TSN integrated environment simulator and measured the overall reduction of end-to-end latency in autonomous driving use-cases, while Park and Park (2023) presented the use of Time-Sensitive Networking for zone-based in-vehicle network architecture.The benefits of TSN for in-vehicle communications are also explored by Farzaneh and Knoll (2017), who developed an experimental test bench to demonstrate the low latency and jitter of Ethernet TSN technologies.
The aforementioned work, regarding TSN-enabled autonomous driving architectures, used emulated environments.In this work, we have deployed and evaluated the actual possibility of using a TSN-enabled autonomous driving architecture in a real environment.

Architecture description
In this section, we describe the different components that are embedded in our car to form a complete TSAA solution.As shown in Fig. 1, our car was modified to enable research projects like this one.It has a custom low-level vehicle controller, implemented in a Centralized Controller Unit (CCU), which gives full control of both the steering wheel and the throttle/brake pedals via CAN bus signals.In general, the tasks performed by an autonomous driving system are to follow a reference trajectory, to detect and avoid possible obstacles.As depicted in Fig. 2, our self-driving software stack features the following key elements: -Localization: provide the coordinates of the vehicle in its surrounding area according to the information gathered by the different sensors; -Detection: allow the detection of other vehicles, pedestrians, etc., to avoid collisions and a safe journey; -Local planning: dynamically generate alternative trajectories according to the position and detected obstacles; -Actuation layer: determine the correct and safe commands to send to the CCU to follow the selected path according to the current position; and -Monitoring: display current information about the system state.
In the near future, more and more features will be added to autonomous driving systems, which will require an ever-increasing amount of computational power (more ECUs) with more communication, real-time guarantees, and predictability, even for communications.As stated in Fumio et al. (2022), Robert Bosch GmbH (2017), the current auto-market is pushing for a multi-zonal architecture in place of a multidomain one.A zone groups ECUs/sensors based on their proximity instead of their functionalities.The main benefit of having multiple zones lies in the reduction of the necessary wires to connect the involved components.
Since realizing a multi-zonal architecture with a different number of ECUs has a high cost, we are limiting our prototype to a dual-zone architecture, which is enough to test and evaluate a TSAA system.Therefore, in our use-case architecture, depicted in Fig. 2, we identified two zones: 1. Zone 1 to localize and monitor the vehicle, 2. Zone 2 to control the vehicle (accelerating, steering, braking, ...).This design is still extendable by either augmenting zones or by adding more, for example, for infotainment.
Due to the heavy computational demand, in addition to zones, a typical design solution for autonomous driving systems is to partition the computation between multiple computing platforms.
The following subsections describe our choices regarding the different hardware, middleware, and software components.
By pulsing infrared light, and measuring the return travel time upon colliding with target objects, the LiDAR maps the distance between itself and all objects surrounding it.The VLP-16 provides the ranging data of 1800 points for each of its 16 vertical channels on a 360 • field of view and delivers them via an Ethernet connection.The sensor packets are transmitted by the LiDAR at a frequency of 10 Hz.

3
The IMU provides an accelerometer, a magnetometer and a gyroscope, which are used to estimate the movements and rotations of the vehicle.The IMU exposes two interfaces: a parallel serial (RS232/RS422/UART) and a USB.The latter was chosen as a physical interface for ease of use and port availability on the used ECUs.The IMU provides data at a frequency of 400 Hz.
The camera generates an image stream at 60 frames per second with a resolution of 1440×1080 and is connected via USB.
All the sensors are used for the navigation of the vehicle in its surrounding environment.The combination of their gathered data allows them to provide a robust and accurate localization.In particular, the IMU provides position information at a very high frequency but with low measurements accuracy, while the LiDAR provides high accuracy measurements at a much lower sampling frequency.Moreover, the high frequency of the IMU allows it to correct the LiDAR distortion.For a similar reason, the fusion of camera and LiDAR data allows for high accuracy and responsiveness in object detection, such as pedestrians.
Note, the CCU, already present in the car, also provides speed and odometry information that can be used for safety and debugging purposes or to increase localization algorithm precision.We reserve their usage for future work.

ECUs
We target two well-known platforms from the Xilinx Zynq UltraScale+ MPSoC ZCUs family: ZCU102 and ZCU106.They both feature a multi-core CPU and an FPGA accelerator that serves as an energy-efficient co-processor (Qasaimeh et al. 2019).Such platforms are widely adopted in several application domains, e.g., automotive, autonomous drones, computer vision, etc (AMD Xilinx 2022b).Nonetheless, FPGA-based architectures are less widely adopted than General-Purpose computing on Graphics Processing Units (GPGPUs), due to their complexity, but have shown to provide a comparable or higher Performance/Watt trade-off and an increased predictability (Brilli et al. 2018;Liu et al. 2019).
As shown in Fig. 3, the two platforms feature two CPU clusters: one with 4 × ARM Cortex-A53 cores and the other with 2 × ARM Cortex-R5F cores.Both include an FPGA with slightly different characteristics: the ZCU102 is composed of 600 K system logic cells, 32 MB of memory, and 2520 Digital Signal Processor (DSP) slices, while the ZCU106 consists of 504 K system logic cells, 38 MB of memory, 1728 DSP slices, and has a Video Codec Unit.
The FPGA of the ZCU102 is occupied by a further described TSN connectivity module, while the ZCU106's FPGA embeds an acceleration unit for the detection.This choice is motivated to maximize the occupancy of the FPGAs.
The sensors are placed as close as possible to their main consumer.This reduces communication latencies, granting better responsiveness.The LiDAR and IMU are then directly connected to the ZCU102 as their data is only needed by the localization.The camera is connected to the ZCU106 in order to facilitate the access to its main consumer, which is the detection algorithm that needs fresher image data 1 3 Real-Time Systems (2023) 59:568-608 and requires more reactivity than the localization.Hence, the image stream from the camera will be sent to the ZCU102 for localization.A different design could connect the camera to the ZCU102, but it would increase the pressure on this board.
On the connectivity side, both platforms have only one built-in Ethernet port.Hence, to connect all the involved devices (including the LiDAR and monitor), the ZCU102 hosts the Opsero Ethernet FPGA Mezzanine Card (FMC) 4-port switch.

Multiport TSN switch (MTSN)
To perform the packet switching in the network in a timely manner, we rely on the System-on-Chip Engineering (SoC-e) Multiport TSN switch (MTSN), it is sold as: "SoC-e solution for any customer that requires an all-in-one solution to introduce TSN capabilities in their equipment" (SoC-e 2010).The MTSN switch comes in the form of an Intellectual Property (IP) core bitstream loadable on a FPGA.In sectors like the automotive industry, it offers a good level of flexibility, leading to great portability and future extensions, e.g., more zones, more platforms, more sensors.This IP core is designed for a variety of configurations from a simple 2-port endpoint to a complex multiport TSN switch.It can be configured with a wide range of parameters, including the number of ports (up to 32) and the size of the queues for each port.These configurations can be changed using the Xilinx Vivado Tool (AMD Xilinx 2022a).
Figure 4 shows our system ports configuration and how the Gigabit Ethernet Media Access Control (MAC) controllers (GEM) are connected to the MTSN switch to provide an interface for connecting to a 10 Mbps, 100 Mbps, or 1 Gbps network in full-duplex capability.All four ports of the FMC located on the ZCU102 are routed to MTSN ports (PORT_0,1,2,3), in order to perform external (to the board) communication.In our setup, the ZCU102 receives incoming camera frames through GEM1, and sends localization coordinates through GEM2.Note that PORT_3 is connected to an external computer that we use to generate interfering traffic during our experiments, see Sect.5.2.This computer would not be part of a real deployed system, and so the port would remain unused.
Of the three internal ports, PORT_4 is used as the management port of the switch and is connected to GEM0, while PORT_5 and PORT_6 are connected to GEM1 and GEM2, respectively.By describing this information in the device-tree, Linux is able to recognize how the GEMs are mapped into the port of the MTSN and it attaches an Ethernet interface for each of them, guaranteeing better isolation between the transited data flows.
From the various features that the MTSN includes, it is important to mention the full support for Virtual Local Area Network (VLAN) (IEEE 802.1Q), and the two TSN schedulers: Credit Based Shaper (CBS-IEEE 802.1Qav) and Time Aware Traffic Shaping (IEEE 802.1Qbv).The Qbv scheduler separates the communication on the Ethernet network into fixed duration, periodic time cycles, while Qav permits the reservation of the maximum bandwidth needed by traffic classes with different criticality levels or priorities.In Zhao et al. (2022), a comparison of various individual traffic shapers was conducted across different network topologies, evaluating their end-to-end latency bounds.The experiments revealed that Qbv outperforms Qav.Despite this finding, the paper also proposes the combination of both protocols, as also suggested in Meyer et al. (2013) and Alderisi et al. (2012).The combination of Qav and Qbv can provide a more robust and reliable communication infrastructure for safety-critical applications.Qav can dynamically manage bandwidth allocation for different flows, while Qbv guarantees deterministic timing for critical flows, minimizing potential delays and jitter.When using both protocols together, it is essential to carefully configure and coordinate their settings to avoid conflicts and ensure optimal performance.Proper configuration of time slots and credit assignments is crucial to achieving the desired real-time behavior and bandwidth prioritization.Additionally, the specific hardware and software components in the network must support both Qav and Qbv.Compatibility between devices and adherence to the TSN standard are essential to achieving seamless integration and interoperability.
In our case-study, we only rely on Qav, as Qbv requires a global notion of time shared by all platforms, and thus requires additional mechanisms for time synchronization which are not available at the moment from the upper OS layer.Moreover, the end-to-end latency upper-bound of Qav is suitable for our current application.As of now, the combination of both protocols is reserved for future work.
SoC-e's implementation of Qav gives full control to the user, i.e., it allows to set up to eight priority queues (one per priority level), where different priorities can share the same queue.For each couple (queue, port), the user can decide to enable CBS, which schedules the different queues based on the given credit and priority values.This requires to also set the idleSlope value to specify the slope of increasing credits as a percentage.If CBS is disabled for a particular (queue, port), then no TSN regulation is employed on this data flow.A configuration example for a queue is given by Listing 1.We experimented with different configurations for the queue in Sect.5.2 to show how these parameters can affect the transmission of the packets.
In our setup, we require six data flows with different priorities.From the ZCU102, both data flows coming out (localization coordinates), and coming in (camera feed) are of high criticality and require a high priority.The data reported by features for monitoring purposes are, however, of less criticality and need a lower priority.More details on the final configuration and data flows arising in the car are given in Sect.5.2 when describing the experiment.

TSN protocol IEEE 802.1Qav
This section acts as a reminder on CBS, defined in IEEE 802.1Qav.It schedules the traffic in sorted queues based on an Ethernet frame's priority level (7 highest to 0 lowest) given by the Priority Code Point (PCP), a 3-bit field defined in the IEEE 802.1Q standard (Fig. 5).
Each queue has a credit value used by the algorithm to determine if the queue is allowed to forward a frame through the network or not.From the definitions in IEEE (2018), the credit changes according to two parameters: 1. idleSlope: The charging rate of credit, in bits per second, when the value of the credit is increasing; 2. sendSlope: The discharging rate of change of credit, in bits per second, when the value of the credit is decreasing.
In Mohammadpour et al. ( 2019), a summary of the forwarding rules for the CBS is given.The main principle is pictured in Fig. 6, and briefly presented as follows: -If the transmission line is free, the scheduler transmits a frame from the highest priority class that satisfies all of these conditions: 1. its queue is not empty; 2. it has a non-negative credit.
(1) -The credit of a traffic class is reduced linearly with the rate sendSlope when the class transmits.-The credit of a traffic class increases linearly with rate idleSlope when the following conditions hold simultaneously for that class: 1. its queue is not empty; and 2. other classes are transmitting.
-Whenever a traffic class has a positive credit and its queue becomes empty, the credit is set to zero: this is called a credit reset.-If the credit is negative and the queue becomes empty, the credit increases with the rate idleSlope until the zero value is reached.
CBS permits the regulation of the maximum bandwidth that a specific traffic class can use, ensuring the stability of critical traffic over the best-effort.For this prototype, where multiple sources generate critical traffic, Qav de-bursts the lower critical traffic that can interfere with the critical flow, ensuring communication stability and predictability.The high level of customization provided by this protocol makes it an efficient tool for ensuring that communications data flows are secure while other traffic runs on the same network.Moreover, its simplicity allows us to select it as the first algorithm that our VTSN solution will provide.

Virtualization & OSes
Hypervisors allow to follow the aforementioned partitioning design principles.They enable multiple Operating Systems (OS) to execute on the same hardware platform without interfering with each other.Using hypervisors also allows to reduce costs and fully utilize all the capabilities of the hardware platforms while still guaranteeing temporal and spatial isolation.A hypervisor can indifferently run different OSes (e.g., Linux), Real-Time OSes (e.g., Erika), or even bare-metal applications.
Jailhouse ( 2015) is an open-source, Linux-based, static partitioning type-1 hypervisor whose design has been condensed to include only essential features in order to serve as an appropriate base for certifiable software.This is accomplished through a simple, yet highly effective method.The system is first booted using Linux, and then the hypervisor is loaded through a Linux Kernel Module.Moreover, static partitioning means that it does not emulate hardware resources, but rather distributes them into isolated virtual systems, known as cells, allowing for guaranteed resource access and predictable performance.This makes Jailhouse an excellent choice for safety-critical environments.
Jailhouse includes an Inter-VM Shared Memory (IVSHMEM) device that allows messages to be exchanged between virtual machines.This device is being seen by the attached virtual machines as a virtualized Peripheral Component Interconnect (PCI) device with different shared memory regions that can be used to communicate with other peers.Jailhouse defines the IVSHMEM device with some useful features, including the ability to connect up to 65,536 peers, different permissions on the shared memory regions, support for life-cycle management, event signaling via interrupts, and much more.The IVSHMEM device has a register region with five 32-bit registers, each with a specific role; e.g., the Doorbell register is a writeonly register used for triggering an interrupt into a targeted peer.Table 1 presents the usually used standard permissions of the IVSHMEM memory regions over N attached peers.
We performed a preliminary feasibility study of this mechanism in order to check if it could support the necessary 1 Gbps Ethernet connectivity.We considered Linux-based applications as well as bare-metal ones.It showed us that the IVSH-MEM device can sustain such speed in both environments.
In our TSAA design, we employ two different OSes: Linux and Erika Enterprise v3.While Linux needs no more introduction, Erika Enterprise v3 is an open-source AUTOSAR RTOS for automotive environments.It supports different architectures, such as Aurix, ARM Cortex-M, Cortex-R, and much more.Its kernel offers the essentials for setting up a multi-threading environment.It implements stack sharing techniques, semaphores, three alternative scheduling algorithms, and the OSEK Implementation Language (OIL), which is an OSEK standard for statically defining the setup of real-time applications.The main issue with Erika is its lack of Ethernet driver, we therefore need a solution through virtualization to connect it to the network.With the help of Jailhouse and its IVSHMEM device, it is possible to overcome this issue and allow the actuator running on Erika to receive the localization information.
Having such two OSes allows for performance as Linux is able to benefit from the different hardware accelerators present on the platform, i.e., the FPGA for the detection.However, Linux does not offer guarantees regarding its responsiveness.Hence, for critical cases such as actuating the driving commands, the certified Erika comes in handy and increases the safety of the whole system as well as reducing its certification time with enhanced predictability.
We therefore have Linux running alone on the ZCU102 to support the heavy computation of the sensor fusion and localization.On the ZCU106, Jailhouse is configured to embed a Linux OS for the detection, a Erika RTOS for the actuation.In consequence, multiple VMs on the ZCU106 need to access the TSNenabled network.In Leonardi et al. (2020), a key design challenge is to understand where the TSN scheduling algorithm should be executed to maintain high Real-Time Systems (2023) 59:568-608 performance without affecting the complexity of the hypervisors.Section 4 paves the way for a possible solution that locates the TSN scheduling algorithm into an additional VM which executes a bare-metal application to manage the single Ethernet port.Using a specific proxy VM also allows the Erika RTOS to receive/send data through the network as this RTOS is not shipped with an Ethernet driver.Moreover, this solution reduces costs and energy consumption since it does not require an additional Ethernet FMC card.

Middleware communication
The different features of the self-driving application are implemented on top of Robot Operating System 2 (ROS2) (Macenski et al. 2022), a distributed robotic development framework.ROS2 offers a runtime environment based on a publish/subscribe middleware referred to as Data Distribution Service (DDS).The DDS implementation chosen for this scenario is FastDDS, a free and opensource middleware developed by eProsima (2019).FastDDS is compliant with the Object Management Group (OMG) Real-Time Publish-Subscribe (RTPS) 2.2 and OMG DDS 1.4, thus providing publisher-subscriber communications over the UDP/IP protocol stack.Since a robotic system code base is inherently complex, ROS2 enhances software modularity by modeling each software element as a node.Each node implements one or more specific functionalities, e.g., reading data from a sensor, building a map of the surrounding environment, etc.Since nodes are expected to exchange data between them, ROS2 also defines the abstraction of communication channels with topics.A topic is identified by a string and a message type.Each node declares a set of publishers and subscribers.Each publisher allows the node to send messages on specific topics, while a subscriber receives the messages sent.
ROS2 provides Quality of Service (QoS) policies, allowing to tune communication between nodes.These policies can define different aspects of a communication, such as the reliability or durability of messages.The ROS2 software stack is highly scalable, especially with respect to the communication middleware; hence, it is possible to implement custom policies for network traffic management/shaping directly at the middleware/user-space level.By combining a set of QoS policies, it is possible to define a QoS profile that can be applied to specific topics.Reliability between nodes means that a publisher of a topic will not send the next message if all the registered subscribers do not acknowledge the reception of the previous one.
To fully benefit from the scalability of ROS2 and to account for the constraints of our system, each aforementioned feature of the autonomous driving application is a specific ROS2 node.Similarly, each sensor generates a specific type of data, and so, their transmission is mapped to a specific topic.Moreover, additional topics are created to allow the localization node, as well as the planning node, to communicate with the actuation node.

Self-driving software
As presented above, the self-driving software stack is composed of five main ROS2 nodes: Localization, Detection, Local planning, Actuation and Monitoring.
To localize the vehicle, the localization uses the three sensors, i.e., LiDAR, IMU and camera, for an optimum compromise between data efficiency and accuracy.However, to get the highly precise current localization of the AV, a sensor-fusion algorithm needs to merge the information from the different sensors, hence exploiting the best of both of them.To perform the sensor-fusion, we use the state-of-theart GPS-free precise localization algorithm Fast and Tightly-Coupled Sparse-Direct LiDAR-Inertial-Visual Odometry (FAST-LIVO) (Zheng et al. 2022).It is an evolution of FAST-LIO (Xu and Zhang 2021) which appends camera data to estimate the position of the vehicle.We allocate the whole ZCU102 to the localization due to the high computational power required by the algorithm to fuse sensor data and estimate the position of the vehicle.The estimated coordinates can then be sent to the actuation for maneuver computation.
The detection algorithm receives the information from the camera.Each image is processed by a neural network based on YOLO v3 which is accelerated by a Deep-Learning Processor Unit (DPU) core instantiated on the FPGA of the ZCU106.Recall that we cannot have the detection algorithm running on the ZCU102 as its FPGA is mostly utilized for the MTSN switch.
The detection and actuation are isolated and instantiated into two different VMs on the ZCU106.This isolation is performed through the virtualization provided by Jailhouse.Using the IVSHMEM feature of Jailhouse, both VMs can communicate.Due to its high-criticality properties (failing to control the vehicle can have dramatic consequences), the actuation task is encapsulated into a single-core VM and ported to Erika Enterprise v3 (Evidence 2017), enhancing safety and predictability.

Virtual TSN: a proof of concept
The VTSN solution aims to enhance TSN features for multiple VMs running on the same platform, on which the available Ethernet ports are fewer than the number of VMs.In other words, VTSN improves the predictability of data flow for multiple VMs accessing the network through a single virtually shared Ethernet port.The difficulty of such technology lies in the necessity to properly schedule the generated Ethernet frames transmitted to/from the various VMs.
To implement a VTSN solution in a system, the choice of the hypervisor plays a crucial role in achieving the desired level of predictability and security.Albeit full virtualization hypervisors, such as Xen, can be employed, our Proof-of-Concept is restricted to static partitioning hypervisors, such as Jailhouse.Static partitioning hypervisors have a smaller code base and fewer layers of abstraction, resulting in less variability, less overhead, and more deterministic behavior in terms of timing and performance.With this approach, the Ethernet port access is assigned exclusively to a single virtual machine, allowing it to handle and control network traffic without any interference from other VMs.
Our VTSN solution is based on the Inter-VM shared memory (IVSHMEM) feature, which needs to be provided by the hypervisor, as introduced in Sect. 3 for Jailhouse.Using IVSHMEM helps reducing the amount of copy operations necessary for exchanging messages between VMs.Moreover, this communication must also provide a wake-up mechanism, e.g., by sending an interrupt to the interested VM.
This solution is then composed by two main components: 1. a virtual Ethernet switch; and 2. at least, TSN protocol.
In Sect.5, we will demonstrate the effectiveness of TSN on a complex system performing in a real and simulated environment.Hence, this Proof-of-Concept focuses solely on the virtual Ethernet switching mechanism, which when combined with TSN will form a complete VTSN solution.At the time of writing, the last software brick that we are missing is a traffic scheduler that will enforce the Qav or Qbv protocols, which we argue to be important but less technically challenging as the code base exists.

Virtual switch
Figure 7 illustrates an overview of a possible system enabling VTSN.The central point of our solution is a specialized VM, hereafter named the Virtual Switch (VS), which allows other VMs to send and receive Ethernet frames to and from the TSN network connected to it.The VS is the only VM having access to the Ethernet port; therefore, any VM's traffic needs to pass through it, enabling a Fig. 7 The representation of a system with VTSN and multiple virtual machines running an applications and generating data single scheduling point.For brevity, we hereafter use VMs to refer to all VMs, excluding the VS if not specifically stated.
To enable communication between the different VMs and the VS, they are all connected to a dedicated shared memory using the IVSHMEM device.This shared memory region is divided into sections, on which different permissions are applied in order to ensure safety and security through address space isolation.As a result, a section cannot be assigned to multiple VMs, and the number of sections assigned to a VM is only limited by the memory size and other VMs requirements.This means that the Ethernet frames sent by, e.g., VM0 cannot be read or overwritten by, e.g., VM1. Figure 8 pictures the mapping layout of these shared memory regions.For each VM, we assigned a section with R/W permissions, shared with the VS.Moreover, the section itself is divided into three zones in order to exchange Ethernet frames.In the first one, we are saving the metadata of the circular buffers that handle the other two zones.The second one is used by the VS to write the receiving external Ethernet frames to the VM, while the last one is used by the VM to write the Ethernet frames that need to be sent.Obviously, VS reads from the VM's zone and vice versa.
We implemented VS as a bare-metal application using Newlib (RedHat 1999), a C standard library designed for embedded systems.Due to its traits and the communication mechanism that it provides, Jailhouse has been selected for hosting this version in a single-core VM.For implementing VS, bare-metal programming makes it easier to develop small drivers and prove the feasibility of an idea.It does not have multitasking ability, nor the presence of a scheduler, hence allowing further timing analysis and facilitating certifications.
On the VM side, we develop an Inter-VM driver in Linux (frontend).The driver is an adapted version of the ivshmem-net driver, from Jailhouse (2020), that uses the circular buffer and IVSHMEM device to communicate with the VS (backend).

Evaluations
In order to evaluate our driver's performance, we experimented the following four configurations on the ZCU102 board: 1.A Linux VM with direct access to a GEM (Conf.1); 2. VS and a Linux VM (Conf.2); 3. VS and two Linux VMs (Conf.3); 4. VS and three Linux VMs (Conf.4).
This first system is directly wired to a ZCU106 board, as shown in Fig. 9.The Linux systems are hosted by Jailhouse and built with the PetaLinux Tools (AMD Xilinx 2020), a tool that provides a set of functionalities useful to build, develop, test, and deploy an embedded environment, including a Linux distribution.Configurations 2, 3, and 4 have also been compared to a similar one where the Xen hypervisor is employed instead of Jailhouse.The setup is detailed in Sect.4.2.1.
Note that three Linux VMs are the maximum number of VMs that we can run on the ZCU102 using only the Cortex-A cluster (4 cores).
Software components versions used are summarized below.
-The Linux kernel version is the 5.15.0-xilinx-v2022.2;-The ping tool is provided by BusyBox version 1.34.1;-The iperf2 version is 2.0.13;-The Jailhouse version is 0.12; and -The Xen version is 4.13.0.

Fig. 9 A representation of the experiments setup
To measure networking latency, we use the ping tool, from iputils (2022), to trigger and measure ICMP packets round trips between the ZCU102 and the ZCU106.
In configurations 1 and 2, the Linux VM is executing ping, while in configuration 3 and 4, both Linux VMs are pinging each other simultaneously.Experiments were executed using different packet sizes-from 64 bytes up to 1024 bytes, using the powers of 2, including also 1480 bytes, i.e. the maximum size allowed by the ICMP packets.Each experiment transmits 100,000 packets.We also measured the maximum reachable bandwidth using iperf2 (Dugan et al. 2016) tests on the same previous configurations.This tool allows us to measure the maximum TCP and UDP bandwidth as well as other characteristics.iperf2 usually requires at least two entities: a server and a client.Once the server is active, the client can start sending Ethernet frames to the server.Both entities measure the bandwidth reached.In this case, we installed the server on the ZCU106 and tested the VMs as clients.Bandwidth experiments were executed for 30 s (-t option) for both TCP and UDP protocols.The bandwidth set ( − < _ > option) and used for the UDP tests is equal to 1 Gbps.

Xen configuration
Xen (The Linux Foundation 2003) is a bare-metal hypervisor that allows the creation and management of multiple virtual machines on a single machine.This hypervisor refers to virtual machines as domains.The first virtual machine to boot has privileges, referred to as Domain0(Dom0), and can manage the unprivileged domains, known as DomUs.We evaluated this hypervisor Ethernet virtualization features using a paravirtualized configuration that reflects, as much as possible, our VS.Hence, Dom0 acts as VS by granting Ethernet access to the three Linux VMs, DomU1, DomU2 and DomU3.To properly implement a static partitioning setup, like the Jailhouse's ones, we used the null scheduler, which pins each virtual CPU to a physical CPU, and we assigned one core for each virtual machine.
In order to provide Ethernet connection to the DomUs, we followed the most configuration suggested by the Xen Networking Guide (Xen Project 2018)-a software bridge in Dom0 connecting the DomUs.As a consequence, in order to have Ethernet access, virtual machines are attached to this bridge by declaring a virtual interface (vif) in their configuration files.We also tuned the networking configurations according to the Xen Performance Networking Guide (Xen Project 2014), adopting, inter alia, the recommended TCP settings for Dom0, DomU1, DomU2 and DomU3.

Results
Experiments results are cleaned up by the worst 0.002% measurements, in order to remove outliers caused by uncontrollable external factors such as hardware failures or kernel subroutines execution.We then compared the last three configurations with a Xen-based solution.
Figure 10 shows the min, average, and max latency of each configuration.For the last two configurations, we reported the measured latency per VM.In this 1 3 Real-Time Systems (2023) 59:568-608 experiment, we can see that the introduction of the VS in the system brings a slight increase in latency, but at the same time, it reduces the standard deviation, thereby improving the stability.Moreover, VS allows the communication with the network with two VMs slightly impacting the cases with small packet sizes, but remaining quite the same for larger ones.By comparing a Xen-based solution with our VS, it is clear that with VS the latencies are lower in all the cases, as shown in Fig. 11.Moreover, the worst case for our solution is still better than the best case for Xen.
Bandwidth test results obtained with iperf2 tool are shown in Fig. 12.They emphasize that our solution is not currently able to reach the maximum possible bandwidth for both protocols; however, it is better than Xen.Moreover, due to the actual round-robin schedule, the bandwidth is split correctly between the two or three running VMs.
Bandwidth measurements using iperf as client for both TCP and UDP protocols.
On the receiver side, VS is able to reach the theoretical Gigabit bandwidth for both protocols and fairly split the bandwidth between the different running VMs.Xen is always outperformed by VS, especially with the UDP protocol, where to avoid a drastic drop in performance, we had to limit the bandwidth of the iperf client with the -b option.
In conclusion, the first evaluation of the VS prototype is quite satisfactory.However, still missing TSN traffic control features, as well as detection stack from the self-driving vehicle, VS was not considered for inclusion in our real use-case scenario.

Deploying TSAA on the road
We selected, adapted and integrated a realistic autonomous driving use-case into the first iteration of Time-Sensitive Autonomous Architecture.Before proceeding with the architecture design for our use-case scenario, we provide a more detailed description of the autonomous driving system that has been deployed and tested.

Autonomous driving system
The purpose of this autonomous driving system is twofold.In the first place, while the car is being navigated manually by a person, the system has the objective to record live sensor data.The sensor data is then used by the offline mapping software to produce a high resolution point cloud map of the environment.We define this as the mapping phase.After the mapping phase is complete, the objective of the autonomous driving system is to steer the vehicle along a given reference path.We define this as the autonomous driving phase.To achieve these objectives, multiple ROS2 nodes are implemented.
During both phases, the sensor driver nodes collect sensor data from the IMU and LiDAR to make it available inside the ROS2 middleware.In the mapping phase, the sensor data is recorded as a ROS2 bag file that will be used for offline mapping, while in the autonomous driving phase the previously generated map is used by the localization node in conjunction with the sensor data to localize the vehicle in the mapped environment.The localization output is a ROS2 Odometry message with an estimate of the vehicle pose composed of a three-dimensional position and a quaternion describing the rotation state of the vehicle.This data is then passed to the high-level actuation node that keeps track of the vehicle's position and generates an adequate speed and steering value to maintain the vehicle on the desired reference path.Finally, the drive parameters are passed to the low-level actuation layer, whose job is the smoothing of the vehicle commands and their forwarding via the vehicle's CAN network to the CCU, which in turn controls the engine and steering wheel.
Here follows a more detailed description of the software solutions implemented for each phase.
The first sensing layer contains the LiDAR and IMU drivers, which transform the raw data of the sensor into ROS2 messages.LiDAR messages are published at 10 Hz rate, using an average bandwidth of 5.0 MB/s.The IMU messages, on the other hand, are published at a frequency of 400 Hz, with an average bandwidth of 160 kB/s.
In the literature, various methods have been proposed to solve the Simultaneous Localization And Mapping (SLAM) problem, like LIO-SAM (Shan et al. 2020) and FAST-LIO (Xu et al. 2022).While FAST-LIO provides a better SLAM latency, it comes at a cost: FAST-LIO does not implement any type of loop closure, so it can accumulate some long-term odometry drifts and deviations in the mapped environment.On the other hand, LIO-SAM is computationally heavier, as it implements feature extraction and loop closure methods.The feature extraction phase generally reduces noise inside the generated map, while a loop closure method helps to correct drifting issues.This makes LIO-SAM optimal for the mapping phase, as the maps generated with it depict a more realistic environment when compared against FAST-LIO generated maps.
The localization is the most complex and critical node in the autonomous driving phase since any communication delay from and to this node can lead to a higher reaction time of the actuation software.The main objective of this node is to filter the position and orientation of the vehicle in the environment.The vehicle pose is continuously estimated using the live sensor data and published to a ROS2 topic.Given the embedded nature of the project and the low computational power of the ECUs used, FAST-LIO was chosen as it provides a better localization latency.
The goal of FAST-LIO in the localization layer is to provide an accurate estimate of the vehicle odometry inside the mapped environment.To reduce the localization latency, FAST-LIO leverages iKD-trees (Cai et al. 2021), a data structure used to index map points in an efficient way.The iKD-tree structure reduces the k-nearestneighbor search time, which is the most time-consuming section of the software, thus improving the overall local registration latency.Since the IMU data come in at a much higher frequency than the LiDAR, the two measurements are integrated in an iterated Extended Kalman Filter (EKF) (He et al. 2023).The IMU data is also used in a preprocessing section to deskew the LiDAR point cloud, which improves localization accuracy.After the LIDAR measurement is integrated inside the EKF, its output state is the best estimate for the vehicle movement, so it is published as an Odometry message.To avoid publishing high covariance poses, the system does not publish in between LiDAR measurements processing, so the estimated odometry will be published at the same LiDAR frequency (10 Hz).These odometry messages published by FAST-LIO use an average bandwidth of 7.30 kB/s.
After the pose of the vehicle is estimated by the localization node, it is received by the high level actuation node, which determines the command to send to the CCU, through the CAN bus, in order to maintain the predefined trajectory.As described in Sect.3, the actuation node has high-criticality and has been ported into a single-core virtual machine running the RTOS Erika Enterprise v3 accentuating the isolation properties gained by using a hypervisor.However, Erika Enterprise v3 does not support ROS2, hence, in order to receive the Odometry messages, we opted to implement a node running in another VM.This node, hereafter referred to as ProxyROS2, is responsible for extracting the coordinates of the vehicle from the received Odometry messages and forwarding them to Erika Enterprise V3 using the Inter-VM Shared Memory communication.Since we also want to monitor the actuation data, ProxyROS2 creates ROS2 messages with the actuation data, received using the same mechanism, and sends them to adx_gui.This led us to choose the ZCU106 for the global planner, too.The virtualized environment is then divided into two virtual machines: 1. Erika on one core, as previously described; and 2. the global planner and the ProxyROS2 on three cores running a Linux-based distribution.
The actuation node is composed of three parts: 1.The control algorithm; 2. a PID controller; and 3. a CAN driver.

3
Real-Time Systems (2023) 59:568-608 The control algorithm used to calculate the driving parameters is based on the regulated pure pursuit approach (Coulter 1992).In order to compute the necessary driving parameters, the node needs to be initialized with some vehicle-specific parameters like the wheelbase (the distance between the front and rear axles) and the maximum steering angle of the wheels.The pure pursuit node is also fed with a set of ordered waypoints that represent the reference path to follow.Each waypoint is composed of a three-dimensional position and a linear speed that should be kept around that waypoint.These waypoints are used at initialization time to calculate an interpolation function which will be used during the execution to interpolate the required points in the reference path.
Each time the pure pursuit node receives a pose, it geometrically calculates the steering angle required to keep the vehicle on the reference path.This is done by finding a point a certain number of meters ahead in the reference path (the lookahead distance) and by geometrically calculating the optimal steering angle to reach that point.This look-ahead distance is adjusted based on the current vehicle speed.The output steering value is then combined with the desired speed contained in the reference path at the current position to build the drive parameters.
Once the drive parameters (desired speed and steering angle) are computed, a simple PID controller is used to smooth the driving parameters.This step is done to avoid abrupt changes in the speed of the vehicle.The PID receives the vehicle odometry data from the CCU via the CAN network.This odometry is computed by the CCU by using the live data from the engine and steering wheel.This odometry is used by the PID in combination with the drive parameters received from the pure pursuit to compute a new set of smooth drive parameters.These are relayed to the CCU via the CAN network, which finally actuates the command by controlling the engine speed and steering angle of the vehicle.
In order to monitor the status of the nodes during execution and for safety reasons, two visualization nodes are presented.To monitor the general status of the mapping and localization, we deployed RViz, the standard ROS2 GUI.To plot and monitor the driving parameters and real-time actuation data, a custom GUI named adx_gui was implemented.This GUI is also used as a safety system to manually enable or disable the CCU in case of any issue with the navigation software stack.The two GUIs are meant to be used together to monitor the status of the vehicle and act as a general switchboard for the entire system.
With this autonomous driving system, we want to demonstrate the importance of TSN-enabled communications in real-time applications using TSAA-involved technologies.

Iteration description
Figure 13 shows how the components and nodes were placed in this first iteration and the involved data flows.This iteration is composed of different communication mechanisms.The ZCU102 receives LiDAR sensor data over a gigabit Ethernet cable, while the IMU sensor data is collected over a USB connection.Moreover, the communication with the TSN switch is managed by different GEMs, granting better isolation between flows.The MTSN setup used for this use-case is shown in Fig. 4.While on the ZCU106, there is an Inter-VM Shared Memory communication to exchange messages between the two virtual machines and a CAN bus that communicates with the vehicle CCU.Furthermore, in this iteration, we used only the A53-cluster on both boards.
The data flows involved in this system are shown in Fig. 13, where numbers match between the figure and the following enumeration (green and red numbers are respectively for the start and end points of each data flow): 1.The LiDAR produces data and sends it via an Ethernet cable connected to one of the FMC switch ports.Due to hardware limitations, this sensor sends broadcast data; 2. Once the LiDAR and IMU's data are received, the localization node uses them to calculate the car's pose.The pose is then published to the topic shared with the ProxyROS2 node.The latter simply forwards the pose to the actuation node using the IVSHMEM device and a cache-aligned circular buffer.In this case, due to the low frequency required for this communication and its high criticality, the shared topic between the localization and the ProxyROS2 uses the default Quality of Service which ensures reliability; 3. The same calculated pose described in flow 2 is sent to RViz for the visualization.
In this case, since the monitoring system is not critical, we use a different topic and QoS, which is not reliable.This ensures that the localization node will not stay in a waiting state for missing acknowledgements from this flow; 4. Starting from the car CCU, the actuation data are being propagated in the system until they reach adx_gui node, where they will be displayed; 5.As already explained, adx_gui is also capable of enabling or disabling the CCU.
As a result, this flow has a high criticality but is activated only in emergency; and 6.To demonstrate the benefit of TSN, we introduce this interference flow.Without TSN, this flow will introduce an ever-growing delay to messages received by the actuation.Since the localization node uses a shared topic with the actuation node and uses the same connection, the introduction of this interference flow creates noises on the Ethernet cable between the ZCU102 and ZCU106.
The complete flow number 2 (from the localization to the actuation) is the most critical of the system because any interruption or even too large delays can leave the autonomous vehicle unable to localize itself in the environment, leading to unexpected behaviors or dramatic consequences.
Due to their criticality level, Qav parameters, i.e., priority and idleSlope, must be set so that they will privilege flow 2 and limit flow 6.Hence, flow 2 must have a higher priority and bandwidth than flow 6.
The advised reader may note that in flow 2, the ProxyROS2 node has to forward the pose to the actuation node located on the RTOS.This indirection necessitates an extra overhead.To avoid such a design, it is necessary to first have a TSN driver available for the second VM OS (here, Erika RTOS) and, second, to allow multiple VMs to access a single Ethernet port.As Erika does not provide such a driver and Jailhouse does not allow this configuration, a more evolved solution would be to virtualize the Ethernet traffic, which summarizes our first contribution in Sect. 4. Furthermore, we could use the VTSN to prioritize the acknowledgement flows.
Another aspect that needs to be considered is the interference flow.We have selected an additional laptop connected to the ZCU102 to generate the flow 6 in order to minimize the impact on the system.
For safety reasons, this system is previously tested in a Hardware-In-The-Loop (HiL) setup using an open-source simulator for autonomous driving named CAR Learning to Act (CARLA) (Dosovitskiy et al. 2017).CARLA is based on Unreal Engine 4, a well-known 3D graphics engine.This simulator provides multiple features needed to test autonomous driving systems.
In the first place, CARLA offers many open assets, like vehicle models and readymade urban layouts.Secondly, CARLA supports the implementation of a CAN bus interface simulator; therefore, we developed a new CAN bus interface that simulates the CCU of the real vehicle.Finally, using the carla-ros-bridge package, the simulator can be directly interfaced to the ROS2 framework, on which our autonomous driving stack already runs.
We have tried our best to simulate the real environment as much as possible, nonetheless, we want to advise the reader that, albeit CARLA simulator is a comprehensive tool for simulating the real-world scenarios, it can generate a partial point cloud, and it is not currently able to correctly simulate all the real components, e.g., the generated LiDAR messages do not include all the data fields, as documented in this issue from the source repository of the previous cited package (ROS-Bridge Community 2020).Keeping this in mind, the difference between the software stack on the vehicle and the one on the simulator relies on the LiDAR and IMU sensors and driver nodes, which are not used since the sensor data is directly published by the simulator bridge as ROS2 topics.Moreover, the IMU and LiDAR send data via the same Ethernet cable and work at a frequency of 33 Hz and 14 Hz, respectively.These frequencies allowed us to prioritize the generation of a complete LiDAR point cloud.
The purpose of the Hardware-In-The-Loop setup is twofold: 1.It improves safety in the early testing phases since bugs in the software can be found and replicated without a real vehicle; 2. It enables the functionality and performance testing of the system without the long setup times of the sensors and ECUs on the vehicle; and 3.It reduces the costs and speeds up the researcher.Some of the CARLA experiments are presented in Sect.5.3.Figure 19 depicts the simulation environment data flows (without considering the VS solution).
Only when the system reaches a certain level of stability is it possible to proceed with the real context, where all the necessary equipment is mounted in the Maserati Quattroporte.Both ECUs are placed in the trunk and powered by the car's power supply system.Moreover, in order to control the system or launch/ stop ROS2 nodes, two UART cables (one for the ZCU102 and one for the ZCU106) are passed through the seats and connected to the laptops (to facilitate the control of them, one for each board), so that it will be possible to use serial communications to address the command prompts.We also configure the IP addresses and VLANs on the ZCU102, ZCU106, and the two laptops, in order to use the TSN properties on the aforementioned flows.
The first testing phase requires checking that everything is properly mounted using preliminary tests such as sensor acknowledgement and connection checks.Once the preliminary checks are completed, the mapping phase starts.In this phase, the vehicle is driven manually around the test environment.This phase has two goals: one is to create a map of the surrounding environment, as described in Sect.5.1.Secondly, the reference trajectory is recorded as a set of ordered waypoints, which will be followed later on in the autonomous driving phase.Once the mapping phase is successfully completed the car is back at the starting point, and the reference trajectory gets saved and sent to the monitoring software.
If the mapping phase is successfully completed the autonomous driving phase can start.Before enabling TSN properties and the interference component, a first round (five laps of trajectory) is completed.This first autonomous run serves as the baseline and is hereafter referred to as Scenario A. We then activate the interference flow without activating TSN, and perform a second run, referred to as Scenario B. For the subsequent runs, we keep the interference flow activated and enable TSN, allowing us to sharpen the actual effect of the activation of TSN.
At this point, an iterative experiment phase starts with the following actions while varying the TSN parameters: 1. Manually drive the car to the starting point; 2. Enable autonomous driving system ROS2 nodes; 3. Load the path and start the Jailhouse cell containing Erika; 4. Configure the Qav parameters, priority and idleSlope for both VLANs; The previous phases and steps are equal in the simulation environment, too.
The experiments were considered successful if at least five laps were completed, a failure otherwise.We evaluated different scenarios in the simulator environment; however, since experimenting in the real vehicle is expensive (at least 3 people are required), in that case, we have selected and evaluated five meaningful scenarios.Consequently, we will show these five scenarios for both environments.
Table 2 shows some of the Qav parameters values that were used for the experiments; these values were conveniently selected in order to evaluate the impact on the system.Moreover, the last line of the table reports the result of the corresponding test case.The scenarios A and B are executed without the usage of the TSN protocol Qav.The scenarios C and D privilege the localization flow by assigning a high priority and high idleSlope compared to the interference flow, while the scenario E privileges the interference flow instead of the localization one.
By cross-checking parameters and results, it is clear that when the localization and interference flows are running uncontrolled, the autonomous driving system is not able to achieve the goal of 5 laps, although with Qav, it is able to localize even in the scenario E, where we assign the lowest idleSlope and priority values to the localization flow and the highest idleSlope and priority values for the  Next sections present and discuss the obtained results in the simulated environment as well as the real environment.

Simulated environment results
In the simulation environment, Scenario B was not able to run even a single lap, not allowing us to plot it.Figure 14 shows the difference in terms of accuracy of the Scenario E and Scenario C with the Scenario A. It emphasizes the efficiency of the Qav protocol when we prioritize the localization flow instead of the interference one.
Since the time synchronization between the two Ultrascale+ is not yet accomplished, we have evaluated the tardiness between the publishing of the Odometry messages on the FAST-LIO node and the tardiness of their reception on the Proxy-ROS2 node.The tardiness is then calculated as follows: where the sequence t i represents the observed time while T the expected period.
(3) In Scenario E, we can see a strange behavior: the median value of the tardiness of the ProxyROS2 node is less than 0. This behavior can be attributed to the default Quality of Service used for the communication between FAST-LIO and a 100-size buffer for the messages.This buffer is emptied each time an acknowledgement of the previous sent message is received.As a consequence, two consecutive messages in the ProxyROS2 node can be received really close to each other, causing a reduction of the tardiness median value.

Real environment results
All experiments in the real vehicle were performed on the public parking lot of the Department of Engineering at the Università degli studi di Modena e Reggio Emilia, which was controlled (no jumping pedestrians) and populated with some parked cars.As a consequence, due to the limited space, we could not test really different paths; however, we have tested and evaluated the results on the same parking lot in counterclockwise and clockwise senses, as shown respectively Figs. 16 and 17.
In all our tests, the designed trajectories have an oval-form and since we had to perform them for the two senses due to the presence of the already-parked cars, they can present slight differences.These figures illustrate only the last executed lap.
Due to the lower speed of the vehicle in the experiments conducted in the real environment, the car was able to execute some laps in Scenario B, allowing us to plot them.Hence, in both senses, we can clearly see the loss of accuracy in Scenario B, where the interference flow is not being controlled by the TSN protocol and the exact moment we had to stop the vehicle.Once again, the results demonstrate the efficiency of the TSN protocol, and moreover, we can see the behavior of the reliability of the involved Quality of Service for the communication between the FAST-LIO and the ProxyROS2 nodes, since in scenarios A, C, D, and E, the box plots are very similar.However, the uncontrolled Fig. 16 The poses of the worst and best case scenarios for the clockwise path, compared with Scenario A (dotted black line) Fig. 17 The poses of the worst and best case scenarios for the counterclockwise path, compared with Scenario A (dotted black line) interference flow denies the reliability property of the QoS in Scenario B.Moreover, Scenario B presents a significant amount of packet loss, which is more than 35% between the two nodes in both senses.
By comparing scenarios A and C, the applied TSN protocol reduces the tardiness since the localization flow has a higher priority than flow 1 and flow 6.From the difference in the tardiness between the scenarios C, D, and E, it is clear that by assigning a greater idleSlope to the interference flow, or even both priority and idleSlope, the resulting tardiness increases as well.In this case, Scenario E is not experiencing the same behavior as the simulated environment due to the correct frequency of the LiDAR (10Hz).In this section, we explain the current status of the Time-Sensitive Autonomous Architecture prototype.At the current state, we do not have a way to extract the necessary content of the Odometry messages in VS or in the RTOS, leading to the use of Proxy-ROS2.We are evaluating the possibility of including micro-ROS (2022) in the RTOS that will use the IVSHMEM device as a custom transport layer.micro-ROS is an optimized implementation for microcontrollers that provides a ROS2 framework.
The VS solution has reached a reasonable level of stability and performance; we evaluated the tardiness in CARLA, hereafter referred to as Scenario F. This scenario was able to complete all the required five laps.
Figure 19 pictures the simulation environment the data flows.Once again, the involved flows are equal to the Sect.5.2, the only difference is the introduction of the additional communication (VS ⟷ ProxyROS2).
Figure 20 depicts that VS introduces an overhead; however, the resulting standard deviation is reduced.This is by virtue of the simpler and smoother environment used for the development of VS, which does not involve less predictable kernel subroutines.

Conclusion
The first part of this paper presents a reference architecture for next-generation autonomous systems that involves a partitioning hypervisor for enabling mixedcriticality and heterogeneous OSes (including RTOS) and real-time networking with TSN protocols.
The second part paves the way towards the deployment of VTSN-based design.We demonstrated the applicability of virtualized communication on the one hand and the applicability of TSN on the other hand by reaching performance levels that Real-Time Systems (2023) 59:568-608 make it an adoptable solution for enabling TSN on such platforms where the number of available Ethernet ports is limited.In the future, we plan to combine these two technologies to finally provide a complete VTSN.Moreover, different aspects can be improved, e.g., reduce the number of copy operations for exchanging frames between VMs, manage the life-cycle of VMs, increase the stability of the drivers as well as the performance, use more than three VMs, add one of the TSN protocols starting from IEEE 802.1Qav and port it into the R5-cluster.We believe that the VTSN mechanism will be integrated into TSAA as a possible way to manage mixedcriticality data flows, guaranteeing high determinism, as shown in Fig. 2. We also plan to use the same technique to develop other kinds of autonomous systems, such as robotics in industry, e.g., (Čech et al. 2022).
The third section presents a first iteration of the proposed architecture in both a simulated and a more relevant real-world context.The architecture has been engineered, tested, and deployed with a complex autonomous vehicle application running in a dual-zone environment.We proved the feasibility of TSAA and the increase in stability in communications governed by TSN protocols.
Minerva Systems, Hipert, and Università degli studi di Modena e Reggio Emilia are still working together in order to accomplish the complete TSAA architecture, where a more complex autonomous driving system that includes the detection and uses a more robust localization algorithm, will be instantiated in the car and it will use physical and virtual TSN.Benjamin Rouxel received his M.Sc.degree (2015) and a Ph.D. degree (2018) in Computer Science, both from the University of Rennes 1 in France.After more than 2 years as a Post-doctorat at the University of Amsterdam, Netherlands, he is currently a Post-doctorat at the University of Modena and Reggio Emilia, Italy, in the HiPeRT Lab group.His research interests include, but are not limited to, Real-Time Systems from design concepts to concrete implementations with the addition of energy efficiency, security, and he is starting to gain interest in neural networks.

Fig. 1
Fig. 1 A picture of Maserati Quattroporte used for the experiments

Fig
Fig. 8 A representation of the IVSHMEM mapping layout

Fig. 10
Fig. 10Ping results over 100,000 packets using an interval of 1 ms between two consecutive ICMP messages (ping -i option).The configurations referenced as Conf.X correspond to the configurations listed in Sect.4.2

Fig. 11
Fig. 11 Ping results over 100,000 packets using an interval of 1 ms between consecutive ICMP messages (ping -i option).The figure compares the Xen-based solution (DomUX) and our VS.The configurations referenced as Conf.X correspond to the configurations listed in Sect.4.2

Fig. 12
Fig. 12 The figure illustrates the achieved bandwidth in various configurations: a non-virtualized environment, virtual machines with direct GEM access on respective hypervisors (Conf.1),and two solutions utilizing VS as the backend and Dom0 in a Xen-based setup (Conf.2 to Conf.4)

Fig. 13
Fig. 13 System design and its data flows.The green circles represent the flows starting point, while the red squares their conclusions (Color figure online) iperf as server on the ZCU106's Linux (2nd and subsequent runs); 6. Use adx_gui to monitor the actuation and CAN communication; 7. Run iperf as client on the interference laptop's Linux; 8. Let the car drive, meanwhile: (a) Wait for the completion of the five laps; OR (b) Interrupt if the localization has been lost or the car is about to crash due to the latency in the communication of the odometry; 9. Restart from 1.
interference flow.As documented in the standard definition of this protocol by The Institute of Electrical and Electronics Engineers (IEEE) (2010), the transmission of the highest priority flow can be delayed by the maximum Ethernet frame size supported by the MAC, and, thanks to the low bandwidth necessary to exchange the Odometry messages, this allows the vehicle to locate under such circumstances.

Fig. 14
Fig. 14 The poses accuracy of the worst and best scenarios in CARLA compared with Scenario A (dotted black line)

Fig. 15
Fig. 15 Tardiness of the FAST-LIO and ProxyROS2 nodes in CARLA

Fig. 18
Fig. 18 Tardiness of the FAST-LIO and ProxyROS2 nodes

Fig. 20
Fig. 20 Tardiness of the FAST-LIO and ProxyROS2 nodes in CARLA Systems (2023) 59:568-608 Donato Ferraro is an industrial Ph.D. student in Computer and Data Science for Technological and Social Innovation at the University of Modena and Reggio Emilia.Currently working with Minerva Systems, his research interests revolve around safety, Real-Time, and embedded systems.He is also gaining experience in capability-based operating systems.Luca Palazzi is a master student in Computer Science at the University of Modena and Reggio Emilia, and he is working at Minerva Systems.His research is focused on hypervisors and networking for embedded systems.Federico Gavioli is a master student in Computer Science at the University of Modena and Reggio Emilia.His main research interests are in high performance reconfigurable systems, with a particular focus on hardware-software co-design for autonomous driving applications.Michele Guzzinati obtained his B.Sc. degree in Computer Science from the University of Modena and Reggio Emilia.He is currently employed at Hipert Srl, a spin-off of the University of Modena and Reggio Emilia.His research focuses on Robot Navigation and Autonomous Driving on Embedded and Real-Time Systems.Andrea Bernardi is a computer scientist with expertise in high-performance computing platforms, autonomous systems, perception, and software engineering.He holds a B.Sc. in Computer Science from the University of Modena and Reggio Emilia.Currently serves as a software engineer at Hipert Srl.His research and practical experience contribute to the advancement of these fields.

Table 2
Tested scenarios and their Qav parameter values and corresponding results in both simulated and real environment LF localization flow, IF interference flow.✓ : success run with five complete laps.: the car was not able to complete all the five laps a Qav not enabled