HEPSYCODE-RT: a Real-Time Extension for an ESL HW/SW Co-Design Methodology

This work focuses on the definition of a methodology for handling embedded real-time applications, starting from an existing HW/SW co-design methodology able to support the design of dedicated heterogeneous parallel systems. The state-of-the-art related to similar tools and methodologies is presented and the reference framework with the proposed extension to the realtime world is introduced. A case study is then described to show a design space exploration able to consider such an extension.


INTRODUCTION
During the last years, the spread and importance of embedded systems are increasing but it is still not yet possible to completely engineer their system-level design flow. Designers commonly adopt one or more system-level models (e.g. block diagrams, UML, SystemC, etc.) to have a complete problem view, to perform a check on HW/SW resources allocation and to validate Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. RAPIDO the design by simulating the system behavior. In this scenario, SW tools to support designers to reduce costs and overall complexity of systems development are even more of fundamental importance. Unfortunately, there are no fully engineered general methodologies defined for this purpose and often the best option is still to refer to experienced designer indications to take advantage of empirical criteria and qualitative assessments. For example, systems based on heterogeneous multi-processor architectures (Heterogeneous Multi-Processor Systems, HMPS) have been recently exploited for a wide range of application domains, especially in the System-on-Chip (SoC) form factor (e.g. [2]). In particular, such architectures are often used to implement Dedicated Systems (DS), i.e. digital electronic systems with an application-specific HW/SW architecture designed to satisfy a priori known application with F/NF requirements. In such a case, they are called Dedicated Heterogeneous Multi-Processor Systems (D-HMPS). D-HMPS are so complex that the adopted HW/SW Co-Design Methodology plays a major role in determining the success of a product. The situation is even worse if the considered system is a hard/soft real-time one. In such a case, time constraints are normally defined considering the worst possible case (hard) or an average situation (soft).
In such a context, this work focuses on the definition of a HW/SW co-design methodology and the development of a related prototypal tool to improve the design time of embedded real-time applications. Specifically, the whole framework drives the designer from an Electronic System-Level (ESL) behavioral model, with related NF requirements, including real-time ones, to the final HW/SW implementation, considering specific HW technologies, scheduling policies and Inter-Process Communication (IPC) mechanisms. The remainder of the paper is organized as follows: Section II provides an overview of HW/SW co-design tools related to embedded real-time computing systems. Section III describes the reference HW/SW Co-Design methodology, whereas Section IV discusses the extension to adapt it to the real-time world. Section V presents a case study that shows a design space exploration able to consider such an extension. Finally, Section VI reports some conclusive considerations and presents future works.

HW/SW CO-DESIGN OF REAL-TIME EMBEDDED SYSTEMS
A remarkable number of research works have focused on the system-level HW/SW co-design of D-HMPS [3]. In such a context, the most critical steps are always related to the System Specification and the Design Space Exploration (DSE) activities [4]. The main differences between the approaches are related to the amount of information and actions explicitly requested to the designer and influenced by his experience. In particular, many approaches (especially those based on the Y-Chart principle [5]) explicitly require as an input the HW architecture to be considered for mapping purposes. Other works [6] aims to the problem of designing embedded real-time systems starting from the input/output constraints on the final implementation. Offline schedulability and feasibility analysis involve different research works [7] [8], with respect to the correct algorithms that can guarantee optimality depending on the load parameters. To analyze the behavior of a system, many tools have been developed to evaluate/estimate timing parameters, to validate scheduling and to perform simulations. In such a domain, the work presented in [9] starts from three sub-models, considering a model for SW application (Platform Independent Model) on one side and a platform (Platform Description Model) on the other side, and both models are connected by a Platform Specific Model that defines the mapping of SW into HW. By exploiting a specific extension for DSE and performance evaluation [10], in order to consider non-functional properties such as real-time, power, temperature, reliability constraints and so on, the tool offers different simulation and estimation outputs that drive the designer from the system-level model to the final implementation.
With respect to works that heavily relies on Model of Computations (MoC) theory, ForSyDe (Formal System Design) [11] is a methodology for modeling and design heterogeneous embedded and cyber-physical systems. The starting application is modeled by a network of processes interconnected by signals. Then, the model is refined by different design transformations into a target implementation language.
An interesting academic tool is SynDEx [12], a system level EDA tool based on the Algorithm-Architecture Adequation (AAA) methodology intended to find implementation solution, under real-time constraints, for embedded applications onto multicomponent HW/SW architectures.
Finally, to have a look also to a SystemC-based commercial product, it is worth noting to cite Intel CoFluent [13] as a promising system-level modeling and simulation environment. Other than the model of the system behavior, it explicitly requires a manual modeling of both the hardware architecture and the mapping.
So, at the best of our knowledge, there are very few systemlevel HW/SW co-design methodologies that try to fully address the problem of both "automatically suggest an HW/SW partitioning of the system specification" and "map the partitioned entities onto an automatically defined heterogeneous multiprocessor architecture" while considering also real-time constraints.

HW/SW CO-DESIGN FRAMEWORK
In the context of embedded real-time systems design, this work starts from a specific framework (called HEPSYCODE: HW/SW Co-Design of Heterogeneous Parallel Dedicated Systems) [14], based on an existing System-Level HW/SW Co-Design Methodology [15][19] [21], and introduces the possibility to specify real-time requirements in the set of non-functional ones (the new framework is so called HEPSYCODE-RT). The main items composing such a methodology and its extension are discussed in the next paragraphs, while the reference ESL HW/SW Co-Design Flow is shown in Fig. 1.

Modeling Language
The system behavior modeling language introduced in HEPSYCODE-RT, named HML (HEPSY Modeling Language) [17], is based on the Communicating Sequential Processes (CSP) MoC [16]. It allows modeling the behavior of the system as a network of processes communicating through unidirectional synchronous channels. By means of HML it is possible to specify the System Behavior Model (SBM), an executable model of the system behavior, a set of Non-Functional constraints (NFC) and a set of Reference Inputs (RI) to be used for simulation-based activities. It is worth noting that another HEPSYCODE extension able to exploit more formal approaches is currently under development [17].
In particular SBM = {PS, CH} is a CSP-based executable model of the system behavior that explicitly defines communication among processes (PS) using unidirectional pointto-point blocking channels (CH) for data exchange. PS = {ps1, ps2, .. , psn} is a set of concurrent processes that communicate with each others exclusively by means of channels and use only local variables. Each process is described by means of a sequence of statements by using a suitable modeling language. Each process can have a priority p: 1 (lower) to 100 (higher) imposed by the designer. The concept of statement has to be fixed once selected a proper specification/modeling language. Languages suitable to describe CSP are SystemC (chosen for this work), OCCAM, Handel-C, ADA and so on. More abstract languages are UML, SysML, Simulink and so on. CH = {ch1, ch2, .. , chn} is a set of channels where each channel is characterized by source and destination processes, and some details (i.e. size, type) about transferred data. Each channel can have also a priority p: 1 (lower) to 100 (higher) imposed by the designer.
RI: {(i1 ,o1), …, (in ,on)} is a set of inputs (possibly timed), representative as much as possible of typical operating conditions of the system, and related expected outputs to be used for analysis and simulation-based validation.
The Non-Functional Constraints (NFC) are composed of Timing Constraints (TC), Architectural Constraints (AC) and Scheduling Directives. Two different TC can be considered by the designer: Time-to-Completion (TTC), unique and related to the whole SBM, is the time available to complete the SBM execution from the first input trigger to the complete output generation; Time-to-Reaction (TTR) is a set of real-time constraints related to the time available for the execution of leaf CSP processes (i.e. the time available to execute the statements inside an input/output pair that delimits the CPS process main body, see Fig. 2). Different leaf processes can have different associated TTR. This real-time constraints are not strictly related to classical RT requirements, but impose a timing bound to the execution of some specific processes. Both TTC and TTR constraints shall be satisfied by each element of RI.

Technologies Library and Basic Blocks
The target HW architectures is composed of different basic HW components. These components are collected into a Technologies Library (TL). TL can be considered as a generic "database" that provides the characterization of the available technologies. EIL elements are characterized by some parameters related to bandwidth, number of connectable items and concurrency properties.
The designer will use such components to build a set of Basic Blocks (BB). So, BB = {bb1, bb2, .. , bbb} is the set of BB available during DSE step to automatically define the HW architecture. A generic BB is composed of a set of PU, a set of MU and a Communication Unit (CU). CU represents the set of EIL that can be managed by a BB. BB internal architecture is dependent on TFF and TTA. In particular, each BB element can be generally composed of 1 or more PU elements, some MU elements and 1 CU element. BB elements are the ones effectively taken as input by the system-level flow for analysis, estimations and DSE steps. So, the target HW architecture can be seen as a set of BB elements interconnected by means of one or more EIL elements.

ESL HW/SW Co-Design Flow
The first step of the adopted co-design flow is the Functional Simulation where SBM is simulated to check its correctness with respect to RI. Then, the next step aims at extracting as much as possible information about the system by analyzing the SBM while considering the available BB. This step is supported by Co-Analysis and Co-Estimation activities to evaluate/estimate several metrics related to the BB involved in the design flow.
Co-Analysis performs evaluation of two metrics. The first one is called Affinity [19]. Co-estimation performs two kinds of estimations: Static Estimations of Timing and Size, and Dynamic Estimations of Load and Bandwidth. The Timing metric is the number of clock cycles needed to execute each statement j of each process psi by means of each processor k in the available BB, with k=1..n. The goal is to estimate how many clock cycles are needed by a specific BB to execute the implementation of a specific statement (e.g. [18] and [20] presents two possible approaches).
Size is a set of estimations for each statement of each process with respect to each available processor. It is related to bytes or area/resources metrics depending on SW or HW implementations. L is the Load (i.e. the processor utilization percentage) that each process would impose to each not-SPP processor to satisfy imposed TTC/TTR timing constraints (see Section IV). Finally the Bandwidth (B) is the number of bits sent/received over each channel (i.e. bits exchanged by communicating processes pairs in PS) during an interval of time equal to TTC.
After this steps, the reference co-design flow reaches the DSE step (as shown in Figure 3). It includes two iterative activities: "HW/SW Partitioning, Mapping and Architecture Definition", based on a genetic algorithm that allows to explore the design space looking for feasible mapping/architecture items suitable to satisfy imposed constraints; "Timing Co-Simulation", that considers suggested mapping/architecture items to actually check for timing constraints satisfaction. When the mapping/architecture item proposed by the DSE step is acceptable, it is possible to proceed with system implementation (i.e. Algorithm-Level Flow).

HEPSYCODE-RT: PROPOSED EXTENSION
With respect to NF requirements, this work provides an extension that allow the methodology to better consider architectural and timing constraints. Related to the SBM model, it is now possible to identify two classes of CSP processes: classical CSP process and real-time CSP processes. In the current version, the last ones shall be leaf processes and their body (i.e. a never-ending loop) shall start with a channel read and end with a channel write towards the same process. To such input/output pair will be referred the TTR constraint. Moreover, in such a context, a CSP to Task Model transformation has been defined to allow considering classical real-time world notations. Such a transformation involves concepts related to both processes and channels.

Figure 4: Time-To-Reaction Constrain.
The general transformation is shown in Figure 4. In this example the CSP SBM model is first expanded in a Process Interaction Model (PIM), where the processes A and B are split into different pieces of code, delimited by channel calls. The final transformation starts from the PIM model and associates the single pieces of code to specific tasks in the classical task representation models (i.e. Process-Task Model, PTM). At this time, the designer should write a SBM avoiding cycles to match the classical real-time DAG representation of tasks. With respect to the real-time CSP processes, the actual transformation is the one shown in Figure 5. With this specific kind of representation it is possible to consider concurrently timing constraints related to the whole SBM (TTC) and real-time constraints related to the reaction of specific processes (TTR) while considering periodic leaf processes as periodic ones.
With this assumption, it is possible to adapt the Load Estimation step to consider such real-time constraints. In particular, the load can be defined in two different ways.
The Load Li that each non real-time process psi would impose to each non-SPP processor s to satisfy TTC. Li is estimated by allocating all the n processes to a single-instance of each software where ⁄ is the average period of each processes on processor puj. By imposing that the execution time shall be equal to TTC, it is possible to evaluate the Load Li that processes psi would impose to the SW processor to satisfy TTC itself. In fact, setting FRTj equal to TTC, for each process/processor pair, such as: The value of estimated Load Li that the system imposes to processor puj to satisfy TTC is: The Load Li that each real-time process psi would impose to each s software processor to satisfy input real-time constrain TTRi is directly set equal to: TTRi is the real-time constraint related to the process psi. In this way it is possible to consider two different situations: Hard real-time process, if ti < TTRi, the constraints are fulfilled and it is possible to consider the value Li as an input to the DSE step; Soft real-time process, if ti < (TTRi + δ(t)), then constraints could be considered as soft real-time ones.
Then, thanks to all the estimated TTC/TTR loads, it is possible to perform DSE step in order to fulfill also RT constraints. Moreover, an additional architectural constraint deriving from TTR is that non-SPP processors executing real-time processes have to adopt a scheduling policy suitable for real-time scheduling (e.g. fixed-priority preemptive scheduling). Finally, the effect of such scheduling policy shall be considered during the timing cosimulations performed to validate the proposed solutions.

CASE STUDY
This section presents a simple case study used to show the effects of the proposed real-time extension to HEPSYCODE.

Figure 6: CSP MoC Example
The reference SBM is shown in Figure 6, where the processes PS = {ps1, .. , ps4}, with priority of {ps1, ps2, ps4} equal to each other and priority of ps3 higher than the others, exchange data using the channels CH = {ch1, .. , ch7}. In this scenario there are three non real-time processes {ps1, ps2, ps4} and one process {ps3} with real-time constraint equal to TTR3. The whole SBM is also subject to a TTC. So, for a given processor puj, the load parameters for the four processes are: For each BB is allowed maximum 1 instance and they are supposed to communicate by means of a shared bus. Moreover, each SW-PU uses a Fixed Priority preemptive scheduling algorithm. Results shown in Table 1 figure out as the DSE step with real-time extension is able to satisfy TTC/TTR constraints, at least with respect to timing simulations. In particular, by setting TTR and decreasing TTC, DSE suggests solutions that fulfil the timing requirements most of the time (two not satisfactory suggestions are underlined in Table 1). Decreasing the TTR, the DSE suggests to allocate the real-time process on puj that fulfil the constraints. It is worth noting that, if the TTR is very strict, the only valid mapping involve the use of a more expensive FPGA.

CONCLUSIONS
This work has proposed an extended Electronic Design Automation (EDA) methodology (and related tools) in the ESL domain supporting the development of Real-time Embedded Systems. The final result is a methodology able to support realtime systems developments by suggesting both the platform and mapping solutions for the specific application. Future works will involve the introduction of other parameters associated to PU such as Power (peak power [W] or other metrics) and Energy. Others analysis, use cases and tests will be done in future, but starting from this preliminary results it is easy to note that the DSE step with load estimation and real-time extension seem to be quite effective with respect to execution times estimated by simulation. Validation on the final HW/SW implementation must be done in future to reduce errors at design time.