# PERFORMANT AND FLEXIBLE ON-BOARD PROCESSING MODULES USING RECONFIGURABLE FPGAS

Björn Fiethe<sup>1</sup>, Lei Jia<sup>1</sup>, Harald Michalik<sup>1</sup>, Jamin Naghmouchi<sup>1,2</sup>

<sup>1</sup>IDA, TU Braunschweig, Hans-Sommer-Str. 66, 38106 Braunschweig, Germany <sup>2</sup>iTUBS mbH, Wilhelmsgarten 3, 38100 Braunschweig, Germany *E-mail: fiethe@ida.ing.tu-bs.de* 

### ABSTRACT

Current and future space missions demand sophisticated on-board data processing functionalities, while low resources consumption remains a constraint. Using inflight dynamically reconfigurable FPGAs allows enhancement of on-board processing with unprecedented levels of flexibility, enabling the adaptation of the system regarding functional and faulttolerance requirements, subjects to change during mission lifetime. After having demonstrated the usage of in-flight reconfigurability for SRAM-based FPGAs for the PHI instrument on Solar Orbiter (SO/PHI), we have developed a universal modul for high performance on-board data processing, based on cPCI Serial Space standard and state-of-the-art Xilinx Zynq Ultrascale+ MPSoC device on a single board (3U) for maximum flexibility. For improved reliability against SEEs, additional measures have been implemented, including a dedicated DDR3 memory error correction. This module is being used within the H2020 project S4Pro, which investigates how to combine state-of-the-art industrial computing technologies and space qualified embedded computing platforms in order to optimize the data processing chain and support the next generation of data intensive missions. A derivate version based on Xilinx XQRKU060 Space-grade FPGA provides dedicated connections for an external rad-hard reconfiguration engine, supporting different types of reconfiguration and scrubbing.

#### 1. INTRODUCTION

Current and future low-Earth orbit (LEO) space missions, such as Earth observation and 5G satellite communication, produce large amount of data that needs to be transferred, processed, and further downlinked. The same is true for deep space scientific missions, which suffer from very limited telemetry data rate. Both types of missions demand sophisticated onboard data processing functionalities, while low resources consumption remains a constraint. Thus, inflight reconfigurable architectures are mandatory. Using dynamically reconfigurable radiation-tolerant FPGAs allows enhancement of on-board processing with unprecedented levels of operational flexibility. Dynamic reconfiguration during flight enables the adaptation of the system regarding functional and fault-tolerance requirements, improving both, performance and maintenance. This is necessary to handle very high data rates, extract and process final physical values by an autonomous, intelligent, and reliable application already on-board the spacecraft, and adapting itself to changing mission needs.

The benefits of such an adaptable processing platform are a superior data yield and a reduced risk of a total instrument loss. An additional advantage of adaptability is the possibility to time-share resources for a more efficient hardware and power utilization, when dedicated functions are not necessary at the same time. Compared to general-purpose processors (GPPs), FPGAs offer a high processing performance combined with a low power consumption. FPGAs combines the performance of a hardware implementation with the flexibility of software realisation.

#### 2. FIRST SCIENTIFIC MISSIONS

We have already demonstrated the successful usage of SRAM-based FPGA devices for scientific instruments with e.g. the Venus Monitoring Camera (VMC) on Venus Express mission launched 2005. VMC was operational for more than 7 years with only a few numbers of resets due to Single Event Effects (SEEs), even fewer than have been expected. However, the reconfigurability was only used during the development phase on ground and no support for in-flight reconfiguration was built-in.

For instrument control and data processing of the PHI instrument on the Solar Orbiter mission (SO/PHI), we have partially adapted results of the ESA study for a Dynamically Reconfigurable Processing Module (DRPM) and implemented a flexible, in-flight reconfigurable, power efficient, and radiation tolerant processing module based on Xilinx Virtex-4 SRAMbased FPGAs. Additionally, the module is equipped with a rad-hard processor and a one-time programmable system supervisor FPGA. A detailed description of the PHI DPU architecture is given in [1],[2]. The very regular (e.g. every minute) reconfiguration of FPGAs according to current processing needs allows a very flexible use of the available resources in a time-space partitioning (TSP) manner. As a drawback, designs are highly specialised and restricted in terms of flexibility. To overcome these disadvantages, we have also implemented a flexible processing pipeline for regularly reconfigured FPGAs in flight [3].

With these devices an advanced System-on-Chip (SoC) like system can be build, however they are susceptible to radiation effects and the system reliability and qualification has to be guaranteed in the harsh space environment. Therefore, different mitigation techniques against SEEs (Single Event Effects), such as configuration memory scrubbing have to be implemented which require additional effort and may reduce FPGA resources available for implementation of the actual processing cores.

While the Xilinx Virtex-4 and Virtex-5 have been used for many current Space missions and are still available (at least Virtex-5), they do not provide sufficient logic resources and embedded memory for future usage. Instead, the Kintex Ultrascale XQRKU060 has become a state-of-the-art rad-tolerant FPGA implementation. As the Xilinx Zynq Ultrascale+ shows, integration of dedicated hard-wired processor cores is presently used already for military and commercial space applications.

### 3. UNIVERSAL PROCESSING MODULE

After having demonstrated the usage of in-flight reconfigurability for SRAM-based FPGAs on SO/PHI, we have developed a universal module for high performance on-board data processing, based on cPCI Serial Space standard and state-of-the-art Xilinx Zynq Ultrascale+ MPSoC device on a single board (3U). To ensure needed flexibility the system builds up on the cPCI Serial Space standard and thus guarantees the modular extensibility of the system due to the fully standardized backplane.

The processing module is designed to perform data intensive applications, e.g. for SAR image processing, with a throughout of multiple of ten gigabit per second. It is equipped with a Zynq Ultrascale+ XCZU17EG device, which combines a so-called processing system (PS) with classical programmable logic (PL). The PS includes two different processor units, the Application Processing Unit (APU, Quad ARM Cortex-A53) and the Real-Timing Processing Unit (RPU, ARM), and thus allows processing of even very complex algorithms. The PL provides ample resources for implementing application-specific logic and interfaces. Generally, tightly coupled Processor-FPGA (MP)SoC systems benefit from high bandwidth interfaces between the hard-wired processor cores and the FPGA fabric. This allows efficient acceleration of data-intensive computations by parallelizing the execution in the programmable logic. Dynamic partial reconfiguration can place hardware tasks in the programmable logic at appropriate times. However, the limited resources in the FPGA requires an efficient scheduling, it needs to be decided when and where to execute a task. This scheduling proposed in [4] needs to satisfy real-time requirements and guarantee latency or throughput rates within tight constraints on power/energy, which are inherent to space missions.

Within the German DFG research group "Controlling Concurrent Change (CCC)" (http://ccc-project.org/) we have developed methods and architectures for a high degree of autonomy of such embedded system supporting run-time platforms, adapted system functionality (time-space partitioned FPGA) under high requirements to real-time, safety, availability, and security, as e.g. in space missions. The implemented demonstrator [5] addresses three typical scenarios, which cover application-, environment-, and platform change. This included improved availability by adaption of reliability because of changing environment conditions, i.e. Single Event Effects (SEEs). Redundancy such as TMR could be configured as performance, needed, reducing processing but preventing malfunction in critical mission phases.

#### 3.1. System Overview

The complete architecture of the S4Pro processing board is depicted in Fig. 1. The board includes two banks of ECC-protected 64 Gibit DDR3-SDRAM memory to provide sufficient buffer and working memory capacity for high performance computing. For FPGA configuration and storage of application software, three identical 2 Gibit NOR flash memory devices are used in Triple Modular Redundancy (TMR) configuration, which improves reliability against SEUs or SEFIs. A correct boot process is crucial for the whole system.

For external connections, various types of interfaces are available. The high-speed interface consists of an 8-lane PCIe 2.0 block with a maximum data throughput up to 32 Gbit/s per direction and an optional pre-processing block, e.g. for SAR image alignment. Additionally, a SpaceWire interface is available for general-purpose lower speed data. The configuration & housekeeping interface consists of a Gigabit Ethernet controller (GEM) and PHY, which implement a tri-mode Ethernet interface enabling data rates of up to 10/100/1000 Mbit/s.



Figure 1. S4Pro Processor Board System Architecture Overview

The power supply consists of a set of high-accuracy DC-DC power modules and measurement & control logic for all lower voltages (< 5V) and correct power sequencing. It is designed to provide a maximum power of 15W and detect latch-up effects by measuring the overcurrent of individual power modules. Once a latch-up occurs, the control logic will reset the power supplies and subsequently the whole board.

### 3.2. FPGA Configuration Scrubbing

Configuration scrubbing is the process of detecting and correcting upsets in the volatile configuration memory of a SRAM-based FPGA and thus preserving the essential data integrity of FPGA configuration memory. Configuration scrubbing on state-of-the-art FPGAs does not interrupt normal FPGA functionality and may operate continuously if needed. Scrubbing has to be used to prevent the accumulation of SEUs in configuration memory of FPGAs. This can range from fixed periodic device re-configuration at dedicated time steps up to transparently checking and re-writing individual frames in the background throughout FPGA operation.

For this industrial implementation, we have implemented configuration readback and configuration scrubbing completely internal to the FPGA by using the Xilinx Soft Error Mitigation (SEM) core. The main advantage of internal scrubbing is the high-speed for single error detection and correction and no need of external resources. However, ICAP and internal logic are susceptible to radiation, so internal scrubbers may be qualitatively less reliable than external ones. Another problem, which needs to be taken into account, is the start-up latency of the SEM controller.

For supervision of the functionality of the XCZU17EG device, a rad-hard supervisory circuit is implemented as last resort. This watchdog performs a complete power cycle once it gets stuck.

#### 3.3. Memory EDAC for SEE Mitigation

The PL buffer memory DDR3 SDRAM EDAC is based on the ESA study for Next Generation Mass Memory Architecture (NGMMA). It uses a Reed-Solomon (RS) code with a scheme of RS (12, 8), which is capable of detecting and correcting up to two symbol (8 bit/symbol) errors and thus can tolerate the complete loss of a memory device. To offer a gross capacity of 64+32 Gibit, 12 DDR3 SDRAM devices in two groups are needed, each providing 8 Gbit at a data word width of 16 bit.

The Hamming code based EDAC block within the hardwired Xilinx PS DDR3 controller is capable of doublebit error detection but only single-bit error correction. Eight check bits are needed additionally to the 64-bit data word width. Correspondingly, the 64+8 Gibit working memory consists of 10 of the same DDR3 SDRAM devices in two groups.

To mitigate SEUs, a periodical memory scrubbing is recommended for both areas. Additionally, so-called software conditioning should be performed to handle device SEFIs without data loss.



*Figure 2. S4Pro Processor Board (100x160mm<sup>2</sup>, 3U)* 

### 3.4. Data Rate

In order to meet the requirements of S4Pro project, especially on the throughput up to 20 Gbit/s, the local buffer memory has a 64-bit net data bus and is operated with a frequency of 400 MHz. Thus, its memory interface can operate that memory with a burst data rate of 51.2 Gbit/s, which fulfills the throughput requirement in addition to performing several background tasks, i.e. memory refreshing, memory scrubbing etc. To cope with the high-performance computing in the APU, the working memory also implements a 64-bit data width and is clocked with a frequency of 600 MHz. It provides a burst data rate of 76.8 Gbit/s. The data exchange between PS and PL is carried out by using AXI interconnects. By using of two AXI interconnects, a burst data rate of 76.8 Gbit/s can be achieved assuming that the interconnect is operated with a frequency of 150 MHz.

#### 4. S4PRO

The processing module is being used within the H2020 project S4Pro (Smart and Scalable Satellite High-Speed Processing chain, https://www.s4pro-h2020.eu/), which investigates how to combine state-of-the-art industrial technologies (namely Xilinx computing Zynq UltraScale+) and space qualified embedded computing platforms (e.g. Gaisler GR740) in order to optimize the data processing chain and support the next generation of data intensive missions [6]. The approach targets not only the enhancement of technology transfer to nanoand small satellites, but also the enabling of institutional satellite missions that rely on operational tasks with very high bandwidth, processing, and storage requirements. Hence, the Zynq Ultrascale+ is used for on-board data elaboration, e.g. for SAR and multispectral imaging applications, due to its strong interfacing capabilities and integrated software processing units based on the ARM A53 core architecture.

The S4Pro system hardware is designed around the cPCI Serial Space standard, with each module connected to a cPCI-SS backplane for inter-module communication and power distribution, see Fig. 3. This delivers a flexible solution for payload processing, onboard computing, data storage, and downlink subsystems. The backplane routes power and data between each S4Pro module, with secondary power provision performed on each module individually. The standardized cPCI-SS backplane will allow multiple hardware vendors to collaborate on a single mission while reducing design risk around the critical interfaces. It will also be scalable, able to accommodate for large and small missions with minimal changes to the hardware design.



Figure 3. S4Pro cPCI-SS Architecture

The nominal design includes no single-point-of-failure by installing redundant modules. Additionally, the interconnect standard provides dual-redundant PCIe and Ethernet connections between modules.

# 5. ENHANCED MODULES

#### 5.1. Optional Mass Memory and Video Codec

Alternatively, the processing module can be build using an EV-MPSoC FPGA to include a video codec for compression and recording of data streams from video cameras. The implemented module is able to command and receive image data of four video cameras. These 500 Mbit/s per camera video data streams are preprocessed, subsequently compressed by H.265 standard, and transferred to the host computer via Ethernet. Optionally, the module can be expanded to control a NAND-flash mass memory of 256 GiByte user capacity. This allows to store raw data of all four cameras at about 12 frames per second. Again, complete Reed-Solomon single symbol error correction with a symbol width of 8 bits is foreseen and needs an additional 25 percent overhead on memory devices.

### 5.2. Rad-Hard Version

Currently, we are implementing a derivate with similar cPCI Serial Space interfaces, based on the Xilinx XQRKU060 Space-grade FPGA. Especially for different types of reconfiguration and scrubbing, a dedicated system controller connector is available to connect an external rad-hard reconfiguration engine. This can be used also to perform fault injection into the FPGA and thus exercise FDIR aspects. Additionally, all measures for potential future radiation tests, like current sense resistors, are already included.

Again, high-performance Reed-Solomon (RS) code EDAC for external buffer memory is applied with a scheme of RS (12, 8), which is capable of detecting and correcting up to two symbol (8 bit/symbol) errors and thus can tolerate the complete loss of a memory device. For configuration of the FPGA, rad-tolerant QSPI TMR NOR-Flash modules from 3D-Plus are envisaged. Theses embed triplicated NOR-Flash memories and thus implement a majority-voting system, ensuring complete SEU immunity. All internal voltages of the processing module have to be generated locally and a power management is needed. A radiation hardened and qualified power solution is proposed of POL Intersil parts, which partly have been used already for PHI DPU.

This derivate implementation is one option in the predevelopment study for the Lagrange PMI instrument DPU as follow-on of SO/PHI. Moreover, it will be used as demonstrator option within the ESA study of 'Strategies for reliable on-board reconfiguration of FPGAs'.

# 6. CONCLUSION

For high performance on-board payload processing in S4Pro project we have developed and implemented a flexible processing module on a single 3U board using state-of-the-art reconfigurable FPGAs, namely the Xilinx Zynq Ultrascale+ MPSoC device. This benefit from high bandwidth interfaces between the hard-wired processor cores and the FPGA fabric and provides a data processing throughput of tens of Gbit/s. The S4Pro system hardware relies one the cPCI Serial Space standard for maximum flexibility. For improved reliability against SEEs, additional measures have been implemented, including a dedicated DDR3 memory error correction. An enhanced processing module include a video codec MPSoC device and optional mass memory. A derivate version based on Xilinx XQRKU060 Space-grade FPGA provides dedicated connections for an external rad-hard reconfiguration engine, supporting different types of reconfiguration and scrubbing.

### ACKNOWLEDGEMENTS

The processing module was partly funded from the EU's Horizon 2020 research and innovation programme under grant agreement No 822014.

### REFERENCESS

- [1] B. Fiethe, F. Bubenhagen, T. Lange, et al., "Adaptive hardware by dynamic reconfiguration for the Solar Orbiter PHI instrument," in Adaptive Hardware and Systems (AHS), 2012 NASA/ESA Conference on. IEEE, 2012, pp. 31–37.
- [2] T. Lange, B. Fiethe, H. Michel, et al., "Onboard processing using reconfigurable hardware on the Solar Orbiter PHI instrument", in 2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), July 2017, pp. 186–191
- [3] T. Lange, B. Fiethe, H. Michalik, et al., "Evaluation of a Hardware Accelerated Onboard Processing Pipeline for Solar Orbiter's PHI Instrument" in Eurospace DASIA 2018, 2018.
- [4] A. Dörflinger, M. Albers, P. Keldenich, et al., "Hardware and Software Task Scheduling for ARM-FPGA Platforms" in NASA/ESA Conference on Adaptive Hardware and Systems (AHS), (Edinburgh, UK), August 2018.
- [5] A. Dörflinger, M. Albers, B. Fiethe, et al., "Demonstrating Controlled Change for Autonomous Space Vehicles" in NASA/ESA Conference on Adaptive Hardware and Systems (AHS), 2019.
- [6] Queiroz de Almeida, Felipe, Naghmouchi, Jamin, et al., "S4Pro: Prototype Implementation of Staggered SAR On-Board Processing" in Advanced Remote Sensing Instruments (ARSI), 2019.