

# HPCB **High Performance Compute Board** A Fault-Tolerant Module for On-board Vision

Processing

**Cobham Gaisler AB** Joaquín España Navarro 2021-06-17 EUROPEAN WORKSHOP ON ON-BOARD DATA PROCESSING 2021



- Introduction
- Hardware Architecture
- FPGA VHDL design
- Microcontroller software
- VPU software
- Conclusion





#### **Introduction** Main objective

- ESA FPGA Accelerated DSP Payload Data Processor Board activity
- Develop a high-performance platform leveraging the AI techniques from the commercial domain to enable in-situ research in space
- Higher processing capabilities allows processing before downlink, reducing bandwidth requirements
- Desired features:
  - Capable of handling very high bit-rates
  - Interface multiple instruments simultaneously
  - Scalable, reconfigurable
  - System-level fault mitigation
  - On-board image processing
  - Data compression



#### 5

#### Introduction Consortium

- Prime
  - Cobham Gaisler AB, Göteborg, Sweden
- Subcontractors
  - Ubotica Technologies Limited, Ireland
  - National and Kapodistrian University of Athens, Greece
  - QinetiQ Space NV, Belgium
- External services
  - Pender Electronic Design, Switzerland













European Space Agency Agence spatiale européenne





## **Hardware Architecture**

Platform overview

- Payload Module
  - 6U by 160 mm
  - OpenVPX (VITA 65)
- Three main components:
  - Kintex Ultrascale XCKU060 FPGA
  - CG GR716 microcontroller
  - Intel Movidius Myriad 2 VPU
- Carrier board + 3 FMC cards
  - VITA 57.1 standard
- Main interfaces:
  - SpFi and SpW for control/data
  - SPI and CIF/LCD for Myriad
  - Debug interfaces









- New FPGA board: GR-VPX-XCKU060
  - Xilinx XCKU060, in 1517 pin FCBGA package
  - GR716B (initially with GR716A)
  - SODIMM DDR3 up to 8 GiB
  - SPI flash for FPGA configuration (512 Mbit), for GR716 boot (256 Mbit), and for data (256 Mbit)
  - Power, Reset, Clock and Auxiliary circuits
  - Three FMC Mezzanine expansion connectors
  - Scrubbing interface for FPGA
  - Backplane I/F: SpaceWire (control), SpaceFibre (data), VPX utility management
  - Front panel I/F and drivers: 4x SpaceFibre, 2x SpaceWire, USB/FTDI UART/JTAG Links, USB I/F to FMC
  - OpenVPX compatible, 6U format, Payload profile



www.gaisler.com/index.php/products/boards/gr-vpx-xcku060

#### **Hardware Architecture GR-HPCB-FMC-M2**

- New Mezzanine board: GR-HPCB-FMC-M2
  - Intel Myriad2 MA2450 (initially)
  - SPI flash 256 Mbit for optional boot mode
  - Local supply and power sequencing
  - Latch-up protection evaluation circuitry
  - The mezzanine board is intended for prototyping only





совнят

GR-VPX-XCKU060

and the information on this page are preliminary specifications

Xilinx XCKU060, in 1517 pin FCBGA packa

be GR-VPX-XCKU060 board features a Xilinx Kintex Litrascale 060 EPGA and a GR716 microc

Introductio

GR-VPY-YCKIIO6

www.gaisler.com/index.php/products/boards/gr-vpx-xcku060

The GR-VPX-XCKU060 carrier board and the GR-HPCB-FMC-M2 mezzanine board are developed as part of the High-Performance Compute Board of



## **Hardware Architecture**

#### GR716B microcontroller



- LEON3FT Fault-tolerant SPARC V8 32-bit processor, **100** MHz
  - LEON-REX extension with 16-bit instructions: improved code density
  - **Pipelined** Floating Point Unit
  - Memory protection units
  - Non-intrusive advanced on-chip debug support unit
  - Determinism: Multi-bus, fixed interrupt latency, cache-less architecture...
- External EDAC memory: 8-bit PROM/SRAM, SPI (4 Byte)
- Hardware FPGA programming and scrubber
- 2-Port SpaceWire Router with time distribution support, 200 Mbps
- MIL-STD-1553B interface
- 2x CAN-FD controller interface with CANopen support for remote boot
- PacketWire with CRC acceleration support
- Programmable Enhanced PWM interface with Digital voltage control loop support
- SPI with SPI-for-Space protocols
- 10/100 Ethernet, UARTs, I2C, GPIO, Timers with Watchdog
- Programmable Enhanced DMA, Interrupt controller, Status registers, JTAG debug, etc.
- 4x ADC 13bits resolution @500ksps, 4 differential or 8 single ended channels
- DAC 12bits @ 3Msps, 4 channels
- LVDS with ColdSpare and Fail-Safe support, Mixed GPIOs
- Power-on-Reset and Brown-out-detection
- 12xAnalogue comparator, Temperature sensor, Integrated PLL
- On-chip regulator for 3.3V single supply
- 132 pin QFP, 24 mm x 24 mm



SPARC

Compliant

SCD V8



## **Hardware Architecture**

Myriad 2 VPU

- Myriad 2 VPU architecture
  - 28nm ultra-low power (0.5W@600MHz) with power islands
  - Heterogeneous SoC: 2 LEON4@fp64 + 12 Shaves@fp32
  - 256+32KB LRAM, 2MB CMX, DDR3 support, DMAs
  - Power efficiency of 2Tops/W (max 16-bit equivalent)
- Applications
  - Visual analytics / indoor navigation
  - 360° panoramic video
  - Computational photography 3D modeling
  - Immersive gaming, augmented reality
  - DJI Phantom 4 / Mavic Pro Drones
  - FLIR Thermal Imaging
  - Google IoT
  - Microsoft Windows 10 Devices
  - Neural Compute AI Stick











|                          | DDR Controller |                           |              |                               |                              |  |
|--------------------------|----------------|---------------------------|--------------|-------------------------------|------------------------------|--|
|                          |                | -bit AXI                  | 128-bit AHB  | 128-bit AHE                   |                              |  |
|                          |                | 256kB 2-way L2 cach       | ie (SHAVE)   |                               |                              |  |
| 1                        | 2MB CMX SRAM   |                           |              |                               |                              |  |
| <                        | Ports          | 64-bit 64-bit CMX X 12 SI | HAVEs        | 256kB 4-way                   | 32k8 4-way                   |  |
| 32-bit<br>APB            | <u>*</u>       |                           |              | L2 cache (LEON4)              | L2 cache (LEON4)             |  |
|                          | VRF 32x128     | (10 ports)                | or processor | 32kB 2-way<br>I-cache (LEON4) | 4kB 2-way<br>I-cache (LEON4) |  |
|                          | IRF 32x32      | (17 ports)                |              | 32kB 2-way<br>D-cache (LEON4) | 4kB 2-way<br>D-cache (LEON4) |  |
| 1kB 1kl<br>D-cache I-cac |                | UO LSU1 IAU SAU           |              | LEON4<br>RISC2                | LEON4<br>RISC1               |  |

128/256MB LPDDR2/3 Stacked Di-





## **FPGA VHDL design**

Overview



- High capacity XCKU060-FFVA1517C
  - Potential path to flight by upgrading to the rad-tolerant XQRKU060
- HPCB flow manager and VPU handling:
  - Implements controllers for the interfaces of the board: SpW, SpFi, SPI, I<sup>2</sup>C, CIF/LCD
  - Buffer configuration and application data in SDRAM
  - Control Myriad boot process and configuration
  - Communication with the System Controller
  - Support for non-redundant (SINGLE) and redundant modes (DMR, TMR)
  - Hardware acceleration
    - CCSDS 123.0 data compression
    - Temporal binning
- IPs from ESA portfolio + custom IPs
  - LEON2FT package, SpFi, SpW with RMAP, ShyLoC





## **FPGA VHDL design**

VHDL block diagram





#### **FPGA VHDL design** Preliminary figures

- Design currently being finalized
- AMBA clock of 50 MHz
- 32-bit central AMBA bus, 128-bit memory bus
- SpaceFibre links running at 3.125 Gbps
  - Support for switching bit-rate automatically by accessing the ports of the GTH transceiver
- SpaceWire links running at 100 Mbps
- FPGA Myriad SPI link running at 5 Mbps
- CIF/LCD interfaces running at 50 MHz, 16bpp
- Utilization: 23% LUTs  $\rightarrow$  TMR possible
  - Before adding hardware accelerators

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 76026       | 331680    | 22.92         |
| LUTRAM   | 2952        | 146880    | 2.01          |
| FF       | 64298       | 663360    | 9.69          |
| BRAM     | 77          | 1080      | 7.13          |
| DSP      | 25          | 2760      | 0.91          |
| 10       | 438         | 624       | 70.19         |
| GT       | 5           | 32        | 15.63         |
| BUFG     | 33          | 624       | 5.29          |
| ММСМ     | 5           | 12        | 41.67         |
| PLL      | 3           | 24        | 12.50         |







### **Microcontroller software**

#### Overview

- Rad-tolerant component of the platform
- Present on the carrier board
- GR716A currently used
  - To be replaced by GR716B (under development)
- System supervisor:
  - Access to the SPI flash memories with the golden copy of the uC SW (self-boot), FPGA configuration and Myriad 2 boot images
  - Program and scrub the configuration area of the FPGA via SelectMAP (GR716B only)
  - Transfer the Myriad 2 SW to the FPGA via SPI
    - Trigger the VPU boot process once the transfer completes
  - Monitor the VPUs heartbeats and verify the processing results
    - If errors are detected or a heartbeat is missing, the GR716 resets the corresponding VPU







# VPU Software



- Myriad 2 VPU
  - Performs all ISP, CV & AI tasks in the system
  - Executes user-design heterogeneous hardware-software pipelines
    - Internal HW blocks perform common ISP/CV tasks
    - Parallelised and optimized SW blocks run on VLIW vector processors
  - Up to 400MPix/s sustained throughput per Myriad for ISP/CV
  - >1TOPS of compute per Myriad
- VPU software designed for task flexibility
  - Vision and NN inference application development without requiring any embedded coding
- Myriad 2 radiation characterisation
  - 6 radiation test campaigns completed (SEL, SEU, TID)
  - No critical effects observed



#### **VPU Software** Features & Block Diagram





- Myriad 2 firmware driven by FSM
- Two-phase boot process
  - Enables high-bandwidth interface for boot firmware transfer
- Features
  - Vision Pipeline and NN blob replacement, on-device memory management
  - Dynamic reconfigurability of input and output image sizes, bit depths
  - Built In Self Test for chip level performance monitoring
  - Per-processor heartbeats for GR716 monitoring
  - Junction temperatures available on demand



#### **VPU Software** Vision, ISP and AI Compute

- Flexible ISP & CV pipelines
  - Drag & drop development of pipelines enabled via CVAI Toolkit software
- Wide AI model and framework support
  - Fully compatible with Intel OpenVINO toolchain and common frameworks: PyTorch, TensorFlow, Keras...
- Dynamic updates
  - New vision pipelines and AI models can be uploaded and selected at runtime from the System Controller
  - Enables frame level switching of pipelines
- Image pre-processing followed by NN inference tightly coupled on device
  - User and runtime reconfigurable

|                      |           | Data Sink                           |
|----------------------|-----------|-------------------------------------|
|                      |           | <ul> <li>Input</li> </ul>           |
|                      |           | vertical -1                         |
| Data Source          |           | horizontal -1                       |
|                      |           | ↓ planes_in_0 -1                    |
| vertical             | Output    | ▲ bytes_per_pixel_in_0 -1           |
| horizontal           |           |                                     |
| ↓ planes_out_0       | -1 🕨      |                                     |
| d bytes_per_pixel_ou | it_0 -1 🕨 | <ul> <li>Debayering</li> </ul>      |
|                      |           | Input     Outp                      |
|                      |           | ▲ bytes_per_pixel_in_0 -1           |
|                      |           | ▲ bytes_per_pixel_out_0 -1          |
|                      |           | ▲ bytes_per_pixel_out_1 -1          |
|                      |           | cfg 0,0,0,0,0,0,0,0                 |
|                      |           |                                     |
|                      |           | thresh 0,0,0,1,0,0                  |
|                      |           | thresh 0,0,0,1,0,0<br>dewormCfg 0,0 |

| TensorFlow                     |                     |
|--------------------------------|---------------------|
| K Keras OP                     | yTorch              |
| 💆 Caffe2                       | mxnet               |
| -                              |                     |
| Movidius<br>MA2450<br>Myriad 2 | <pre>OpenVINO</pre> |
|                                | 21                  |



#### **Conclusions** Summary



- High-performance platform to do science in space using commercial AI parts
- System-level mitigation techniques by using a rad-tolerant microcontroller as the system supervisor
  - FPGA scrubbing
  - VPU monitoring and reset
- FPGA upgradeable with the Xilinx rad-tolerant counterpart
- Up to three VPUs operating simultaneously:
  - SINGLE mode to improve throughput
  - Redundant modes (DMR, TMR) to detect anomalous processing
- FPGA working memory: dual DDR3 SDRAM
  - Interface of 64 + 32 bits
  - Memories can be protected by EDAC
- Image processing in the VPUs, hardware accelerators in the FPGA

#### **Conclusions** Future work



- Current status:
  - HPCB under TRB review with ESA
  - Design fully tested with a single FMC card
  - Characterization of the CIF/LCD interfaces completed
- Future work:
  - Support for higher bit-rates:
    - SpW at 200 Mbps
    - SpFi up to 6.25 Gbps
  - Hardware acceleration
    - CCSDS 123.0 compression
    - Temporal binning
  - Backplane validation: HPCB as payload module in CORA rack
  - System verification with 3 FMC cards in parallel and benchmarking
    - Maximum throughput in SINGLE mode
    - Fault-injection in redundant modes
- Validation to be completed by end of 21Q3



## For further information and inquiries

- www.caes.com/gaisler
- sales@gaisler.com

#### Thank you for listening!



