

Manuel Peña Fernández Electronics Engineer

IP to detect and diagnose errors in COTS microprocessors through the Trace Interface

2nd European Workshop on On-Board Data Processing (OBDP 2021)



16th June 2021



#### **About our company: ARQUIMEA**

We believe in technology as a driver for social development and progress.

Our continuous activity in R&D&i allows us to create solutions and innovative products based on our technologies for highly demanding sectors

where we operate.

ARQUIMEA is a cross-sectoral international technology company

Turnover
71 M€
Professionals
380+
Operations
25+ Countries



#### About our company: ARQUIMEA AEROSPACE & DEFENCE







DEFENCE & SECURITY



**AERONAUTICS** 



**SCIENCE** 









# **OUTLINE**

- **1** Motivation
- Trace-based error detection and diagnosis
- 3 Applications
- 4 Conclusions



# **Motivation**

"Enhance observability of hard COTS processors to provide error detection and diagnosis capabilities"



# Microprocessor errors and hardening techniques

#### Types of errors in microprocessors

- Control-flow errors
- Data errors

#### Microprocessor hardening techniques

- Software
  - Data replication, signatures, assertions
- Hardware
  - TMR, watchdogs, lockstep
- Hybrid

Hardware cannot be modified in COTS!





# Microprocessor error diagnosis

Radiation testing quantifies device susceptibility but commonly disregards error causes

- Error diagnosis may:
  - Identify circuit vulnerabilities
  - Assess on error criticality
  - Improve mitigation techniques
  - Support risk management

Importance of quality and completeness of diagnosis information

Collected immediately after the error

- Existing error diagnosis approaches:

  - Fault injection to create error dictionaries \_\_\_\_\_\_ Error aliasing



#### The trace interface

- Software debugging tool commonly available at modern microprocessors
- Non-intrusive, low latency information
- Unused in deployed applications
- Deals with asynchronous events
- Useful for error detection and diagnosis
- Not natively supported







## **ARM & CoreSight**

- High penetration in commercial electronics
- New ARM-based space-oriented initiatives (NASA HPSC or NanoXplore)
  - Scalable
  - Flexible
  - Low power and high performance
- CoreSight technology is the family of ARM components to support trace and debug
- CoreSight trace is compatible with almost any ARM processor cores
- Availability of specific components is implementation dependent
  - Common functionalities
  - Common interfaces

# Trace-based error detection and diagnosis









#### **IP** interfaces



\*Freq(TRACE\_CLK\_N) = Freq(TRACE\_CLK\_N) × N / 8



#### IP architecture



- 1. Configurable
- 2. Generates error signals



#### Historical data



- 1. Configurable
- 2. Generates error signals



**IP** specifications

| •                       | Condition                                           | Min  | Тур  | Max  | Units                  | Comment                                              |
|-------------------------|-----------------------------------------------------|------|------|------|------------------------|------------------------------------------------------|
| Pin count               | SPI interface option No error signals               | 6    | 10   |      |                        | Each error signal adds extra pins                    |
| Error detection latency | No nested events in event evaluator                 |      |      | 23   | TRACE_CLK clock cycles | Event evaluator adds one cycle per each nested event |
|                         | @1333Mbps                                           | 140  |      |      | ns                     |                                                      |
| Operating frequency     | Implemented on Xilinx XC7Z010                       |      |      | 166  | MHz                    | TRACE_CLK frequency                                  |
| LUT count               | Synthesis for Xilinx Artix 7 series                 | 2500 | 6000 |      |                        | 6-input LUTs                                         |
| Flip Flop count         | Synthesis for Xilinx Artix 7 series                 | 2700 | 7000 |      |                        | D-type FFs                                           |
| Trace Data throughput   | On-chip XC7Z010 over EMIO 8-bit data width          |      |      | 1333 | Mbps                   |                                                      |
|                         | Off-chip XC7Z010 over MIO 4-bit data width LVCMOS33 |      |      | 920  | Mbps                   |                                                      |
|                         | Off-chip XC7Z010 over EMIO 4-bit data width TDMS33  |      |      | 1200 | Mbps                   |                                                      |

# **Applications**

- Device evaluation
- System design
- System operation



## Device evaluation & system design

Successfully detecting and classifying errors in ARM Cortex-A9 on Xilinx Zyng-7000 device.

#### **IP** supports following tasks:

- Online error detection of control-flow and data errors in different applications with up to 99.9% coverage
- Integration with other system-level hardening techniques such as lockstep or hardware acceleration
- Identification of most radiation sensitive resources in the processing system
- Selection of the lowest cross section version of a given application
- **Evaluation of error criticality through effective error diagnosis**

#### Flexible integration options



## **Binary integration**





### **Ternary integration**





# **IP** highlights

- Online, low latency, error detection and diagnosis
- 140ns detection latency
- Comprehensive error traceability and diagnosis
- Seamless integration as a system peripheral
- Scalable, flexible, parametric design
- User configurable



- Already tested under neutron and proton irradiation with up to 99.9% error coverage
- Selected for contract by ESA through the Open Space Innovation Platform

# Conclusions

- New solutions for reliably using COTS processors in space are of big interest in space industry
- Trace monitoring brings new possibilities to the designer's toolbox
- Trace-based error detection and diagnosis is available at Arquimea as an IP core
- Further IP developments are ongoing with ESA support

# **EARQUIMEA**

IP to detect and diagnose errors in COTS microprocessors through the Trace Interface

2nd European Workshop on On-Board Data Processing (OBDP 2021)

Manuel Peña Fernández Electronics Engineer

mpena@arquimea.com

16th June 2021

#### arquimea.com

Parque Científico Leganés Tecnológico Margarita salas, 10, 28918, Leganés (Madrid), Spain