Sensor Fault Diagnosis

This tutorial investigates the problem of the occurrence of multiple faults in the sensors used to monitor and control a network of cyberphysical systems. The goal is to formulate a general methodology, which will be used for designing sensor fault diagnosis schemes with emphasis on the isolation of multiple sensor faults, and for analyzing the performance of these schemes with respect to the design parameters and system characteristics. The backbone of the proposed methodology is the design of several monitoring and aggregation cyber agents (modules) with specific properties and tasks. The monitoring agents check the healthy operation of sets of sensors and infer the occurrence of faults in these sensor sets based on structured robustness and sensitivity properties. These properties are obtained by deriving analytical redundancy relations of observer-based residuals sensitive to specific subsets of sensor faults, and adaptive thresholds that bound the residuals under healthy conditions, assuming bounded modeling uncertainty and measurement noise. The aggregation agents are employed to collect and process the decisions of the agents, while they apply diagnostic reasoning to isolate combinations of sensor faults that have possibly occurred. The design and performance analysis methodology is presented in the context of three different architectures: for cyber-physical systems that consist of a set of interconnected systems, a distributed architecture and a decentralized architecture, and for cyber-physical systems that are treated as monolithic, a centralized architecture. For all three architectures, the decomposition of the sensor set into subsets of sensors plays a key role in their ability to isolate multiple sensor faults. A discussion of the challenges and benefits of the three architectures is provided, based on the system scale, the type of system nonlinearities, the number of sensors and the communication needs. Lastly, this tutorial concludes with a discussion of open problems in fault diagnosis. V. Reppa, M. M. Polycarpou and C. G. Panayiotou. Sensor Fault Diagnosis. Foundations and Trends © in Systems and Control, vol. 3, no. 1-2, pp. 1–248, 2016. DOI: 10.1561/2600000007.


Abstract
This tutorial investigates the problem of the occurrence of multiple faults in the sensors used to monitor and control a network of cyberphysical systems.The goal is to formulate a general methodology, which will be used for designing sensor fault diagnosis schemes with emphasis on the isolation of multiple sensor faults, and for analyzing the performance of these schemes with respect to the design parameters and system characteristics.The backbone of the proposed methodology is the design of several monitoring and aggregation cyber agents (modules) with specific properties and tasks.The monitoring agents check the healthy operation of sets of sensors and infer the occurrence of faults in these sensor sets based on structured robustness and sensitivity properties.These properties are obtained by deriving analytical redundancy relations of observer-based residuals sensitive to specific subsets of sensor faults, and adaptive thresholds that bound the residuals under healthy conditions, assuming bounded modeling uncertainty and measurement noise.The aggregation agents are employed to collect and process the decisions of the agents, while they apply diagnostic reasoning to isolate combinations of sensor faults that have possibly occurred.The design and performance analysis methodology is presented in the context of three different architectures: for cyber-physical systems that consist of a set of interconnected systems, a distributed architecture and a decentralized architecture, and for cyber-physical systems that are treated as monolithic, a centralized architecture.For all three architectures, the decomposition of the sensor set into subsets of sensors plays a key role in their ability to isolate multiple sensor faults.A discussion of the challenges and benefits of the three architectures is provided, based on the system scale, the type of system nonlinearities, the number of sensors and the communication needs.Lastly, this tutorial concludes with a discussion of open problems in fault diagnosis.

Introduction to Sensor Fault Diagnosis
Recent advances in information and communication technologies, embedded systems and sensor networks have generated significant research activity in the development of so-called cyber-physical systems.These systems consist of two components: (i) physical, biological or engineered systems and (ii) a cyber core, comprised of communication networks and computational availability that monitors, coordinates and controls the physical part [Antsaklis et al. (2013)].In this tutorial, we will consider a network of interconnected cyber-physical systems, where each subsystem may be characterized by simple dynamics, but the overall dynamics can be large-scale and complex.The focus of research on cyber-physical systems is to improve the collaborative link between physical and computational (cyber) elements for increased adaptability, efficiency and autonomy.The key motivation for the advancement of cyber-physical systems is the need to better coordinate the interactions between the software and hardware designs by facilitating selfawareness in evolving environments and the handling of a huge amount of data of different time and space characteristics.However, reaching such a level of system intelligence necessitates the development of mechanisms capable to assess the reliability of information acquired by distributed deployed sensors and sensor networks through wired and wireless links [Ding et al. (2006)], or by internet-of-things devices.
A representative example of a large network of cyber-physical systems is a smart city with intelligent infrastructures for supporting the environment, energy and water distribution, transportation, telecommunication, health care, home automation and many more [Chourabi et al. (2012)].Each of these critical infrastructures consists of a large number of distributed, interconnected subsystems, which need to be monitored and controlled using a large number of sensing/actuation devices and feedback control algorithms.
Although the benefits of the use of automated monitoring and control procedures are widely accepted, this use has made critical infrastructures more susceptible to faults [Kröger and Zio (2011)].Thus, supervision schemes capable of diagnosing and accommodating faults are applied for ensuring system reliability and safety.From a systems point of view, safety, reliability and fault tolerance become key challenges in designing cyber-physical systems.For meeting these challenges, the cyber core should be empowered with supervision capabilities for diagnosing faults in the physical part and compensating their effects by taking appropriate remedial actions [Blanke et al. (2016); Isermann (2006)].
Fault detection addresses the problem of determining the presence of faults in a system and estimating their instant of occurrence [Isermann (2006); Gertler (1998); Chen and Patton (1999); Blanke et al. (2016); Ding (2008)].Fault detection is followed by fault isolation, which deals with finding which ones are the faulty components in the system, or the type of fault.Fault identification is described as the procedure of determining the size and the time variant behavior of the fault.In some cases, during the fault identification procedure, we also seek to assess the extend of the fault and the risks associated with it [Chen and Patton (1999); Ding (2008)].The result of fault identification is essential for performing fault accommodation by either changing the control law or using virtual sensors or actuators in response to a fault, without switching off any system component [Blanke et al. (2016)].In this tutorial, we will consider mainly the fault detection and fault

Introduction to Sensor Fault Diagnosis
isolation problems, which, for simplicity, together we will refer to as fault diagnosis.
Various methodologies have been developed for the fault diagnosis problem in general [Isermann (2006)], but the detection and isolation of sensor faults has become a key challenging problem in the last few years.This is due to the large number of sensors and sensor networks, used for (i) monitoring and controlling large-scale cyberphysical systems; (ii) providing rich and redundant information for executing safety-critical tasks; and (iii) offering information to citizens and governmental agencies for resolving problems promptly in emergency situations.For instance, in intelligent transportation systems, vehicles may be equipped with odometers, lasers, frontal camera video-sensors, GPS, speed or object tracking sensors, in order to be able to acquire and broadcast information relevant to performing tasks such as cooperative or fully autonomous driving, avoiding lane departure and collision, etc.In smart buildings, multiple sensors are installed in different zones (measuring quantities such as temperature, humidity, CO 2 , contaminant concentration, occupancy), as well as in heating, ventilation and air-conditioning systems for measuring supply/return/mixed air temperature, supply/return air differential pressure, return air humidity, etc.Such sensing information may be used for reducing the energy consumption of a building and maintaining the desired living conditions, as well as for executing evacuation plans in safety-critical situations (e.g.fire).Undetected sensor faults can severely impact automation and supervision schemes [Sherry and Mauro (2014)], possibly leading to system instability, loss of information fidelity, incorrect decisions and disorientation of remedial actions [BEA (2012)].
Sensor fault detection and isolation (FDI) methods are classified into physical redundancy-based and model-based methods [Betta and Pietrosanto (2000)].In many applications, the physical redundancy approach is not used due to the high cost of installation and maintenance, as well as due to space restrictions.However, the evolution of microtechnology in recent years has contributed to the reduction of the size and fabrication cost of sensors, making physical-redundancy methods more cost effective.Current technological advances are geared towards the Full text available at: http://dx.doi.org/10.1561/2600000007use of multiple, possibly heterogeneous sensors, which are not necessarily co-located, however the measured variables may have redundant information, which is useful for fault diagnosis purposes.For example, in a smart building, there may exist two sensors measuring the temperature in adjacent rooms; in such a case, the relation (either known apriori or learned during operation) between the two measured quantities maybe used to determine if one of the two sensors is faulty [Alippi et al. (2013)].With the current trend towards utilizing larger and larger numbers of sensors, there is also a higher probability of multiple sensor faults occurring, which is an issue that has not been well studied in the fault diagnosis literature.
The majority of sensor FDI techniques rely on the utilization of models only [Isermann (2006)].These techniques are further categorized as quantitative or qualitative methods [Venkatasubramanian et al. (2003a,b)]; the first category relies on a nominal mathematical model describing the system, while the second one uses symbolic and qualitative system representations.The equivalence and the differences between these two categories, as well as the design of a unified framework taking advantage of the benefits of each approach have been studied by several researchers [Cordier et al. (2004); Pulido and González (2004); Gentil et al. (2004)].
Qualitative model-based techniques are typically used by the artificial intelligence diagnostic (DX) community [Travé-Massuyès (2012)].The design of these techniques is based on the utilization of either causal models, such as signed digraphs, bond graphs, fault trees, etc. [Vedam and Venkatasubramanian (1997); Bregon et al. (2012)], or functional or structural abstraction hierarchies [Daigle et al. (2012); Blanke et al. (2016); Monteriu et al. (2007)].The nature of these models facilitates especially the fault isolation procedure.Moreover, the qualitative approach treats fault detection and isolation as a unified problem, and exploits reasoning techniques, thereby providing by design more straightforward methods for multiple sensor fault isolation [Nyberg (2006); De Kleer and Williams (1987); Daigle et al. (2012); Frisk et al. (2012)].

Introduction to Sensor Fault Diagnosis
While qualitative model-based approaches have mostly been adopted by the DX community, quantitative model-based approaches such as parity equations and observers are widely used for sensor FDI by the control-oriented FDI community [Gertler (1998); Chen and Patton (1999)].Among the quantitative methods, observer-based approaches have been applied to nonlinear systems, using a single nonlinear observer [Rajamani and Ganguli (2004); Narasimhan et al. (2008); Yan and Edwards (2007); Talebi et al. (2009)], or a bank of observers [Mattone and De Luca (2006); Rajaraman et al. (2006); Samy et al. (2011); Reppa et al. (2014bReppa et al. ( , 2012))].Several researchers have developed sensor FDI methods, which treat sensor faults as actuator faults and apply observer-based approaches for nonlinear systems [Kabore and Wang (2001); De Persis and Isidori (2001)], as well as methodologies for tackling the problem of actuator and sensor faults in a unified framework [Du et al. (2013)].One of the common characteristics of the majority of observer-based methods is the use of the open-loop system model and the input and output data.Recently, observer-based sensor FDI techniques have been proposed, which take advantage of the information about the closed-loop operation of the system (i.e.reference signals and controller's structure), when this is available [Olaru et al. (2010); Seron et al. (2012Seron et al. ( , 2013))].The control-oriented FDI community focuses mostly on making methods robust against modeling uncertainties and views the fault detection and fault isolation as two different tasks.
Clearly, mathematical models never capture the real behavior of the modeled system, due to the presence of uncertainties including parametric uncertainty, unmodeled system dynamics, or faults occurring in the system, which can be function of the system state and input.A powerful approach to robust FDI for nonlinear uncertain systems is based on the use of learning techniques [Polycarpou and Helmicki (1995); Trunov and Polycarpou (2000)].The main concept behind the learning approach for FDI is the approximation of the unknown system behavior using adaptive approximation models (e.g.sigmoidal neural networks, radial basis functions, support vector machines) and nonlinear estimation schemes [Trunov and Polycarpou (2000); Caccavale et al.
Full text available at: http://dx.doi.org/10.1561/2600000007(2008); Talebi et al. (2009); Thumati and Jagannathan (2010)].Under healthy conditions, adaptive approximation based schemes can be used to learn the modeling uncertainty during the initial stage of nonlinear system operation (training period) [Caccavale et al. (2009); Reppa et al. (2014b)].Then, the nonlinear functional approximator of modeling uncertainty can be used for optimizing the adaptive thresholds, thus enhancing the fault detectability and isolability of a quantitative model-based scheme [Reppa et al. (2014b)].Another approach for reducing the modeling error is offline identification of uncertainties.In general, adaptive approximation schemes provide a flexible methodology for learning the uncertainties in the sense that the training time can be adjusted online based on some criterion involving the estimation error.Under faulty conditions, adaptive approximation based schemes can be applied for learning the faults for isolation and identification purposes initially [Zhang et al. (2005[Zhang et al. ( , 2008))], and then for compensating the fault effects [Zhang et al. (2004); Reppa et al. (2014a)].
The majority of model-based sensor FDI methods are deployed in a centralized framework (see Fig 1 .1-1),but these approaches are less suitable for large-scale and complex systems such as a network of interconnected cyber-physical systems.In this context, centralized approaches have the following disadvantages: (i) increased computational complexity of the FDI algorithms, since centralized architectures are tailored to handle (multiple) faults globally, (ii) increased communication requirements due to the transmission of information to a central point, (iii) vulnerability to security threats, because the central cyber core in which the sensor FDI algorithm resides is a single-point of failure, and (iv) reduced scalability in case of system expansion, due to the utilization of a global physical model or black-box.A common design characteristic of non-centralized methods is that they handle the large-scale and complex system as a set of interconnected subsystems and they employ local agents that perform diagnosis based on local subsystems' models.The local agents are commonly deployed in either a distributed (   1) In a centralized approach, input/output information of all systems is transmitted to one agent.
(2) In a distributed architecture, input/output information of each subsystem is transmitted to its dedicated agent, and the agents are allowed to exchange information (input/output information, decisions).(3) In a decentralized architecture, input/output information of each subsystem is transmitted to its dedicated agent, but the agents do not exchange information.
Full text available at: http://dx.doi.org/10.1561/2600000007local diagnosers, as well as the type of communication and information exchanged between the local and high-level diagnosers.
The decentralized or distributed nature of the FDI process is related to either the task executed by the local diagnosers or the communication between the local diagnosers.In decentralized schemes, a local diagnoser is commonly designed to detect and isolate faults only in its underlying system [Yan and Edwards (2008); Klinkhieo et al. (2008); Reppa et al. (2015a)], while it may not exchange any information with other local diagnosers [Ferdowsi et al. (2012); Indra et al. (2012)].On the contrary, in distributed schemes, there is communication between the local diagnosers and every local diagnoser can detect and isolate faults in neighboring systems [Zhang and Zhang (2013); Shames et al. (2011); Davoodi et al. (2014); Daigle et al. (2007); Ferrari et al. (2012); Boem et al. (2013a); Reppa et al. (2015b)].The design of distributed FDI architectures may also differ in the type of exchanged information.Specifically, the local diagnosers may exchange estimations [Zhang and Zhang (2013); Yan and Edwards (2008)] [ Daigle et al. (2007)], or measurements of the interconnected states [Shames et al. (2011);Ferrari et al. (2012); Boem et al. (2013a)], or fault signatures [Daigle et al. Introduction to Sensor Fault Diagnosis (2007)].In multi-level FDI schemes, the communication between levels is commonly sporadic and event-driven, while the information transmitted to higher levels can be the decisions of the local diagnosers [Ferrari et al. (2012); Boem et al. (2013a); Reppa et al. (2015b)], the time instances of fault detection of the local diagnosers [Ferdowsi et al. (2012)] or the calculated analytical redundancy relations [Indra et al. (2012)].
It is worth noting that non-centralized fault diagnosis techniques can be applied to a monolithic system.A common approach is based on the use of multiple processing units (agents) and an aggregation unit that fuses the information from these units.This approach is followed by several existing FDI methods for stochastic systems based on interacting multiple models (IMM) [Zhang and Li (1998)], multiple sensor fusion (MSF) [Salahshoor et al. (2008); Reece et al. (2009)] or hidden Markov models (HMM) [Alippi et al. (2013)].In IMM-based techniques, the multiple models describe the system in healthy and various faulty system modes and are designed using the a priori knowledge of the possible system faults.Fault diagnosis using MSF-based techniques can be conducted by using local filters that generate local estimates and local decisions, and a global filter that combines the local state estimates to derive an improved global estimate and/or fuse the local decisions for obtaining a global decision.In HMM-based methods, spatial and temporal relationships among sensor datastreams is exploited, and a HMM-based module is designed for each pair of sensors.The lower processing layer detects variations in the relationships between pairs of sensors, while the upper processing (cognitive) level aggregates the information coming from all sensor units to distinguish faults from changes in the environment and false positives [Alippi et al. (2013)].
The main goal of this tutorial is to provide a cyber-physical methodology for designing and analyzing quantitative model-based sensor FDI techniques for large-scale nonlinear systems, which are monitored and controlled by a large number of sensors.To this end, Chapter 2 presents models that describe the system behavior, along with the underlying assumptions that are commonly used for the design of sensor FDI techniques and the formulation of the sensor fault diagnosis problem.Then, Chapter 3 surveys various architectures (centralized, decentralized, distributed) for solving the sensor FDI problem, taking into account the system scale and the number of sensors, as well as the communication needs.Chapters 4 describes the stages for designing observer-based fault detection methods, taking into account the nonlinear system nature, while Chapters 5 details the isolation steps with emphasis on multiple sensor faults.The performance of the observer-based sensor FDI techniques is analyzed in Chapter 6 with respect to robustness against modeling uncertainties, sensor fault detectability and isolability.Chapter 7 presents learning techniques that can be used for enhancing the performance of sensor FDI methods under healthy and faulty conditions.This tutorial is completed by summarizing the concluding remarks and discussing some open issues in fault diagnosis in Chapter 8.
Fig 1.1-2) or decentralized (Fig 1.1-3) architecture.The classification of these architectures is based on the type of system interconnections, the cyber levels of diagnosis, the task of the Introduction to Sensor Fault Diagnosis (1) Centralized architecture.(2) Distributed architecture.(3) Decentralized architecture.

Figure 1 . 1 :
Figure 1.1:Typical architectures for interconnected systems.(1)In a centralized approach, input/output information of all systems is transmitted to one agent.(2)In a distributed architecture, input/output information of each subsystem is transmitted to its dedicated agent, and the agents are allowed to exchange information (input/output information, decisions).(3) In a decentralized architecture, input/output information of each subsystem is transmitted to its dedicated agent, but the agents do not exchange information.