Prognostics approach for power MOSFET under thermal-stress aging

The prognostic technique for a power MOSFET presented in this paper is based on accelerated aging of MOSFET IRF520Npbf in a TO-220 package. The methodology utilizes thermal and power cycling to accelerate the life of the devices. The major failure mechanism for the stress conditions is die-attachment degradation, typical for discrete devices with lead-free solder die attachment. It has been determined that die-attach degradation results in an increase in ON-state resistance due to its dependence on junction temperature. Increasing resistance, thus, can be used as a precursor of failure for the die-attach failure mechanism under thermal stress. A feature based on normalized ON-resistance is computed from in-situ measurements of the electro-thermal response. An Extended Kalman filter is used as a model-based prognostics techniques based on the Bayesian tracking framework. The proposed prognostics technique reports on preliminary work that serves as a case study on the prediction of remaining life of power MOSFETs and builds upon the work presented in [1]. The algorithm considered in this study had been used as prognostics algorithm in different applications and is regarded as suitable candidate for component level prognostics. This work attempts to further the validation of such algorithm by presenting it with real degradation data including measurements from real sensors, which include all the complications (noise, bias, etc.) that are regularly not captured on simulated degradation data. The algorithm is developed and tested on the accelerated aging test timescale. In real world operation, the timescale of the degradation process and therefore the RUL predictions will be considerable larger. It is hypothesized that even though the timescale will be larger, it remains constant through the degradation process and the algorithm and model would still apply under the slower degradation process. By using accelerated aging data with actual device measurements and real sensors (no simulated behavior), we are attempting to assess how such algorithm behaves under realistic conditions.


INTRODUCTION
Prognostics is an engineering discipline focused on predicting the time at which an in-service component will fail. The science of prognostics is based on the analysis of failure modes, detection of early signs of wear and aging, and fault conditions. These signs are then correlated with a damage propagation model and suitable prediction algorithms to arrive at a "remaining useful life" (RUL) estimate. The discipline that links studies of failure mechanisms to system lifecycle management is often referred to as prognostics and health management (PHM). Power semiconductor devices such as MOSFETs (Metal Oxide Field Effect Transistors) are essential components of electronic and electrical subsystems in onboard autonomous functions for vehicle controls, communications, navigation, and radar systems. In current practices, maintenance schedules are usually based on reliability data available from the manufacturer. However, while this approach works well in aggregate on a large number of components, failures on individual components are not necessarily averted. For mission critical systems it is extremely important to avoid such failures. This calls for condition based prognostic health management methods.

Related Work
In [2] a model-based prognostics approach for discrete IGBTs was presented. RUL predictions were accomplished using a particle filter algorithm where the collector-emitter leakage current was used as the primary precursor of failure. A prognostics approach for power MOSFETs was presented in [3], where, the threshold voltage was used as a precursor of failure; a particle filter was used in conjunction with an empirical degradation model.
Identification of parameters that indicate precursors to failure in discrete power MOSFETs and IGBTs have received considerable attention in recent years. Several studies have focused on precursor of failure parameters for discrete IGBTs under thermal degradation due to power cycling overstress. In [4], collector-emitter voltage was identified as a health indicator; in [5], the maximum peak of the collector-emitter ringing at turn OFF transient was identified as the degradation variable; in [6] the switching turn-OFF time was recognized as failure precursor; and switching ringing was used in [7] to characterize degradation. For discrete power MOSFETs, ONresistance was identified as a precursor of failure for the diesolder degradation failure mechanism [8,9]. A shift in threshold voltage was identified as failure precursor due to gate structure degradation fault mode [10].
There have been some efforts in the development of degradation models that are a function of the usage/aging time based on accelerated life test. For example, empirical degradation models for model-based prognostics are presented in [2] and [3] for discrete IGBTs and power MOSFET respectively. Gate structure degradation modeling of discrete power MOSFETs under ion impurities has been presented in [11].

ACCELERATED LIFE EXPERIMENTS
The development of prognostics algorithms face similar constrains as reliability engineering in that both need information about failure events of critical electronics systems. These data are is rarely ever available. In addition, prognostics requires information about the degradation process leading to an irreversible failure; therefore, it is necessary to record in-situ measurements of key output variables and observable parameters in the accelerated aging process in order to develop and learn failure progression models.
Thermal cycling overstress leads to thermo-mechanical stresses in electronics due to mismatch of the coefficient of thermal expansion between different elements in the component's packaged structure. The accelerated aging applied to the devices presented in this work consists of thermal overstress. Latch-up, thermal run-away, or failure to turn ON due to loss of gate control are considered as failure conditions. Thermal cycles were induced by power cycling the devices without the use of an external heat sink. The device case temperature was measured and directly used as control variable for the thermal cycling application. For power cycling, the applied gate voltage was a square wave signal with an amplitude of ~15V, a frequency of 1KHz and a duty cycle of 40%. The drain-source was biased at 4Vdc and a resistive load of 0.2Ω was used on the collector side output of the device. The aging system used for these experiments is described in [5], and the accelerated aging methodology is presented in [8].
In-situ measurements of the drain current (I D ) and the drain to source voltage (V DS ) are recorded as the device is under aging regime. The ON-state resistance R DS(ON) in this application was computed as the ratio of V DS and I D on the ON-state of the square waveform. In the accelerated aging system, it is not possible to measure junction temperature directly, as a result, the increase in junction temperature is observed by monitoring the increase in R DS(ON) . Furthermore, junction temperature is also a function of the case temperature, which is measured and recorded in-situ. Therefore, the measured R DS(ON) was normalized to eliminate the case temperature effects and reflect only changes due to degradation. Due to manufacturing variability, the pristine condition R DS(ON) varies from device to device. In order to take this into account, the normalized R DS(ON) time series is shifted by applying a bias factor representing the pristine condition value. The resulting trajectory (ΔR DS(ON) ) from pristine condition to failure, represents the degradation process due to die-attach failure and represents the increase in R DS(ON) through the aging process.
These measurements do not have a fixed sampling rate. On average, there is a transient response measurement every 400 ns. This consists of a snapshot of the transient response which includes one full square waveform cycle. Therefore a resampling of the curve was carried out to have uniform sampling and a reduced sampling frequency on the failure precursor trajectory. The signals were filtered by computing the mean of every one minute long window. There are six available aged MOSFETs under thermal overstress. Figure 1 presents the ΔR DS(ON) trajectories for the six cases. trajectories for all MOSFETs.

DEGRADATION MODELING
An empirical degradation model is suggested based on the degradation process observed on ΔR DS(ON) for the six aged devices. It can be seen that this process grows exponentially as a function of time and that the exponential behavior starts at different points in time for different devices. An empirical degradation model can be used to model the degradation process when a physics-based degradation model is not available. This methodology has been used for prognostics of electrolytic capacitors using a Kalman filter [12]. There, the exponential degradation model was posed as a linear firstorder discrete dynamic system in the form of a state-space model representing the dynamics of the degradation process. The proposed degradation model for the power MOSFET application is defined as follows. Let be the increase in ON-resistance due to aging. (1) where is time and and are model parameters that could be static or estimated on-line as part of the Bayesian tracking framework. This model structure is capable of representing the exponential behavior of the degradation process for the different devices. Table 1 presents parameter estimation results for model (1) based on non-linear least-squares estimation. The estimate for both parameters is presented along with their corresponding sample variance. It is clearly observed that the parameters of the model will be different for different devices. Therefore, the parameters and need to be estimated online in order to ensure accuracy. Figure 2 presents the estimation results for device #36.  (1) applied to degradation data in Figure 1.

Dynamic degradation model for Bayesian tracking
The degradation model presented in equation (1) is converted into a dynamic model in order to obtain the statespace representation needed for Bayesian tracking. Defining the parameters and be time dependent parameters, then the derivative of (1) is given by, . ( Defining and = 0, the dynamic model representation is given by, In this model, and are also state variables that change through time. Therefore, the model is a non-linear dynamic system and Bayesian tracking algorithms like the extended Kalman or particle filters are needed for on-line state estimation. The forward difference method is used to approximate the time derivatives in order to discretize the model in equation (3). The first step in the process is Solving for and applying the method to and we get:

PROGNOSTICS ALGORITHM DEVELOPMENT
A prognostics algorithm in this application predicts the remaining useful life of a particular power MOSFET device at different points in time through the accelerated life of the device. As indicated earlier, ΔR DS(ON) is used in this study as a health indicator feature and as a precursor of failure. The prognostics problem is posed in the following way.
• A single feature is used to assess the health state of the device (ΔR DS(ON) ). • It is assumed that the die-attached failure mechanism is the only active degradation during the accelerated aging experiment.

Extended Kalman filter implementation
Extended Kalman filter allows for the implementation of the Kalman filter algorithm for on-line estimation on nonlinear dynamic systems [13,14]. This algorithm has been used in other applications for health state estimation and prognostics. The general form of extended Kalman filter is given as; (6) where f and h are non-linear equations, is the model noise and is the measurement noise. Noise is considered to be normally distributed, with zero mean and known variance and for and respectively. For the prognostics implementation using the discrete dynamic degradation model in equation (5), the state variable is defined as Therefore, f is a vector valued function given by equation (8). The ON-resistance is the only measured value; therefore, the measurement equation is given by equation (9).

RUL ESTIMATION RESULTS
This section presents the results of the algorithm implemented. Four test cases are defined as follows following the leave one out validation concept: • : Predict RUL on device #36, estimate initial conditions with the rest of the devices and compute RUL at times

•
: Predict RUL on device #09, estimate initial conditions with the rest of the devices and compute RUL at times

•
: Predict RUL on device #08, estimate initial conditions with the rest of the devices and compute RUL at times