A hidden-Gamma model-based filtering and prediction approach for monotonic health factors in manufacturing

In the context of Smart Monitoring and Fault Detection and Isolation in industrial systems, the aim of Predictive Maintenance technologies is to predict the happening of process or equipment faults. In order for a Predictive Maintenance technology to be effective, its predictions have to be both accurate and timely for taking strategic decisions on maintenance scheduling, in a cost-minimization perspective. A number of Predictive Maintenance technologies are based on the use of ‘‘health factors’’, quantitative indicators associated with the equipment wear that exhibit a monotone evolution. In real industrial environment, such indicators are usually affected by measurement noise and non-uniform sampling time. In this work we present a methodology, formulated as a stochastic filtering problem, to optimally predict the evolution of the aforementioned health factors based on noisy and irregularly sampled observations. In particular, a hidden Gamma process model is proposed to capture the nonnegativity and nonnegativity of the derivative of the health factor. As such filtering problem is not amenable to a closed form solution, a numerical Monte Carlo approach based on particle filtering is here employed. An adaptive parameter identification procedure is proposed to achieve the best trade-off between promptness and low noise sensitivity. Furthermore, a methodology to identify the risk function associated to the observed equipment based on previous maintenance data is proposed. The present study is motivated and tested on a real industrial Predictive Maintenance problem in semiconductor manufacturing, with reference to a dry etching equipment.


Introduction
Advanced monitoring is a fundamental activity in the Industry 4.0 scenario to implement control, maintenance, quality, reliability, and safety policies [1,2,3]. In particular, Fault Detection and Isolation (FDI) [3] and Predictive Maintenance (PdM) [4] technologies have proliferated in the past recent years for diagnosis and prognosis of process/tool failures [5]. While the aim of such technologies is similar and partly overlapped, PdM technologies are more focused on prognosis. Prognosis can be defined as the capability to provide early detection of the precursor and/or incipient fault condition of a component, and to design tools for managing and predicting the progression of such fault condition to component failure [6]. Given their goal, PdM technologies are typically applied to failures that are associated with wear and usage of the system/process [7], or, more generally, to failures that can be predicted in advance [8,9]. Examples of such type of faults are the breaking of the source in ion-implantation processes in semiconductor manufacturing [7], the flute wear in cutting tool equipment [10], and the lifespan of lithium-ion batteries [11].
In this work we focus on the so-called 'Health Factors' (HFs), an important concept in prognostic 1 . HFs are quantitative indexes used to define the current status of a tool/process and to assess the future statuses of the system under exam (or of one of its components/subsystems), and its Remaining Useful Life (RUL) [14,16,17], so that strategic decisions regarding maintenance scheduling and dynamic sampling plans can be taken [4]. Being in direct relationship with wear, usage or stress of an equipment/component or system, HFs generally have a monotone evolution. A HF can be of very different nature: in its simplest form, HFs can be observable param-eters that, thanks to specific domain expertise, can be associated with equipment/process health status. Example of health factors as quantities that are directly related to system health, such as the thermal index of a polymeric material [18], the scar width in sliding metal wear [19], and the temperature difference in semiconductor manufacturing epitaxy processes [20]. HFs can also be the output of Soft Sensor modules [21,22], where the status health is impossible/costly to be monitored. Moreover, HFs can be the residual of first principle FDI models [23]. In fact, in many practical examples [1,17,23,24], residuals have a monotonic behaviour and threshold-based policies to maintenance management are implemented on such quantities. HFs are therefore relevant quantities in both model-based [24,25,26] and model-free [1,3,27,28] prognostic approaches.
In the present paper, the problem of designing a HF for Predictive Maintenance (PdM) purposes is considered [7,29,30]. In particular, the issue of assessing the probability distribution of the HF future values given its past measurements is addressed, under the following assumptions: (i) the HF is monotonically increasing; (ii) its measurements are subject to random noise that may conceal its monotonic nature; (iii) measurements are non-uniformly sampled over time. The aforementioned features are typical traits of HF signals [17,20,31,32,33], but they are generally not simultaneously accounted for in the related literature. Non-stochastic models (see [12] for a broad review on RUL estimation) for HFs have been presented in literature, as well as inspection and intervention approaches for increasing maintenance actions effectiveness and decreasing the associated costs. However, such methodologies are well suited for noise-free scenarios and, given the aforementioned assumptions on the HF signals, it is here proposed to adopt a stochastic filtering paradigm [34]. With the proposed approach, the HF is treated as a stochastic process, with the possibility to combine prior knowledge on the HF with statistical information regarding the observed noisy data. A simple approach to deal with the problem at hand is provided by the Wiener and Kalman predictors [20,35,36,37], which are statistically optimal for linear Gaussian models. However, such classical approaches may be considered suboptimal for signals with the characteristics given in assumptions (i)-(iii). As a matter of fact, far from being Gaussian, the HF derivative is in this work considered to be a nonnegative random variable.
Given such premises, a framework for HF filtering and prediction based on the Gamma distribution is here proposed. PdM applications employing Gamma distributions has been developed since the 1970s [38], especially in mechanical and civil engineering applications [39,40,41] and, recently, in industrial environments [42]. Indeed, if the HF is modeled as the sum of Gamma distributed random variables, such sum is still Gamma distributed, with the advantage that convenient estimation and prediction algorithms can be derived. Given that in real-world industrial applications HFs are usually observed with noise, the ap-proach proposed in this work considers the HF as a monotonic Gamma process (with time-varying shape parameter) corrupted by Gaussian noise (hidden-Gamma model). Such assumptions lead to the lack of closed-form solutions for the estimation of model parameters in the proposed approach. However, it will be shown that the prediction problem can be efficiently solved by resorting to particle filtering methods [43,44,45], employing Monte Carlo (MC) simulations to derive the target posterior distributions. Finally, a recursive procedure to estimate the time-varying shape parameter is proposed. Such procedure allows to optimize a trade-off between the need for promptness and noise insensitivity/outlier rejection.
The paper is organized as follows. In Section 2 the hidden-Gamma model is presented. In Section 3.1 the principles of Particle Filtering (PF) are briefly summarized and adapted to Gamma processes. In Section 4 an adaptive recursive scheme for estimation of monotone HFs is presented. Section 5 is dedicated to the definition and estimation of an appropriate Risk Function for the proposed model. In Section 6 some experimental results on synthetic datasets are reported, whereas in Section 7 a real PdM semiconductor manufacturing problem related to dry etching is tackled. 2

Gamma Probability Distribution
The most notable property of Gamma distributions is their non-negative support. We consider a random variable x with Gamma distribution Γ(a, θ), where a is the shape parameter and θ is the scale factor. The first two statistical moments of x are E[x] = aθ and V ar[x] = aθ 2 and the probability density function (PDF) is p(x) = x a−1 e − x θ Γ(a)θ a . Gamma distributed random variables enjoy the following property: Property 1 (Infinite Divisibility): if x 1 ∼ Γ(a 1 , θ) and x 2 ∼ Γ(a 2 , θ), then the sum x = x 1 + x 2 has a Gamma distribution with shape a 1 + a 2 and scale factor θ.
The shape of the Gamma probability distribution for different values of a and θ is shown in Figure 1.

The Hidden-Gamma Model
In the following, the HF is denoted by x(·). Measurements x(t 1 ), x(t 2 ), . . . , x(t k ) are available for time instants t 1 ≤ t 2 ≤ . . . ≤ t k . We indicate with t 0 ≤ t 1 the initial time instant, and with t k+1 ≥ t k the instant points in which predictions of the HF are desired.
In the following, we adopt a Bayesian paradigm by assigning to {x j , j = 1, . . . , k + 1} a joint prior distribution. As stated in Section 1, the HF is associated with equipment/process degradation, therefore the prior information on the HF can be formalized as follows: • HF takes non negative values: • HF has non negative increments: • ∆x j is positively correlated with the length of t j − t j−1 .
The previous characteristics for the HF are captured by the following stochastic model: The HF evolution is governed by the following equation where x 0 ∼ Γ(a 0 , θ) and w j ∼ Γ(α(t j − t j−1 ), θ). It is supposed that w j are mutually independent random variables, also independent from x 0 .
It is straightforward to see from Eq. (1) that both the HF and its increments are non negative. Moreover, thanks to Property 1, it can be seen that E[x(t j )] = (a 0 + αt j )θ, which means that it is expected that the HF is linearly increasing with time.
Remark 1: the discrete-time model described in (1) can be obtained by sampling the continuous-time Gamma process x(t) that satisfies where x(t 0 ) ∼ Γ(a 0 , θ) and dw(t) is a Wiener process with dw(t) ∼ Γ(αdt, θ). Equation (2) can be obtained thanks to Property 1. Such formulation allow us to estimate the HF for the generic time instant t = t j .
If {x j } were noiseless, the estimation of the unknown parameters {a 0 , α, θ}, that specify the distribution of the future values of the HF, could be performed for example via maximum likelihood estimation (MLE). In such case, the posterior expectation can be employed as point predictor Moreover, the knowledge of the distribution of w k+1 allows to define confidence intervals for the HF. Such confidence intervals could be exploited in a PdM perspective, by computing the probability of exceeding predefined thresholds associated with a maintenance action.
Unfortunately, in real world scenario, HFs are usually affected by measurement noise. For this reason, a measurement equation is added to the model (1): The HF is observed through the noisy measures where v j ∼ N (0, σ 2 ) (Gaussian distribution with 0 mean and standard deviation σ) is independent from the initial value x 0 and {w j }.
The stochastic process y k is named here a hidden Gamma process (HGP). Given the presence of the measurement noise v j in (3), there is no guarantee that the sequence {y j } is monotonic. In the following, it is assumed that a 0 , α, θ are known, as they can be estimated by MLE even in presence of noise. The formulation of the filtering problem is then the following: Problem 1. Given the available measures Y k = {y j , j = 1, . . . , k} and Assumptions 1-2, compute the posterior PDF p(x k+1 |Y k ).
Since p(x k+1 |x k , x k−1 , . . .) = p(x k+1 |x k ), the HGP defined in (1) is a first-order Markov process. Then, Problem 1 can be approached by looking for a recursive solution where p(x k+1 |Y k ) is computed by updating p(x k |Y k−1 ), once the measure y k is available. In the noisy conditions we are considering in this work, such solution must be derived with numerical MC techniques like PF.

Basics of Particle Filtering
PFs or Sequential Monte Carlo methods are numerical approaches that allow to approximate intractable or complex distributions by employing discrete distributions whose statistical moments and confidence intervals can be easily calculated. PFs exploits the generations of N random variables, named particles, to approximate the unknown stochastic process posterior. A basic PF algorithm for a hidden state-space system as the one given by (1)-(3) is provided here 3 (a graphical representation of the PF procedure is depicted in Fig. 2): are adjusted, where L is a likelihood function defined by the measurement model (3) and the known statistics of v j , while the state-transition probability p(x The weights are normalized in order to sum to 1 a discrete distribution with support points x    Beside the selection of the number of particles N , the most important design choice of the PF procedure is the selection of G. The simplest choice is to set G(x , so that only L is required in (4). Notably, with this choice, y k has no influence on Step 5 of the procedure, reducing in general the robustness of the estimates. A possible alternative is to use a G that considers a preliminary approximation of p(x k |Y k−1 ); this can be achieved, for example, by means of a Kalman Filter (KF) approach [48].
A second critical design issue in the PF procedure is the resampling step [49]. If a large number of particles have their respective weights with very small values w (j) k 1 N , it is necessary that such particles are discarded and re-sampled from the distribution p(x k |Y k−1 ). This is done in order to allow the uninformative particles to contribute again to the estimation. Many resampling strategies have computational cost O(N ), however less resourcedemanding approaches for resampling can be implemented [50].

Adaptation to Gamma Processes
A possible issue affecting the PF problem for system (1)-(3) is related to the nonnegativity of quantities w j . Such issue is related to the fact that an overestimation of the lower limit of the distribution of x k will propagate to all estimates x i with i > k. In such case, lower limits of x i will be overestimated as well, leading to the accumulation of one-sided errors. This issue is formally described in the following proposition. Proposition 1. If, for some k, the posterior distribution p(x k |Y k−1 ) is approximated by a representation with discrete support, the lower limit of the support of p(x k+1 |Y k ) is greater or equal to that of p(x k |Y k−1 ). For a proof of Proposition 1, we refer the interested readers to [46]. A possible way to mitigate the aforementioned propagation error is to employ a fixed-lag smoother [47], where p(x k+1 |Y k+W ) is taken as the basis for future updates instead of p(x k+1 |Y k ), p(x k+1 |Y k+W ) (the integer W denotes a fixed window size). With this approach, the smoothed p(x k+1 |Y k+W ) is generally more accurate and prone to overestimating the lower limit of the distribution, thanks to the increased availability of information.
The fixed-lag smoother can be derived as follows. The augmented state vectorx j := [x j x j+1 . . . x j+W ] is introduced and, similarly,w j ,ṽ j ,ỹ j . By exploiting (1)- (3) it is possible to obtaiñ The augmented-state model described by equations (6)-(7) allows more robustness in the PF approach. In Fig.  4 an example with a synthetic HF is illustrated. It can be observed that larger lag sizes allow enhanced estimation stability (even in presence of outliers) and prevent systematic bias. A final design guideline for the Hidden Gamma Process-PF regards resampling. Conditional resampling has been here implemented to make less likely for a particle to be sampled depending on its distance from the critical edge. While this procedure can lead to the creation of zones where particles are rarely resampled, a mitigated risk of error propagation is achieved.

Regularized Adaptive Filtering
The a-priori knowledge on the HF increment from t j−1 to t j is expressed by the statistics E[w j ] = α(t j − t j−1 )θ and V ar[w j ] = α(t j − t j−1 ): the higher the α, the higher the expected size of the increments. In real world industrial cases, variations in HF may happen quite suddenly with a steep rise of x(t) after flat steady-state behavior (see the application case discussed in Section 7). For this reason, a time-varying shape factor α = α(t) is used here (see Fig. 5), that must be estimated as well by the PF.
Considering the discrete-time nature of (1)-(3), the shape factor can be denoted as α j = α(t j ), and the following hypothesis can be made: Assumption 3. The shape factor α j evolves according to In this perspective, the hyper-parameter λ 2 can be tuned to modify the variability of α j . Indeed, large values of λ 2 lead to quickly varying α j and promptly reactive adapting PF. On the other hand, small values of λ 2 can improve the noise sensitivity at the price of a less responsive PF.
A maximum a posteriori (MAP) estimate for α k is adopted based on a moving window approach. Ifα j := [α j α j+1 . . . α j+W ] , the MAP estimateα k iŝ with x k−W and α k−W set to be equivalent to their point estimates at previous iteration.
Given that the conditional distributions ofỹ k and α k in (9) are Gaussian, the logposterior L = log(p(α k |ỹ k , x k−W , α k−W )) is defined (up to a constant) as where SSR and R are respectively the sum of squared residuals and the sum of squares of δ j , j = k − W, . . . , k.
Equation (10) is typical of regularization methods [51] where a trade off choice between accuracy in fitting the training data and complexity of the estimated function has to be made. In regularization methods, the following family of penalty terms is usually considered: For q = 2, the well-known Ridge Regression (RR) [52] is obtain. The advantage of RR is that it admits a closedform solution. For q = 1, a LASSO-type [53,54] regularization is obtained instead. LASSO provides sparse solutions, an important property that makes LASSO the first choice in many applications over RR, even at the price of not admitting a closed-form solution. Values of q larger than 1 can also be adopted; for instance, q ∈]1, 2[ leads to a penalization region similar to the Elastic Net [55].
The computational cost of this operation for k > W is O(W 2 k) for q = 1 and O(W 3 + W 2 k) for q = 2; optimized approaches [56] are now available for the less frequent case k < W . Remark 2: The discrete-time model with time-varying α k can still be interpreted as the sampled-data version of the continuous time model with time-varying α(t). In fact, p(x(t j+1 )|x(t j )) in (2) depends on the mean value of α(t) in the interval [t j , t j+1 ], and not on its evolution inside the interval. From Property 1, w j = x(t j+1 ) − x(t j ) is Gamma distributed, that is, w j ∼ Γ(ᾱ j , θ) whereᾱ j := tj+1 tj α(t)dt. Then, by setting α j := 1 tj+1−tj tj+1 tj α(t)dt it follows that w j ∼ Γ(α j (t j+1 − t j ), θ) and the discrete-time increment model is obtained.

Implementation notes
The HGP-PF approach proposed in this work can be summarized (for the sequences {t j , y j }, j = 1, . . . , k) as: 1. Initial parameter are selected: the process noiserelated quantities θ and α 0 , initial state distribution P (x 0 ), measurements noise variance σ 2 , the PF design parameters N , W and λ 2 . 2. For j = 1, . . . , k: (a) If (j ≥ W ),α j is updated by solving the regularization problem (Section 4); (b) p(x j |Y j ) is approximated; 3. Predictions and confidence intervals are computed using the newest estimation; It can be shown, given that p(x k+1 |x k ) is Gamma distributed [46], that p(x k+1 |Y k ) is a continuous mixture of Gamma distributions. Therefore p(x k+1 |Y k ) will be approximated by a finite mixture of Gamma distributions since the PF provides an approximation of p(x k |Y k ) with discrete support (see Eq. 5).

Risk Function Evaluation
Once a prediction of the future probability distribution of an observed HF is available, it can be compared with a maintenance threshold to compute and evaluate a risk function (RF) [57] associated with the maintenance operation ( Figure 6). Such a maintenance threshold may be given from process/equipment operating conditions, or inferred from historical, noisy HF data, and it can be time/usage-dependent. In the following, a RF is formally defined and motivated in a maintenance optimization perspective. Furthermore, resorting to supervised classification theory, a method to estimate the maintenance threshold from historical maintenance data is also presented.

RF definition
Let p t k τ := p(x(t k +τ )|Y k ), τ > 0 be the predicted distribution of the observed health factor at time instant t k + τ . According to the paradigm established in the previous sections, p τ is a mixture of Gamma distributions such that is the gamma function. Furthermore, let H(t) be a continuous real function with nonnegative codomain representing the time-dependent threshold for the analyzed HF. It follows that the probability of exceeding the threshold H at time instant t k + τ is Time Health Factor Late maintenance Early maintenance Support Vectors Figure 7: A dataset D along with an example of HF reading (in grey). The optimal threshold H(t) is obtained using SVM (p = 1). It is to be noted that the optimal order of the class separation, p, can be determined by means of Generalized Cross Validation (GCV).
By observing that where γ(·, ·) is the incomplete lower Gamma function, it is possible to compute the risk function The risk function R(t) depends on the choice of the threshold function H(t). Such choice can be done either by exploiting experts knowledge (for instance, a threshold beyond which the machine is known to malfunction) or by analysing historical maintenance data. In the next subsection, classification theory is employed in order to estimate the optimal threshold when such data are available.

Threshold Estimation
Let D be a set of N m maintenance operations where T i is the duration of the i-th production cycle (maintenance to maintenance) and y i is the last observed HF measurement. Furthermore, let s i be an indicator of the effectiveness of the i-th maintenance cycle. Since it is not possible to know what the status is of the maintained component before its replacement, two situations can occur. Let s i = −1 conventionally represent an early replacement (the component is still functional when replaced) and let s i = 1 represent a belated replacement (component replaced after it has broken).
To derive the threshold function H(t) from D, a Support Vector Machine (SVM) approach is hereby proposed. SVM techniques allow to find an optimal nonlinear separation between two categories of data points (if such categories are separable) or, in the soft-margin version, to produce an optimal robust (with respect to data mislabeling) classification. In the following, the focus is set on the estimation of a H(t) represented by a p-th degree polynomial function of t. Lett i = [t i , t 2 i , . . . , t p i ] be the polynomial span of t i up to the p-th degree. Furthermore, The problem of estimating H, according to SVM softmargin theory, can then be seen as the research of a parameter vectorc, defined asc = [c (y) , c (t1) , c (t2) , . . . , c (tp) ] that solves the following problem.
where the cost function J is defined as under the constraints µ i ≥ 0, β i ≥ 0. C is a tuning parameter that can be selected by means of Generalized Cross Validation (GCV).
Problem 2 is the standard representation of the SVM problem with soft margin. Its solution can be obtained by means of a Sequential Minimal Optimization (SMO) approach [58].
The optimal separating hyperplane resulting from the solution of Problem 2 (Figure 7) satisfies the condition The threshold function H(t) is then obtained as By combining (13) and (14), the risk function is then

Use of RF for Maintenance Optimization
The main goal of a maintenance management system is the minimization of the costs associated with failures and maintenance operations. The cost of maintenance operations can be associated with the cost of several factors, such as spare parts, equipment/production downtime, staff performing the interventions, scrap products due to the requalification of the system (a post-maintenance phase in which the process needs to run before being 'stable' or within the production standards). At first glance, minimizing the number of interventions seems to be the natural approach to minimize the aforementioned costs. However, Run-to-Failure (R2F) policies, where maintenances are performed only after a failure has happened, are generally deprecated because the costs related to unexpected failures can be very high. The trade-off between earlystage maintenances (and associated unexploited lifetime UL of the system) and unexpected breaks UB can be optimized by using corresponding agglomerated costs (respectively c UL and c UB ) and the proposed RF. Furthermore, in the proposed approach, reliable predictions can be obtained over given time frames, a relevant feature in applications where timeliness is a critical issue, such as when interventions have to be planned in advance or the monitoring of RF and re-scheduling can be guaranteed only with a fixed time delay.

Simulation Results
To test the proposed methodology, synthetic datasets representing HFs have been created. The generic dataset D has a total of N time series, the i-th time series is defined as where τ i indicates the time instant when the i-th HF crosses a predefined threshold H.
The accuracy of the prediction at a time instant τ i − ∆t is used to assess the performance of the proposed PdM algorithm. The initial filter parameters are chosen via likelihood maximization. A truncated multivariate Normal distribution is used as proposal distribution, that is then sampled through a Gibbs sampler [59]. A Kalman Filter is used to obtain the non-truncated Normal distribution. Although the use of the Kalman filter is justified by model linearity, an Unscented Kalman Filter can also be used [60].    where a i ∼ U(0.005, 0.02), a uniform distribution of support [0.005, 0.02], and b i ∼ U(750, 1000). Gaussian noise v ∼ N (0, 1) is then added to the HF. A set of 5 time series belonging to D (1) is reported in Fig. 8. For this study a fixed maintenance threshold has been set to H = 8. The proposed HGP-PF is compared with a KF-based approach. The KF employed in this Section, and also in the experimental work detailed in Section 7 is formulated as follows. The following linear state-space model is considered:

Sigmoid Data
where  estimated present and past values of the HF, v j ∼ N (0, Q) is the model error, and w j ∼ N (0, R) is the measurement error. v j and w j are supposed to be uncorrelated. Matrices A, C and G are estimated using the N4SID algorithm [61], while the model order n O is chosen according to the Akaike Information Criterion [62]. The Kalman predictor for the 1-step ahead prediction has the classical formulation where P j|j−1 is the variance matrix of the prediction error z j+1|j − z j+1 and it is updated through the discrete Riccati equation. The tuning of Q and R has been done by computing a test on the residuals correlation with e Q,R (j) = y j − Cẑ j|j where the estimationẑ j|j depends on the choice of Q and R. A grid search on different set of values of Q and R has been performed to minimize max σ>0 |RE Q,R (σ)|. The multiple-step ahead prediction that can be easily derived by exploiting 16-18 [20].
In Table 1 the filtering performances of the proposed HGP-PF are reported, and compared to that of a KFbased approach in terms of RMSE. As for prediction accuracy, averaged Risk Functions are reported in Fig. 9 for 2 cases, a 50-step and 100-step ahead predictions. Risk Functions are aligned with respect to the time instant of the fault τ and compared with a ideal function that provides The HGP-PF beats the KF both in terms of RMSE and in PdM accuracy. In particular, it can be appreciated in Fig. 9 how the PF of the HGP-PF is qualitatively closer to R ideal (·) than that of the KF.

Sigmoid Data with Discontinuities
A sigmoid dataset with discontinuities D (2) (with N = 100 time series and support T = [0, 1, . . . , 1500], as before) has been generated as follows: The i-th HF at time t is where a i ∼ U(0.005, 0.02), b i ∼ U(750, 1000) and c i ∼ B(0.01, 1) is a binomial distribution with 1 trial and 0.01 success probability. Gaussian noise v ∼ N (0, 1) is then added to the HF. A set of 5 time series in D (2) is reported in Fig. 10. A fixed maintenance threshold has been set to H = 14. All of the N generated HFs exceed H in T . In Fig. 11 the averaged Risk Functions of PF and Kalman Filter fot the 50-step and 100-step ahead predictions are reported. RMSE performance is given in Table 1. In this case, too, the HGP-PF ouperforms the KF both in terms of RMSE and in PdM accuracy.

Application to predictive maintenance of a dry etching equipment for semiconductor manufacturing
Maintenance optimization strategies based on HFs are receiving increasing attention in the field of semiconductor manufacturing [17,20]. In this Section, the HPG-PF approach developed in the present paper is tested on a PdM problem related to a dry etching equipment. Etching is a fundamental step in semiconductor fabrication that is employed to chemically remove layers from the wafer surface. In some etching tools, the wafer is held on a Electrostatic Chuck (ESC) thanks to electrostatic charge, while a backside helium flow cools down the wafer and prevents problems during the unloading of the product [63]. During this operation, the quartz parts around the ESC (and the ESC itself) undergo the action of aggressive plasma, causing a wearing out that affects both wafer quality and process stability. The intensity of the helium flow intensity in the etching equipment, represented in Fig. 12, reflects the wear of the ESC and can be considered a HF for the degradation problem at hand. The helium flow exhibits a monotonically increasing trend (masked by noise) and it is common practice that, in a Condition-based Maintenance fashion, when a given threshold is exceeded, maintenance operations take place, including the (expensive) replacement of several components. Predicting future values of the HF is fundamental to timely schedule maintenance actions and minimize the trade-off between unnecessary replacements and unexpected breaks. For such reasons, stability and reliability of the predictions are primary needs in the problem at hand, as in general for HF signal employed in PdM.
To test the proposed methodology, a dataset consisting of 17 complete helium flow readings (from maintenance to maintenance) is employed as benchmark. The i-th test series is defined as operation has taken place, and {t j , y j } ni j=1 are the related helium flow readings. Specifically, the accuracy of the prediction at a timet i − ∆t is used to assess the performance of the proposed algorithm.
In Table 1 the filtering performances of the proposed HGP-PF and of a KF in terms of RMSE are reported. As in the simulation studies of previous section, the HGP-PF outperforms the KF. Fig. 13 reports filtering and prediction results based on the data of Fig. 12. In Fig. 3, the predictions distributions are represented with grey and red areas if they are below or above the FCL, respectively. Fig.  14 shows the estimated risk (i.e., the threshold-crossing probability) at time instantt i − ∆t for the actual failure timet i (i.e., R(t i ) computed with all the available information at time instantt i − ∆t). Rather unsurprisingly, the best performances are obtained with the smallest ∆t, namely, 100 working hours (left panel of Figure 14). In this case, only 3 out of 17 estimated risks are below 50%, and only one below 40%. When ∆t is increased to 200 working hours (central panel of Figure 14), the increased uncertainty does not allow to obtain optimal results. For ∆t = 250 working hours (right panel of Figure 14) the phenomenon is even more evident, as the distribution of the risk function assessment is almost flat.
From an operational point of view, the most important feature of the proposed methodology is its capability to precisely assess the fault event probability when the lifecycle of the equipment is coming to an end, thus allowing to schedule an early maintenance operation. With respect to this requirement, the performances presented in Figure  14 are satisfactory. The experimental results presented in this section show that the choice of a proper prediction range is crucial: Indeed, a long range would result in almost uninformative predictions, while a short range would yield extremely precise predictions when the optimal time to adjust the maintenance schedule is already passed. Such trade-off between precision and timeliness is consistent with the characteristics of proposed prediction paradigm.
Finally, the HGP-PF based and KF based PdM policies  have been tested versus a simulated Preventive Maintenance (PvM) tool. PvM is a really popular approach to maintenance that triggers actions based on the amount of time passed from previous maintenance. Here the PvM tool has been simulated by computing a risk factor R PvM on a different set D 0 of N 0 time series from the same distributions of D as follows where Θ(·) is the Heaviside step function. Observe that the Risk Function (21) is computed on the training data and does not depend on current sensor readings. Let ρ U B be the percentage of unexpected breaks and ρ U l the number of unexploited runs. In Fig. 15   performances for fixed values of the ratio c UL c UB and different values of ∆t for the HGP-PF PdM-based and for the PvM-based maintenance management policy. are reported in Table 2. It can be noticed that, for ∆t = 200 and ∆t = 250, the PvM-based policy is more effective than the HGP-PF PdM-based one. It can also be appreciated that, as expected, the more timely a prediction is, the lower the performance is in terms of costs associate with unexpected breaks and unexploited lifetime. However, in a cost-minimization perspective such lower performance could be justified in some real world example by the cost savings associated with timely maintenance planning.

Conclusions and Discussion
In this work, a hidden Gamma process particle-filter approach for health factor has been presented. The proposed approach is well suited for real-world industrial health factors, characterized by monotonic behavior and observed through irregularly sampled and noisy measurements. The proposed approach provides predictions based on a Particle Filter that employs Monte Carlo simulation to approximate the health factor posterior distribution from the aforementioned data. To account for changes in variability of the health factor, an adaptive filtering scheme, based on a regularization approach, has also been proposed. Furthermore, the definition and generation of a proper risk function associated with the model has been discussed.
The proposed approach has been tested on both synthetic data and experimental data coming from a semiconductor manufacturing application. In both cases, the new approach has proved to ensure better performance with respect to those based on Kalman Filtering when applied to the definition of Predictive Maintenance policies. Furthermore, the possibility of calculating the future distribution of the health factor can be used to obtain a quantitative assessment of failure risks.
In Fault Detection and Isolation, model robustness and reliability are crucial issues [64,65]. For this reason, many data-driven approaches, thanks to their simple forms and limited requirements in terms of design and engineering efforts, have become more and more popular both in industry and academia [66]. One of the main advantages of the approach proposed in this work is that it only relies on the simple model assumption that the Health Factor and its increments are nonnegative. As shown in Section 1, such assumption is common and realistic in most industrial/real-world scenarios.
Beside robustness and reliability, current research in Fault Detection and Isolation and Predictive Maintenance is dedicated to the derivation of multi-component, incipient faults [67], and model-free solutions. In the present work the latter 2 aspects have been addressed, while multicomponent problems have not been explored. In fact, in complex, real-world industrial scenarios processes are described by a large number of variables that can be related to a given fault. For this reason, many works on multi-dimensional diagnosis have been presented in the past recent years [66,68,3]. However, in terms of prognosis, the focus of the proposed approach, Health Factors can be traced back to a single variable/quantity, identified through experience/domain expertise, the output of multi-dimensional Soft Sensor, or the residual of a Fault Detection & Identification multi-dimensional procedure. One aspect that has not been tackled in this work and may be subject of future research activities is the case of multiple fault problems that may be associated with multiple Health Factors. While the Hidden-Gamma Process-Particle Filter procedure can still be applied in such cases (prognostic for each Health Factor can be run in parallel by different instances of the proposed approach), a jointly risk function has to be considered to implement Predictive Maintenance policies considering multiple faults.