AMSAA maturity projection model based on Stein estimation

Assessment methods are presented for projecting the impact of corrective actions, or fixes, on system reliability. The presented approach to reliability projection, referred to as AMPM-Stein, addresses the case where all fixes are delayed to the end of the current test phase. The assessment procedure allows for a unified treatment of failure modes. The procedure when applied in this fashion only distinguishes for estimation purposes between modes for which a fix would be attempted if surfaced (B-modes) and those that would not be addressed with a fix even if surfaced (A-modes) through assigned positive and zero fix effectiveness factors (FEFs) for surfaced B-modes and A-modes, respectively. A projection procedure that can treat failure modes in such a unified manner is necessary in cases where it is not realistic to divide the potential failure modes into an inherent set of A-modes and B-modes. In many instances A-modes are not inherent. Modes could be reclassified from A to B due to repeat occurrences and the necessity of meeting a reliability requirement, additional information, or changes in the level of resources available for corrective action. The AMPM-Stein procedure can also be applied to the case addressed by widely used projection methods for which there is an assumed inherent division between potential failure modes with respect to A-mode/B-mode categorization. Current simulation results indicate that greater accuracy with regard to the MTBF estimates can be achieved by the AMPM-Stein approach even when such an A-mode/B-mode split is valid. In particular, these simulation results indicate that the AMPM-Stein MTBF projections obtained for this case tend to be more accurate than those obtained from the reliability projection method adopted as a standard by the International Electrotechnical Commission. Also the simulations indicate that the AMPM-Stein MTBF assessments based on simple closed form parameter estimators obtained from the method of moments are almost as accurate as the AMPM-Stein assessments that utilize the more computationally intensive maximum likelihood estimators. The AMPM-Stein approach has several additional appealing characteristics. In contrast to current projection methods, the approach precludes the need to assess the arithmetic average of the fix effectiveness factor values that would be realized if all B-modes were to be surfaced. It only requires assessment of FEFs for the surfaced modes that will be mitigated. In addition, the AMPM-Stein approach naturally leads to an expression for the portion of the mitigated system failure rate due to the unsurfaced modes (or unsurfaced B-modes for two classifications). Thus no functional form for this failure rate need be assumed.


Reliability Projection Concepts
Initial prototypes of complex systems incorporating new, and often unproven, technological advances will inevitably possess reliability defects. The same is true even for prototypes that consist of the integration of existing systems. These known, and often unknown, defects are identified and examined in developmental and operational testing of the system, and are referred to as failure modes. Failure modes are the root causes of potential reliability deficiencies in a system. Failure modes are typically unforeseen problems, and have associated failure mechanisms. Corrective actions, or fixes, are measures which are taken to address these problem failure modes. More specifically, corrective actions alter the design, maintenance and operational procedures, or manufacturing process of an item for the purpose of improving its reliability. To model the reliability improvement resulting from a corrective action, we use Fix Effectiveness Factors (FEFs). A fix effectiveness factor is the expected fraction reduction in initial mode failure rate due to corrective action. Corrective actions are typically reserved for failure modes which exhibit one or more failures during testing, which are referred to as surfaced modes, or observed modes. Unsurfaced modes, or unobserved modes, are failure modes which did not trigger a failure during the test phase.
Reliability growth projection is the process of assessing the reliability of a system which can be anticipated due to implementation of corrective actions to surfaced failure modes. Reliability projections are based on the test data to date, as well as engineering assessments of the effectiveness of planned or implemented corrective actions.
Current projection methods ( [1], [2]) distinguish between failure modes that will be addressed by a fix if observed (Bmodes) and those that will not be fixed (A-modes). Typical reasons why a surfaced mode might not be fixed include: the fix may not be economically justifiable, the surfaced mode may be related to Government Furnished Equipment (GFE), or a Commercial-Off-The-Shelf (COTS) item (whose failure rate is known and/or accepted), or the diagnosis of the underlying failure mechanism may be unclear. The projected system reliability depends upon this mode classification. In particular, the reliability projection depends on estimates of the rate of occurrence of B-modes based on B-mode first occurrence times, and of the presumed constant A-mode failure rate. It is desirable and more natural to base the projection of system reliability on estimates that do not depend on such a classification scheme, but merely on failure mode test phase data, and the assessed effectiveness of the failure mode corrective actions. Such a projection would only distinguish between failure modes on the basis of their assigned FEF. Here, A-modes would be assigned a zero FEF. For situations where fixes to surfaced failure modes will be delayed until the conclusion of the test phase, such a projection method would be helpful in conducting a reliability versus cost tradeoff analysis with regard to deciding which modes to fix.

Study Overview
In this paper we shall present a projection method that can treat failure modes in a unified manner for the case where all corrective actions are delayed until the end of the test phase. A unique characteristic of the projection methodology is that the estimation procedure is based on a Stein shrinkage estimator. For complex systems, the estimation procedure is shown to approximate an estimator that satisfies an optimal expected squared-error loss criterion associated with Stein estimation for the vector of unknown mode initial failure rates. The Stein optimality criterion leads to a particular functional form for the rate of occurrence of new failure modes. This functional form is compared to the B-mode rate of occurrence functions currently utilized in several projection models. The procedure does not require one to distinguish between failure modes that will not be corrected if surfaced (A-Modes) and those that would receive a corrective action if observed (B-Modes). Such a unified treatment of failure modes avoids the frequently unrealistic assumption that there are two inherent types of potential failure modes a priori. Often failure modes are switched from A-Modes to B-Modes during a development program due to the necessity of meeting a requirement or due to additional funding or information that allows a mode to be addressed. Several estimation procedures are presented for assessing the Stein shrinkage factor utilized by the reliability projection method. The accuracy of the presented projection procedures is compared against the IEC standard projection model.

The AMPM based on Stein Estimation
The U.S. Army Materiel Systems Analysis Activity (AMSAA) has recently developed a new reliability growth projection model. The new model is closely related to the current AMSAA Maturity Projection Model (AMPM) [2]. This new model was developed for making reliability projections based on Stein estimation of mode initial failure rates in the case where the test duration is measured in a continuous fashion, and where all corrective actions are deferred until the end of the test phase. The motivation for developing the new model was to obtain a potentially more accurate reliability projection based on the number of surfaced modes and failures for each mode. This is done by minimizing the expected sum of squared errors for the mode initial failure rates. Estimates for unknown parameters in the Stein approach are obtained by treating the mode initial failure rates as a realization of a random sample from a gamma distribution as is done in the AMPM approach. Thus, the new model is referred to as AMPM-Stein.

Notable Features of AMPM-Stein
The model allows for a unified treatment of failure modes. This permits one to conduct a trade-off analysis between reliability improvement and incremental cost since failure modes are only distinguished through their FEFs. A second feature of AMPM-Stein is that all mode failures are utilized in estimating model parameters. Third, the new model only requires assessing FEFs associated with the surfaced modes -there is no need to consider FEFs for unsurfaced modes. In particular, AMPM-Stein does not need an assessment of the an average FEF for all the modes that would receive corrective action if surfaced. Finally, AMPM-Stein avoids inaccuracies in assessments that can arise in projection methods which utilize A-mode and B-mode classification. Such inaccuracies can occur if modes initially considered A-modes are switched to B-modes. This can happen for a variety of reasons. A few reasons include repetition of A-mode failures, a more accurate diagnosis of a failure mode, and increased funding. Such events could motivate management to implement a fix, in which case, Amodes could be reclassified as B-modes.

Differences in Technical Approach
The AMPM-Stein approach does not require one to distinguish between A-modes and B-modes other than through the assignment of zero and positive FEFs, respectively. Also, only FEFs associated with the surfaced modes need be referenced. In particular, unlike the methods in [1] and [2], no estimate of the arithmetic average of all the FEFs, that would be realized if all the B-modes were surfaced, is required. Another significant difference between the Stein approach and the other methods is that the Stein projection is a direct assessment of the realized system failure rate after failure mode mitigation. The approaches [3], [1], and [2] indirectly attempt to assess the realized system reliability by estimating the expected value of the mitigated system probability of failure or system failure rate, Reference [3] proceeds in a similar fashion for one-shot systems.

Stein Approach to Projection using One Classification
Assume the system has k > 1 potential failure modes that have initial failure rates k λ λ ,..., 1 . It is assumed the modes independently generate failures and that the system fails whenever a failure mode occurs. It is also assumed that corrective actions do not spawn new failure modes and that all fixes are incorporated into the system at the end of a test period of duration T hours, or miles.
Let i N denote the number of failures encountered for mode i that occur during the test. The standard Maximum , of multidimensional parameters that satisfy such an optimality criterion were considered in [4]. After some detailed calculation, one can show

The Stein estimators for
The unique value of θ that minimizes (4) is S θ , and After mitigation of the failure modes surfaced during the test period [0, T], the realized system failure rate is Let m denote the number of surfaced modes during [0, T]. Then by (7) and (8), The Stein projection cannot be directly calculated from the data for a set of * However, approximations to the Stein projection can be obtained that can be calculated from the test data and the assessed FEFs.

Stein Approach to Projection using Two Classifications
One can also use the Stein projection approach with two failure mode classifications as is done for the AMSAA-Crow and AMPM models. Strictly speaking, such an application of these models demands that there are a priori ground rules for classifying observed modes into A or B-modes which do not become reclassified. The Stein projection for the two failure mode classification case is given by

Failure Rate due to Unobserved Modes for large k
The term ∑

AMPM-Stein Approximations using MLEs and MMEs
As shown in the previous section, the Stein projection depends on the unknown constants k, λ , and ] [ i Var λ . We shall now consider an approximation to the Stein projection obtained for a given k and for when k is unknown but large. To obtain the approximations we assume k λ λ ,..., 1 is a realization of a random sample from a gamma distribution with density function ) (x f and that at least one failure mode has a repeat. One can use the data i N and m to obtain estimates for α and β based on marginal maximum likelihood estimates (MLEs) or based on marginal method of moments estimates (MMEs). These methods are presented in [5]. Approximations to the Stein shrinkage factor S θ can be obtained from these estimates. Let Thus, using MLEs, we shall approximate λ by k λˆ and . Based on (5), using MLEs, we approximate S θ by, For large k, we approximate S θ by, We could also approximate S θ using the MMEs by replacing k βˆ in (19) by k β to obtain the approximation Likewise, a large k approximation for S θ based on MMEs, ∞ ,S θ , is obtained by replacing ∞ βˆ in (20) by ∞ β . From the MLE equations (7.171) and (7.172) in [5] one can show, where the inner sum is zero for 1 = Likewise, proceeding as in Section 7.7.1 of [5] it can be shown that The large k AMPM-Stein projection based on ∞ ,S θ is,

MTBF Projections
Simulations were run in Mathematica to investigate the accuracy of the Stein MTBF projections ). The accuracy of the procedures were investigated for two mode classifications, and where modes are only differentiated via the mode FEFs (referred to as one classification). The accuracy of these procedures for MTBF assessment was compared to the accuracy of the IEC standard [6] for reliability projection (the AMSAA-Crow model). Since all these projection methods incorporate the * i d in the same manner, * i d is set equal to i d in the simulation.
The simulation consists of a number of steps. These steps include: 1. specifying distribution type and parameters; 2. generating A k and B k A-mode and B-mode failure rates, respectively; 3. calculating A-mode and B-mode first occurrence times; 4. generating B-mode fix effectiveness factors from a beta distribution with shape parameters equal to 19.2 and 4.8. These parameters yield a mean and coefficient of variation of 0.8 and 0.1, respectively. The A-mode fix effectiveness factors are set equal to zero, since these modes are not fixed; 5. calculating a sequence of failure times for each failure mode; 6. calculating the MTBF projections discussed above; 7. reclassifying repeat A-modes to B-modes. The associated fix effectiveness factors, are changed from zero to a random number generated by the beta distribution discussed in step 4; and 8. repeating step 6 to calculate the projections after reclassification.
The presented simulation runs perform the steps above for a test phase of length 3,000 hours. This process is replicated 1,000 times. In these runs, 200 A-modes and 500 B-mode failure rates where generated from a gamma distribution with shape and scale parameters 6667 . 0 = α and -4 10 2.000 ⋅ = β , respectively. There were 54 and 134 average number of surfaced A, and B-modes over the 1,000 replications, respectively. These parameter values were held constant for all replications. However, new sets of initial mode failure rates, and FEFs were generated for each replication.

Accuracy Results
For every replication, a simulated failure history was generated for each of the failure modes over the specified test period (T = 3,000 hours for the displayed results). Table 1 displays the average actual and assessed mitigated system MTBF values over the 1,000 replications for each of the following cases: 1. assuming an inherent set of A and B-modes. The versions of the Stein, AMPM-Stein, and the AMSAA-Crow models used to generate the assessed MTBFs displayed in Table 1 assumes such a mode split -labeled 2C. for the simulated A-modes. 3. reclassification using one classification. Each generated Amode was reclassified to a B-mode if one or more repeat failures of the mode occurred during the simulated test. The averages for the actual and assessed MTBFs are displayed in Table 1 for the Stein, AMPM-Stein methods, and the AMSAA-Crow model -labeled 1CR. The appropriate variants of the Stein, AMPM-Stein and AMSAA-Crow methods for the one classification case are applied to obtain the assessed MTBFs before and after reclassification (17 modes on average were reclassified).
Simulation accuracy results, based on the absolute error between the actual MTBF and the assessed MTBF, indicate the Stein and AMPM-Stein approximations compare favorably to the AMSAA-Crow method. Figure 1 shows three pie charts.
The pie charts are one-on-one comparisons between two estimation methods. For example, the pie chart on the left side of Figure 1 compares the MTBF accuracy between the AMPM-Stein MLE (with an infinite number of modes) to that of the AMSAA-Crow model. The distribution is shown below the pie chart. For the pie chart on the left, the AMPM-Stein MLE (with an infinite number of modes) provided a more accurate MTBF projection than the AMSAA-Crow model in 73.5% of the 1,000 tests that were replicated. Figure 2 below shows similar results in the one classification case after repeat A-modes were reclassified as B-modes.  AMPM-Stein procedures for finite k and large k (i.e. ∞ → k ) based on the MLE and MME shrinkage estimators produced MTBF assessments that were close to each other. In practice, the number of modes, k, is not known and could be difficult to estimate. However, these results (and underlying theory) indicate that for complex systems, one does not need to assess k. The displayed simulation results also indicate that the AMPM-Stein assessments from the simple closed form MME estimates are only slightly less accurate than the MTBF assessments based on the more computationally intensive MLE estimates. For the displayed simulation results, prior to reclassification, the accuracy of the assessed MTBFs for the Stein, AMPM-Stein and AMSAA-Crow procedures for one classification were comparable to the achieved accuracy of the corresponding procedures that address the two classification case. Additional simulation runs were performed for the cases where the mode initial failure rates were generated from a Weibull distribution. Even though the AMPM-Stein MLE and MME estimation procedures for the unknown Stein parameters assume the i λ are realizations from a gamma distribution, all the comments for the previous tables concerning accuracy still apply for these simulation runs. Comparable results were also attained when the i λ were generated from a lognormal distribution. This perhaps indicates that the true MTBF and AMPM-Stein assessed MTBFs are not strongly affected by the tails of the mode failure rate distribution.