N a g - 1 9 8 2 4 DIAGNOSTICS IN THE EXTENDABLE INTEGRATED SUPPORT ENVIRONMENT (EISE)

EISE is an Air Force developed real-time computer network consisting of conercially available hardware and software components to support systems level integration, modifications and enhancements to weapons systems. The EISE approach offers substantial potential savings by eliminating unique support environments in favor of sharing common modules for the support of operational weapon systems. An expert system is being developed that will help support diagnosing faults in this network. This is a multi-level, multi-expert diagnostic system which uses experiential knowledge relating symptoms to faults and also reasons from structural and functional models of the underlying physical model when experiential reasoning is inadequate. The individual expert systems are orchestrated by a supervisory reasoning controller, a meta-level reasoner which plans the sequence of reasoning steps to solve the given specific problem. The overall system, termed the Diagnostic Executive, accesses systems level performance checks and error reports, and issues remote test procedures to formulate and confirm fault hypotheses.

In general, once a weapon system has been operationally accepted and placed into the Air Force inventory, management responsibility for its support is transferred from the acquiring agency to one of the Air Logistics Centers (ALC) within the Air Force Logistics Command (AFLC).
Aside from the classical logistics management functions, these centers also provide the engineering capability to do systems analysis, modifications of the hardware and software, component level testing and evaluation, and system level integration and test. The primary engineering tool for AFLC weapon system support is the Integration Support Facility (ISF) .
A typical ISF, being a subset of the tools the contractor originally used to develop the system,is useful for supporting just the original system. Because modification is difficult, individual ISFs are developed to support the various models of the same weapon system. Like the weapon systems they To alleviate these problems, the Air Force is developing the Extendable Integration Support Environment, or EISE. The EISE concept consists of common hardware and software modules for common integration support functions. These modules, or building blocks, are logically reconfigurable to provide support for multiple weapon systems within the same environment. The EISE building blocks are off-the-shelf, commercially available items to the greatest extend possible. The building blocks are connected by Ethernet for non-real time requirements, and by a high-speed token passing network during real time simulations. As a result, custom ISFs for each weapon system will no longer be needed, thus reducing support costs through resource sharing, economies of scale for sparing of the individual building blocks, and for facility maintenance contracts.

OVERVIEW OF DIAGNOSTICS APPROACHES
Diagnosing problems associated with accommodating a great variability in the building blocks for EISE on a real time h i g h speed network is expected to be quite difficult. Thus, a complex expert system, named the "Diagnostic Executive", is being developed to ensure the functioning and availability of EISEs. It is an "executive" because it performs high level diagnostics and calls upon diagnostics in the various processors as required to identify and isolate faults in EISE. In order to provide background on the development strategy for the Diagnostic Executive, we first summarize existing approaches to diagnostic expert systems.
The conventional approach to diagnostic expert systems development involves collecting and organizing the knowledge gained through experience by repair technicians, essentially associating a set o f symptoms to the set of faults causing those symptoms. This approach is termed surface, shallow, experiential or empirical reasoning and it works we1 1 where human maintenance experts have accumulated enough experience to provide rules of thumb for most of the probable faults. Early experiential diagnostic systems developed as a flat knowledge base with every rule being scanned at every step of inference. In spite of some weaknesses, this CSRL developed a t t h e Ohio State U n i v e r s i t y (1983,1986) and T h i s "model based reasoning" approach i s sometimes termed "deep reasoning" when reasoning from p h y s i c a l f i r s t p r i n c i p l e s .

The model based reasoning s t r a t e g y f o r d i a g n o s i s involves n o t i n g the d i f f e r e n c e s between t h e expected o u t p u t s as p r e d i c t e d by t h e model and r e a l i t y as measured a t t e s t p o i n t s . C o n n e c t i v i t y i s used t o f i n d t h e components along t h e t o p o l o g i c a l p a t h o f t h e s i g n a l .
The

THE DIAGNOSTIC EXECUTIVE
The approach used i n t h e E I S E D i a g n o s t i c Executive employs a m i x t u r e of t h e approaches described above.  Pau (19861, Havlicsek (1986), McCown andConway (1988), Warn (1988).
The Second, timing problems can occur during the realtime simulation phase, causing problems ranging from bad data to total system shutdown. Isolating these problems is expected to be especially difficult because they can arise from a variety of sources including hardware failures (intermittent or otherwise), errors in the network software, errors in the applications software or configuration errors. Third, faults can result from an incorrect setup or operation of the simulation itself. These faults can occur when an operator does not correctly follow the setup protocols, when an object code file does not get properly downloaded to a processor, or when the incorrect, or old version of object code is downloaded to a processor.
Many of these problems are highly interrelated and may have unpredictable side effects. Error messages resulting when an anomaly finally surfaces and becomes detectable by system checks may be highly unrelated to the cause of the problem. Thus, the diagnostic executive incorporates reasoning approaches to resolve the resulting ambiguities. IMPLEMENTATION The Diagnostic Executive is currently being implemented on a Symbolics 3675 using Intellicorp's suite of knowledge engineering tools. Diagnostics information from each layer of the ESIE network (see Figure 1) is routed to the Diagnostic Executive, which calls the appropriate diagnostic routines and repair actions. The planned structure of the Diagnostic Executive (shown in FIGURE 2.) is a multi-level, multi-expert diagnostic system which uses experiential knowledge relating symptoms to faults and also reasons from structural and functional models of the underlying system. The individual expert systems, termed Reasoners, are orchestrated by a supervisor termed the Reasoning Controller, a meta-level reasoner which plans the sequence of reasoning steps to solve the given specific problem. The Reasoners are integrated via highly structured common working memory managed by the truth maintenance system which keeps track of all relevant facts, deductions, hypotheses and chains of reasoning. A conclusion reached by any reasoner serves to constrain the space of possible causes of difficulty known as the ambiguity group. The constraints from all reasoners are summed in the Ambiguity Group Truth Maintenance System as they are determined. This stepwise summation of constraints, known as propagation of constraints, has the tremendous advantage of limiting the size of search space. The principle is that two simple constraints by separate reasoners can synergistically add to tremendously reduce the size of the search space a third reasoner must look through.
Coordinating the multiple expert systems employed in the Diagnostic Executive is the responsibility of the "Reasoning Controller", a knowledge base that contains information concerning which reasoning strategy is best to employ for each type of diagnostic problem. Upon malfunction, the Reasoning Controller is activated to determine the state of the network, what parts are functioning correctly, and the nature and extent of the problem.
The Reasoning Controller consists of a planner, agenda, scheduler and progress monitor. Using the refinement of skeletal plans technique, the planner matches relevant features of the problem, symptoms, and states of the network or operator requests, and chooses a sequence of applying the Reasoners that best fits the problem at hand. The sequence is placed in the agenda and executed by the scheduler. The progress monitor i s a regularly executed watchdog process which reports information on the current known state of all nodes, processors and functions as available from the current ambiguity group, its current strategy, goals and deductions. In the future, it is expected to compare current progress against established norms and time constraints so that replanning can be directed to the planner when required. Explanations are also available in the form of dependency records and tracings of rule firings.
The typical control strategy is for the Reasoning Controller to first invoke the Event Reasoner to look at current system status data (including reported status of each node and error conditions), and to trace the operator command input history, forward chaining from this information to deduce the estimated status of all processors on the network. Next, the Reasoning Controller invokes the Surface Reasoner, to identify probable fault classes that can explain the observed symptoms (using its experential knowledge). Assuming the fault is not isolated by the Surface Reasoner, the Structural Reasoner is called upon to locate optimal test points in the system, given the malfunctions currently under diagnosis. The test points are chosen both to reduce the number of tests required to isolate a fault and to minimize the cost of performing the test. At this time, the Functional Reasoner can be used to determine expected values of intermediate test points from known good points or known bad points. Differences between the model predictions and the actual responses of the system constitute the symptoms to be used by the model based Reasoners.
The Event Reasoner is the primary diagnostic aid in beginning failure analysis of network startup from power-off condition to full-up, real-time condition. Major event sequences modeled include initialization of the individual computers, communication over a non-real-time network, the downloading of configuration files from servers to the diskless systems, communication over the realtime network, initialization of processors with real-time application data, handshaking and communication in real-time and post simulation activities.
The Event Reasoner uses models of temporal sequences to constrain the search to those portions of the system active during each action. Correct sequences of operations indicate processors and functions which have operated correctly. Thus the recognition of temporal sequences can serve as a landmark to indicate state of the system. The results of time sequences trigger rules to insert facts into the structural and functional dimensions of our ambiguity group truth maintenance system.

E N O I N C E R S
.

OF POOR QUALITY
The Structural Reasoner is a topological based expert system as described in the Diagnostics Overview section.
Each system component is represented as a unit, containing slots for source and destination of data paths. A connectivity tracer uses this information to find the paths from known good test points to known bad test points. After the paths have been found, feedback loops are marked to be opened, and calculation is done to find the point in the path that will yield the most information for the least cost. Included with each component is its failure probability, cost to access, test setup costs, required testing times and information yield. A computation is performed to optimize for least test cost, least time to locate, least technical skill required, or least test equipment required. The criteria to be optimized is passed down from the Reasoning Controller each time the Structural Reasoner is invoked. The result of this process is the optimal test point.
The Functional Reasoner is a model based reasoner as described in the Diagnostics Overview section. It models the transformations which occur as a signal is passed through the component. We have adapted the methodology of Pipitone (1986).
A separate rule is used for every dimension of the signal. Qualitative reasoning (always, sometimes, never, low, OK, high) is used, not quantitative, numeric reasoning. All functionality rules are bi-directional. Computation of expected signal downstream from a known signal is done by forward chaining through any rules (always, sometimes, or never). Backward chaining can also be done from a measured failed testpoint to find components responsible for the erroneous behavior.
After the expected values have been computed, the test is run. Comparison of actual test results with expected test results yields a symptom. The dimensions of the symptom which differ from the predicted value is used by the Functional Reasoner to search for rules of each component along the test path that influence this dimension of the signal. This is done by matching on the functionality rules. Components along the signal path that contain rules that influence the dimension of the signal differing from prediction are the ambiguity set, the set of possible causes of the abnormality. Most of the diagnostic power of the Functional Reasoner comes from chaining on rules describing behavior.
This narrows down the topological search to only those components that affect the behavior of the specific parameter under measurement.
The Ambiguity Group Truth Maintenance System constitutes Level 4 of Figure 2 and overlays the device data base. The deductions and corresponding rule firings are recorded in a set of worlds, or state/dependency graphs, managed by the truth maintenance system.
Most of the intermediary conclusions made by the system are stored in these systems adds, deletes, merges or invalidates worlds explicitly control this process. Facts not fitting I situation graphs or worlds. The truth maintenance I as information is confirmed, contradictions are I found and new hypotheses are generated. Rule firings I into the worlds structure are stored in an unstructured facts list.
The Device Data Base constitutes Level 5 of Figure  2. This data base contains a structural hierarchy o f EISE, from system to node, to individual processors on the node, to cards inside each processor. Boards are the replaceable unit in the EISE system, therefore the structural hierarchy only goes down to the board level. The device data base also contains a functional hierarchy of EISE.
As indicated above, the architecture described here represents our implementation plan. Implementation began early in 1988, and a working version of the Diagnostic Executive for the A-10 EISE will be ready for validation in late 1988. Currently, the Event Reasoner has received the most implementation attention and can handle many of the problems encountered. Portions of the other reasoners have been implemented, but integration of all reasoners as described through the reasoning controller is as yet incomplete.

CONCLUSIONS
The diagnostic executive described here is being built on the A-10 EISE, i.e., the EISE which serves as the Integration Support Facility for the A-10. Other Air Logistics Centers are expected to use EISEs to support their weapon systems. Extending EISE to other weapon systems will not only introduce other configurations and hardware components to diagnose, but operator setup protocols are expected to be more complex and timing problems will be more severe because of greater traffic on the network.
The Diagnostics Executive will substantially decrease the need to employ many high cost troubleshooting experts. In addition, faster and more thorough diagnostics will decrease downtime allowing greater utilization of existing valuable computer network resources.
A third important benefit will be increased availability of the Integrated Support Facility ,a1 lowing faster turnaround time to implement the needed, timely and highly responsive updates to our aircraft systems.