AI-Driven Engineering Agency: A Multi-Agent Diagnostic Framework with NVIDIA NeMo and DeepSeek-R1 for Safety-Critical Systems
Description
Executive Summary: Engineering Agency Framework
The research introduces Engineering Agency, a paradigm shift in aerospace diagnostics that transitions from traditional rule-based expert systems to adaptive, multi-agent AI architectures. Developed by Frank Morales, this framework leverages the NVIDIA NeMo Agent Toolkit (NAT) and the DeepSeek-R1 reasoning model to perform real-time diagnostic tasks for safety-critical systems, specifically using the Artemis II mission as a primary test case.
The paper detailing this framework, titled "AI-Driven Engineering Agency: A Multi-Agent Diagnostic Framework with NVIDIA NeMo and DeepSeek-R1 for Safety-Critical Systems," has been officially accepted for publication and presentation at the CCECE 2026 conference.
Core Architecture and Coordination
The system utilizes a hierarchical multi-agent structure managed by the NAT Orchestrator. This setup allows for advanced state management, context persistence, and strict safety constraint enforcement. The framework employs three distinct coordination patterns:
-
Sequential Tool Execution: For deterministic diagnostic procedures.
-
Parallel Tool Integration: For time-critical data retrieval and analysis.
-
Hierarchical Agent Delegation: Utilizing specialized sub-agents for domain-specific tasks under supervisor coordination.
Multi-Phase Implementation Methodology
The framework was validated through a five-phase evolution, moving from basic telemetry observation to complex physical modeling:
-
Foundation Establishment: Implementing the basic Reasoning and Acting (ReAct) loop to analyze Environmental Control and Life Support System (ECLSS) telemetry, such as scrubber power and $CO_2$ levels.
-
Configuration Compliance: Enforcing strict YAML-based formatting and validation schemas required for aerospace standards.
-
Dynamic Knowledge Integration: Integrating live internet search to cross-reference real-time mission telemetry with current NASA benchmarks.
-
Stability Enhancement: Implementing comprehensive error handling, parsing retries, and timeout management for complex agentic loops.
-
Physics-Based Computation: Executing constrained code within sandboxed environments to perform orbital mechanics and station-keeping calculations.
Performance and Results
Evaluated across over 500 diagnostic cycles, the framework demonstrated significant improvements over traditional systems:
| Metric | Traditional Systems | Engineering Agency (Agentic) |
| Accuracy | ~95-97% | 99.2% |
| Response Time | 15–30 minutes | 23–32 seconds (Sub-5s in optimized cycles) |
| Adaptability | Limited (Rule-based) | High (Emergent Reasoning) |
| Human Intervention | High | Low (85% reduction) |
| False Positives | 3% to 5% | 0.5% to 1% |
The system achieved 100% parity with manual engineering baselines in complex calculations while maintaining a constant, low memory footprint of approximately 65 MB per agent.
Discussion and Future Directions
The framework successfully diagnosed 47 previously unseen anomaly patterns, proving its capability to handle novel scenarios that traditional rule-based systems cannot. While the reasoning process currently introduces a latency for complex tasks, the technical advantages in adaptability and real-time knowledge fusion outweigh these constraints for next-generation autonomous systems.
Future research will focus on:
-
Real-time Optimization: Reducing latency through predictive caching and hardware acceleration.
-
Enhanced Safety Certification: Developing formal verification methods for autonomous AI decisions in human spaceflight.
-
Multi-Mission Coordination: Extending the framework to manage concurrent missions and distributed fault tolerance.
Files
2690283180-manuscript.pdf
Files
(258.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d1e485e55b57772ff2f54cf9dbec036d
|
215.1 kB | Preview Download |
|
md5:a1be52b266f6236964bd0903869bc279
|
43.3 kB | Preview Download |