Published August 26, 2025 | Version v1.0
Report Open

Production-Ready AI Inference for Healthcare with Triton, FastAPI, and Kubernetes

  • 1. Principal member of Technical Staff at AMD

Description

This work presents a production-ready AI inference architecture for healthcare and pharmaceutical applications, designed to address the stringent requirements of scalability, compliance, and reliability. The system integrates:

  • FastAPI Gateway for authentication, request validation, and routing

  • Optional NLP/CV Preprocessor as an independent Kubernetes microservice for PHI de-identification and multimodal data handling

  • Triton Inference Server for serving ONNX/TorchScript models at scale

  • Model Registry + CI/CD with GitHub Actions for automated deployment and model versioning

  • Kubernetes (k8s.yaml, hpa.yaml, preprocessor.yaml) for deployment, scaling, and orchestration

  • Observability with Prometheus + Grafana for monitoring latency, throughput, and failures

  • Security & Compliance as outlined in SECURITY.md, including TLS, OAuth2/JWT, structured audit logs, and HIPAA-aligned controls

Key features include horizontal pod autoscaling, self-healing with readiness/liveness probes, rollback and promotion strategies for safe model lifecycle management, and support for both NLP (clinical notes) and CV (medical imaging) pipelines. The architecture is illustrated in architecture.png and is validated through modular YAML and curl snippets that demonstrate deployment and inference in real environments.

This publication is intended to serve as a reference architecture for practitioners, researchers, and engineers deploying AI systems in sensitive clinical contexts. By releasing the design openly, it encourages reuse, adaptation, and further validation in production environments requiring both technical performance and regulatory compliance.

Files

architecture (1).png

Files (282.3 kB)

Name Size Download all
md5:523dfa88b95f9313e8675c5865332cc5
106.3 kB Preview Download
md5:a76d8b73180e5e2e9e3d1baa04ffafc2
176.1 kB Preview Download

Additional details

Software

References