Measuring Cloud API Usage in a Local-First Hermes Agent Deployment: A Log-Based Comparison of Remote Inference vs. On-Premises ds4
Description
Organizations that run Hermes Agent with a local inference backend still need to know whether auxiliary tasks silently route conversation data to cloud LLM providers. We quantify that exposure on a production deployment where main chat inference uses a local ds4-server endpoint (DeepSeek V4 Flash), while several Hermes auxiliary slots were configured for a commercial remote API. Parsing retained Hermes agent logs (2026-04-12–2026-06-26), we find 227 logged remote API calls totaling roughly ~18 million input and ~0.17 million output tokens. All successful remote chat completions in this window were tagged platform=curator (background skill maintenance). Context compression was configured to prefer the remote provider (32 route attempts, 32 compression events). After binding the main model locally on 2026-06-17, successful remote API calls dropped to 0, while local ds4 calls reached 694. We conclude that cloud APIs were not on the critical path of day-to-day operations and that on-premises ds4-server with DeepSeek V4 Flash is viable for production use; we outline configuration hygiene and a future local-only path for analyzing customer-confidential documents.
Files
cloud-api-traffic_20260626.pdf
Files
(80.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:3ceaba43eba0a1e39ba0d77fecad9d54
|
80.1 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.5281/zenodo.20519019 (DOI)