Published June 26, 2026 | Version v1

Measuring Cloud API Usage in a Local-First Hermes Agent Deployment: A Log-Based Comparison of Remote Inference vs. On-Premises ds4

Authors/Creators

  • 1. Elvez, Inc.

Description

Organizations that run Hermes Agent with a local inference backend still need to know whether auxiliary tasks silently route conversation data to cloud LLM providers. We quantify that exposure on a production deployment where main chat inference uses a local ds4-server endpoint (DeepSeek V4 Flash), while several Hermes auxiliary slots were configured for a commercial remote API. Parsing retained Hermes agent logs (2026-04-12–2026-06-26), we find 227 logged remote API calls totaling roughly ~18 million input and ~0.17 million output tokens. All successful remote chat completions in this window were tagged platform=curator (background skill maintenance). Context compression was configured to prefer the remote provider (32 route attempts, 32 compression events). After binding the main model locally on 2026-06-17, successful remote API calls dropped to 0, while local ds4 calls reached 694. We conclude that cloud APIs were not on the critical path of day-to-day operations and that on-premises ds4-server with DeepSeek V4 Flash is viable for production use; we outline configuration hygiene and a future local-only path for analyzing customer-confidential documents.

Files

cloud-api-traffic_20260626.pdf

Files (80.1 kB)

Name Size Download all
md5:3ceaba43eba0a1e39ba0d77fecad9d54
80.1 kB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.5281/zenodo.20519019 (DOI)