Benchmarking Multi-Agent LLM Architectures for Home Energy Management: Real-World Tariff Validation and Cross-Model Cost-Efficiency Analysis

Sulmataj, Besnik

doi:10.5281/zenodo.19074522

Published March 17, 2026 | Version v1

Journal article Open

Benchmarking Multi-Agent LLM Architectures for Home Energy Management: Real-World Tariff Validation and Cross-Model Cost-Efficiency Analysis

Sulmataj, Besnik¹

1. Westcliff University

Multi-agent large language model (LLM) systems have recently been proposed for home energy management systems (HEMS), but prior work has largely evaluated a single backend or a single market context. This paper benchmarks four LLMs spanning self-hosted, low-cost, mid-tier, and frontier deployment classes (Llama 4 Maverick, DeepSeek-V3, GPT-4.1, Claude Sonnet 4.6) across three US utility-linked tariff profiles, three household archetypes, and three random seeds in 108 total 7-day simulations.

Three of the four tested models — DeepSeek-V3 (37.8%), GPT-4.1 (44.8%), and Claude Sonnet 4.6 (49.3%) — achieve statistically equivalent mean energy cost reductions above 20% versus an unmanaged baseline (p > 0.10 for all pairwise comparisons, n = 27 per group), while Llama 4 Maverick (17.4%) is a significant underperformer (p < 0.001, Cohen's d > 1.0). Because frontier-tier savings are statistically indistinguishable, the deployment decision reduces to API cost and latency: DeepSeek-V3 delivers equivalent savings at an estimated $0.005/day — a 7.5× lower daily API cost than GPT-4.1 and 17× lower than Claude Sonnet 4.6. Tariff structure complexity emerges as a stronger model differentiator than household size alone.

Files

Benchmarking Multi-Agent LLM Architectures for Home Energy Management.pdf

Files (360.0 kB)

Name	Size	Download all
Benchmarking Multi-Agent LLM Architectures for Home Energy Management.pdf md5:203fcb8a88cb09ce26072d16aebbbc7f	360.0 kB	Preview Download

	All versions	This version
Views	59	59
Downloads	40	40
Data volume	19.8 MB	19.8 MB

Benchmarking Multi-Agent LLM Architectures for Home Energy Management: Real-World Tariff Validation and Cross-Model Cost-Efficiency Analysis

Authors/Creators

Description

Files

Benchmarking Multi-Agent LLM Architectures for Home Energy Management.pdf

Files (360.0 kB)