Decentralized Hybrid LLM Inference Architectures Under Free-Tier Infrastructure Constraints

Arora, Shivansh

doi:10.5281/zenodo.18466177

Published February 3, 2026 | Version v1.0

Preprint Open

Decentralized Hybrid LLM Inference Architectures Under Free-Tier Infrastructure Constraints

Arora, Shivansh (Researcher)¹

1. High School Student, The Genesis School, India

Access to advanced artificial intelligence systems is increasingly shaped by infrastructure and cost rather than capability alone. While large proprietary models dominate public benchmarks, a growing ecosystem of open-weight models offers an alternative path that prioritizes local control, transparency, and privacy. This paper examines whether such models can be deployed and used meaningfully under free-tier computational constraints.

Rather than proposing new algorithms, this work focuses on system-level analysis and hands-on deployment. Open-weight reasoning models were examined in terms of memory requirements, inference latency, privacy properties, and operational stability when run on single-GPU free-tier instances such as NVIDIA T4 and P100. Particular attention is given to the gap between benchmark-reported performance and what is practically achievable on constrained hardware.

The analysis highlights a clear hardware capability gap: while large distilled reasoning models (e.g., 32B variants) report strong benchmark results, free-tier infrastructure realistically supports only smaller 7B–8B deployments without aggressive quantization and offloading. Experimental observations confirm that these smaller models remain usable for reasoning-oriented tasks, albeit with longer setup times, variable latency, and limited throughput.

The findings suggest that open-weight models can function as privacy-preserving reasoning systems for specific use cases, but they do not replace hosted platforms universally. Instead, they occupy a distinct role shaped by accessibility, user control, and infrastructure limits. A full reference implementation, execution notebook, and empirical runtime evidence are publicly available to support reproducibility.

Files

Decentralized Hybrid LLM Inference Architectures Under Free-Tier Infrastructure Constraints.pdf

Files (669.8 kB)

Name	Size	Download all
Decentralized Hybrid LLM Inference Architectures Under Free-Tier Infrastructure Constraints.pdf md5:b66bc896e69e7a0793a34b2a9eb2168c	669.8 kB	Preview Download

Additional details

Is supplemented by: Software: https://github.com/thedevx-shivansh/free-tier-llm-inference-dhlia (URL); Computational notebook: https://www.kaggle.com/code/shivanshdevx/free-tier-llm-inference-validation (URL)

	All versions	This version
Views	88	88
Downloads	566	566
Data volume	386.5 MB	386.5 MB

Decentralized Hybrid LLM Inference Architectures Under Free-Tier Infrastructure Constraints

Authors/Creators

Description

Files

Decentralized Hybrid LLM Inference Architectures Under Free-Tier Infrastructure Constraints.pdf

Files (669.8 kB)

Additional details

Related works