Topic-Aware Inference Boost: A Fast Microservice Architecture for Reducing Large Language Model Hallucinations

GulveSehgal, Gitanjali

doi:10.5281/zenodo.18819455

Published October 20, 2025 | Version v4

Preprint Open

Topic-Aware Inference Boost: A Fast Microservice Architecture for Reducing Large Language Model Hallucinations

GulveSehgal, Gitanjali (Researcher)¹

1. Founder - Gigi Sehgal LLC

Large language models (LLMs) often hallucinate—producing plausible but inaccurate responses—particularly when misjudging their own confidence [arXiv:2401.01313].

This paper introduces Topic-Aware Inference Boost, a modular microservice architecture designed to mitigate hallucinations through rapid, topic-specific inference augmentation. The system delivers just-in-time expert-level responses from curated subject-matter-expert (SME) models through a lightweight API, without requiring retraining or prompt engineering. The prototype demonstrates end-to-end latency of 1 to 7 seconds on standard CPUs with over 90 % inference quality for multiple domain tasks. By decoupling topic specialization from monolithic LLMs, this solution enables any client model to enhance its reliability through targeted grounding. Phase 2 will extend the framework to allow models to self-evaluate confidence and selectively invoke this solution for low-confidence inferences, maintaining real-time performance and high accuracy.

Note To Readers This document, formerly titled "InferBoost," has been renamed to Topic-Aware Inference Boost to improve technical clarity and to disambiguate the research from external websites currently utilizing the "InferBoost" term. The underlying architecture, topic-identification methodology, and performance metrics remain unchanged.

Files

TopicAwareInferenceBoost.pdf

Files (64.1 kB)

Name	Size	Download all
TopicAwareInferenceBoost.pdf md5:d07d58b75433cfa1133c05e2c73e16e6	64.1 kB	Preview Download

Additional details

Is new version of: Preprint: 10.5281/zenodo.17429009 (DOI)

Updated: 2025-10-23

Updated title only for technical clarity and disambiguation

	All versions	This version
Views	781	16
Downloads	463	0
Data volume	34.9 MB	0 Bytes

TopicAwareInferenceBoost.pdf

Files (64.1 kB)

Related works

Dates

Topic-Aware Inference Boost: A Fast Microservice Architecture for Reducing Large Language Model Hallucinations

Authors/Creators

Description

Files

TopicAwareInferenceBoost.pdf

Files (64.1 kB)

Additional details

Related works

Dates