Khayyam Math: A Voice-Narrated Math Tutor with Multi-Tool Figure Routing and Vision-Audited Generation

Kermani Kolankeh, Arash

doi:10.5281/zenodo.20579983

Published June 7, 2026 | Version v2

Preprint Open

Khayyam Math: A Voice-Narrated Math Tutor with Multi-Tool Figure Routing and Vision-Audited Generation

Kermani Kolankeh, Arash (Researcher)¹

1. Independent

Concept DOI for citation: https://doi.org/10.5281/zenodo.20367146

Visit and try the work at http://www.khayyammath.com/

Repos:

https://github.com/khayyam-math/khayyam-math

https://huggingface.co/khayyam-math/khayyam-math-qwen2.5-7b-v6

Videos Samples:

https://youtu.be/2BDtxlCl7qM

https://youtu.be/IL_JO7H-Fb8

https://youtu.be/2cboozNW9OM

Producing a tutor-quality mathematical figure from a natural-language prompt and synchronising it with a spoken walkthrough is harder than producing either alone. A clean labelled figure is no good if the narration drifts out of time with it, asserts an identity that is mathematically false, or describes elements the figure does not contain; and a single end-to-end LLM call rediscovers, badly, layout problems that mature symbolic tools already solve.

We describe Khayyam Math, an end-to-end tutoring system, live in production at khayyammath.com, that turns a single natural-language prompt into a labelled SVG figure synchronised with a phrase-timed spoken walkthrough the learner can interrupt at any moment. The visualisation pipeline routes each prompt to one of four execution paths (deterministic Python templates, the Graphviz layout engine, a matplotlib backend, or a vision-audited LLM-SVG path with structured-fix retries) and runs a five-tier symbolic verifier (SymPy to Z3 SMT to Lean 4 kernel to per-domain structural checkers to offline Mathlib catalog) that blocks any figure whose narration is provably wrong. The narration pipeline synthesises each phrase with a local neural TTS, reads exact WAV durations to build a phrase-timing manifest, binds the spoken script to the figure by injecting stable element ids when the LLM omits them, and pauses mid-phrase on chat focus, typing, mic activation, or speech input.

On a diverse benchmark, the GPT-4o express backend is the strongest configuration on a blind multimodal-judge protocol. On a 1,000-question Lean-graded production stress test, SymPy alone resolves the majority of emitted claims and the chain has converted approximately forty systemic LLM failure modes into named regression checks. We also report an open negative result on neural layout correction: a GNN delta-predictor and a LayoutDM-style discrete-diffusion denoiser, both trained on tens of thousands of (broken, fixed) layout pairs, fail to outperform the trivial no-op baseline; a small graph-conditioned binary quality classifier used as a re-ranker over a CP-SAT label-placement planner produces a small but consistent lift. Every accepted user turn is captured as a training example for a Qwen2.5-7B LoRA adapter published periodically through huggingface.co/khayyam-math for offline and air-gapped deployment; the production service itself uses GPT-4o.

The complete source code, training corpora, trained adapters, layout-correction models, and live deployment infrastructure are released under the MIT licence.

Files

Khayyam-Math-v2.pdf

Files (1.5 MB)

Name	Size	Download all
Khayyam-Math-v2.pdf md5:26c139fe73d79c3d779086180dfa41a8	1.5 MB	Preview Download

Additional details

Repository URL: https://github.com/khayyam-math/khayyam-math

	All versions	This version
Views	287	69
Downloads	174	62
Data volume	450.3 MB	179.9 MB

Khayyam Math: A Voice-Narrated Math Tutor with Multi-Tool Figure Routing and Vision-Audited Generation

Authors/Creators

Description

Files

Khayyam-Math-v2.pdf

Files (1.5 MB)

Additional details

Software