Khayyam Math: A Voice-Narrated Math Tutor with Multi-Tool Figure Routing and Vision-Audited Generation
Description
Concept DOI for citation: https://doi.org/10.5281/zenodo.20367146
Visit and try the work at http://www.khayyammath.com/
Repos:
https://github.com/khayyam-math/khayyam-math
https://huggingface.co/khayyam-math/khayyam-math-qwen2.5-7b-v6
Videos Samples:
https://youtu.be/2BDtxlCl7qM
https://youtu.be/IL_JO7H-Fb8
https://youtu.be/2cboozNW9OM
Producing a tutor-quality mathematical figure from a natural-language prompt and synchronising it with a spoken walkthrough is harder than producing either alone. A clean labelled figure is no good if the narration drifts out of time with it, asserts an identity that is mathematically false, or describes elements the figure does not contain; and a single end-to-end LLM call rediscovers, badly, layout problems that mature symbolic tools already solve.
We describe Khayyam Math, an end-to-end tutoring system, live in production at khayyammath.com, that turns a single natural-language prompt into a labelled SVG figure synchronised with a phrase-timed spoken walkthrough the learner can interrupt at any moment. The visualisation pipeline routes each prompt to one of four execution paths (deterministic Python templates, the Graphviz layout engine, a matplotlib backend, or a vision-audited LLM-SVG path with structured-fix retries) and runs a five-tier symbolic verifier (SymPy to Z3 SMT to Lean 4 kernel to per-domain structural checkers to offline Mathlib catalog) that blocks any figure whose narration is provably wrong. The narration pipeline synthesises each phrase with a local neural TTS, reads exact WAV durations to build a phrase-timing manifest, binds the spoken script to the figure by injecting stable element ids when the LLM omits them, and pauses mid-phrase on chat focus, typing, mic activation, or speech input.
On a diverse benchmark, the GPT-4o express backend is the strongest configuration on a blind multimodal-judge protocol. On a 1,000-question Lean-graded production stress test, SymPy alone resolves the majority of emitted claims and the chain has converted approximately forty systemic LLM failure modes into named regression checks. We also report an open negative result on neural layout correction: a GNN delta-predictor and a LayoutDM-style discrete-diffusion denoiser, both trained on tens of thousands of (broken, fixed) layout pairs, fail to outperform the trivial no-op baseline; a small graph-conditioned binary quality classifier used as a re-ranker over a CP-SAT label-placement planner produces a small but consistent lift. Every accepted user turn is captured as a training example for a Qwen2.5-7B LoRA adapter published periodically through huggingface.co/khayyam-math for offline and air-gapped deployment; the production service itself uses GPT-4o.
The complete source code, training corpora, trained adapters, layout-correction models, and live deployment infrastructure are released under the MIT licence.
Files
Khayyam-Math-v2.pdf
Files
(1.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:26c139fe73d79c3d779086180dfa41a8
|
1.5 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/khayyam-math/khayyam-math