AI-Run Modern LaTeX Manuscript Workflow and Replication Packet
Description
This record contains a public, reusable workflow note and a sanitized replication packet for an AI-run, human-directed pipeline that modernizes public-domain mathematical manuscripts into source-checkable LaTeX, reader PDFs, translations, and archival release packages.
The packet documents the current practical workflow: scan acquisition and indexing, chunking, TeX transcription, compilation, visual/text auditing, translation passes, Zenodo publication, GitHub mirroring, preservation of provenance artifacts, and current OCR/math-OCR experiments. The 2026-05-31 refresh adds the concrete local document-tooling notes used for public workflow packets: Pandoc, LibreOffice/soffice, Python pdf2image, and Poppler-compatible PDF rendering checks.
The workflow is model-agnostic. It records the observed role of web LLM sessions, Codex-style local automation, Kimi-style large agent swarms, Claude Code-style local repair agents, Python/PyMuPDF tooling, TeX engines, GitHub, and the Zenodo API. OCR tools are described as witnesses only; mathematical fidelity still requires compilation, visual audit, and source checking against scans. The 2026-06-04 refresh adds reusable OCR/math-OCR scripts and lessons, including local GPU/OCR experiment notes and page-crop/formula-crop witness workflows.
Notes
Files
OCR Workflow Scripts and Lessons 20260604.zip
Additional details
Related works
- Is supplement to
- https://zenodo.org/records/20393488 (URL)
- Is supplemented by
- https://github.com/KokunoYumeto/modern-latex-manuscripts (URL)