There is a newer version of the record available.

Published June 4, 2026 | Version 2026-06-04 workflow packet with OCR/math-OCR scripts, document tooling, and current publication workflow

AI-Run Modern LaTeX Manuscript Workflow and Replication Packet

  • 1. Independent

Description

This record contains a public, reusable workflow note and a sanitized replication packet for an AI-run, human-directed pipeline that modernizes public-domain mathematical manuscripts into source-checkable LaTeX, reader PDFs, translations, and archival release packages.

The packet documents the current practical workflow: scan acquisition and indexing, chunking, TeX transcription, compilation, visual/text auditing, translation passes, Zenodo publication, GitHub mirroring, preservation of provenance artifacts, and current OCR/math-OCR experiments. The 2026-05-31 refresh adds the concrete local document-tooling notes used for public workflow packets: Pandoc, LibreOffice/soffice, Python pdf2image, and Poppler-compatible PDF rendering checks.

The workflow is model-agnostic. It records the observed role of web LLM sessions, Codex-style local automation, Kimi-style large agent swarms, Claude Code-style local repair agents, Python/PyMuPDF tooling, TeX engines, GitHub, and the Zenodo API. OCR tools are described as witnesses only; mathematical fidelity still requires compilation, visual audit, and source checking against scans. The 2026-06-04 refresh adds reusable OCR/math-OCR scripts and lessons, including local GPU/OCR experiment notes and page-crop/formula-crop witness workflows.

Notes

Companion workflow record for the Modern LaTeX Editions of Public-Domain Mathematics Manuscripts archive. No private local paths or personal maintainer names are intentionally included.

Files

OCR Workflow Scripts and Lessons 20260604.zip

Files (170.6 kB)

Additional details