The Digital Stoic Library
Description
This dataset contains a computationally enriched edition of two foundational texts of Roman Stoicism: Epictetus’s Enchiridion and Marcus Aurelius’s Meditations. The data was produced using a novel "Deep Matching" framework that bridges the gap between raw Ancient Greek source material and technical English philosophical translation.
The dataset is provided in a cleaned, machine-readable JSON format, suitable for immediate use in thematic lexical analysis, natural language processing (NLP), or educational application development.
Methodology
The data was generated through a four-stage pipeline:
Source Acquisition: Greek text (edition perseus-grc2) was harvested from the Scaife Viewer CTS API.
Contextual Translation: English translations and commentary were generated using the Gemini 3 Pro reasoning model, employing a 150-character sliding context window to maintain narrative continuity across chapter boundaries.
Lexical Lemmatization: Passages were enriched using the Classical Language Toolkit (CLTK) to map 45 core Stoic technical terms (e.g., prohairesis, logos, eph’ hēmin) back to their original Greek lemmas, regardless of grammatical inflection.
Refinement: Automated post-processing was applied to remove model artifacts and ensure structural integrity for production environments.
Contents
enchiridion_final_clean.json: The complete Enchiridion (53 chapters) with Greek text, English translation, Stoic notes, and thematic lexical tags.
meditations_final_clean.json: The complete Meditations (12 books) with Greek text, reflective English translation, commentary, and thematic lexical tags.
Processing Scripts: The suite of Python scripts used for fetching, translating, tagging, and cleaning the data.
Use Cases
Thematic Research: Quantitative analysis of technical Stoic vocabulary distribution.
Educational Tools: Development of interactive readers that highlight the relationship between original Greek and modern translation.
Citation
Moss, W. (2026). Data Paper: The Digital Stoic Library. Knowledge Commons. https://doi.org/10.5281/zenodo.18273320
Files
discourses_final_clean.json
Additional details
Dates
- Created
-
2026-01-15