Beyond Linguistic Decipherment: A Structural Analysis of the Voynich Manuscript as a Diagrammatic Information System (DIS)
Authors/Creators
Description
The Voynich Manuscript (Beinecke MS 408) has resisted linguistic decipherment for over a century. This study reframes the manuscript as a Diagrammatic Information System (DIS), a non-linear technical artifact where text operates as modular, position-bound annotation.
A full-manuscript quantitative analysis establishes a robust architectural signature: an extreme, scale-invariant suppression of lexical continuity between segments ($\bar{J} \approx 0.08$). This pattern remains stable across folio types and segmentation strategies, forming a global structural constraint statistically incompatible with medieval prose and poetry (Cohen’s $d > 2.0$).
The signature is further distinguished from randomized text and persists after controlling for formulaic language. The study provides a falsifiable, constraint-based framework for future research, delineating the manuscript's macro-scale architecture independently of linguistic content and remaining compatible with complementary models of local text generation.
Changes in Version 3.0 — January 6th, 2026
1. Methodological Formalization & Reproducibility
-
Strict Segmentation Protocols: Introduces explicit procedures for Manuscript Line Segmentation and Fixed-length Segmentation ($N=50$ blocks), ensuring the DIS signature is analyzed across multiple scales.
-
Tokenization Standards: Specification of space-delimited units using diplomatic transcriptions, with standardized lowercasing and no lemmatization to ensure experiment replicability.
-
Exclusion Criteria: Formal definition of text isolation (excluding marginalia and labels) to measure the architectural properties of "continuous" text blocks.
2. Enhanced Comparative Controls
-
Scale-Invariance Analysis: Demonstrates that while natural language (Latin/Italian) shows significant scale sensitivity, MS 408 maintains a stable, low Jaccard index ($\bar{J} \approx 0.08$) regardless of segment length.
-
Control for Formulaic Language: Inclusion of "formula-stripped" Latin baselines to confirm that the manuscript's lexical independence is not a byproduct of missing common technical terms.
-
Randomization Baseline: Comparison against globally shuffled corpora to distinguish the manuscript's structural modularity from purely stochastic or randomized word distributions.
3. Theoretical Refinement
-
Defining the DIS Signature: Establishes the systematic and persistent inter-segment lexical independence as the primary quantitative hallmark of a diagrammatically anchored system.
-
Structural Envelope: Positions the DIS as a global architectural constraint that describes the manuscript's organization specifically at the manuscript-wide (macro) level, thereby providing a boundary condition for any local generative mechanism.
Files
Beyond Linguistic Decipherment - DIS v3.pdf
Files
(268.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:dc43b8aafa11c790e8ab3f810bead6b4
|
268.9 kB | Preview Download |