Published April 1, 2026
| Version 5.0
Working paper
Open
Low-Redundancy Text, High-Redundancy System: Evidence for Cross-Modal Encoding in the Voynich Manuscript
Authors/Creators
Description
This paper presents a falsifiable structural model of the Voynich manuscript (Beinecke MS 408), based on computational analysis of the complete ZL IVTFF 2b transcription (36,234 tokens, 226 folios, 8 sections). Rather than attempting to identify an underlying natural language, the study asks what kind of system the manuscript implements, and answers through holdout-validated formal analysis, independent unsupervised confirmation, and cross-modal testing against the manuscript's illustrations. The model establishes five principal findings. First, a four-layer morphological grammar classifies 91-97% of tokens across six stratified holdout blocks spanning five manuscript sections, three or more scribal hands, and both Currier languages, with zero stacking-order violations in any block and no parameter adjustment after model freeze. Second, the invariant formal system is deployed in at least six distinct compositional regimes -- loop-based prose, topic-dominant chaining, nominal labelling, weakened-loop variant, closure-weighted operational mode, and balanced connective mode -- varying systematically by section and hand. Two regimes were discovered only upon unsealing the sealed reserve holdout, demonstrating that the taxonomy expands under evaluation. Third, discourse-framing density in text predicts visual complexity of herbal illustrations (Spearman rho = 0.600, p < 0.0001, n = 43), confirmed by pre-registered holdout with minimal attenuation. At the label level, specific morphemes predict specific plant features across five independent visual channels, and morpheme bundles predict multi-feature plant profiles compositionally (LOO AUC p = 0.0006). Fourth, a 17-mapping codebook decodes plant architecture from herbal labels at 58.5% accuracy across 72 folios and is bidirectional: image features recover label morpheme sets above chance (p < 0.0001), with forward-greater-than-inverse asymmetry diagnostic of selective encoding rather than cipher. Labels and prose perform complementary, load-balanced functions confirmed by an adaptive compensation mechanism (rho = -0.337, p = 0.011). Fifth, the system meets 8 of 10 criteria for restricted technical notation while failing the criterion most diagnostic of natural language: lexical recoverability. These findings are independently triangulated: a rule-based grammar, holdout replication across two evaluation stages, and unsupervised HMM recovery of grammar classes from suffix sequences alone (NMI = 0.181, entity purity 0.53) converge on the same structural conclusions. The architecture is inconsistent with simple cipher, random generation, hoax, or classical mnemonic systems. The study also situates the manuscript within the documented manuscript ecology of the eastern Mediterranean, presenting quantitative visual comparisons against six comparator manuscript traditions. The herbal section aligns closely with early encyclopedic Qazwini copies (Euclidean distance 2.37), while the zodiac section occupies a distinct visual regime matching no tested tradition, combining Latin computational diagram architecture with Byzantine Greek medico-astrological content and a unique figurative encoding system. The manuscript is best understood as a structured, sectionally differentiated technical system with partially recoverable semantics -- structurally technical but lexically local. Its grammar is real and invariant. Its regimes are real and section-specific. Its text and images interact. Its labels carry structured semantic content. What it does not yield through structural analysis alone is a reading. This deposit includes the pre-submission draft (v5.0), analysis scripts, data files, and figures.Files
all_visual_scores.json
Files
(3.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a6c8199069c8ff5cb2b29bdfb069f141
|
12.1 kB | Preview Download |
|
md5:f68b20c977a799c9f9100ddb7e725ab1
|
5.5 kB | Preview Download |
|
md5:35fa2dbe73277ef9228ef5930555df63
|
4.1 kB | Preview Download |
|
md5:6fb129c8d17f12f9d133019b3b287afd
|
5.0 kB | Preview Download |
|
md5:94c1a6649607313412124c837f1c7f3e
|
3.2 kB | Preview Download |
|
md5:eb4d99e4c0b4701b0acc51e88b4e8ed0
|
7.3 kB | Preview Download |
|
md5:f8055f6728837bdd201d44891a247077
|
5.0 kB | Preview Download |
|
md5:bc6ed0e76eaf03fca4031980608d1687
|
562 Bytes | Preview Download |
|
md5:0dda52ab9d2994b8358e371bd3a121e6
|
3.9 kB | Preview Download |
|
md5:0eb1dbf16b589d523da47810279b1773
|
95.6 kB | Preview Download |
|
md5:4f510abc893320c69dbc5f6df3ffa3ee
|
5.4 kB | Preview Download |
|
md5:d91e72510c53882e26caeda324eb7ea1
|
50.3 kB | Preview Download |
|
md5:851f350e1db1a9c00e59de6aa221a16a
|
3.5 MB | Preview Download |
|
md5:31b54b1e57128a08b87d83ea5d007190
|
7.2 kB | Preview Download |
|
md5:39d62fed7350f2911c92cc2abea5bcce
|
6.8 kB | Preview Download |
|
md5:9d1c07a01e9a3b647d309fcaef1c4c54
|
4.5 kB | Preview Download |
|
md5:38b1ab24621d50a7b368071bf82d7c3c
|
4.8 kB | Preview Download |
|
md5:ff0ed06226b7a1fa966d7fb8b43e1858
|
6.7 kB | Preview Download |
|
md5:cc22619c880be7985e91239d00dee85d
|
8.2 kB | Preview Download |
|
md5:5f3debf8022417ca172791b341472a28
|
6.1 kB | Preview Download |
|
md5:1fb30ab54b2c77f9f49fe44f27595f9e
|
12.0 kB | Preview Download |
|
md5:b7a4e93b6b27431af62e7101e3d4a95f
|
11.7 kB | Preview Download |
|
md5:a6c40872f557a2fb81a109ed6a029280
|
14.1 kB | Preview Download |
|
md5:26b1d256b6edb28838c104751be0e439
|
12.8 kB | Preview Download |