Published April 1, 2026 | Version 5.0
Working paper Open

Low-Redundancy Text, High-Redundancy System: Evidence for Cross-Modal Encoding in the Voynich Manuscript

Authors/Creators

Description

This paper presents a falsifiable structural model of the Voynich manuscript (Beinecke MS 408), based on computational analysis of the complete ZL IVTFF 2b transcription (36,234 tokens, 226 folios, 8 sections). Rather than attempting to identify an underlying natural language, the study asks what kind of system the manuscript implements, and answers through holdout-validated formal analysis, independent unsupervised confirmation, and cross-modal testing against the manuscript's illustrations.
 
The model establishes five principal findings. First, a four-layer morphological grammar classifies 91-97% of tokens across six stratified holdout blocks spanning five manuscript sections, three or more scribal hands, and both Currier languages, with zero stacking-order violations in any block and no parameter adjustment after model freeze. Second, the invariant formal system is deployed in at least six distinct compositional regimes -- loop-based prose, topic-dominant chaining, nominal labelling, weakened-loop variant, closure-weighted operational mode, and balanced connective mode -- varying systematically by section and hand. Two regimes were discovered only upon unsealing the sealed reserve holdout, demonstrating that the taxonomy expands under evaluation. Third, discourse-framing density in text predicts visual complexity of herbal illustrations (Spearman rho = 0.600, p < 0.0001, n = 43), confirmed by pre-registered holdout with minimal attenuation. At the label level, specific morphemes predict specific plant features across five independent visual channels, and morpheme bundles predict multi-feature plant profiles compositionally (LOO AUC p = 0.0006). Fourth, a 17-mapping codebook decodes plant architecture from herbal labels at 58.5% accuracy across 72 folios and is bidirectional: image features recover label morpheme sets above chance (p < 0.0001), with forward-greater-than-inverse asymmetry diagnostic of selective encoding rather than cipher. Labels and prose perform complementary, load-balanced functions confirmed by an adaptive compensation mechanism (rho = -0.337, p = 0.011). Fifth, the system meets 8 of 10 criteria for restricted technical notation while failing the criterion most diagnostic of natural language: lexical recoverability.
 
These findings are independently triangulated: a rule-based grammar, holdout replication across two evaluation stages, and unsupervised HMM recovery of grammar classes from suffix sequences alone (NMI = 0.181, entity purity 0.53) converge on the same structural conclusions. The architecture is inconsistent with simple cipher, random generation, hoax, or classical mnemonic systems.
 
The study also situates the manuscript within the documented manuscript ecology of the eastern Mediterranean, presenting quantitative visual comparisons against six comparator manuscript traditions. The herbal section aligns closely with early encyclopedic Qazwini copies (Euclidean distance 2.37), while the zodiac section occupies a distinct visual regime matching no tested tradition, combining Latin computational diagram architecture with Byzantine Greek medico-astrological content and a unique figurative encoding system.
 
The manuscript is best understood as a structured, sectionally differentiated technical system with partially recoverable semantics -- structurally technical but lexically local. Its grammar is real and invariant. Its regimes are real and section-specific. Its text and images interact. Its labels carry structured semantic content. What it does not yield through structural analysis alone is a reading.
 
This deposit includes the pre-submission draft (v5.0), analysis scripts, data files, and figures.

Files

all_visual_scores.json

Files (3.8 MB)

Name Size Download all
md5:a6c8199069c8ff5cb2b29bdfb069f141
12.1 kB Preview Download
md5:f68b20c977a799c9f9100ddb7e725ab1
5.5 kB Preview Download
md5:35fa2dbe73277ef9228ef5930555df63
4.1 kB Preview Download
md5:6fb129c8d17f12f9d133019b3b287afd
5.0 kB Preview Download
md5:94c1a6649607313412124c837f1c7f3e
3.2 kB Preview Download
md5:eb4d99e4c0b4701b0acc51e88b4e8ed0
7.3 kB Preview Download
md5:f8055f6728837bdd201d44891a247077
5.0 kB Preview Download
md5:bc6ed0e76eaf03fca4031980608d1687
562 Bytes Preview Download
md5:0dda52ab9d2994b8358e371bd3a121e6
3.9 kB Preview Download
md5:0eb1dbf16b589d523da47810279b1773
95.6 kB Preview Download
md5:4f510abc893320c69dbc5f6df3ffa3ee
5.4 kB Preview Download
md5:d91e72510c53882e26caeda324eb7ea1
50.3 kB Preview Download
md5:851f350e1db1a9c00e59de6aa221a16a
3.5 MB Preview Download
md5:31b54b1e57128a08b87d83ea5d007190
7.2 kB Preview Download
md5:39d62fed7350f2911c92cc2abea5bcce
6.8 kB Preview Download
md5:9d1c07a01e9a3b647d309fcaef1c4c54
4.5 kB Preview Download
md5:38b1ab24621d50a7b368071bf82d7c3c
4.8 kB Preview Download
md5:ff0ed06226b7a1fa966d7fb8b43e1858
6.7 kB Preview Download
md5:cc22619c880be7985e91239d00dee85d
8.2 kB Preview Download
md5:5f3debf8022417ca172791b341472a28
6.1 kB Preview Download
md5:1fb30ab54b2c77f9f49fe44f27595f9e
12.0 kB Preview Download
md5:b7a4e93b6b27431af62e7101e3d4a95f
11.7 kB Preview Download
md5:a6c40872f557a2fb81a109ed6a029280
14.1 kB Preview Download
md5:26b1d256b6edb28838c104751be0e439
12.8 kB Preview Download