==============================================================================
ENTROPY AND DIRECTIONALITY ANALYSIS
Voynich Manuscript EVA vs H12-Decoded Sinhala
==============================================================================

------------------------------------------------------------------------------
ANALYSIS 1: CHARACTER-LEVEL CONDITIONAL ENTROPY
------------------------------------------------------------------------------

Background:
  Literature reports Voynichese has h2 ~ 2 bits (second-order
  conditional entropy), while natural languages have h2 ~ 3-4 bits.
  If H12 decoding maps EVA to natural Sinhala, the decoded text
  should have higher h2 (closer to natural language).

Methodology:
  h0 = log2(alphabet_size)
  h1 = Shannon entropy = -sum(p(c) * log2(p(c)))
  h2 = H(X_n | X_{n-1}) = conditional entropy given 1 previous char
  h3 = H(X_n | X_{n-1}, X_{n-2}) = cond. entropy given 2 prev chars

TABLE 1: Entropy measures WITHOUT word boundaries (chars within words)
Corpus                                   Len   |A|      h0      h1      h2      h3
------------------------------------------------------------------------------
EVA (no boundaries)                   182554    22   4.459   3.860   2.358   2.114
H12 Decoded (no boundaries)           156773    22   4.459   3.373   2.339   2.164
Sinhala Dict (no boundaries)          376309    26   4.700   3.855   3.338   3.163
English (no boundaries)               124851    25   4.644   4.078   3.343   2.701

TABLE 2: Entropy measures WITH word boundaries (space = character)
Corpus                                   Len   |A|      h0      h1      h2      h3
------------------------------------------------------------------------------
EVA (with word boundaries)            218469    23   4.524   3.870   2.141   1.873
H12 Decoded (with boundaries)         192666    23   4.524   3.439   2.055   1.856
Sinhala Dict (with boundaries)        411308    27   4.755   3.947   3.354   3.116
English (with boundaries)             159850    26   4.700   3.944   2.919   2.057

INTERPRETATION:

  EVA h2 = 2.358 bits
  H12-Decoded h2 = 2.339 bits
  Delta (decoded - EVA) = -0.019 bits

  RESULT: H12-decoded text has SIMILAR h2 to raw EVA.
  The entropy difference is small.

  Sinhala dictionary h2 = 3.338 bits (reference)
  English h2 = 3.343 bits (reference)

  How close is decoded h2 to natural language?
  Distance from decoded to Sinhala: 0.999 bits
  Distance from EVA to Sinhala:     0.980 bits

  The decoded text is NOT closer to Sinhala entropy than raw EVA.

  Third-order analysis:
  EVA h3 = 2.114, Decoded h3 = 2.164, Sinhala h3 = 3.163, English h3 = 2.701

  Redundancy (EVA (no boundaries)): 0.134
  Redundancy (H12 Decoded (no boundaries)): 0.244
  Redundancy (Sinhala Dict (no boundaries)): 0.180
  Redundancy (English (no boundaries)): 0.122

------------------------------------------------------------------------------
ANALYSIS 2: DIRECTIONAL PERPLEXITY (LTR vs RTL)
------------------------------------------------------------------------------

Background:
  Parisel (arXiv 2509.10573) found Voynich has RTL optimization
  at 73.9% prediction accuracy using character 4-grams. Natural
  LTR languages show LTR optimization. If the H12 abugida
  encoding reverses directional properties, decoded text should
  show LTR optimization even though EVA shows RTL.

Methodology:
  For each character position in each word, compare probability
  of the character given:
    LTR context: preceding (n-1) characters
    RTL context: following (n-1) characters
  'X better' = percentage of positions where X context gives
  higher probability for the actual character.

TABLE 3: Directional prediction accuracy
Corpus             n  LTR acc%  RTL acc%  LTR>RTL%  RTL>LTR%  Positions   Optimized
------------------------------------------------------------------------------
EVA                2      45.3      49.2      51.4      48.6     111365         LTR
EVA                3      52.7      55.9      48.7      51.3      48395         RTL
EVA                4      53.8      59.7      45.5      54.5      10876         RTL
H12 Decoded        2      47.8      39.1      54.6      45.4      85610         LTR
H12 Decoded        3      59.8      50.8      39.2      60.8      26994         RTL
H12 Decoded        4      68.6      51.6      58.6      41.4       4207         LTR
Sinhala Dict       2      32.5      29.1      50.2      49.8     306309         LTR
Sinhala Dict       3      35.6      33.4      49.6      50.4     236424         RTL
Sinhala Dict       4      41.3      38.3      50.5      49.4     168368         LTR
English            2      36.7      36.9      50.3      49.7      55551         LTR
English            3      59.2      56.8      55.8      42.7      12993         LTR
English            4      86.3      81.3      58.2      16.0       2364         LTR

TABLE 4: N-gram perplexity (lower = better predictability)
Corpus             n    PP(LTR)    PP(RTL)  RTL/LTR  Direction
------------------------------------------------------------------------------
EVA                2       4.33       3.94    0.910   RTL opt.
EVA                3       3.71       3.41    0.920   RTL opt.
EVA                4       3.42       3.07    0.899   RTL opt.
H12 Decoded        2       3.97       5.45    1.372   LTR opt.
H12 Decoded        3       2.92       3.85    1.316   LTR opt.
H12 Decoded        4       2.28       4.11    1.804   LTR opt.
Sinhala Dict       2       9.02       9.86    1.094   LTR opt.
Sinhala Dict       3       7.77       8.37    1.077   LTR opt.
Sinhala Dict       4       5.90       6.60    1.118   LTR opt.
English            2       6.73       7.03    1.046   LTR opt.
English            3       2.76       2.85    1.033   LTR opt.
English            4       1.32       1.45    1.093   LTR opt.

INTERPRETATION:

  EVA (n=4): RTL-optimized (LTR>45.5% vs RTL>54.5%)
  Decoded (n=4): LTR-optimized (LTR>58.6% vs RTL>41.4%)
  Sinhala (n=4): LTR-optimized (LTR>50.5% vs RTL>49.4%)
  English (n=4): LTR-optimized (LTR>58.2% vs RTL>16.0%)

  RESULT: EVA shows RTL optimization, decoded shows LTR optimization.
  This is consistent with an abugida encoding that reverses directional
  properties: the encoding rules map LTR Sinhala structure into patterns
  that appear RTL-optimized in EVA. H12 decoding recovers the original
  LTR directionality.

==============================================================================
SUMMARY
==============================================================================

Key findings:

  1. Entropy: EVA h2 = 2.358 -> Decoded h2 = 2.339 (delta = -0.019)
     The entropy change is modest.

  2. Alphabet: EVA |A| = 22, Decoded |A| = 22
     The abugida decoding changes the effective alphabet.

  3. Directionality: EVA = RTL-optimized, Decoded = LTR-optimized
     The H12 abugida mapping reverses the apparent reading direction,
     consistent with position-dependent encoding rules.

==============================================================================
END OF ANALYSIS
==============================================================================
