================================================================================
DECODER SPECIFICITY TEST
Does H12 specifically construct Sinhala, or generic CV strings?
================================================================================

Loading Voynich corpus...
  35,916 tokens, 7,733 types

  Raw EVA unique forms:     7,733
  H12 decoded unique forms: 4,972

Loading language dictionaries...
  Arabic      :    379,737 words
  Hebrew      :    897,978 words
  Hindi       :    127,451 words
  Latin       :    141,827 words
  Sinhala     :  1,470,278 words
  Tamil       :    403,980 words
  Turkish     :    409,612 words

================================================================================
CONDITION COMPARISON: RAW EVA vs H12 DECODED
================================================================================

  Language       Dict Size     Raw EVA    Raw %         H12    H12 %     Delta   Delta %    Boost
  ------------  ----------  ----------  -------  ----------  -------  --------  --------  -------
  Arabic           379,737       2,984     8.3%       3,365     9.4%      +381     +1.1%    1.13x
  Hebrew           897,978       5,114    14.2%           0     0.0%    -5,114    -14.2%    0.00x
  Hindi            127,451         844     2.3%      12,917    36.0%   +12,073    +33.6%   15.30x
  Latin            141,827       5,426    15.1%      10,848    30.2%    +5,422    +15.1%    2.00x
  Sinhala        1,470,278       6,776    18.9%      16,977    47.3%   +10,201    +28.4%    2.51x
  Tamil            403,980       4,043    11.3%       5,509    15.3%    +1,466     +4.1%    1.36x
  Turkish          409,612       8,017    22.3%      14,533    40.5%    +6,516    +18.1%    1.81x

  Size-normalized delta (decoder boost per 100K dictionary entries):
  Language         Delta    Per 100K
  ------------  --------  ----------
  Arabic            +381      +100.3
  Hebrew          -5,114      -569.5
  Hindi          +12,073     +9472.7
  Latin           +5,422     +3823.0
  Sinhala        +10,201      +693.8
  Tamil           +1,466      +362.9
  Turkish         +6,516     +1590.8

================================================================================
RANDOM DECODER COMPARISON (200 trials)
What does a random decoder add per language?
================================================================================

  ... trial 1/200
  ... trial 51/200
  ... trial 101/200
  ... trial 151/200
  Done.

================================================================================
DECODER SPECIFICITY: DELTA Z-SCORES
How much does each decoder SPECIFICALLY boost each language?
================================================================================

  Language       H12 Delta   Rnd Avg Δ   Rnd Std Δ   Δ Z-score   Beat H12
  ------------  ----------  ----------  ----------  ----------  ---------
  Arabic              +381     +3036.7      2979.4       -0.89   175/200
  Hebrew            -5,114     -4969.2       198.4       -0.73   200/200
  Hindi            +12,073    +14448.2      1480.4       -1.60   196/200
  Latin             +5,422     +6338.4      1191.0       -0.77   162/200
  Sinhala          +10,201    +11819.3      1974.2       -0.82   154/200
  Tamil             +1,466     +2659.0      1466.0       -0.81   157/200
  Turkish           +6,516     +8868.6      1322.0       -1.78   198/200

  For comparison — raw match Z-scores (H12 absolute, not delta):
  Language         H12 tok     Rnd avg     Rnd std     Z-score
  ------------  ----------  ----------  ----------  ----------
  Arabic             3,365      6020.7      2979.4       -0.89
  Hebrew                 0       144.8       198.4       -0.73
  Hindi             12,917     15292.2      1480.4       -1.60
  Latin             10,848     11764.4      1191.0       -0.77
  Sinhala           16,977     18595.3      1974.2       -0.82
  Tamil              5,509      6702.0      1466.0       -0.81
  Turkish           14,533     16885.6      1322.0       -1.78

================================================================================
H12 SPECIFICITY PROFILE
How selectively does H12 boost each language?
================================================================================

  1. Hebrew        Δ Z= -0.73  Δ= -5,114 tokens  
  2. Latin         Δ Z= -0.77  Δ= +5,422 tokens  
  3. Tamil         Δ Z= -0.81  Δ= +1,466 tokens  
  4. Sinhala       Δ Z= -0.82  Δ=+10,201 tokens   <== H12 HYPOTHESIS
  5. Arabic        Δ Z= -0.89  Δ=   +381 tokens  
  6. Hindi         Δ Z= -1.60  Δ=+12,073 tokens  
  7. Turkish       Δ Z= -1.78  Δ= +6,516 tokens  

  Most specifically boosted:  Hebrew (Δ Z=-0.73)
  Second most boosted:       Latin (Δ Z=-0.77)

================================================================================
THE 'GENERIC CV' HYPOTHESIS TEST
================================================================================

  If H12 produces 'generic CV strings', all languages should be
  boosted equally (delta Z-scores would be similar).

  Delta Z-score range: 1.05
  Delta Z-score mean:  -1.06
  Delta Z-score std:   0.41

  Sinhala delta Z:         -0.82
  All-other-languages avg: -1.10
  Sinhala advantage:       +0.28

  RESULT: H12 boosts Sinhala slightly more than other languages.
  The 'generic CV' hypothesis cannot be fully rejected by this test alone.

================================================================================
VERDICT
================================================================================

  Hebrew is more specifically boosted than Sinhala.
  This requires investigation.

  NOTE: This test measures DECODER SPECIFICITY — does the specific
  H12 mapping selectively produce one language over others?
  Structural tests (Panchavidha Z=10.5, SOV Z=7.04, Collocations)
  test whether the matched words form COHERENT TEXT, which is the
  stronger form of evidence.

================================================================================
