================================================================================
NAIBBE CIPHER vs H12 STRUCTURAL TEST FRAMEWORK
================================================================================

Purpose: Test whether Naibbe ciphertext (Latin/Italian source,
encrypted to mimic Voynichese statistics) can be distinguished
from genuine H12-decoded Voynich manuscript text.

If Naibbe-through-H12 fails these tests, it means:
  - H12 structural signals are NOT artifacts of EVA statistics
  - Matching Voynichese char/word frequencies is NECESSARY
    but NOT SUFFICIENT to reproduce H12 linguistic signals

--------------------------------------------------------------------------------
1. LOADING DATA
--------------------------------------------------------------------------------

  Naibbe ciphertext: 33,750 tokens, 6,423 types
  Source: Pliny Natural History Book 16, Naibbe cipher (Greshko 2025)

  Real VMS corpus:   35,916 tokens, 7,733 types

--------------------------------------------------------------------------------
2. H12 DECODING
--------------------------------------------------------------------------------

  Naibbe decoded: 33,750 tokens, 4,198 unique
  VMS decoded:    35,916 tokens, 4,972 unique

  Top 20 Naibbe-decoded words:
  Rank  Decoded             Freq  EVA                 
  ----  ----------------  ------  --------------------
     1  ula                 1009  ol                  
     2  meda                 744  shedy               
     3  ugena                684  okaiin              
     4  eda                  666  chedy               
     5  ugēa                 628  qokeey              
     6  ugæna                610  okain               
     7  ugēda                585  qokeedy             
     8  gara                 554  dar                 
     9  ugeda                492  qokedy              
    10  uteda                469  otedy               
    11  gena                 438  daiin               
    12  ugara                435  okar                
    13  uga                  346  qoky                
    14  ara                  342  ar                  
    15  ena                  337  chaiin              
    16  ura                  313  chor                
    17  utena                305  otaiin              
    18  utēda                300  qoteedy             
    19  ugala                299  qokal               
    20  mea                  291  shey                

  Sinhala dictionary: 1,470,278 words
  External pharma vocab: 150 terms

--------------------------------------------------------------------------------
3. DICTIONARY MATCH RATE
--------------------------------------------------------------------------------

  Corpus                      Matched     Total      Rate
  -------------------------  --------  --------  --------
  Naibbe-through-H12           13,619    33,750     40.4%
  VMS-through-H12              16,977    35,916     47.3%

================================================================================
TEST A: SOV SYNTAX PATTERNS
================================================================================

Sinhala is SOV with postpositions. H12-decoded VMS shows:
  noun-before-verb ~83%, post-after-noun ~78%, verb-final ~71%.
Random/non-SOV text should be ~50% on all metrics.

  Metric                              Naibbe         VMS      Chance
  ------------------------------  ----------  ----------  ----------
  Noun-before-verb %                   49.5%       50.1%        ~50%
    (NV pairs tested)                   4946        5573
  Post-after-noun %                    52.2%       50.6%        ~50%
    (NP pairs tested)                     69         176
  Verb-final %                         72.3%       72.1%        ~50%
    (verbs tested)                      9106        8512

  SOV composite Z-score:
    Naibbe: NV Z=-0.74, NP Z=0.36, VF Z=42.50 -> Composite Z=14.04
    VMS:    NV Z=0.12, NP Z=0.15, VF Z=40.69 -> Composite Z=13.65
    (Paper reference: VMS SOV Z=7.04)

  POS tag distribution:
  Tag           Naibbe     Naibbe%         VMS        VMS%
  N              9,229       27.3%      12,343       34.4%
  V              9,106       27.0%       8,512       23.7%
  POST              97        0.3%         247        0.7%
  UNK           15,318       45.4%      14,814       41.2%

================================================================================
TEST B: AYURVEDIC COLLOCATION TEST (36 pairs, window=10)
================================================================================

Tests whether canonical Ayurvedic recipe pairs (water+senna,
honey+take, root+strain, etc.) co-occur within 10-word windows.
H12 on VMS: 16/36 pairs. Random decoders: ~2-4/36.

  Corpus                      Found   Total      Rate
  -------------------------  ------  ------  --------
  Naibbe-through-H12             21      36     58.3%
  VMS-through-H12                20      36     55.6%

  Naibbe collocations found:
           ula + sena        (water+senna              )  dist=1
           ula + gala        (water+strain             )  dist=1
           mea + gena        (honey+take               )  dist=1
           mea + kara        (honey+make               )  dist=1
          mula + gena        (root+take                )  dist=2
          mula + sena        (root+senna               )  dist=1
          mula + gala        (root+strain              )  dist=1
          gula + kara        (pill+make                )  dist=7
          pala + gena        (fruit+take               )  dist=5
           ata + gena        (leaf/branch+take         )  dist=2
           ala + gena        (tuber+take               )  dist=1
           ala + sena        (tuber+senna              )  dist=1
          meda + kara        (fat+make/knead           )  dist=1
           ula + kara        (water+make               )  dist=1
          dena + kara        (give+make                )  dist=5
          mula + kara        (root+make                )  dist=1
          sena + gala        (senna+strain             )  dist=2
          sena + gena        (senna+take               )  dist=2
          gena + kara        (take+make                )  dist=1
          gena + sena        (take+senna               )  dist=2
          gena + dena        (take+give                )  dist=3

  VMS collocations found (top 20):
           ula + sena        (water+senna              )  dist=1
           ula + gala        (water+strain             )  dist=1
           mea + gena        (honey+take               )  dist=1
           mea + kara        (honey+make               )  dist=1
          mula + gena        (root+take                )  dist=1
          mula + sena        (root+senna               )  dist=2
          mula + gala        (root+strain              )  dist=1
          gula + kara        (pill+make                )  dist=2
          pala + gena        (fruit+take               )  dist=4
           ata + gena        (leaf/branch+take         )  dist=1
           ala + gena        (tuber+take               )  dist=1
           ala + sena        (tuber+senna              )  dist=1
          meda + kara        (fat+make/knead           )  dist=5
           ula + kara        (water+make               )  dist=1
          mula + kara        (root+make                )  dist=4
          sena + gala        (senna+strain             )  dist=1
          sena + gena        (senna+take               )  dist=1
          gena + kara        (take+make                )  dist=1
          gena + sena        (take+senna               )  dist=1
          gena + dena        (take+give                )  dist=4

================================================================================
TEST C: EXTERNAL PHARMACEUTICAL VOCABULARY (156 terms)
================================================================================

Tests how many independently-compiled Sinhala pharmaceutical terms
appear in decoded output. H12 on VMS: ~Z=3.5 over random.

  Corpus                       Tokens   Types    Match%
  -------------------------  --------  ------  --------
  Naibbe-through-H12            6,416      41     19.0%
  VMS-through-H12               7,130      45     19.9%

  Category breakdown:
  Category                Naibbe       VMS
  --------------------  --------  --------
  apparatus                   91        64
  body_part                  508     1,001
  descriptor                   3         6
  disease                    371       681
  dosha                        0         0
  efficacy                     0         0
  function                 1,350     1,021
  general_med                830       547
  ingredient               2,238     2,137
  measurement                 27        66
  plant                        0         8
  plant_part                 370       534
  preparation                  5         0
  process                    549       883
  quality                      2         0
  timing                      72       182

  Naibbe pharma matches (top 20):
    ula          (1009x)  water/spring water              [ingredient]
    meda         ( 744x)  fat/soften/knead                [ingredient]
    ugena        ( 684x)  having learned/studied          [function]
    eda          ( 666x)  then/that                       [function]
    gara         ( 554x)  poison/toxin                    [general_med]
    gena         ( 438x)  having taken                    [process]
    ara          ( 342x)  fever (Elu form)                [disease]
    ura          ( 313x)  chest                           [body_part]
    mea          ( 291x)  honey                           [ingredient]
    leda         ( 221x)  disease/sick                    [general_med]
    ala          ( 186x)  root/tuber                      [plant_part]
    gala         ( 169x)  throat/neck                     [body_part]
    mula         ( 169x)  root (plant part)               [plant_part]
    sena         (  99x)  senna (Cassia angustifolia)     [ingredient]
    kara         (  79x)  make/do                         [process]
    uda          (  72x)  morning                         [timing]
    kala         (  68x)  pot/vessel                      [apparatus]
    sara         (  48x)  essence/ghee                    [ingredient]
    dena         (  32x)  give (to patient)               [process]
    mala         (  25x)  bodily waste                    [general_med]

  VMS pharma matches (top 20):
    ula          (1131x)  water/spring water              [ingredient]
    gena         ( 797x)  having taken                    [process]
    ura          ( 607x)  chest                           [body_part]
    eda          ( 530x)  then/that                       [function]
    ugena        ( 491x)  having learned/studied          [function]
    ara          ( 461x)  fever (Elu form)                [disease]
    meda         ( 445x)  fat/soften/knead                [ingredient]
    ala          ( 348x)  root/tuber                      [plant_part]
    gara         ( 331x)  poison/toxin                    [general_med]
    gala         ( 302x)  throat/neck                     [body_part]
    mea          ( 282x)  honey                           [ingredient]
    uda          ( 182x)  morning                         [timing]
    mula         ( 178x)  root (plant part)               [plant_part]
    sena         ( 137x)  senna (Cassia angustifolia)     [ingredient]
    gula         ( 135x)  abdominal tumor                 [disease]
    leda         ( 129x)  disease/sick                    [general_med]
    sara         (  90x)  essence/ghee                    [ingredient]
    sula         (  75x)  pain/colic                      [disease]
    kara         (  70x)  make/do                         [process]
    tula         (  67x)  weight/balance                  [general_med]

================================================================================
ADDITIONAL: PHONOLOGICAL STRUCTURE
================================================================================

  Vowel-final words:
    Naibbe: 99.4%  (33554/33750)
    VMS:    99.7%  (35798/35916)

  Word length distribution (decoded):
   Len     Naibbe%        VMS%
     1        0.9%        1.7%
     2        4.4%        6.5%
     3       16.1%       20.3%
     4       24.0%       26.0%
     5       29.7%       27.6%
     6       12.0%       10.2%
     7        8.1%        5.0%
     8        2.6%        1.8%
     9        1.1%        0.6%
    10        0.5%        0.2%
    11        0.3%        0.1%
    12        0.2%        0.0%

================================================================================
SUMMARY: NAIBBE vs VMS COMPARISON
================================================================================

  Test                                       Naibbe     VMS (H12)     Paper ref
  -----------------------------------  ------------  ------------  ------------
  Corpus size (tokens)                       33,750        35,916
  Dictionary match rate                       40.4%         47.3%
  Vowel-final %                               99.4%         99.7%
  SOV composite Z                             14.04         13.65        Z=7.04
    Noun-before-verb %                        49.5%         50.1%
    Post-after-noun %                         52.2%         50.6%
    Verb-final %                              72.3%         72.1%
  Ayurvedic collocations (of 36)                 21            20            16
  Pharma vocab tokens                         6,416         7,130
  Pharma vocab types                             41            45
  Pharma vocab match %                        19.0%         19.9%

================================================================================
VERDICT: CAN NAIBBE BE DISTINGUISHED FROM H12?
================================================================================

  Test A (SOV syntax):         PASS (SOV signal present)
  Test B (Collocations):       PASS (collocations present)
  Test C (Pharma vocabulary):  PASS (pharma terms present)

  Naibbe passes: 3/3 structural tests

  CONCLUSION: WEAK DISCRIMINATION
  Naibbe passes all structural tests.

  INTERPRETATION:
  The Naibbe cipher replicates many statistical properties of Voynichese
  (character frequencies, word-length distributions, entropy profile).
  However, passing through the H12 decoder tests whether the TEXT
  carries genuine Sinhala linguistic structure vs. mere statistical mimicry.
  Structural tests (SOV order, Ayurvedic collocations, pharma vocabulary)
  probe aspects of MEANING that statistical mimicry cannot replicate.

================================================================================
