================================================================================
EXTERNAL PHARMACEUTICAL VOCABULARY VALIDATION
Anti-circularity test: external sources vs H12 decoder output
================================================================================

Loading corpus...
  35,916 tokens, 7,733 types
Loading external pharmaceutical vocabulary...
  150 terms across 16 categories
  Categories: {'plant_part': 6, 'preparation': 13, 'process': 9, 'ingredient': 23, 'plant': 37, 'disease': 14, 'body_part': 13, 'apparatus': 4, 'measurement': 10, 'timing': 3, 'efficacy': 3, 'dosha': 2, 'general_med': 6, 'descriptor': 6, 'function': 5, 'quality': 2}
  Sources (88):
    - Alpinia galanga
    - Ayurvedic pharmacopoeia (Arabic sanā borrowing)
    - Azadirachta indica
    - Bacopa monnieri
    - Bodleian MS
    - Bodleian MS Sinh.a.2(R)
    - Bodleian MS Sinh.d.3(R)
    - Bodleian MS Sinh.d.5(R)
    - Cassia fistula
    - Charaka
    ... and 78 more

Running H12 decoder...
  H12 matches: 7,211 tokens (20.1%), 235 types
  By category:
    ingredient          : 2,137 tokens
    function            : 1,021 tokens
    body_part           : 1,001 tokens
    process             :   883 tokens
    disease             :   681 tokens
    general_med         :   547 tokens
    plant_part          :   534 tokens
    timing              :   182 tokens
    apparatus           :   132 tokens
    measurement         :    66 tokens
    plant               :    21 tokens
    descriptor          :     6 tokens

  Matched terms:
    gena         ( 792x)  having taken                    [process]  src: Bodleian MS Sinh.d.3(R)
    ula          ( 518x)  water/spring water              [ingredient]  src: Yogaratnakaraya
    eda          ( 495x)  then/that                       [function]  src: Yogaratnakaraya
    meda         ( 424x)  fat/soften/knead                [ingredient]  src: Yogaratnakaraya
    ula          ( 376x)  water/spring water              [ingredient]  src: Yogaratnakaraya
    ara          ( 342x)  fever (Elu form)                [disease]  src: Yogaratnakaraya
    ura          ( 340x)  chest                           [body_part]  src: Yogaratnakaraya
    gara         ( 295x)  poison/toxin                    [general_med]  src: Charaka Samhita
    mea          ( 275x)  honey                           [ingredient]  src: Yogaratnakaraya
    ugena        ( 262x)  having learned/studied          [function]  src: Yogaratnakaraya
    ala          ( 250x)  root/tuber                      [plant_part]  src: Bodleian MS Sinh.a.2(R)
    gala         ( 241x)  throat/neck                     [body_part]  src: Yogaratnakaraya
    ugena        ( 207x)  having learned/studied          [function]  src: Yogaratnakaraya
    ura          ( 204x)  chest                           [body_part]  src: Yogaratnakaraya
    mula         ( 174x)  root (plant part)               [plant_part]  src: Bodleian MS Sinh.a.2(R)
    ula          ( 147x)  water/spring water              [ingredient]  src: Yogaratnakaraya
    sena         ( 134x)  senna (Cassia angustifolia)     [ingredient]  src: Ayurvedic pharmacopoeia (Arabic sanā borrowing)
    leda         ( 119x)  disease/sick                    [general_med]  src: Yogaratnakaraya
    gula         ( 111x)  abdominal tumor                 [disease]  src: Charaka Samhita
    uda          (  89x)  morning                         [timing]  src: Yogaratnakaraya
    sara         (  82x)  essence/ghee                    [ingredient]  src: Dravyaguna
    ara          (  70x)  fever (Elu form)                [disease]  src: Yogaratnakaraya
    sula         (  67x)  pain/colic                      [disease]  src: Charaka Samhita
    saina        (  66x)  mortar                          [apparatus]  src: Yogaratnakaraya
    kara         (  51x)  make/do                         [process]  src: Bodleian MS Sinh.d.3(R)
    ula          (  50x)  water/spring water              [ingredient]  src: Yogaratnakaraya
    ala          (  46x)  root/tuber                      [plant_part]  src: Bodleian MS Sinh.a.2(R)
    tula         (  45x)  weight/balance                  [general_med]  src: Yogaratnakaraya
    uda          (  41x)  morning                         [timing]  src: Yogaratnakaraya
    ala          (  29x)  root/tuber                      [plant_part]  src: Bodleian MS Sinh.a.2(R)
    ... and 205 more

Running 200 random decoders...
  Random # 1: 5,599 tokens (15.6%)
  Random # 2: 6,178 tokens (17.2%)
  Random # 3: 4,115 tokens (11.5%)
  Random # 4: 5,136 tokens (14.3%)
  Random # 5: 5,194 tokens (14.5%)
  ...
  Random #200: 2,148 tokens (6.0%)

================================================================================
RESULTS
================================================================================

  External vocabulary:    150 pharmaceutical terms
  Corpus:                 35,916 tokens

  H12 decoder:             7,211 tokens  (20.1%)  235 types
  Random average:             3843 tokens  (10.7%)
  Random best:             7,458 tokens  (20.8%)
  Random worst:            1,411 tokens  (3.9%)
  Random std dev:         3.95%

  H12 advantage:          +9.4% over random average
  H12 / Random ratio:     1.9x
  Z-score:                2.4

  Random decoders matching/beating H12: 3/200

  VERDICT: PASS
  H12 significantly exceeds random (Z=2.4), but 3 random decoders come close.

================================================================================
CONTROLLED COMPARISON (same vowel system: o→u)
================================================================================

  o→u decoders:       38/200
  H12 (o→u):          20.1%
  o→u average:        13.3%
  o→u best:           17.9%
  o→u std dev:        1.95%
  Controlled Z:       3.5
  Beating H12:        0/38

  CONTROLLED VERDICT: STRONG PASS
  When comparing to decoders with the same vowel system,
  H12's consonant mappings produce Z=3.5 (p<0.001).

================================================================================
CATEGORY BREAKDOWN: H12 vs Random Average
================================================================================

  Category              H12 tokens  Random avg   Ratio
  --------------------  ----------  ----------  ------
  apparatus                    132         159    0.8x
  body_part                  1,001         236    4.2x
  descriptor                     6           0    6.0x
  disease                      681         567    1.2x
  dosha                          0           0    0.0x
  efficacy                       0           0    0.0x
  function                   1,021          20   49.8x
  general_med                  547          78    7.0x
  ingredient                 2,137         680    3.1x
  measurement                   66         183    0.4x
  plant                         21          41    0.5x
  plant_part                   534         622    0.9x
  preparation                    0          45    0.0x
  process                      883         384    2.3x
  quality                        0           0    0.0x
  timing                       182           1  182.0x

================================================================================
PROVENANCE
All terms sourced from published materials predating H12 decoder:
  - Bodleian Library palm-leaf MSS (Liyanaratne 1992)
  - Yogaratnakaraya (15th century Sri Lankan medical text)
  - Charaka Samhita / Sushruta Samhita (classical Ayurveda)
  - Sri Lanka Ayurvedic Drugs Corporation product formulary
  - Sri Lankan materia medica (published ethnobotanical lists)
  - Dravyaguna Vijnana (pharmacological texts)
NO terms were derived from H12 decoder output.
================================================================================
