============================================================
=== JUDGE PASS 1/3: Naive Baseline (138 results) ===
=== Started: Thu Mar 12 20:59:55 CET 2026 ===
============================================================
INFO:exp_v4_naive_baseline:[1/138] Judging: 01_sarek_FR1
INFO:exp_v4_naive_baseline:[2/138] Judging: 01_sarek_CO2
INFO:exp_v4_naive_baseline:[3/138] Judging: 01_sarek_TE3
INFO:exp_v4_naive_baseline:[4/138] Judging: 02_snakemake_FR1
INFO:exp_v4_naive_baseline:[5/138] Judging: 02_snakemake_CO2
INFO:exp_v4_naive_baseline:[6/138] Judging: 02_snakemake_TE3
INFO:exp_v4_naive_baseline:[7/138] Judging: 03_nfcore_framework_FR1
INFO:exp_v4_naive_baseline:[8/138] Judging: 03_nfcore_framework_CO2
INFO:exp_v4_naive_baseline:[9/138] Judging: 03_nfcore_framework_TE3
INFO:exp_v4_naive_baseline:[10/138] Judging: 04_fastp_FR1
INFO:exp_v4_naive_baseline:[11/138] Judging: 04_fastp_CO2
INFO:exp_v4_naive_baseline:[12/138] Judging: 04_fastp_TE3
INFO:exp_v4_naive_baseline:[13/138] Judging: 05_multiqc_FR1
INFO:exp_v4_naive_baseline:[14/138] Judging: 05_multiqc_CO2
INFO:exp_v4_naive_baseline:[15/138] Judging: 05_multiqc_TE3
INFO:exp_v4_naive_baseline:[16/138] Judging: 06_star_aligner_FR1
INFO:exp_v4_naive_baseline:[17/138] Judging: 06_star_aligner_CO2
INFO:exp_v4_naive_baseline:[18/138] Judging: 06_star_aligner_TE3
INFO:exp_v4_naive_baseline:[19/138] Judging: 07_salmon_FR1
INFO:exp_v4_naive_baseline:[20/138] Judging: 07_salmon_CO2
INFO:exp_v4_naive_baseline:[21/138] Judging: 07_salmon_TE3
INFO:exp_v4_naive_baseline:[22/138] Judging: 08_deseq2_FR1
INFO:exp_v4_naive_baseline:[23/138] Judging: 08_deseq2_CO2
INFO:exp_v4_naive_baseline:[24/138] Judging: 08_deseq2_TE3
INFO:exp_v4_naive_baseline:[25/138] Judging: 09_seqkit_FR1
INFO:exp_v4_naive_baseline:[26/138] Judging: 09_seqkit_CO2
INFO:exp_v4_naive_baseline:[27/138] Judging: 09_seqkit_TE3
INFO:exp_v4_naive_baseline:[28/138] Judging: 10_cutadapt_FR1
INFO:exp_v4_naive_baseline:[29/138] Judging: 10_cutadapt_CO2
INFO:exp_v4_naive_baseline:[30/138] Judging: 10_cutadapt_TE3
INFO:exp_v4_naive_baseline:[31/138] Judging: 11_asf_burkina_faso_FR1
INFO:exp_v4_naive_baseline:[32/138] Judging: 11_asf_burkina_faso_CO2
INFO:exp_v4_naive_baseline:[33/138] Judging: 11_asf_burkina_faso_TE3
INFO:exp_v4_naive_baseline:[34/138] Judging: 12_hpai_netherlands_FR1
INFO:exp_v4_naive_baseline:[35/138] Judging: 12_hpai_netherlands_CO2
INFO:exp_v4_naive_baseline:[36/138] Judging: 12_hpai_netherlands_TE3
INFO:exp_v4_naive_baseline:[37/138] Judging: 13_lsd_nepal_FR1
INFO:exp_v4_naive_baseline:[38/138] Judging: 13_lsd_nepal_CO2
INFO:exp_v4_naive_baseline:[39/138] Judging: 13_lsd_nepal_TE3
INFO:exp_v4_naive_baseline:[40/138] Judging: 14_bovine_tb_cameroon_FR1
INFO:exp_v4_naive_baseline:[41/138] Judging: 14_bovine_tb_cameroon_CO2
INFO:exp_v4_naive_baseline:[42/138] Judging: 14_bovine_tb_cameroon_TE3
INFO:exp_v4_naive_baseline:[43/138] Judging: 15_rabies_tanzania_FR1
INFO:exp_v4_naive_baseline:[44/138] Judging: 15_rabies_tanzania_CO2
INFO:exp_v4_naive_baseline:[45/138] Judging: 15_rabies_tanzania_TE3
INFO:exp_v4_naive_baseline:[46/138] Judging: 16_ppr_ethiopia_FR1
INFO:exp_v4_naive_baseline:[47/138] Judging: 16_ppr_ethiopia_CO2
INFO:exp_v4_naive_baseline:[48/138] Judging: 16_ppr_ethiopia_TE3
INFO:exp_v4_naive_baseline:[49/138] Judging: 17_brucellosis_ethiopia_FR1
INFO:exp_v4_naive_baseline:[50/138] Judging: 17_brucellosis_ethiopia_CO2
INFO:exp_v4_naive_baseline:[51/138] Judging: 17_brucellosis_ethiopia_TE3
INFO:exp_v4_naive_baseline:[52/138] Judging: 18_fmd_review_FR1
INFO:exp_v4_naive_baseline:[53/138] Judging: 18_fmd_review_CO2
INFO:exp_v4_naive_baseline:[54/138] Judging: 18_fmd_review_TE3
INFO:exp_v4_naive_baseline:[55/138] Judging: 19_hpai_canada_FR1
INFO:exp_v4_naive_baseline:[56/138] Judging: 19_hpai_canada_CO2
INFO:exp_v4_naive_baseline:[57/138] Judging: 19_hpai_canada_TE3
INFO:exp_v4_naive_baseline:[58/138] Judging: 20_lsd_review_FR1
INFO:exp_v4_naive_baseline:[59/138] Judging: 20_lsd_review_CO2
INFO:exp_v4_naive_baseline:[60/138] Judging: 20_lsd_review_TE3
INFO:exp_v4_naive_baseline:[61/138] Judging: CD_001
INFO:exp_v4_naive_baseline:[62/138] Judging: CD_002
INFO:exp_v4_naive_baseline:[63/138] Judging: CD_003
INFO:exp_v4_naive_baseline:[64/138] Judging: CD_004
INFO:exp_v4_naive_baseline:[65/138] Judging: CD_005
INFO:exp_v4_naive_baseline:[66/138] Judging: CD_006
INFO:exp_v4_naive_baseline:[67/138] Judging: CD_007
INFO:exp_v4_naive_baseline:[68/138] Judging: CD_008
INFO:exp_v4_naive_baseline:[69/138] Judging: CD_009
INFO:exp_v4_naive_baseline:[70/138] Judging: CD_010
INFO:exp_v4_naive_baseline:[71/138] Judging: CD_011
INFO:exp_v4_naive_baseline:[72/138] Judging: CD_012
INFO:exp_v4_naive_baseline:[73/138] Judging: CD_013
INFO:exp_v4_naive_baseline:[74/138] Judging: CD_014
INFO:exp_v4_naive_baseline:[75/138] Judging: CD_015
INFO:exp_v4_naive_baseline:[76/138] Judging: CD_016
INFO:exp_v4_naive_baseline:[77/138] Judging: CD_017
INFO:exp_v4_naive_baseline:[78/138] Judging: CD_018
INFO:exp_v4_naive_baseline:[79/138] Judging: SY_001
INFO:exp_v4_naive_baseline:[80/138] Judging: SY_002
INFO:exp_v4_naive_baseline:[81/138] Judging: SY_003
INFO:exp_v4_naive_baseline:[82/138] Judging: SY_004
INFO:exp_v4_naive_baseline:[83/138] Judging: SY_005
INFO:exp_v4_naive_baseline:[84/138] Judging: SY_006
INFO:exp_v4_naive_baseline:[85/138] Judging: SY_007
INFO:exp_v4_naive_baseline:[86/138] Judging: SY_008
INFO:exp_v4_naive_baseline:[87/138] Judging: SY_009
INFO:exp_v4_naive_baseline:[88/138] Judging: SY_010
INFO:exp_v4_naive_baseline:[89/138] Judging: OOD_001
INFO:exp_v4_naive_baseline:[90/138] Judging: OOD_002
INFO:exp_v4_naive_baseline:[91/138] Judging: OOD_003
INFO:exp_v4_naive_baseline:[92/138] Judging: OOD_004
INFO:exp_v4_naive_baseline:[93/138] Judging: OOD_005
INFO:exp_v4_naive_baseline:[94/138] Judging: OOD_006
INFO:exp_v4_naive_baseline:[95/138] Judging: OOD_007
INFO:exp_v4_naive_baseline:[96/138] Judging: OOD_008
INFO:exp_v4_naive_baseline:[97/138] Judging: OOD_009
INFO:exp_v4_naive_baseline:[98/138] Judging: OOD_010
INFO:exp_v4_naive_baseline:[99/138] Judging: OOD_011
INFO:exp_v4_naive_baseline:[100/138] Judging: OOD_012
INFO:exp_v4_naive_baseline:[101/138] Judging: OOD_013
INFO:exp_v4_naive_baseline:[102/138] Judging: OOD_014
INFO:exp_v4_naive_baseline:[103/138] Judging: OOD_015
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): Server disconnected without sending a response.
INFO:exp_v4_naive_baseline:[104/138] Judging: OOD_016
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[105/138] Judging: OOD_017
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[106/138] Judging: OOD_018
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[107/138] Judging: OOD_019
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[108/138] Judging: OOD_020
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[109/138] Judging: OOD_021
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[110/138] Judging: OOD_022
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[111/138] Judging: OOD_023
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[112/138] Judging: OOD_024
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[113/138] Judging: OOD_025
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[114/138] Judging: OOD_026
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[115/138] Judging: OOD_027
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[116/138] Judging: OOD_028
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[117/138] Judging: OOD_029
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[118/138] Judging: OOD_030
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[119/138] Judging: OOD_031
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[120/138] Judging: OOD_032
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[121/138] Judging: OOD_033
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[122/138] Judging: OOD_034
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[123/138] Judging: OOD_035
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[124/138] Judging: OOD_036
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[125/138] Judging: OOD_037
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[126/138] Judging: OOD_038
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[127/138] Judging: OOD_039
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[128/138] Judging: OOD_040
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[129/138] Judging: OOD_041
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[130/138] Judging: OOD_042
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[131/138] Judging: OOD_043
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[132/138] Judging: OOD_044
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[133/138] Judging: OOD_045
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[134/138] Judging: OOD_046
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[135/138] Judging: OOD_047
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[136/138] Judging: OOD_048
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[137/138] Judging: OOD_049
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:[138/138] Judging: OOD_050
ERROR:exp_common:Judge error (ollama::qwen3-coder:latest): All connection attempts failed
INFO:exp_v4_naive_baseline:Judging complete: 138 newly scored
Warning: No Gemini API Key provided.
Traceback (most recent call last):
  File "/Users/bioinfo-001/projects/Mentori/tests/experiments_v4/exp_v4_naive_baseline.py", line 323, in <module>
    main()
    ~~~~^^
  File "/Users/bioinfo-001/projects/Mentori/tests/experiments_v4/exp_v4_naive_baseline.py", line 319, in main
    asyncio.run(run_judge(args))
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/bioinfo-001/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ~~~~~~~~~~^^^^^^
  File "/Users/bioinfo-001/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/Users/bioinfo-001/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/asyncio/base_events.py", line 725, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/Users/bioinfo-001/projects/Mentori/tests/experiments_v4/exp_v4_naive_baseline.py", line 291, in run_judge
    "metrics": aggregate_v4_metrics(intermediate["results"]),
               ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bioinfo-001/projects/Mentori/tests/experiments_v4/exp_v4_common.py", line 218, in aggregate_v4_metrics
    "source_coverage": compute_mean_source_coverage(results),
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
  File "/Users/bioinfo-001/projects/Mentori/tests/experiments_v4/exp_v4_common.py", line 140, in compute_mean_source_coverage
    cov = cit.get("source_coverage")
          ^^^^^^^
AttributeError: 'str' object has no attribute 'get'
