Published December 17, 2024 | Version v0.4.7
Software Open

EleutherAI/lm-evaluation-harness: v0.4.7

Description

lm-eval v0.4.7 Release Notes

This release includes several bug fixes, minor improvements to model handling, and task additions.

⚠️ Python 3.8 End of Support Notice

Python 3.8 support will be dropped in future releases as it has reached its end of life. Users are encouraged to upgrade to Python 3.9 or newer.

Backwards Incompatibilities

Chat Template Delimiter Handling (in v0.4.6)

An important modification has been made to how delimiters are handled when applying chat templates in request construction, particularly affecting multiple-choice tasks. This change ensures better compatibility with chat models by respecting their native formatting conventions.

📝 For detailed documentation, please refer to docs/chat-template-readme.md

New Benchmarks & Tasks

  • Basque Integration: Added Basque translation of PIQA (piqa_eu) to BasqueBench by @naiarapm in #2531
  • SCORE Tasks: Added new subtask for non-greedy robustness evaluation by @rimashahbazyan in #2558

As well as several slight fixes or changes to existing tasks (as noted via the incrementing of versions).

Thanks, the LM Eval Harness team (@baberabb and @lintangsutawika)

What's Changed

  • Score tasks by @rimashahbazyan in https://github.com/EleutherAI/lm-evaluation-harness/pull/2452
  • Filters bugfix; add metrics and filter to logged sample by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2517
  • skip casting if predict_only by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2524
  • make utility function to handle until by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2518
  • Update Unitxt task to use locally installed unitxt and not download Unitxt code from Huggingface by @yoavkatz in https://github.com/EleutherAI/lm-evaluation-harness/pull/2514
  • add Basque translation of PIQA (piqa_eu) to BasqueBench by @naiarapm in https://github.com/EleutherAI/lm-evaluation-harness/pull/2531
  • avoid timeout errors with high concurrency in api_model by @dtrawins in https://github.com/EleutherAI/lm-evaluation-harness/pull/2307
  • Update README.md by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2534
  • better doc_to_test testing by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2535
  • Support pipeline parallel with OpenVINO models by @sstrehlk in https://github.com/EleutherAI/lm-evaluation-harness/pull/2349
  • Super little tiny fix doc by @fzyzcjy in https://github.com/EleutherAI/lm-evaluation-harness/pull/2546
  • [API] left truncate for generate_until by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2554
  • Update Lightning import by @maanug-nv in https://github.com/EleutherAI/lm-evaluation-harness/pull/2549
  • add optimum-intel ipex model by @yao-matrix in https://github.com/EleutherAI/lm-evaluation-harness/pull/2566
  • add warning to readme by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2568
  • Adding new subtask to SCORE tasks: non greedy robustness by @rimashahbazyan in https://github.com/EleutherAI/lm-evaluation-harness/pull/2558
  • batch loglikelihood_rolling across requests by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2559
  • fix DeprecationWarning: invalid escape sequence '\s' for whitespace filter by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2560
  • increment version to 4.6.7 by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2574

New Contributors

  • @rimashahbazyan made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2452
  • @naiarapm made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2531
  • @dtrawins made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2307
  • @sstrehlk made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2349
  • @fzyzcjy made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2546
  • @maanug-nv made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2549
  • @yao-matrix made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2566

Full Changelog: https://github.com/EleutherAI/lm-evaluation-harness/compare/v0.4.6...v0.4.7

Files

EleutherAI/lm-evaluation-harness-v0.4.7.zip

Files (3.5 MB)

Name Size Download all
md5:379d6e427a03e6cdaa9de8979e60fd74
3.5 MB Preview Download

Additional details

Related works