EleutherAI/lm-evaluation-harness: v0.4.7
Creators
- Lintang Sutawika1
- Hailey Schoelkopf
- Leo Gao
- Baber Abbasi
- Stella Biderman2
- Jonathan Tow
- ben fattori
- Charles Lovering
- farzanehnakhaee70
- Jason Phang
- Anish Thite3
- Fazz
- Aflah4
- Niklas Muennighoff
- Thomas Wang5
- sdtblck
- nopperl
- gakada
- tttyuntian
- researcher2
- Julen Etxaniz6
- Chris7
- Hanwool Albert Lee8
- Leonid Sinev
- Zdeněk Kasner9
- Khalid
- KonradSzafer
- Jeffrey Hsu10
- Anjor Kanekar11
- Pawan Sasanka Ammanamanchi
- 1. @EleutherAI
- 2. Booz Allen Hamilton, EleutherAI
- 3. @ClarosAI
- 4. Max Planck Institute for Software Systems: MPI SWS
- 5. MistralAI
- 6. Hitz Zentroa UPV/EHU
- 7. @azurro
- 8. NCSOFT
- 9. Charles University
- 10. Ivy Natal
- 11. Platypus Tech
Description
lm-eval v0.4.7 Release Notes
This release includes several bug fixes, minor improvements to model handling, and task additions.
⚠️ Python 3.8 End of Support Notice
Python 3.8 support will be dropped in future releases as it has reached its end of life. Users are encouraged to upgrade to Python 3.9 or newer.
Backwards Incompatibilities
Chat Template Delimiter Handling (in v0.4.6)
An important modification has been made to how delimiters are handled when applying chat templates in request construction, particularly affecting multiple-choice tasks. This change ensures better compatibility with chat models by respecting their native formatting conventions.
📝 For detailed documentation, please refer to docs/chat-template-readme.md
New Benchmarks & Tasks
- Basque Integration: Added Basque translation of PIQA (piqa_eu) to BasqueBench by @naiarapm in #2531
- SCORE Tasks: Added new subtask for non-greedy robustness evaluation by @rimashahbazyan in #2558
As well as several slight fixes or changes to existing tasks (as noted via the incrementing of versions).
Thanks, the LM Eval Harness team (@baberabb and @lintangsutawika)
What's Changed
- Score tasks by @rimashahbazyan in https://github.com/EleutherAI/lm-evaluation-harness/pull/2452
- Filters bugfix; add
metrics
andfilter
to logged sample by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2517 - skip casting if predict_only by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2524
- make utility function to handle
until
by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2518 - Update Unitxt task to use locally installed unitxt and not download Unitxt code from Huggingface by @yoavkatz in https://github.com/EleutherAI/lm-evaluation-harness/pull/2514
- add Basque translation of PIQA (piqa_eu) to BasqueBench by @naiarapm in https://github.com/EleutherAI/lm-evaluation-harness/pull/2531
- avoid timeout errors with high concurrency in api_model by @dtrawins in https://github.com/EleutherAI/lm-evaluation-harness/pull/2307
- Update README.md by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2534
- better doc_to_test testing by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2535
- Support pipeline parallel with OpenVINO models by @sstrehlk in https://github.com/EleutherAI/lm-evaluation-harness/pull/2349
- Super little tiny fix doc by @fzyzcjy in https://github.com/EleutherAI/lm-evaluation-harness/pull/2546
- [API] left truncate for generate_until by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2554
- Update Lightning import by @maanug-nv in https://github.com/EleutherAI/lm-evaluation-harness/pull/2549
- add optimum-intel ipex model by @yao-matrix in https://github.com/EleutherAI/lm-evaluation-harness/pull/2566
- add warning to readme by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2568
- Adding new subtask to SCORE tasks: non greedy robustness by @rimashahbazyan in https://github.com/EleutherAI/lm-evaluation-harness/pull/2558
- batch
loglikelihood_rolling
across requests by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2559 - fix
DeprecationWarning: invalid escape sequence '\s'
for whitespace filter by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2560 - increment version to 4.6.7 by @baberabb in https://github.com/EleutherAI/lm-evaluation-harness/pull/2574
New Contributors
- @rimashahbazyan made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2452
- @naiarapm made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2531
- @dtrawins made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2307
- @sstrehlk made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2349
- @fzyzcjy made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2546
- @maanug-nv made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2549
- @yao-matrix made their first contribution in https://github.com/EleutherAI/lm-evaluation-harness/pull/2566
Full Changelog: https://github.com/EleutherAI/lm-evaluation-harness/compare/v0.4.6...v0.4.7
Files
EleutherAI/lm-evaluation-harness-v0.4.7.zip
Files
(3.5 MB)
Name | Size | Download all |
---|---|---|
md5:379d6e427a03e6cdaa9de8979e60fd74
|
3.5 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/EleutherAI/lm-evaluation-harness/tree/v0.4.7 (URL)
Software
- Repository URL
- https://github.com/EleutherAI/lm-evaluation-harness