Jellyfish042/uncheatable_eval: Uncheatable Eval release
Creators
Description
Uncheatable Eval assesses the language modeling capabilities of LLMs on new data from various sources such as recent papers on arXiv, new projects on GitHub, news articles, and more. Since this data is brand new (e.g., from the past 1-2 weeks), it is impossible for these data to be included in the training sets of publicly released models, thus avoiding the impact of unintentional or intentional data leaks.
Specifically, we calculate the sum of negative log probabilities of the models on these texts. In other words, models that are more likely to generate these texts are considered better.
Files
Jellyfish042/uncheatable_eval-UncheatableEval-v0.1.zip
Files
(36.8 MB)
Name | Size | Download all |
---|---|---|
md5:6642ac75a1ce4aa17fbb228c7f087a7f
|
36.8 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/Jellyfish042/uncheatable_eval/tree/UncheatableEval-v0.1 (URL)