Published May 24, 2024 | Version UncheatableEval-v0.1
Software Open

Jellyfish042/uncheatable_eval: Uncheatable Eval release

Creators

Description

Uncheatable Eval assesses the language modeling capabilities of LLMs on new data from various sources such as recent papers on arXiv, new projects on GitHub, news articles, and more. Since this data is brand new (e.g., from the past 1-2 weeks), it is impossible for these data to be included in the training sets of publicly released models, thus avoiding the impact of unintentional or intentional data leaks.

Specifically, we calculate the sum of negative log probabilities of the models on these texts. In other words, models that are more likely to generate these texts are considered better.

Files

Jellyfish042/uncheatable_eval-UncheatableEval-v0.1.zip

Files (36.8 MB)

Additional details