Jellyfish042/uncheatable_eval: Uncheatable Eval release

Jellyfish042

doi:10.5281/zenodo.11284693

Published May 24, 2024 | Version UncheatableEval-v0.1

Software Open

Jellyfish042/uncheatable_eval: Uncheatable Eval release

Jellyfish042

Uncheatable Eval assesses the language modeling capabilities of LLMs on new data from various sources such as recent papers on arXiv, new projects on GitHub, news articles, and more. Since this data is brand new (e.g., from the past 1-2 weeks), it is impossible for these data to be included in the training sets of publicly released models, thus avoiding the impact of unintentional or intentional data leaks.

Specifically, we calculate the sum of negative log probabilities of the models on these texts. In other words, models that are more likely to generate these texts are considered better.

Files

Jellyfish042/uncheatable_eval-UncheatableEval-v0.1.zip

Files (36.8 MB)

Name	Size	Download all
Jellyfish042/uncheatable_eval-UncheatableEval-v0.1.zip md5:6642ac75a1ce4aa17fbb228c7f087a7f	36.8 MB	Preview Download

Additional details

Is supplement to: Software: https://github.com/Jellyfish042/uncheatable_eval/tree/UncheatableEval-v0.1 (URL)

114

Views

Downloads

Show more details

	All versions	This version
Views	114	114
Downloads	32	32
Data volume	1.2 GB	1.2 GB

More info on how stats are collected....

DOI

Resource type

Software

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: May 24, 2024
Modified: May 24, 2024

Jellyfish042/uncheatable_eval: Uncheatable Eval release

Creators

Description

Files

Jellyfish042/uncheatable_eval-UncheatableEval-v0.1.zip

Files (36.8 MB)

Additional details

Related works