Published June 13, 2024 | Version v1
Software Open

Autonomous Assessment of LLM Truth Maintenance in Formal Translation Tasks without Human Labeling: Dynamic Datasets, Assessment Paradigms, and End-to-End Benchmarks

Authors/Creators

Description

Code and Datasets for the NeurIPS-24 dataset track submission.

Files

autoeval.zip

Files (346.8 MB)

Name Size Download all
md5:2980cda4b6636be6538608a33ff40ea7
346.8 MB Preview Download