RISE-UNIBAS/humanities_data_benchmark

Hindermann, Maximilian; Marti, Sorin; Serif, Ina; Burkhardt, Sven

doi:10.5281/zenodo.17010690

Published August 31, 2025 | Version v0.2.0

Software Open

RISE-UNIBAS/humanities_data_benchmark

This repository contains benchmark datasets (images), prompts, ground truths, and evaluation scripts for assessing the performance of large language models (LLMs) on humanities-related tasks. The suite is designed as a resource for researchers and practitioners interested in systematically evaluating how well various LLMs perform on digital humanities (DH) tasks involving visual materials. For detailed test results and model comparisons, visit our results dashboard at https://rise-unibas.github.io/humanities_data_benchmark/.

Notes

If you use this software, please cite it using the metadata from this file.

Files