RISE-UNIBAS/humanities_data_benchmark

Hindermann, Maximilian; Marti, Sorin; Alberto, Anthea; Binnenkade, Alexandra; Bosse, Arno; Burkhardt, Sven; Decker, Eric; Frick, Pema; Kasper, Lea; Lienhard, Sven; Losada Palenzuela, José Luis; Müller, Gabriel; Serif, Ina; Spadini, Elena; Wullschleger, Tabea

doi:10.5281/zenodo.19204070

Published March 24, 2026 | Version v0.5.0

Software Open

RISE-UNIBAS/humanities_data_benchmark

This repository contains benchmark datasets (images and text), prompts, ground truths, and evaluation scripts for assessing the performance of large language models (LLMs) on humanities-related tasks. The suite is designed as a resource for researchers and practitioners interested in systematically evaluating how well various LLMs perform on digital humanities (DH) tasks involving visual and text-like materials. For detailed test results and model comparisons, visit our results dashboard at https://rise-services.rise.unibas.ch/benchmarks/.

Notes

If you use this software, please cite it using the metadata from this file.

Files