HiST-LLM

Hauser, Jakob Elias

doi:10.5281/zenodo.14671248

Published January 16, 2025 | Version v1

Dataset Open

HiST-LLM

Hauser, Jakob Elias (Contact person)¹

1. Complexity Science Hub Vienna

Large Language Models' Expert-level Global History Knowledge Benchmark (HiST-LLM)

Large Language Models (LLMs) have the potential to transform humanities and social science research, yet their history knowledge and comprehension at a graduate level remains untested. Benchmarking LLMs in history is particularly challenging, given that human knowledge of history is inherently unbalanced, with more information available on Western history and recent periods. We introduce the History Seshat Test for LLMs (Hist-LLM), based on a subset of the Seshat Global History Databank, which provides a structured representation of human historical knowledge, containing 36,000 data points across 600 historical societies and over 2,700 scholarly references. This dataset covers every major world region from the Neolithic period to the Industrial Revolution and includes information reviewed and assembled by history experts and graduate research assistants. Using this dataset, we benchmark a total of seven models from the Gemini, OpenAI, and Llama families. We find that, in a four-choice format, LLMs have a balanced accuracy ranging from 33.6% (Llama-3.1-8B) to 46% (GPT-4-Turbo), outperforming random guessing (25%) but falling short of expert comprehension. LLMs perform better on earlier historical periods. Regionally, performance is more even but still better for the Americas and lowest in Oceania and Sub-Saharan Africa for the more advanced models. Our benchmark shows that while LLMs possess some expert-level historical knowledge, there is considerable room for improvement.

Dataset links

Dataset Repository (Github)

Croissant Metadata (Github)

Usage

This dataset can be used to benchmark LLMs on their expert level history knowledge.

Loading the dataset

using Python and Pandas:

import pandas as pd
main = pd.read_parquet("Neurips_HiST-LLM.parquet")
ref = pd.read_parquet("references.parquet")

Dataset metadata

Dataset metadata documented in the croissant.json file.

Model Fingerprints

When model fingerprint are available we created extra columns for each model fingerprint. These columns are named via the following pattern `<model-name>_<model-fingerprint>`.

Column Descriptions

additional_review

Boolean This column describes whether datapoints underwent additional expert review. See section 3.2 of the Paper.

Q

The multiple choice question.

A

The expected completion of the prompt.

polity old id

ID for polity according to Seshat ids.

start year str

String for when polity started existing (in BCE/CE format).

end year str

String for when polity stopped existing (in BCE/CE format).

start year int

Int for when polity started existing (in BCE/CE format).

end year int

Int for when polity stopped existing (in BCE/CE format).

name

Polity name.

nga

Natural Geographic Area for Polity.

world_region

The world region of a NGA (based on the UN regions with some modifications)

root cat

Major category of fact.

value

Value of data point.

variable

Variable of data point.

id

Request id for openai batch requests.

description

Description provided by RAs for fact.

Files

croissant.json

Files (25.0 MB)

Name	Size	Download all
croissant.json md5:55c08c1163e4dfc963644514117c5e9c	48.7 kB	Preview Download
Neurips_HiST-LLM.parquet md5:6f565f9dfb0619b75328c9b8b3084284	24.4 MB	Download
references.parquet md5:3d1f8f289c458702efe390dd9906abca	569.0 kB	Download

Additional details

Repository URL: https://github.com/seshat-db/HiST-LLM

	All versions	This version
Views	832	832
Downloads	261	261
Data volume	2.6 GB	2.6 GB

HiST-LLM

Authors/Creators

Description

Large Language Models' Expert-level Global History Knowledge Benchmark (HiST-LLM)

Dataset links

Usage

Loading the dataset

Dataset metadata

Model Fingerprints

When model fingerprint are available we created extra columns for each model fingerprint. These columns are named via the following pattern <model-name>_<model-fingerprint>.

Column Descriptions

additional_review

Q

A

polity old id

start year str

end year str

start year int

end year int

name

nga

world_region

category

root cat

value

variable

id

description

Files

croissant.json

Files (25.0 MB)

Additional details

Software

When model fingerprint are available we created extra columns for each model fingerprint. These columns are named via the following pattern `<model-name>_<model-fingerprint>`.