Published August 21, 2025 | Version 0.1
Dataset Open

The Basel Land Records Ground Truth: An Annotated Dataset for Information Extraction on German-language Administrative Records.

Description

We present a dataset based on the Historical Land Records of Basel, covering the period 1400–1700. The dataset comprises 829 source excerpts in premodern German containing more than 50,000 tokens and 30,000 annotations. The annotations capture nested entities, events, and relations, reflecting complex interactions between actors and properties. Over two-thirds of entity references are nested within others, providing rich material for training and evaluating models in nested sequence tagging, low-resource named entity recognition, and noise-tolerant NLP. The dataset may also support the development of generalized models for premodern German. The dataset is stored on Zenodo, and provided in TEI and XML formats. 

The source corpus from which this data set was created - The Historical Land Records of Basel - has been a rich collection of information for researchers of history, economy and sociology since a long time.

This dataset was created as part of the research project Economies of Space . The project aims to research strategies on how to best access a large scale corpus such as the central Historical Land Records of Basel, deal with premodern German and produce analysis from a historians perspective.

Files

benasch.zip

Files (5.9 MB)

Name Size Download all
md5:b3c9bca64979945744462a1d16491285
2.7 MB Preview Download
md5:8daaba4e555a6df6f17a268520a6bb1e
208.5 kB Preview Download
md5:833bbccd1f592ed54913cc80ba05cebe
1.6 MB Preview Download
md5:e2bdb04f1fd57a463abb2e229c2f8afc
1.5 MB Preview Download

Additional details

Related works

Is described by
Data paper: 10.5334/johd.387 (DOI)
Is part of
Poster: 10.5281/zenodo.13908083 (DOI)
Is supplement to
Presentation: 10.5281/zenodo.11500543 (DOI)