The Basel Land Records Ground Truth: An Annotated Dataset for Information Extraction on German-language Administrative Records.
Authors/Creators
Description
We present a dataset based on the Historical Land Records of Basel, covering the period 1400–1700. The dataset comprises 829 source excerpts in premodern German containing more than 50,000 tokens and 30,000 annotations. The annotations capture nested entities, events, and relations, reflecting complex interactions between actors and properties. Over two-thirds of entity references are nested within others, providing rich material for training and evaluating models in nested sequence tagging, low-resource named entity recognition, and noise-tolerant NLP. The dataset may also support the development of generalized models for premodern German. The dataset is stored on Zenodo, and provided in TEI and XML formats.
The source corpus from which this data set was created - The Historical Land Records of Basel - has been a rich collection of information for researchers of history, economy and sociology since a long time.
This dataset was created as part of the research project Economies of Space . The project aims to research strategies on how to best access a large scale corpus such as the central Historical Land Records of Basel, deal with premodern German and produce analysis from a historians perspective.
Files
benasch.zip
Additional details
Related works
- Is described by
- Data paper: 10.5334/johd.387 (DOI)
- Is part of
- Poster: 10.5281/zenodo.13908083 (DOI)
- Is supplement to
- Presentation: 10.5281/zenodo.11500543 (DOI)