HIPE-2026 Shared Task Person-Place Relation Dataset
Authors/Creators
- 1. EPFL-DHLAB
- 2. University of Zurich
Description
HIPE-2026 Data — v1.0
HIPE-2026 is a CLEF 2026 Evaluation Lab on the qualification of person–place relations in multilingual historical documents (Who was where, when?). This release contains the complete v1.0 dataset used during the official evaluation campaign.
📦 What's in this release
The HIPE-2026 dataset covers two evaluation domains across two test sets:
- Domain A — Historical Newspapers (Test A): Articles in German, English, and French from the HIPE historical newspaper corpus (derived from HIPE-2022 impresso dataset). Entity annotations and Wikidata links were manually created in HIPE-2022. Person-Places relations were manually annotated for HIPE-2026. Evaluates both
atandisAtrelations. - Domain B — Literary Works (Test B, surprise): A held-out test set of French literature and history works from the 16th–18th centuries, included to assess out-of-domain generalization. Evaluates
atonly.
| Split | Domain | Languages | File(s) |
|---|---|---|---|
| Train | Newspapers | DE, EN, FR | data/newspapers/v1.0/HIPE-2026-v1.0-impresso-train-*.jsonl |
| Test | Newspapers | DE, EN, FR | data/newspapers/v1.0/HIPE-2026-v1.0-impresso-test-*.jsonl |
| Test (masked) | Newspapers | DE, EN, FR | data/newspapers/v1.0/HIPE-2026-v1.0-impresso-test_masked-*.jsonl |
| Test | Literary works | FR | data/litworks/v1.0/HIPE-2026-v1.0-surprise-test-fr.jsonl |
| Test (masked) | Literary works | FR | data/litworks/v1.0/HIPE-2026-v1.0-surprise-test_masked-fr.jsonl |
For detailed information on tasks, datasets, and evaluation protocol, refer to the CLEF HIPE-2026 Shared Task Participation Guidelines.
🔬 Reproducing the official evaluation
The full campaign evaluation — including reference data, participant submissions, and the evaluation orchestrator — is available at: 👉 https://github.com/hipe-eval/hipe-2026-eval
📐 Data format
Data is distributed as UTF-8 JSON Lines (.jsonl). Each line is one document with OCR text, document metadata, and sampled person–location pairs. Prediction targets are:
at— evidence that a person was at a location at any time before publication (TRUE,FALSE,PROBABLE,null)isAt— evidence of presence within ~one month before publication (TRUE,FALSE,null)
The full schema is in schemas/hipe-2026-data.schema.json. See the README for format details, validation, and a prediction/evaluation walkthrough.
📖 How to cite
Juri Opitz, Corina Raclé, Andrianos Michail, Matteo Romanello, Emanuela Boros, Simon Gabay, Maud Ehrmann, and Simon Clematide. 2026. Extended Overview of HIPE-2026: Evaluating Accurate and Efficient Person–Place Relation Extraction from Multilingual Historical Texts. In CLEF 2026 Working Notes, CEUR Workshop Proceedings. https://doi.org/10.5281/zenodo.20344461
📜 License
Released under CC BY-NC-SA 4.0.
HIPE-2026 is organised within the Impresso project, funded by the Swiss National Science Foundation (grant CRSII5_213585) and the Luxembourg National Research Fund (grant 17498891).
Files
hipe-eval/HIPE-2026-data-v1.0.zip
Files
(1.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:16c53c21f5eec0f66790aa251823a171
|
1.9 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/hipe-eval/HIPE-2026-data/tree/v1.0 (URL)
Software
- Repository URL
- https://github.com/hipe-eval/HIPE-2026-data