Published June 9, 2026 | Version v1.0
Dataset Open

HIPE-2026 Shared Task Person-Place Relation Dataset

  • 1. EPFL-DHLAB
  • 2. University of Zurich

Description

HIPE-2026 Data — v1.0

HIPE-2026 is a CLEF 2026 Evaluation Lab on the qualification of person–place relations in multilingual historical documents (Who was where, when?). This release contains the complete v1.0 dataset used during the official evaluation campaign.

📦 What's in this release

The HIPE-2026 dataset covers two evaluation domains across two test sets:

  • Domain A — Historical Newspapers (Test A): Articles in German, English, and French from the HIPE historical newspaper corpus (derived from HIPE-2022 impresso dataset). Entity annotations and Wikidata links were manually created in HIPE-2022. Person-Places relations were manually annotated for HIPE-2026. Evaluates both at and isAt relations.
  • Domain B — Literary Works (Test B, surprise): A held-out test set of French literature and history works from the 16th–18th centuries, included to assess out-of-domain generalization. Evaluates at only.

| Split | Domain | Languages | File(s) | |---|---|---|---| | Train | Newspapers | DE, EN, FR | data/newspapers/v1.0/HIPE-2026-v1.0-impresso-train-*.jsonl | | Test | Newspapers | DE, EN, FR | data/newspapers/v1.0/HIPE-2026-v1.0-impresso-test-*.jsonl | | Test (masked) | Newspapers | DE, EN, FR | data/newspapers/v1.0/HIPE-2026-v1.0-impresso-test_masked-*.jsonl | | Test | Literary works | FR | data/litworks/v1.0/HIPE-2026-v1.0-surprise-test-fr.jsonl | | Test (masked) | Literary works | FR | data/litworks/v1.0/HIPE-2026-v1.0-surprise-test_masked-fr.jsonl |

For detailed information on tasks, datasets, and evaluation protocol, refer to the CLEF HIPE-2026 Shared Task Participation Guidelines.

🔬 Reproducing the official evaluation

The full campaign evaluation — including reference data, participant submissions, and the evaluation orchestrator — is available at: 👉 https://github.com/hipe-eval/hipe-2026-eval

📐 Data format

Data is distributed as UTF-8 JSON Lines (.jsonl). Each line is one document with OCR text, document metadata, and sampled person–location pairs. Prediction targets are:

  • at — evidence that a person was at a location at any time before publication (TRUE, FALSE, PROBABLE, null)
  • isAt — evidence of presence within ~one month before publication (TRUE, FALSE, null)

The full schema is in schemas/hipe-2026-data.schema.json. See the README for format details, validation, and a prediction/evaluation walkthrough.

📖 How to cite

Juri Opitz, Corina Raclé, Andrianos Michail, Matteo Romanello, Emanuela Boros, Simon Gabay, Maud Ehrmann, and Simon Clematide. 2026. Extended Overview of HIPE-2026: Evaluating Accurate and Efficient Person–Place Relation Extraction from Multilingual Historical Texts. In CLEF 2026 Working Notes, CEUR Workshop Proceedings. https://doi.org/10.5281/zenodo.20344461

📜 License

Released under CC BY-NC-SA 4.0.

HIPE-2026 is organised within the Impresso project, funded by the Swiss National Science Foundation (grant CRSII5_213585) and the Luxembourg National Research Fund (grant 17498891).

Files

hipe-eval/HIPE-2026-data-v1.0.zip

Files (1.9 MB)

Name Size Download all
md5:16c53c21f5eec0f66790aa251823a171
1.9 MB Preview Download

Additional details

Related works