Datasets for "Reading Order Independent Metrics for Information Extraction in Handwritten Documents"

doi:10.5281/zenodo.11083657

Published April 29, 2024 | Version v1

Dataset Open

Datasets for "Reading Order Independent Metrics for Information Extraction in Handwritten Documents"

TARRIDE, Solène (Data curator)¹

1. TEKLIA

This repository includes the five datasets used for our paper entitled Reading Order Independent Metrics for Information Extraction in Handwritten Documents, in which we compare various metrics to evaluate end-to-end information extraction from scanned documents.

Datasets

Five datasets are released following the BIO format:

IAM
Simara
POPP
Esposalles
French Military Records

For each dataset, we provide the following data (on test sets):

Ground truth annotations (gt/)
Automatic predictions (dan/)
Automatic predictions with entities appearing in random order (dan_shuffled/)

The data is organized as follows:

├── Dataset name/
│ ├── gt/
│ ├── dan/
│ └── dan_shuffled/

Metrics

To install the ie-eval package, run pip install ie-eval.

To compute all metrics on a specific dataset, run:

ie-eval all --label-dir IAM_paragraph/gt/ --prediction-dir IAM_paragraph/dan/

To learn more about the various options, use the --help argument or read the documentation.

Files

datasets.zip

Files (2.9 MB)

Name	Size	Download all
datasets.zip md5:cae039515544710f6b239763c56172a3	2.9 MB	Preview Download
README.md md5:a71ce0bf6fd95faae5354c31909a2c5e	1.3 kB	Preview Download

	All versions	This version
Views	50	50
Downloads	15	15
Data volume	23.2 MB	23.2 MB

Datasets for "Reading Order Independent Metrics for Information Extraction in Handwritten Documents"

Creators

Description

Datasets

Metrics

Files

datasets.zip

Files (2.9 MB)