Datasets for "Reading Order Independent Metrics for Information Extraction in Handwritten Documents"
Description
This repository includes the five datasets used for our paper entitled Reading Order Independent Metrics for Information Extraction in Handwritten Documents, in which we compare various metrics to evaluate end-to-end information extraction from scanned documents.
Datasets
Five datasets are released following the BIO format:
- IAM
- Simara
- POPP
- Esposalles
- French Military Records
For each dataset, we provide the following data (on test sets):
- Ground truth annotations (
gt/
) - Automatic predictions (
dan/
) - Automatic predictions with entities appearing in random order (
dan_shuffled/
)
The data is organized as follows:
├── Dataset name/
│ ├── gt/
│ ├── dan/
│ └── dan_shuffled/
Metrics
To install the ie-eval
package, run pip install ie-eval
.
To compute all metrics on a specific dataset, run:ie-eval all --label-dir IAM_paragraph/gt/ --prediction-dir IAM_paragraph/dan/
To learn more about the various options, use the --help
argument or read the documentation.
Files
datasets.zip
Files
(2.9 MB)
Name | Size | Download all |
---|---|---|
md5:cae039515544710f6b239763c56172a3
|
2.9 MB | Preview Download |
md5:a71ce0bf6fd95faae5354c31909a2c5e
|
1.3 kB | Preview Download |