There is a newer version of the record available.

Published July 9, 2025 | Version v1
Other Open

Pretrained weights for Document Attention Network for Information Extraction and Labelling of handwritten documents

  • 1. ROR icon Université de Rouen Normandie

Description

This repository contains the corpus necessary for the synthetic data generation of the DANIEL which is available on GitHub and described in the paper DANIEL: a fast document attention network for information extraction and labelling of handwritten documents, authored by Thomas Constum, Pierrick Tranouez, and Thierry Paquet (LITIS, University of Rouen Normandie).

The paper has been accepted for publication in the International Journal on Document Analysis and Recognition (IJDAR) and is also accessible on arXiv.

This project is licensed under a custom Research Usage Only (RUO) license. Please refer to the license file LICENSE for more details.

The contents of this archive should be extracted into the outputs/ directory of the DANIEL codebase.

Each folder in the archive follows the naming convention: daniel_<datasetname>_strategy_X, where:

  • <datasetname> refers to the target dataset used during training,

  • strategy_X refers to the specific training strategy applied to obtain the corresponding model weights.

For a detailed explanation of the training strategies, please refer to the DANIEL paper.

Selecting pre-trained weights for transfer learning

When performing transfer learning, choosing the right pre-trained weights is crucial for achieving optimal results. Below are the recommended weight options based on your dataset and annotation availability:

1. daniel_iam_ner_strategy_A_custom_split

  • Training Data: Trained on all synthetic datasets and real datasets except M-POPP.
  • Best Use Case: Suitable when only a small amount of annotated data is available in the target dataset.
  • Attention Granularity: 32-pixel vertical granularity, meaning the encoder’s output feature map has a height of H/32 (where H is the input image height).

2. daniel_multi_synth

  • Training Data: Trained exclusively on synthetic datasets excluding M-POPP, with no real data. Used to initialize fine-tuning strategies A and B for the IAM/IAM NER, RIMES 2009, and READ 2016 datasets.
  • Best Use Case: Suitable for modern document datasets with several thousand annotated pages.
  • Attention Granularity: 32-pixel vertical granularity (H/32).

Citation Request

If you publish material based on this weights, we request you to include a reference to the paper:

« Constum, T., Tranouez, P. & Paquet, T., DANIEL: a fast document attention network for information extraction and labelling of handwritten documents. IJDAR (2025). https://doi.org/10.1007/s10032-024-00511-9 »

Files

daniel_pretrained_weights.zip

Files (8.7 GB)

Name Size Download all
md5:017dfc139125090d3a02ae9688e9a211
8.7 GB Preview Download
md5:6a8120c32612b8905863b151c6dd6a73
17.1 kB Download

Additional details

Related works

Dates

Available
2025-07-09

References