Token files for the DANIEL (Document Attention Network for Information Extraction and Labeling)
Description
These files are required to execute the DANIEL code, which is available on GitHub and described in the paper DANIEL: a fast document attention network for information extraction and labelling of handwritten documents, authored by Thomas Constum, Pierrick Tranouez, and Thierry Paquet (LITIS, University of Rouen Normandie).
The paper has been accepted for publication in the International Journal on Document Analysis and Recognition (IJDAR) and is also accessible on arXiv.
The contents of this archive must be extracted into the basic
directory of the DANIEL codebase.
Contents of the archive:
-
tokenizer-daniel
: This directory contains the tokenizer used by the DANIEL model, saved in the format of the HuggingFacetokenizers
library. -
replace_dict.pkl
: This file contains a replacement dictionary used during the teacher forcing phase of training. It is designed to randomly substitute certain subwords with similar alternatives. Each key in the dictionary corresponds to a subword index from the DANIEL vocabulary, and each associated value is a list of indices representing the candidate subwords for replacement.
Citation Request
If you publish material based on this weights, we request you to include a reference to the paper:
« Constum, T., Tranouez, P. & Paquet, T., DANIEL: a fast document attention network for information extraction and labelling of handwritten documents. IJDAR (2025). https://doi.org/10.1007/s10032-024-00511-9 »
Files
subwords.zip
Files
(18.7 MB)
Name | Size | Download all |
---|---|---|
md5:6a8120c32612b8905863b151c6dd6a73
|
17.1 kB | Download |
md5:2a2a7f1a10222f8462891ee38e05afde
|
18.6 MB | Preview Download |
Additional details
Related works
- Is described by
- Journal article: https://link.springer.com/article/10.1007/s10032-024-00511-9 (URL)
Dates
- Available
-
2025-07-09
References
- Constum, T., Tranouez, P. & Paquet, T., DANIEL: a fast document attention network for information extraction and labelling of handwritten documents. IJDAR (2025). https://doi.org/10.1007/s10032-024-00511-9